Kubernetes best practices
Antithesis runs your system in our Kubernetes testing environment using the manifests you provide. For an overview of the process, please check out our Kubernetes Setup guide.
Here are some best practices we suggest you follow to get the most out of your testing.
Do’s
Set namespaces explicitly
Resources without a namespace are placed in
defaultwhich can create ambiguity and complicate pod-to-service communication. Always set thenamespacefield for every resource. You may still usedefault, but do so explicitly.
Include required resources
Always include manifests for everything your workload needs (e.g.,
Namespace,ServiceAccount,RoleBinding, etc.).kappensures correct apply ordering, but the resources themselves must be defined. Refer to our kubernetes environment for a list of what is pre-provisioned.
Use readiness probes
Always define readiness probes. A Pod in the
Runningphase is not necessarily ready to serve traffic. Readiness probes provide the explicit signal that initialization is complete and the workload can safely receive requests. Without them, fuzzing or other tests may start before the system is prepared.
Tune liveness probes
Set conservative liveness probe values. Probes that are too aggressive may kill Pods unnecessarily in a single-node K3s environment, especially during faults.
Use Deployments or StatefulSets instead of standalone Pods
Prefer controllers over bare Pods. Deployments and StatefulSets track readiness, support rolling updates and automatic restarts, and expose useful status fields (such as
availableReplicas) to avoid premature success. Use StatefulSets for clustered services that need stable identities or storage (e.g.,etcd).
Set resource requests
Define
resources.requestsfor all your containers to ensure correct scheduling and guaranteed performance. Memory limits are highly recommended to prevent OOM termination and ensure node stability; setting CPU limits is optional, as avoiding them allows Pods to burst when capacity is available. Ensure total CPU requests remain under 1 CPU (1000m) and total memory requests remain under 10 GiB. If higher memory limits are required, contact Antithesis support.
Use digests instead of tags
Reference container images by digest rather than tag. Digests are immutable and prevent tag–digest mismatches ensuring consistency and repeatability.
Gate bootstrap workloads
Bootstrap tasks such as init containers, Jobs, or CronJobs may cause failures if they run before their dependencies are ready. Ensure they are gated appropriately (for example, by waiting on Service endpoints or using readiness probes).
Check Kubernetes compatibility
The K3s version we run can be different from your cluster version, causing issues like removed or deprecated apiVersions (e.g.
PodSecurityPolicy). Alpha or beta feature gates you rely on may be disabled or unavailable in K3s. Validate your manifests against the K3s version we use before submitting.
Use local-path storage
PersistentVolumeClaims must use the
local-pathstorage class (or leavestorageClassNameblank to default to it). PVCs with other storage classes will remain stuck Pending. Pre-create any referenced directories to match the paths in your manifests, otherwise volume mounts will fail and pods may CrashLoopBackOff or remain Pending.
Make PodDisruptionBudgets upgrade-friendly
If you’re interested in restart or upgrade testing, avoid setting
PodDisruptionBudgetswithminAvailable: 1on single-replica Deployments, as this can block upgrades. Adjust them to allow restart and upgrade testing.
Don’ts
Don’t depend on the internet
Our air-gapped cluster has no external connectivity. Any attempt to pull images, download packages, or fetch external resources from the internet will fail. Ensure everything your workload needs is preloaded or provided locally. Common pitfalls include
initcontainers that usecurlorwget, Helm charts referencing public registries, and bootstrap scripts that install packages online.
Don’t set imagePullPolicy: Always
In our air-gapped cluster, all container images are pre-pulled. Setting
imagePullPolicy: Alwaysforces Kubernetes to fetch from a registry and will fail. Instead, reference images by digest and rely on the default pull policy.
Don’t use latest tags
The
latesttag is mutable and can change over time, causing inconsistent and non-reproducible deployments. Kubernetes also defaults toimagePullPolicy: Alwayswhenlatestis used, which in an air-gapped cluster results inErrImagePullfailures. Always reference images by specific tag or digest instead.
Don’t duplicate resource names
Resources of the same kind within the same namespace must have unique names. Defining duplicates will cause startup failures.
Don’t use underscores in names or hostnames
Kubernetes resource names and hostnames must follow RFC-1123. Only lowercase letters, digits, and hyphens are allowed. Underscores will break DNS resolution.
Don’t use privileged containers
securityContext.privileged: trueviolates the PodSecurity baseline and should be avoided.
Don’t use hostPath volumes
hostPathvolumes are discouraged and require the path to exist and be managed on the host node (which isn’t guaranteed), making them only acceptable for specific system-level tools (e.g. Prometheus node exporter) where the required host paths are sure to exist.
Don’t use ReadWriteMany volumes
The
local-pathprovisioner does not supportReadWriteMany(RWX). UseReadWriteOnce(RWO) instead. On single-node clusters, multiple replicas can still share an RWO volume, but attempting RWX will leave PVCs stuck in Pending.
Don’t oversubscribe resources
Pods enter
Pendingstatus if requested resources exceed available node capacity or the aggregate policy limits: 1 CPU total and 10 GiB total memory request.ResourceQuotaorLimitRangemanifests must not conflict with node specs or these limits, as this will block scheduling.
Don’t hardcode IPs or CIDRs
Avoid setting fixed
clusterIPvalues,ipBlockranges inNetworkPolicies, or static IPs in container args or env vars. These may overlap with K3s Service or Pod CIDRs and break routing. Localhost IPs (e.g.,0.0.0.0and127.0.0.1) are acceptable for use. Use DNS or Service names instead for network communication.
Don’t rely on Service type LoadBalancer
Our K3s cluster has no cloud provider and
servicelbis disabled, so Services of typeLoadBalancerwill stay in Pending. UseClusterIPfor pod-to-pod traffic, andNodePortonly if you need to access a Service from the host machine running K3s.
Don’t assume Services always load balance
A standard
ClusterIPService load balances across all ready Pods behind it. A headless Service (clusterIP: None) does not, it only returns the Pod IPs, and load balancing must be handled client-side.
Don’t require multiple nodes
Our cluster has only a single node. It has no taints, so any workload can run on it. But rules that require multiple nodes, such as Pod anti affinity, topology spread constraints, or policies that forbid replicas on the same node, will prevent pods from scheduling.