Kubernetes best practices
Antithesis runs your system in our Kubernetes testing environment using the manifests you provide. For an overview of the process, please check out our Kubernetes Setup guide.
Here are some best practices we suggest you follow to get the most out of your testing.
Do’s
Set namespaces explicitly
Resources without a namespace are placed in
default
which can create ambiguity and complicate pod-to-service communication. Always set thenamespace
field for every resource. You may still usedefault
, but do so explicitly.
Include required resources
Always include manifests for everything your workload needs (e.g.,
Namespace
,ServiceAccount
,RoleBinding
, etc.).kapp
ensures correct apply ordering, but the resources themselves must be defined. Refer to our kubernetes environment for a list of what is pre-provisioned.
Use readiness probes
Always define readiness probes. A Pod in the
Running
phase is not necessarily ready to serve traffic. Readiness probes provide the explicit signal that initialization is complete and the workload can safely receive requests. Without them, fuzzing or other tests may start before the system is prepared.
Tune liveness probes
Set conservative liveness probe values. Probes that are too aggressive may kill Pods unnecessarily in a single-node K3s environment, especially during faults.
Use Deployments or StatefulSets instead of standalone Pods
Prefer controllers over bare Pods. Deployments and StatefulSets track readiness, support rolling updates and automatic restarts, and expose useful status fields (such as
availableReplicas
) to avoid premature success. Use StatefulSets for clustered services that need stable identities or storage (e.g.,etcd
).
Set resource requests and limits
Define
resources.requests
andresources.limits
. Ensure anyResourceQuota
orLimitRange
manifests you include are compatible with the node specs. Quotas that are too small, or limits that exceed node capacity, will prevent Pods from scheduling.
Use digests instead of tags
Reference container images by digest rather than tag. Digests are immutable and prevent tag–digest mismatches ensuring consistency and repeatability.
Gate bootstrap workloads
Bootstrap tasks such as init containers, Jobs, or CronJobs may cause failures if they run before their dependencies are ready. Ensure they are gated appropriately (for example, by waiting on Service endpoints or using readiness probes).
Check Kubernetes compatibility
The K3s version we run can be different from your cluster version, causing issues like removed or deprecated apiVersions (e.g.
PodSecurityPolicy
). Alpha or beta feature gates you rely on may be disabled or unavailable in K3s. Validate your manifests against the K3s version we use before submitting.
Use local-path
storage
PersistentVolumeClaims must use the
local-path
storage class (or leavestorageClassName
blank to default to it). PVCs with other storage classes will remain stuck Pending. Pre-create any referenced directories to match the paths in your manifests, otherwise volume mounts will fail and pods may CrashLoopBackOff or remain Pending.
Make PodDisruptionBudgets
upgrade-friendly
If you’re interested in restart or upgrade testing, avoid setting
PodDisruptionBudgets
withminAvailable: 1
on single-replica Deployments, as this can block upgrades. Adjust them to allow restart and upgrade testing.
Don’ts
Don’t depend on the internet
Our air-gapped cluster has no external connectivity. Any attempt to pull images, download packages, or fetch external resources from the internet will fail. Ensure everything your workload needs is preloaded or provided locally. Common pitfalls include
init
containers that usecurl
orwget
, Helm charts referencing public registries, and bootstrap scripts that install packages online.
Don’t set imagePullPolicy: Always
In our air-gapped cluster, all container images are pre-pulled. Setting
imagePullPolicy: Always
forces Kubernetes to fetch from a registry and will fail. Instead, reference images by digest and rely on the default pull policy.
Don’t use latest
tags
The
latest
tag is mutable and can change over time, causing inconsistent and non-reproducible deployments. Kubernetes also defaults toimagePullPolicy: Always
whenlatest
is used, which in an air-gapped cluster results inErrImagePull
failures. Always reference images by specific tag or digest instead.
Don’t duplicate resource names
Resources of the same kind within the same namespace must have unique names. Defining duplicates will cause startup failures.
Don’t use underscores in names or hostnames
Kubernetes resource names and hostnames must follow RFC-1123. Only lowercase letters, digits, and hyphens are allowed. Underscores will break DNS resolution.
Don’t use privileged containers
securityContext.privileged: true
violates the PodSecurity baseline and should be avoided.
Don’t use hostPath
volumes
hostPath
volumes bypass storage isolation and are restricted by the CIS Kubernetes Benchmark. Use the built-inlocal-path
provisioner instead. In your PVCs, either leavestorageClassName
blank or setstorageClassName: local-path
.
Don’t use ReadWriteMany
volumes
The
local-path
provisioner does not supportReadWriteMany
(RWX). UseReadWriteOnce
(RWO) instead. On single-node clusters, multiple replicas can still share an RWO volume, but attempting RWX will leave PVCs stuck in Pending.
Don’t oversubscribe resources
Pods will remain in
Pending
if their requested resources exceed the capacity of any node. Likewise,ResourceQuota
orLimitRange
manifests that set quotas above node capacity or limits that are too small will block scheduling. Always size requests and limits to match the node specifications.
Don’t hardcode IPs or CIDRs
Avoid setting fixed
clusterIP
values,ipBlock
ranges inNetworkPolicies
, or static IPs in container args or env vars. These may overlap with K3s Service or Pod CIDRs and break routing. Use DNS or Service names instead.
Don’t rely on Service type LoadBalancer
Our K3s cluster has no cloud provider and
servicelb
is disabled, so Services of typeLoadBalancer
will stay in Pending. UseClusterIP
for pod-to-pod traffic, andNodePort
only if you need to access a Service from the host machine running K3s.
Don’t assume Services always load balance
A standard
ClusterIP
Service load balances across all ready Pods behind it. A headless Service (clusterIP: None
) does not, it only returns the Pod IPs, and load balancing must be handled client-side.
Don’t require multiple nodes
Our cluster has only a single node. It has no taints, so any workload can run on it. But rules that require multiple nodes, such as Pod anti affinity, topology spread constraints, or policies that forbid replicas on the same node, will prevent pods from scheduling.