Kubernetes best practices

Antithesis runs your system in our Kubernetes testing environment using the manifests you provide. For an overview of the process, please check out our Kubernetes Setup guide.

Here are some best practices we suggest you follow to get the most out of your testing.

Do’s

Set namespaces explicitly

Resources without a namespace are placed in default which can create ambiguity and complicate pod-to-service communication. Always set the namespace field for every resource. You may still use default, but do so explicitly.

Include required resources

Always include manifests for everything your workload needs (e.g., Namespace, ServiceAccount, RoleBinding, etc.). kapp ensures correct apply ordering, but the resources themselves must be defined. Refer to our kubernetes environment for a list of what is pre-provisioned.

Use readiness probes

Always define readiness probes. A Pod in the Running phase is not necessarily ready to serve traffic. Readiness probes provide the explicit signal that initialization is complete and the workload can safely receive requests. Without them, fuzzing or other tests may start before the system is prepared.

Tune liveness probes

Set conservative liveness probe values. Probes that are too aggressive may kill Pods unnecessarily in a single-node K3s environment, especially during faults.

Use Deployments or StatefulSets instead of standalone Pods

Prefer controllers over bare Pods. Deployments and StatefulSets track readiness, support rolling updates and automatic restarts, and expose useful status fields (such as availableReplicas) to avoid premature success. Use StatefulSets for clustered services that need stable identities or storage (e.g., etcd).

Set resource requests

Define resources.requests for all your containers to ensure correct scheduling and guaranteed performance. Memory limits are highly recommended to prevent OOM termination and ensure node stability; setting CPU limits is optional, as avoiding them allows Pods to burst when capacity is available. Ensure total CPU requests remain under 1 CPU (1000m) and total memory requests remain under 10 GiB. If higher memory limits are required, contact Antithesis support.

Use digests instead of tags

Reference container images by digest rather than tag. Digests are immutable and prevent tag–digest mismatches ensuring consistency and repeatability.

Gate bootstrap workloads

Bootstrap tasks such as init containers, Jobs, or CronJobs may cause failures if they run before their dependencies are ready. Ensure they are gated appropriately (for example, by waiting on Service endpoints or using readiness probes).

Check Kubernetes compatibility

The K3s version we run can be different from your cluster version, causing issues like removed or deprecated apiVersions (e.g. PodSecurityPolicy). Alpha or beta feature gates you rely on may be disabled or unavailable in K3s. Validate your manifests against the K3s version we use before submitting.

Use `local-path` storage

PersistentVolumeClaims must use the local-path storage class (or leave storageClassName blank to default to it). PVCs with other storage classes will remain stuck Pending. Pre-create any referenced directories to match the paths in your manifests, otherwise volume mounts will fail and pods may CrashLoopBackOff or remain Pending.

Make `PodDisruptionBudgets` upgrade-friendly

If you’re interested in restart or upgrade testing, avoid setting PodDisruptionBudgets with minAvailable: 1 on single-replica Deployments, as this can block upgrades. Adjust them to allow restart and upgrade testing.

Don’ts

Don’t depend on the internet

Our air-gapped cluster has no external connectivity. Any attempt to pull images, download packages, or fetch external resources from the internet will fail. Ensure everything your workload needs is preloaded or provided locally. Common pitfalls include init containers that use curl or wget, Helm charts referencing public registries, and bootstrap scripts that install packages online.

Don’t set `imagePullPolicy: Always`

In our air-gapped cluster, all container images are pre-pulled. Setting imagePullPolicy: Always forces Kubernetes to fetch from a registry and will fail. Instead, reference images by digest and rely on the default pull policy.

Don’t use `latest` tags

The latest tag is mutable and can change over time, causing inconsistent and non-reproducible deployments. Kubernetes also defaults to imagePullPolicy: Always when latest is used, which in an air-gapped cluster results in ErrImagePull failures. Always reference images by specific tag or digest instead.

Don’t duplicate resource names

Resources of the same kind within the same namespace must have unique names. Defining duplicates will cause startup failures.

Don’t use underscores in names or hostnames

Kubernetes resource names and hostnames must follow RFC-1123. Only lowercase letters, digits, and hyphens are allowed. Underscores will break DNS resolution.

Don’t use privileged containers

securityContext.privileged: true violates the PodSecurity baseline and should be avoided.

Don’t use `hostPath` volumes

hostPath volumes are discouraged and require the path to exist and be managed on the host node (which isn’t guaranteed), making them only acceptable for specific system-level tools (e.g. Prometheus node exporter) where the required host paths are sure to exist.

Don’t use `ReadWriteMany` volumes

The local-path provisioner does not support ReadWriteMany (RWX). Use ReadWriteOnce (RWO) instead. On single-node clusters, multiple replicas can still share an RWO volume, but attempting RWX will leave PVCs stuck in Pending.

Don’t oversubscribe resources

Pods enter Pending status if requested resources exceed available node capacity or the aggregate policy limits: 1 CPU total and 10 GiB total memory request. ResourceQuota or LimitRange manifests must not conflict with node specs or these limits, as this will block scheduling.

Don’t hardcode IPs or CIDRs

Avoid setting fixed clusterIP values, ipBlock ranges in NetworkPolicies, or static IPs in container args or env vars. These may overlap with K3s Service or Pod CIDRs and break routing. Localhost IPs (e.g., 0.0.0.0 and 127.0.0.1) are acceptable for use. Use DNS or Service names instead for network communication.

Don’t rely on Service type LoadBalancer

Our K3s cluster has no cloud provider and servicelb is disabled, so Services of type LoadBalancer will stay in Pending. Use ClusterIP for pod-to-pod traffic, and NodePort only if you need to access a Service from the host machine running K3s.

Don’t assume Services always load balance

A standard ClusterIP Service load balances across all ready Pods behind it. A headless Service (clusterIP: None) does not, it only returns the Pod IPs, and load balancing must be handled client-side.

Don’t require multiple nodes

Our cluster has only a single node. It has no taints, so any workload can run on it. But rules that require multiple nodes, such as Pod anti affinity, topology spread constraints, or policies that forbid replicas on the same node, will prevent pods from scheduling.