What is Antithesis? How we're different Problems we solve Security approach Demo Fintech Blockchain Databases Customer stories Working with Antithesis Contact us Backstory Leadership Careers Brand Distributed systems reliability glossary Cost of outages white paper Deterministic simulation testing Property-based testing

Kubernetes best practices

Antithesis runs your system in our Kubernetes testing environment using the manifests you provide. For an overview of the process, please check out our Kubernetes Setup guide.

Here are some best practices we suggest you follow to get the most out of your testing.

Do’s

Set namespaces explicitly

Resources without a namespace are placed in default which can create ambiguity and complicate pod-to-service communication. Always set the namespace field for every resource. You may still use default, but do so explicitly.

Include required resources

Always include manifests for everything your workload needs (e.g., Namespace, ServiceAccount, RoleBinding, etc.). kapp ensures correct apply ordering, but the resources themselves must be defined. Refer to our kubernetes environment for a list of what is pre-provisioned.

Use readiness probes

Always define readiness probes. A Pod in the Running phase is not necessarily ready to serve traffic. Readiness probes provide the explicit signal that initialization is complete and the workload can safely receive requests. Without them, fuzzing or other tests may start before the system is prepared.

Tune liveness probes

Set conservative liveness probe values. Probes that are too aggressive may kill Pods unnecessarily in a single-node K3s environment, especially during faults.

Use Deployments or StatefulSets instead of standalone Pods

Prefer controllers over bare Pods. Deployments and StatefulSets track readiness, support rolling updates and automatic restarts, and expose useful status fields (such as availableReplicas) to avoid premature success. Use StatefulSets for clustered services that need stable identities or storage (e.g., etcd).

Set resource requests and limits

Define resources.requests and resources.limits. Ensure any ResourceQuota or LimitRange manifests you include are compatible with the node specs. Quotas that are too small, or limits that exceed node capacity, will prevent Pods from scheduling.

Use digests instead of tags

Reference container images by digest rather than tag. Digests are immutable and prevent tag–digest mismatches ensuring consistency and repeatability.

Gate bootstrap workloads

Bootstrap tasks such as init containers, Jobs, or CronJobs may cause failures if they run before their dependencies are ready. Ensure they are gated appropriately (for example, by waiting on Service endpoints or using readiness probes).

Check Kubernetes compatibility

The K3s version we run can be different from your cluster version, causing issues like removed or deprecated apiVersions (e.g. PodSecurityPolicy). Alpha or beta feature gates you rely on may be disabled or unavailable in K3s. Validate your manifests against the K3s version we use before submitting.

Use local-path storage

PersistentVolumeClaims must use the local-path storage class (or leave storageClassName blank to default to it). PVCs with other storage classes will remain stuck Pending. Pre-create any referenced directories to match the paths in your manifests, otherwise volume mounts will fail and pods may CrashLoopBackOff or remain Pending.

Make PodDisruptionBudgets upgrade-friendly

If you’re interested in restart or upgrade testing, avoid setting PodDisruptionBudgets with minAvailable: 1 on single-replica Deployments, as this can block upgrades. Adjust them to allow restart and upgrade testing.


Don’ts

Don’t depend on the internet

Our air-gapped cluster has no external connectivity. Any attempt to pull images, download packages, or fetch external resources from the internet will fail. Ensure everything your workload needs is preloaded or provided locally. Common pitfalls include init containers that use curl or wget, Helm charts referencing public registries, and bootstrap scripts that install packages online.

Don’t set imagePullPolicy: Always

In our air-gapped cluster, all container images are pre-pulled. Setting imagePullPolicy: Always forces Kubernetes to fetch from a registry and will fail. Instead, reference images by digest and rely on the default pull policy.

Don’t use latest tags

The latest tag is mutable and can change over time, causing inconsistent and non-reproducible deployments. Kubernetes also defaults to imagePullPolicy: Always when latest is used, which in an air-gapped cluster results in ErrImagePull failures. Always reference images by specific tag or digest instead.

Don’t duplicate resource names

Resources of the same kind within the same namespace must have unique names. Defining duplicates will cause startup failures.

Don’t use underscores in names or hostnames

Kubernetes resource names and hostnames must follow RFC-1123. Only lowercase letters, digits, and hyphens are allowed. Underscores will break DNS resolution.

Don’t use privileged containers

securityContext.privileged: true violates the PodSecurity baseline and should be avoided.

Don’t use hostPath volumes

hostPath volumes bypass storage isolation and are restricted by the CIS Kubernetes Benchmark. Use the built-in local-path provisioner instead. In your PVCs, either leave storageClassName blank or set storageClassName: local-path.

Don’t use ReadWriteMany volumes

The local-path provisioner does not support ReadWriteMany (RWX). Use ReadWriteOnce (RWO) instead. On single-node clusters, multiple replicas can still share an RWO volume, but attempting RWX will leave PVCs stuck in Pending.

Don’t oversubscribe resources

Pods will remain in Pending if their requested resources exceed the capacity of any node. Likewise, ResourceQuota or LimitRange manifests that set quotas above node capacity or limits that are too small will block scheduling. Always size requests and limits to match the node specifications.

Don’t hardcode IPs or CIDRs

Avoid setting fixed clusterIP values, ipBlock ranges in NetworkPolicies, or static IPs in container args or env vars. These may overlap with K3s Service or Pod CIDRs and break routing. Use DNS or Service names instead.

Don’t rely on Service type LoadBalancer

Our K3s cluster has no cloud provider and servicelb is disabled, so Services of type LoadBalancer will stay in Pending. Use ClusterIP for pod-to-pod traffic, and NodePort only if you need to access a Service from the host machine running K3s.

Don’t assume Services always load balance

A standard ClusterIP Service load balances across all ready Pods behind it. A headless Service (clusterIP: None) does not, it only returns the Pod IPs, and load balancing must be handled client-side.

Don’t require multiple nodes

Our cluster has only a single node. It has no taints, so any workload can run on it. But rules that require multiple nodes, such as Pod anti affinity, topology spread constraints, or policies that forbid replicas on the same node, will prevent pods from scheduling.

  • Introduction
  • How Antithesis works
  • Tutorial
  • Testing with Antithesis
  • Docker Compose
  • Build and run an etcd cluster
  • Meet the Test Composer
  • Kubernetes
  • Build and run an etcd cluster
  • Meet the Test Composer
  • User manual
  • Setup guide
  • Using Docker Compose
  • Using Kubernetes
  • Properties and Assertions
  • Properties in Antithesis
  • Assertions in Antithesis
  • Sometimes Assertions
  • Properties to test for
  • Test Composer
  • Test Composer basics
  • Test Composer reference
  • Principles of test composition
  • Checking test templates locally
  • Getting started with Test Composer
  • Webhooks
  • Launching a test in Docker environment
  • Launching a test in Kubernetes environment
  • Launching a debugging session
  • Retrieving logs
  • Reports
  • The triage report
  • Findings
  • Environment
  • Utilization
  • Properties
  • The bug report
  • Context, Instance, & Logs
  • Bug likelihood over time
  • Statistical debug information
  • Search dashboard & multiverse map
  • Multiverse debugging
  • Overview
  • The Antithesis multiverse
  • Querying with event sets
  • The Environment and its utilities
  • Using the Antithesis Notebook
  • Cookbook
  • Antithesis' testing environment
  • The Antithesis Environment
  • Fault Injection
  • Reference
  • Handling external dependencies
  • SDK reference
  • Go
  • Tutorial
  • Instrumentor
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Java
  • Tutorial
  • Instrumentation
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • C
  • C++
  • Tutorial
  • C/C++ Instrumentation
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • JavaScript
  • Python
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Rust
  • Tutorial
  • Instrumentation
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • .NET
  • Tutorial
  • Instrumentation
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Languages not listed above
  • Assert (reference)
  • Lifecycle (reference)
  • Assertion Schema
  • Tooling integrations
  • CI integration
  • Discord and Slack integrations
  • Issue tracker integration - BETA
  • Configuring Antithesis
  • Instrumentation
  • User management
  • Best practices
  • Kubernetes best practices
  • Docker best practices
  • Optimizing for Antithesis
  • Finding more bugs
  • FAQ
  • About Antithesis POCs
  • Product FAQs
  • Release notes
  • Release notes