Pausing faults

There are times during a test when you need a window of normal operation, without faults, to test a system’s liveness properties, or check an invariant, or the system’s ability to recover from a failure. Antithesis provides a mechanism to request a “quiet period”.

Antithesis injects a ready-to-use binary on every container or pod path in a test run. A request to stop faults affects all fault types (except clock jitter, which continues normally during a quiet period) and all containers or pods, so the entire system is granted a recovery period, not just select services. You must provide a quiet period duration in seconds.

Invoke it from any test command or script (provided in bash but easily translatable into any programming language):

[ "${ANTITHESIS_STOP_FAULTS}" ] && "${ANTITHESIS_STOP_FAULTS}" <duration_seconds>

When you invoke it:

  1. All fault injection stops and no new faults are scheduled.
  2. The simulated network and killed containers are restored, but just like any container restart operation, restored containers will take some time to be fully operational.
  3. Fault injection will automatically resume after the requested duration_seconds has elapsed.
  4. Any overlapping quiet period requests will be merged to reflect the biggest interval.
Overlapping requests

Pattern: mid-run liveness check

A common workload pattern uses ANTITHESIS_STOP_FAULTS to assert an invariant in the middle of a run without giving up the rest of the test budget:

  1. Run your workload operations while faults are active.
  2. Call ANTITHESIS_STOP_FAULTS <SECONDS> with enough time for the system to stabilize.
  3. Poll for health (retry reads until they succeed, wait for replicas to converge, etc.).
  4. Assert your liveness property — for example, “all replicas converge to the same value”, “queued work eventually drains”, “every committed write is readable”.
  5. Resume the workload. Faults will restart automatically when the quiet period elapses.

This pattern is particularly useful during rolling operations — upgrades, schema migrations, config rollouts — where you want to verify the system is healthy at each step before continuing.

Anti-patterns

  • Don’t use a quiet period to hide flakiness. If a property only passes during quiet periods but fails during normal operation, that’s a real bug.
  • Don’t assume restarted containers are immediately reachable. A quiet period restores killed containers but they need time to come up. Add retry loops to ensure successful restoration before your liveness check.
  • Introduction
  • Welcome to Antithesis
  • How Antithesis works
  • Using Antithesis with AI
  • Get started
  • Setup guide
  • Overview
  • For Docker Compose users
  • For Kubernetes users
  • Test an example system
  • Overview
  • With Docker Compose
  • Overview
  • Build and run an etcd cluster
  • Add a test template
  • With Kubernetes
  • Overview
  • Build and run an etcd cluster
  • Add a test template
  • Product
  • Test templates
  • Overview
  • Creating test templates
  • Test commands
  • How to check a test template locally
  • How to port tests to Antithesis
  • Test launchers
  • The triage report
  • Overview
  • Findings
  • Environment
  • Utilization
  • Properties
  • Logs Explorer & multiverse map
  • Debugging
  • Overview
  • Causality analysis
  • Multiverse debugging
  • Simple Multiverse debugging
  • Advanced
  • Overview
  • The Antithesis multiverse
  • Querying with event sets
  • Environment utilities
  • Using the Antithesis Notebook
  • Cookbook
  • Tooling integrations
  • CI integration
  • Discord and Slack integrations
  • Issue tracker integration - BETA
  • Configuration
  • Access and authentication
  • The Antithesis environment
  • Best practices
  • Docker best practices
  • Kubernetes best practices
  • Optimizing for testing
  • Concepts
  • Properties and Assertions
  • Overview
  • Properties in Antithesis
  • Assertions in Antithesis
  • Sometimes Assertions
  • Properties to test for
  • Fault injection
  • Overview
  • Types of faults
  • Pausing faults
  • Fault events in logs and reports
  • Reference
  • Webhooks
  • Overview
  • Launching a test
  • Launching a debugging session
  • webhook reference
  • Antithesis API
  • Handling external dependencies
  • SDK reference
  • Overview
  • Define test properties
  • Generate randomness
  • Manage test lifecycle
  • Assertion catalog
  • Coverage instrumentation
  • Go
  • Go SDK
  • Instrumentor
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Java
  • Java SDK
  • Using the SDK
  • Building your software
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • C SDK
  • C++
  • C++ SDK
  • C/C++ Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • JavaScript
  • Python
  • Python SDK
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Rust
  • Rust SDK
  • Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • .NET
  • .NET SDK
  • Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Languages not listed above
  • Fallback SDK
  • Assert (reference)
  • Lifecycle (reference)
  • Assertion Schema
  • FAQ
  • Product FAQs
  • About Antithesis POCs
  • Release notes
  • Release notes
  • General reliability resources
  • Reliability glossary
  • Techniques for better software testing
  • Autonomous testing
  • Deterministic simulation testing
  • Property-based testing
  • White paper — How much does an outage cost?
  • Catalog of reliability properties for key-value datastores
  • Catalog of reliability properties for blockchains
  • Test ACID compliance with a ring test