Overview

Antithesis runs your system inside a deterministic hypervisor and continuously disrupts the environment in which it’s running. We call those disruptions faults: a process gets killed, the network between two services stops carrying packets, the clock jumps forward by thirty seconds.

Types of faults

TypeExamples
NetworkBaseline latency, partitions, clogs, restore
NodeNode hang, node kill / stop, throttling
ClockForward/backward clock jumps
OtherThread pausing, CPU modulation, custom faults

Understanding what faults occurred

Information about faults appears in three places after a run completes:

  • In the Triage report, as fault-injection events, next to your application’s logs and any assertion outcomes.
  • In the Logs Explorer, filterable by the fault injector category.
  • In the API responses that return logs, as events whose source.name is fault_injector.

Fault events in logs explains the logging format for faults.

Standard fault settings

Antithesis’ basic_test runs with all network faults enabled. Thread pausing can be enabled by instrumenting your code. To enable node faults, clock jitter, or custom faults, talk to your forward-deployed engineer.

Pausing faults

By default, faults are injected throughout the test, interleaved randomly with your workload.

There are two ways to pause fault injection:

  1. Your workload can request a temporary quiet period via the ANTITHESIS_STOP_FAULTS API. Antithesis will pause faults, restore killed containers or pods and not inject any faults for the requested duration. Faults resume after the requested duration has elapsed.
  2. The test commands eventually_ and finally_ create a terminal pause at the end of an execution, giving the system under test time to recover before final validation checks.

Pausing faults has more details.

Overlapping faults

Faults are scheduled independently and may overlap in time. The overlap behavior differs by fault type:

  • Network faults can overlap on the same target. When two network faults affect the same link at the same time, the more aggressive one wins for the duration of the overlap, e.g., a fault that drops packets on a link supersedes a slowdown on the same link. Once the more aggressive fault ends, the other resumes if its window has not yet expired. You will see overlapping network fault events emitted independently in the log; Antithesis does not collapse them into a single combined event.
  • Node faults do not overlap on the same target. Only one node fault can be active on a given container at a time. If subsequent node faults are scheduled against the same target, they’re skipped while one is in progress. The skipped fault isn’t logged.
  • Introduction
  • Welcome to Antithesis
  • How Antithesis works
  • Using Antithesis with AI
  • Get started
  • Setup guide
  • Overview
  • For Docker Compose users
  • For Kubernetes users
  • Test an example system
  • Overview
  • With Docker Compose
  • Overview
  • Build and run an etcd cluster
  • Add a test template
  • With Kubernetes
  • Overview
  • Build and run an etcd cluster
  • Add a test template
  • Product
  • Test templates
  • Overview
  • Creating test templates
  • Test commands
  • How to check a test template locally
  • How to port tests to Antithesis
  • Test launchers
  • The triage report
  • Overview
  • Findings
  • Environment
  • Utilization
  • Properties
  • Logs Explorer & multiverse map
  • Debugging
  • Overview
  • Causality analysis
  • Multiverse debugging
  • Simple Multiverse debugging
  • Advanced
  • Overview
  • The Antithesis multiverse
  • Querying with event sets
  • Environment utilities
  • Using the Antithesis Notebook
  • Cookbook
  • Tooling integrations
  • CI integration
  • Discord and Slack integrations
  • Issue tracker integration - BETA
  • Configuration
  • Access and authentication
  • The Antithesis environment
  • Best practices
  • Docker best practices
  • Kubernetes best practices
  • Optimizing for testing
  • Concepts
  • Properties and Assertions
  • Overview
  • Properties in Antithesis
  • Assertions in Antithesis
  • Sometimes Assertions
  • Properties to test for
  • Fault injection
  • Overview
  • Types of faults
  • Pausing faults
  • Fault events in logs and reports
  • Reference
  • Webhooks
  • Overview
  • Launching a test
  • Launching a debugging session
  • webhook reference
  • Antithesis API
  • Handling external dependencies
  • SDK reference
  • Overview
  • Define test properties
  • Generate randomness
  • Manage test lifecycle
  • Assertion catalog
  • Coverage instrumentation
  • Go
  • Go SDK
  • Instrumentor
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Java
  • Java SDK
  • Using the SDK
  • Building your software
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • C SDK
  • C++
  • C++ SDK
  • C/C++ Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • JavaScript
  • Python
  • Python SDK
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Rust
  • Rust SDK
  • Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • .NET
  • .NET SDK
  • Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Languages not listed above
  • Fallback SDK
  • Assert (reference)
  • Lifecycle (reference)
  • Assertion Schema
  • FAQ
  • Product FAQs
  • About Antithesis POCs
  • Release notes
  • Release notes
  • General reliability resources
  • Reliability glossary
  • Techniques for better software testing
  • Autonomous testing
  • Deterministic simulation testing
  • Property-based testing
  • White paper — How much does an outage cost?
  • Catalog of reliability properties for key-value datastores
  • Catalog of reliability properties for blockchains
  • Test ACID compliance with a ring test