Overview
Antithesis runs your system inside a deterministic hypervisor and continuously disrupts the environment in which it’s running. We call those disruptions faults: a process gets killed, the network between two services stops carrying packets, the clock jumps forward by thirty seconds.
Types of faults
| Type | Examples |
|---|---|
| Network | Baseline latency, partitions, clogs, restore |
| Node | Node hang, node kill / stop, throttling |
| Clock | Forward/backward clock jumps |
| Other | Thread pausing, CPU modulation, custom faults |
Understanding what faults occurred
Information about faults appears in three places after a run completes:
- In the Triage report, as fault-injection events, next to your application’s logs and any assertion outcomes.
- In the Logs Explorer, filterable by the
fault injectorcategory. - In the API responses that return logs, as events whose
source.nameisfault_injector.
Fault events in logs explains the logging format for faults.
Standard fault settings
Antithesis’ basic_test runs with all network faults enabled. Thread pausing can be enabled by instrumenting your code. To enable node faults, clock jitter, or custom faults, talk to your forward-deployed engineer.
Pausing faults
By default, faults are injected throughout the test, interleaved randomly with your workload.
There are two ways to pause fault injection:
- Your workload can request a temporary quiet period via the
ANTITHESIS_STOP_FAULTSAPI. Antithesis will pause faults, restore killed containers or pods and not inject any faults for the requested duration. Faults resume after the requested duration has elapsed. - The test commands
eventually_andfinally_create a terminal pause at the end of an execution, giving the system under test time to recover before final validation checks.
Pausing faults has more details.
Overlapping faults
Faults are scheduled independently and may overlap in time. The overlap behavior differs by fault type:
- Network faults can overlap on the same target. When two network faults affect the same link at the same time, the more aggressive one wins for the duration of the overlap, e.g., a fault that drops packets on a link supersedes a slowdown on the same link. Once the more aggressive fault ends, the other resumes if its window has not yet expired. You will see overlapping network fault events emitted independently in the log; Antithesis does not collapse them into a single combined event.
- Node faults do not overlap on the same target. Only one node fault can be active on a given container at a time. If subsequent node faults are scheduled against the same target, they’re skipped while one is in progress. The skipped fault isn’t logged.