Fault events in logs and reports

Most faults Antithesis injects are recorded in the test run’s logs. This page describes the shape of those events, what each field means, and how to query them.

Accessing fault logs

  • In the triage report, fault events appear in the log viewer panel under “Fault injection events”, next to your application’s events and any assertion outcomes.
  • In the Logs Explorer, fault events are surfaced by the fault injector category (also reachable by filtering general.source = fault_injector). Temporal queries, “preceded by” and “followed by”, let you correlate failures with fault events.
  • In API responses that return logs, events whose source.name is fault_injector and which carry a fault field are fault events.

Fault event schema

Every fault event in the log has at least source and moment fields:

{
"source": {
"name": "fault_injector"
},
"moment": {
"_vtime_ticks": 51539607552,
"input_hash": "...",
"session_id": "..."
},
"fault": {
"name": "partition",
"type": "network",
"affected_nodes": ["ALL"],
"max_duration": 10,
"details": {
"disruption_type": "Stopped",
"asymmetric": false,
"partitions": [["A", "B"], ["C"]]
}
}
}

vtime

Events in the logs are globally ordered by a simulated deterministic virtual time, called vtime.

vtime is expressed in two different ways: vtime_ticks and vtime_seconds. vtime_seconds are floating point numbers and vtime_ticks are 64 bit integers. moment._vtime_ticks is the integer representing deterministic virtual time. Use this virtual time as the source of truth for ordering, not application-emitted timestamps as they can be out of order under clock jitter or thread pausing.

To convert ticks to seconds:

vtime_seconds = vtime_ticks / 4294967296

The fault object

The fault object contains all information about the injected fault, its duration, and the nodes it affects.

FieldTypeDescription
namestringOne of partition, clog, restore, kill, stop, pause, throttle, skip.
typestringOne of network, node, clock.
affected_nodesarray of stringNodes targeted by the fault, or ["ALL"]. If the array is empty, the fault doesn’t actually do anything.
max_durationnumberNumber of seconds the fault remains active.
detailsobjectFault-specific payload — disruption_type, partitions, asymmetric, offset, etc.

Fault names in logs

The fault types listed here have slightly different names in the logs to provide fine grained information about the fault event.

Faultfault.namefault.type
Network partitionpartitionnetwork
Network clogclognetwork
Network restorerestorenetwork
Node hangpausenode
Node throttlingthrottlenode
Node terminationkill, stopnode
Clock jitterskipclock

Thread pausing and CPU modulation happen very frequently during a test run but do not produce fault events in the log to prevent excessive logging.

Examples by fault type

Network partition

{
"fault": {
"affected_nodes": [
"ALL"
],
"details": {
"asymmetric": true,
"disruption_type": "Slowed",
"drop_rate": 0,
"latency": {
"deviation": 1597.9999999999998,
"mean": 1492.601977
},
"partitions": [["client-1", "client-2"], ["server", "client-3"]]
},
"max_duration": 0.183884736,
"name": "partition",
"type": "network"
}
}

Nodes are split into groups by details.partition. Network links between different groups experience the details.disruption_type. Network links within the same group are not affected by this event (though they may be affected by an overlapping fault).

Disruption types:

  • Stopped - packets are dropped entirely.
  • Slowed - packets are delayed with latency.
  • Jammed - packets are “piled up” in a queue until a future deliver time.

Network clog

{
"fault": {
"affected_nodes": ["server", "client-2"],
"details": {
"disruption_type": "Stopped"
},
"max_duration": 4.515860336,
"name": "clog",
"type": "network"
}
}

Any connection to a node listed in affected_nodes experiences the details.disruption_type for max_duration.

Disruption types:

  • Stopped - packets are dropped entirely.
  • Slowed - packets are delayed with latency.
  • Jammed - packets are “piled up” in a queue until a future deliver time.

Network restore

{
"fault":{
"affected_nodes":["ALL"],
"name":"restore",
"type":"network"
}
}

All ongoing network faults are stopped until new ones are scheduled in the future.

Node faults

The node type covers three distinct faults that share the same structure (affected_nodes, max_duration) but differ in semantics.

Node termination (kill / stop)

{
"fault":{
"affected_nodes":["server-3"],
"max_duration":1.7741677258234223,
"name":"kill",
"type":"node"
}
}

The affected nodes are killed (name: "kill") or stopped (name: "stop") for max_duration seconds, then restarted. If the node is a pod, Antithesis cannot restart it and it’s fully managed by Kubernetes. In that case, max_duration will be 0 because Antithesis can’t control the restart.

If a restart policy is defined the container may be restarted immediately by docker-compose. This nullifies the Antithesis fault event, so we recommend not defining a restart policy.

Node hang (pause)

{
"fault":{
"affected_nodes":["server-3"],
"max_duration":1.5223575492507213,
"name":"pause",
"type":"node"
}
}

The affected nodes are frozen in place for max_duration seconds. The container remains on the network but cannot process anything, so other containers will see timeouts when trying to communicate with it.

Node throttling

{
"fault":{
"affected_nodes":["server-3"],
"max_duration":1.3029411842394,
"name":"throttle",
"type":"node"
}
}

The named node’s CPU is constrained for max_duration seconds.

Clock jitter

System level clock jitter moves the time forward/backward by an offset. The jump can be temporary or permanent: if the fault event contains a max_duration field, Antithesis reverses the offset after that duration; if max_duration is missing, the offset is permanent. Clock offsets are cumulative — each new skip event shifts the clock from wherever it already is.

{
"fault":{
"affected_nodes":["ALL"],
"details":{
"offset":-0.11456344671067203
},
"max_duration":0.15177661326674713,
"name":"skip",
"type":"clock"
}
}
  • Introduction
  • Welcome to Antithesis
  • How Antithesis works
  • Using Antithesis with AI
  • Get started
  • Setup guide
  • Overview
  • For Docker Compose users
  • For Kubernetes users
  • Test an example system
  • Overview
  • With Docker Compose
  • Overview
  • Build and run an etcd cluster
  • Add a test template
  • With Kubernetes
  • Overview
  • Build and run an etcd cluster
  • Add a test template
  • Product
  • Test templates
  • Overview
  • Creating test templates
  • Test commands
  • How to check a test template locally
  • How to port tests to Antithesis
  • Test launchers
  • The triage report
  • Overview
  • Findings
  • Environment
  • Utilization
  • Properties
  • Logs Explorer & multiverse map
  • Debugging
  • Overview
  • Causality analysis
  • Multiverse debugging
  • Simple Multiverse debugging
  • Advanced
  • Overview
  • The Antithesis multiverse
  • Querying with event sets
  • Environment utilities
  • Using the Antithesis Notebook
  • Cookbook
  • Tooling integrations
  • CI integration
  • Discord and Slack integrations
  • Issue tracker integration - BETA
  • Configuration
  • Access and authentication
  • The Antithesis environment
  • Best practices
  • Docker best practices
  • Kubernetes best practices
  • Optimizing for testing
  • Concepts
  • Properties and Assertions
  • Overview
  • Properties in Antithesis
  • Assertions in Antithesis
  • Sometimes Assertions
  • Properties to test for
  • Fault injection
  • Overview
  • Types of faults
  • Pausing faults
  • Fault events in logs and reports
  • Reference
  • Webhooks
  • Overview
  • Launching a test
  • Launching a debugging session
  • webhook reference
  • Antithesis API
  • Handling external dependencies
  • SDK reference
  • Overview
  • Define test properties
  • Generate randomness
  • Manage test lifecycle
  • Assertion catalog
  • Coverage instrumentation
  • Go
  • Go SDK
  • Instrumentor
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Java
  • Java SDK
  • Using the SDK
  • Building your software
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • C SDK
  • C++
  • C++ SDK
  • C/C++ Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • JavaScript
  • Python
  • Python SDK
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Rust
  • Rust SDK
  • Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • .NET
  • .NET SDK
  • Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Languages not listed above
  • Fallback SDK
  • Assert (reference)
  • Lifecycle (reference)
  • Assertion Schema
  • FAQ
  • Product FAQs
  • About Antithesis POCs
  • Release notes
  • Release notes
  • General reliability resources
  • Reliability glossary
  • Techniques for better software testing
  • Autonomous testing
  • Deterministic simulation testing
  • Property-based testing
  • White paper — How much does an outage cost?
  • Catalog of reliability properties for key-value datastores
  • Catalog of reliability properties for blockchains
  • Test ACID compliance with a ring test