Multiverse debugging

Antithesis’ multiverse debugger gives you two superpowers.

You can time-travel to pull information from the past, present, or future of a reproduction of your bug, with extreme precision. This allows you to ask questions like:

  • What was the last network packet that was sent before your process died?
  • What was the internal state of your Raft algorithm at the moment a leader has been chosen?
  • What was the eventual health-check result of your system after minutes of quiescence?

You can also do destructive analysis, interacting with the system without fear of losing your reproduction. You can take actions like:

  • Cause a core dump by killing a key process.
  • Take down all secondary nodes and see if a suspected race condition still occurs.
  • First choose to step-over in your debugger and later choose to step-into.

Prerequisites

You must set up SSO to use multiverse debugging.

Launching a debugging session

With a multiverse debugger, you can jump in to any moment in the test run and start investigating.

From the logs explorer

  1. Select a log line in the log viewer on the right.
  2. Click the Debug event button in the top right corner.
  3. Optionally, update the pre-filled description of the debugging session and provide emails to notify when the session is ready.
  4. Your debugging session will appear in the Debugging sessions list once it’s ready (usually in 10-30 minutes).

The precise delay here depends on how deep the chosen moment is in the simulation and on the simulation efficiency of your system.

From the triage view

  1. Select any example found.
  2. Click the get logs button. This will find the relevant log lin in the logs explorer.
  3. Click the Debug event button in the top right corner.
  4. Optionally, update the pre-filled description of the debugging session and provide emails to notify when the session is ready.
  5. Your debugging session will appear in the Debugging sessions list once it’s ready (usually in 10-30 minutes).

The precise delay here depends on how deep the chosen moment is in the simulation and on the simulation efficiency of your system.

Things you can do

The multiverse debugger allows you to run nearly any bash script in any container at any point in time in the simulation – including in the future!

You can also extract a file from a specified container at a specified point in time.

The basic workflow is:

  1. Specify a moment in the test run (key in, or select a log line in the viewer).
  2. Specify a container.
  3. Run your bash.
  4. Extract a file if desired.

You can write multi-line scripts, using ; and newline.

MVD interface

Run a diagnostic script

Run a bash script (e.g. netstat) in a container you’re investigating.

netstat

Get a process id

If you’re here to debug a crashed process, get the pid of that process from the logs. Container processes have two pids, one found in the host machine and another in the container. The pid obtained from the logs is the host pid. For most debugging use-cases, you’ll want the container pid.

Map a host pid to its container pid
#!/usr/bin/env bash

hpid=1275

# Get all pids of the target process in the namespaces it participates in
# This command is run on the host machine
grep NSpid /proc/$hpid/status
Get the process id of a known process
#!/usr/bin/env bash

# if you know the process name
process_name="slirp4netns"
ps --format pid --no-headers -C $process_name | head -n 1

# If you have `psgrep` inside of your target container, that could be more ergonomic
pgrep $process_name
Get the container id from a host pid

If you’re here to investigate a process crash, here’s how to get the container id that was running the crashed process.

  • Grab the host pid of the crashed process from the logs.
  • Enter a time that’s a few seconds before that process crash.
  • Select the (host) container.
  • Run the following script.
#!/usr/bin/env bash

hpid=1275
cat /proc/$hpid/cgroup` | grep -o 'libpod-[^.]*' /proc/$hpid/cgroup | sed 's/libpod-//'

# Containers are placed into distinct cgroups, so you can get the container id from the process's cgroup information
# The output will look similar to 
# 0::/machine.slice/libpod-<container-id>.scope
# The important part is: "libpod-<container-id>.scope" to get the container id

# Optional: Inspect the container to find the image name, container name
# podman inspect <container-id>

Extract a file

The Extract file button lets you print the output of a process or command with one click.

If you want to extract the output of a process or command ran in the MVD, you can chain the two steps by writing a bash script and providing the path to the file to extract and run them together. Running them separately will not work.

If the file you want to extract already exists in a container, just specify the path to it.

Extract file

Automate your workflow

Any set of debugging steps that you perform in one of your multiverse debugging sessions can be converted into a script that automatically runs in every one of your tests when a certain kind of bug is found. The results, output, artifacts, etc. are automatically attached to your triage report in the bug details section.

Suppose you often want to know what’s happening in your network three seconds before a segmentation fault. Our triage report can autonomously gather this information for you and have it ready when you open the report. You can still use the Multiverse debugger for the truly hard cases where your scripted steps don’t work.

To get started with custom artifacts, contact us at support@antithesis.com.

Advanced mode

The multiverse debugger offers an advanced mode that allows you to go beyond the capabilities here, but is also a little more complicated to use.

For instance, advanced mode lets you ask a counterfactual, or run an external debugger inside our simulation.

To launch advanced mode, open a debugging session and select “Advanced mode” from the ⋮ menu in the top right. You’ll want to read the docs first.

  • Introduction
  • How Antithesis works
  • Using Antithesis documentation with AI
  • Get started
  • Test an example system
  • With Docker Compose
  • Build and run an etcd cluster
  • Meet the Test Composer
  • With Kubernetes
  • Build and run an etcd cluster
  • Meet the Test Composer
  • Setup guide
  • For Docker Compose users
  • For Kubernetes users
  • Product
  • Test Composer
  • Test Composer basics
  • Test Composer commands
  • How to check test templates locally
  • How to port tests to Antithesis
  • Test launchers
  • Reports
  • The triage reports
  • Findings
  • Environment
  • Utilization
  • Properties
  • The bug reports
  • Context, Instance, & Logs
  • Bug likelihood over time
  • Logs Explorer & multiverse map
  • Debugging
  • Multiverse debugging
  • Advanced mode
  • The Antithesis multiverse
  • Querying with event sets
  • Environment utilities
  • Using the Antithesis Notebook
  • Cookbook
  • Tooling integrations
  • CI integration
  • Discord and Slack integrations
  • Issue tracker integration - BETA
  • Configuration
  • Access and authentication
  • The Antithesis environment
  • Optimizing for testing
  • Docker best practices
  • Kubernetes best practices
  • Concepts
  • Properties and Assertions
  • Properties in Antithesis
  • Assertions in Antithesis
  • Sometimes Assertions
  • Properties to test for
  • Fault injection
  • Reference
  • Webhooks
  • Launching a test
  • Launching a debugging session
  • Webhook parameters
  • SDK reference
  • Define test properties
  • Generate randomness
  • Manage test lifecycle
  • Assertion catalog
  • Coverage instrumentation
  • Go
  • Instrumentor
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Java
  • Using the SDK
  • Building your software
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • C
  • C++
  • C/C++ Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • JavaScript
  • Python
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Rust
  • Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • .NET
  • Instrumentation
  • Tutorial
  • Assert (reference)
  • Lifecycle (reference)
  • Random (reference)
  • Languages not listed above
  • Assert (reference)
  • Lifecycle (reference)
  • Assertion Schema
  • Handling external dependencies
  • FAQ
  • Product FAQs
  • About Antithesis POCs
  • Release notes
  • Release notes
  • General reliability resources
  • Reliability glossary
  • Techniques for better software testing
  • Autonomous testing
  • Deterministic simulation testing
  • Property-based testing
  • White paper — How much does an outage cost?
  • Catalog of reliability properties for key-value datastores
  • Catalog of reliability properties for blockchains
  • Test ACID compliance with a ring test