Cookbook notebook
View the logs from the future
Peer 2 seconds into the future of a buggy-moment and view its events.
// PRE-REQ: you have a 'moment' (likely from our boilerplate)
// spawn a branch where you'll play the simulation forwardbranch = moment.branch()branch.wait(Time.seconds(2))
// view the events leading up to this futureprint(environment.events.up_to(branch))Run a diagnostic command
Run a bash command (e.g. netstat) in a container you’re investigating.
// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)
// list all running container namesclient_names = environment.containers.list({moment}).map(x => x.name)print(client_names)
// select the container name for the container you want to run netstat inclient_name = client_names.at(0)
// spawn a branch and run the command, printing the resulting process outputbranch = moment.branch()process = bash`netstat`.run({container: client_name, branch})print(process)Get a process id
If you’re here to debug a crashed process, get the pid of that process from the logs. Container processes have two pids, one found in the host machine and another in the container, the pid obtained from the logs is the host pid. For most debugging use-cases, you’ll want the container pid.
Map host pid to its container pid
To map a host pid to its container pid, follow the instructions below.
// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)// PRE-REQ: you have the host pid of the crashed process
// Rewind to a moment before the crashpre_crash_moment = moment.rewind(Time.seconds(0.8))branch = pre_crash_moment.branch()
// Let the host pid = 3653hpid = 3653
// Get all pids of the target process in the namespaces it participates in// This command is run on the host machineprint(bash`grep NSpid /proc/${hpid}/status`.run({container: environment.host, branch}))
// Output// 82.108 NSpid: 3653 103// The container pid is 103Get process id of a known process
If you’re here to proactively debug an event and want the pid of a specific process in a specific container, follow the instructions below.
// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)
// grab the container you want to debug, similar to in our diagnostic command exampleenvironment.containers.list({moment})// container_1// container_2// container_3
// you want to debug container_2chosen_container = 'container_2'
// rewind time if you're interested in debugging just before the current momentdebug_moment = moment.rewind(Time.seconds(0.8))
// if you don't know the pid of the program you want to debug, you can use a ps command to see the list of running processesprint(bash`ps aux`.run({branch: debug_moment.branch(), container: chosen_container}))
// if you know the process nameprocess_name = "slirp4netns"ps_command = bash`ps --format pid --no-headers -C ${process_name} | head -n 1`.run({branch: debug_moment.branch(), container: chosen_container})
// If you have `psgrep` inside of your target container, that could be more ergonomicps_command = bash`pgrep ${process_name}`.run({branch: debug_moment.branch(), container: chosen_container})
// then create a variable of the pid of interestpid_to_debug = ps_command.stdout_textGet the container id from a host pid
If you’re here to investigate a process crash, here’s how to get the container id that was running the crashed process.
Grab the host pid of the crashed process from the logs.
// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)// PRE-REQ: you have the host pid of the crashed process
// Rewind to a moment before the crashpre_crash_moment = moment.rewind(Time.seconds(0.8))branch = pre_crash_moment.branch()
// Let the host pid = 3653hpid = 3653
// Containers are placed into distinct cgroups, so you can get the container id from the process's cgroup informationprint(bash`cat /proc/${hpid}/cgroup`.run({branch, container: environment.host}))
// The output will look similar to// 0::/machine.slice/libpod-<container-id>.scope// The important part is: "libpod-<container-id>.scope" to get the container id
// You can also use this command to extract the container idprint(bash`grep -o 'libpod-[^.]*' /proc/${hpid}/cgroup | sed 's/libpod-//'`.run({container: environment.host, branch}))
// Inspect the container to find the image name, container nameprint(bash`podman inspect <container-id>`.run({container: environment.host, branch}))Pull a core dump
Cause a process in a container to exit and extract a core dump. Follow the instructions to get the pid of the target process.
// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)
// grab a container, similar to in our diagnostic command exampleclient_name = environment.containers.list({moment}).at(0).name
// If you're here because your target process has crashed, rewind to before the crash occurredcoredump_moment = moment.rewind(Time.seconds(0.8))
// If you don't know the process id (pid) of the program you'd like a core dump for, follow the steps in get a process id example// Then create a variable of the pid you're interested in core_dumping.pid_to_kill = 1234
file = environment.core_dump_by_pid({moment: coredump_moment, pid: pid_to_kill, container: client_name})
print(file)Run the profiler
Runs the profiler for ten seconds and prints the its results.
// Start the profiler on some branchbackground_profiler = environment.profiler.start({branch})
// You can optionally supply a PID if you want to look at a particular process.// background_profiler = environment.profiler.start({branch, pid: 1})
// Advance time on the branch// Note that instead of waiting, you could run commands here.// This is especially helpful for investigating the performance of a series of commandsbranch.wait(Time.seconds(10))
// Stop the profiler on the branchenvironment.profiler.stop({branch, background_profiler})
// View the results as of the end of the branchprint(environment.profiler.report({moment: branch.end}))
Ask a counterfactual
If you have a bug you believe is not vulnerable to small changes in CPU-timings you can ask counterfactual questions like “if I turn this feature flag off, does the bug still occur?”
// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)// PRE-REQ: you have something you want to tweak that can be controlled by a container in your system under test
// define what you consider a bug as an eventsetbugs_event_set = environment.events.filter(ev => ev.output_text != null && ev.output_text.includes("FATAL"))
// check that the bug occurred in this moment's historyprint(bugs_event_set.up_to(moment))
// grab a container, similar to in our diagnostic command examplefeature_flag_client = environment.containers.list({moment}).at(0)?.name
// Rewind to where you want to tweak historyalternative_timeline = moment.rewind(Time.seconds(0.8)).branch()
// use your feature-flag-service (eg. StatSig) to flip a feature flagbash`siggy gates update my-feature-flag '{ "type": "public" }'`.run({branch: alternative_timeline, container: feature_flag_client})
// wait until a bug occurs, or 10 simulated seconds pass, whichever is sooneralternative_timeline.wait_until({until: bugs_event_set, timeout: Time.seconds(10)})
// see if the bug did occurprint(bugs_event_set.up_to(alternative_timeline))Using language-specific debuggers
Multiverse debugging can be used to operate command-line debuggers inside the simulation. If you’re used to a GUI this may be awkward at first, but we encourage you to give it a shot. Being able to undo, view your command history, or parallelize commands offers a lot of power and usability.
If there are particular tools or abilities you’d like to be made more ergonomic, please contact us at support@antithesis.com or join our Discord.
Example
Imagine you found a core dump error in a test run and you’re trying to debug it. With the help of command-line debuggers, you can know that the core dump was caused by, say, memory corruption at <memory_address>.
But you can’t know what caused the memory corruption in that specific test run without travelling back in time and monitoring the affected area of code. Antithesis’ deterministic replayability allows you to rewind time before the core dump, set a watchpoint on the memory address that’ll be corrupted and debug what caused it and how it happened.
Below is an example workflow of how you can investigate memory corruption in the simulation using GDB or LLDB.
The example illustrates the commands to set a watchpoint on a memory address, set a breakpoint on a function, and read a memory address – using a named pipe. Alternatively, you can also run these commands in the batch mode.
Using GDB
// PRE-REQ: you have a 'moment' and a 'pid'
chosen_container = "container_2"branch = moment.branch()
pre_moment = branch.end.rewind(1)pre_branch = pre_moment.branch()
// Notice the `pre_branch.branch()`// This creates a quick branch to inspect the pids without moving `pre_branch` forwardprint(bash`ps aux`.run({branch: pre_branch.branch(), container: chosen_container}))
// Creates the pipeprint(bash`mkfifo /dev/gdb_pipe`.run({branch: pre_branch, container: chosen_container}))
// Keeps the pipe openprint(bash`sleep infinity > /dev/gdb_pipe`.run_in_background({branch: pre_branch, container: chosen_container}))
// Attach the pid (e.g. 36)print(bash`gdb -p 36 < /dev/gdb_pipe`.run_in_background({branch: pre_branch, container: chosen_container}))
// Set a watchpointprint(bash`echo "watch *(long*)<memory_address>" > /dev/gdb_pipe`.run({branch: pre_branch, container: chosen_container}))
// Set a breakpointprint(bash`echo "b <function>" > /dev/gdb_pipe`.run({branch: pre_branch, container: chosen_container}))
// Examine a memory addressprint(bash`echo "x/2wu <memory_address>" > /dev/gdb_pipe`.run({branch: pre_branch, container: chosen_container}))
print(bash`echo "c" > /dev/gdb_pipe`.run({branch: pre_branch, container: chosen_container}))
// Move the `pre_branch` forward to hit the breakpointpre_branch.wait({duration: Time.seconds(3)})
// you can also run a batch of gdb commandsgdbout = bash`gdb -p ${FAILING_pid.toString()} -ex "watch *(long*)<memory_address>" -ex "b <function>" -ex "x/2wu <memory_address>"`.run({ branch: pre_branch, container: chosen_container})
print(gdbout)
download(gdbout)Using LLDB
// PRE-REQ: you have a 'moment' and a 'pid'
chosen_container = "container_2"branch = moment.branch()
pre_moment = branch.end.rewind(1)pre_branch = pre_moment.branch()
// Notice the `pre_branch.branch()`// This creates a quick branch to inspect the pids without moving `pre_branch` forwardprint(bash`ps aux`.run({branch: pre_branch.branch(), container: chosen_container}))
// Creates the pipeprint(bash`mkfifo /dev/lldb_pipe`.run({branch: pre_branch, container: chosen_container}))
// Keeps the pipe openprint(bash`sleep infinity > /dev/lldb_pipe`.run_in_background({branch: pre_branch, container: chosen_container}))
// Attach to pid (e.g. 36)print(bash`lldb -p 36 < /dev/lldb_pipe`.run_in_background({branch: pre_branch, container: chosen_container}))
// Set a watchpointprint(bash`echo "watchpoint set expression <memory_address>" > /dev/lldb_pipe`.run({branch: pre_branch, container: chosen_container}))
//Set a breakpointprint(bash`echo "b -n <function>" > /dev/lldb_pipe`.run({branch: pre_branch, container: chosen_container}))
//Read memoryprint(bash`echo "memory read --size 2 --format x --count 2 <memory_address>" > /dev/lldb_pipe`.run({branch: pre_branch, container: chosen_container}))
print(bash`echo "continue" > /dev/lldb_pipe`.run({branch: pre_branch, container: chosen_container}))
// Move the `pre_branch` forward to hit the breakpointpre_branch.wait({duration: Time.seconds(3)})
// you can also run a batch of lldb commandslldbout = bash`lldb attach -p ${failing_pid.toString()} --batch -o "watchpoint set expression <memory_address>" -o "b -n <function>" -o "memory read --size 2 --format x --count 2 <memory_address>" -o "continue"`.run({ branch: pre_branch, container: chosen_container})
print(lldbout)
download(lldbout)Using JDB
You must have JDWP enabled to use JDB. To do this, set the JAVA_TOOL_OPTIONS environment variable as follows: JAVA_TOOL_OPTIONS: -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=7896 (The port that JDWP is enabled on, in this case 7896, just needs to be an available port.) This must be set in the relevant containers, at or before the time of invocation.
// PRE-REQ: you have a 'moment' (likely a bug moment from a triage report)
// rewind and spawn a new branch starting shortly before your momentdebug_branch = moment.rewind(.3).branch()
// because bash shells are ephemeral, we need to write jdb commands to a file...print(bash`mkfifo /dev/jdb_pipe`.run({branch: debug_branch, container: 'kafka-2'}))
// ...that's piping the output to jdb// In this case, JDWP is enabled on port 7896print(bash`jdb -attach 7896 < /dev/jdb_pipe`.run_in_background({branch: debug_branch, container: 'kafka-2'}))
// sleep to the pipe (keeping the pipe open)print(bash`sleep infinity > /dev/jdb_pipe`.run_in_background({branch: debug_branch, container: 'kafka-2'}))
// set breakpointprint(bash`echo 'catch java.lang.NullPointerException' > /dev/jdb_pipe`.run({branch:debug_branch, container: 'kafka-2'}))
//advance time to hit the issuedebug_branch.wait({duration: Time.seconds(.5)})
//get all local variable outputprint(bash`echo 'locals' > /dev/jdb_pipe`.run({branch:debug_branch, container: 'kafka-2'}))
//access Object information, field by fieldprint(bash`echo 'dump getDataRequest.ctx' > /dev/jdb_pipe`.run({branch:debug_branch, container: 'kafka-2'}))If you want to start with a Java Heap Dump instead
Generating a Java Heap Dump is done using the jcmd command-line tool, which is included in the JDK.
// PRE-REQ: you have a 'moment' (likely a bug moment from a triage report)
// spawn a branch where you'll play the simulation forwardbranch = moment.branch()
// list all processes running in the kafka-2 container along with their pidprint(pid = bash` ps -elf`.run({ branch: branch, container: 'kafka-2'}))
// since the process we want is the container entrypoint we're using pid = 1, but you can use ps to find the relevant pid, as we do aboveheap_dump_pid = 1
// This will generate a heap dump file...bash`jcmd ${heap_dump_pid} GC.heap_dump /tmp/heap_dump.hprof`.run({branch: branch, container: 'kafka-2'})
//...which you can downloaddownload(environment.extract_file({ moment: branch.end, path: `/tmp/heap_dump.hprof`, container: 'kafka-2'}));Using Delve or another Go debugger
In this example, we’re using Delve. You must install Delve on your container image for these commands to work.
// PRE-REQ: you have a 'moment' (likely a bug moment from a triage report)
// rewind and spawn a new branch starting shortly before your momentdelve_branch = moment.rewind(Time.seconds(2)).branch()
// create a pipe for delve input/outputprint(bash`mkfifo /dev/dlv_pipe && ls -l /dev/dlv_pipe`.run({branch: delve_branch, container }))
// define the pid of the process you're debuggingdelve_pid = 1
// enable delveprint((pipe = bash`dlv attach ${delve_pid} --allow-non-terminal-interactive < /dev/dlv_pipe`.run_in_background({ branch: delve_branch, container })))
// sleep to the pipe (keeping the pipe open)bash`sleep infinity > /dev/dlv_pipe`.run_in_background({branch: delve_branch, container})
// create a breakpoint and continuebash`echo "break bp path/to/file/file.go:406" > /dev/dlv_pipe`.run({branch: delve_branch, container })
delve_branch.wait({duration:Time.seconds(2)})From here, you can:
//print a stack traceprint(bash`echo "stack" > /dev/dlv_pipe`.run({branch: delve_branch, container }))
//move the current frame upprint(bash`echo "up 3" > /dev/dlv_pipe`.run({branch: delve_branch, container }))
//print local variable outputprint(bash`echo "locals" > /dev/dlv_pipe`.run({branch: delve_branch, container }))All available Delve CLI commands can be found here.