Cookbook#

Simple: View the logs from the future#

Peer 2 seconds into the future of a buggy-moment and view its events.

// PRE-REQ: you have a 'moment' (likely from our boilerplate)

// spawn a branch where you'll play the simulation forward
branch = moment.branch()
branch.wait(Time.seconds(2)

// view the events leading up to this future
print(environment.events.up_to(branch))

Simple: Run a diagnostic command#

Run a bash command (e.g. netstat) in a container you’re investigating.

// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)

// list all running container names
client_names = environment.containers.list({moment}).map(x => x.name)
print(client_names)

// select the container name for the container you want to run netstat in
client_name = client_names.at(0)

// spawn a branch and run the command, printing the resulting process output
branch = moment.branch()
process = bash`netstat`.run({container: client_name, branch})
print(process)

Pull a core dump#

Cause a process in a container to exit and extract a core dump.

// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)

// grab a container, similar to in our diagnostic command example
client_name = environment.containers.list({moment}).at(0).name

// If you're here because your target process has crashed, rewind to before the crash occurred
coredump_moment = moment.rewind(Time.seconds(0.8))

// If you don't know the pid of the program you'd like a core dump for, you can use a ps command to see the list of running processes
process_name = "slirp4netns"
ps_command = bash`ps --format pid --no-headers -C ${process_name} | head -n 1`.run({branch: coredump_moment.branch(), container: client_name})

// If you have `psgrep` inside of your target container, that could be more ergonomic
ps_command = bash`pgrep ${process_name}`.run({branch: coredump_moment.branch(), container: client_name})

// Then create a variable of the pid you're interested in core_dumping.
pid_to_kill = ps_command.stdout_text

file = environment.core_dump_by_pid({moment: coredump_moment, pid: pid_to_kill, container: client_name})

print(file)

Run the profiler#

Runs the profiler for ten seconds and prints the its results.

// Start the profiler on some branch
background_profiler = environment.profiler.start({branch})

// You can optionally supply a PID if you want to look at a particular process.
// background_profiler = environment.profiler.start({branch, pid: 1})

// Advance time on the branch
// Note that instead of waiting, you could run commands here.
// This is especially helpful for investigating the performance of a series of commands
branch.wait(Time.seconds(10))

// Stop the profiler on the branch
environment.profiler.stop({branch, background_profiler})

// View the results as of the end of the branch
print(environment.profiler.report({moment: branch.end}))
../_images/flame.png

Ask a counterfactual#

If you have a bug you believe is not vulnerable to small changes in CPU-timings you can ask counterfactual questions like “if I turn this feature flag off, does the bug still occur?”

// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)
// PRE-REQ: you have something you want to tweak that can be controlled by a container in your system under test

// define what you consider a bug as an eventset
bugs_event_set = environment.events.filter(ev => ev.output_text != null && ev.output_text.includes("FATAL"))

// check that the bug occurred in this moment's history
print(bugs_event_set.up_to(moment))

// grab a container, similar to in our diagnostic command example
feature_flag_client = environment.containers.list({moment}).at(0)?.name

// Rewind to where you want to tweak history
alternative_timeline = moment.rewind(Time.seconds(0.8)).branch()

// use your feature-flag-service (eg. StatSig) to flip a feature flag
bash`siggy gates update my-feature-flag '{ "type": "public" }'`.run({branch: alternative_timeline, container: feature_flag_client})

// wait until a bug occurs, or 10 simulated seconds pass, whichever is sooner
alternative_timeline.wait_until({until: bugs_event_set, timeout: Time.seconds(10)})

// see if the bug did occur
print(bugs_event_set.up_to(alternative_timeline))