Cookbook
Simple: View the logs from the future
Peer 2 seconds into the future of a buggy-moment and view its events.
// PRE-REQ: you have a 'moment' (likely from our boilerplate)
// spawn a branch where you'll play the simulation forward
branch = moment.branch()
branch.wait(Time.seconds(2))
// view the events leading up to this future
print(environment.events.up_to(branch))
Simple: Run a diagnostic command
Run a bash command (e.g. netstat
) in a container you’re investigating.
// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)
// list all running container names
client_names = environment.containers.list({moment}).map(x => x.name)
print(client_names)
// select the container name for the container you want to run netstat in
client_name = client_names.at(0)
// spawn a branch and run the command, printing the resulting process output
branch = moment.branch()
process = bash`netstat`.run({container: client_name, branch})
print(process)
Pull a core dump
Cause a process in a container to exit and extract a core dump.
// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)
// grab a container, similar to in our diagnostic command example
client_name = environment.containers.list({moment}).at(0).name
// If you're here because your target process has crashed, rewind to before the crash occurred
coredump_moment = moment.rewind(Time.seconds(0.8))
// If you don't know the pid of the program you'd like a core dump for, you can use a ps command to see the list of running processes
process_name = "slirp4netns"
ps_command = bash`ps --format pid --no-headers -C ${process_name} | head -n 1`.run({branch: coredump_moment.branch(), container: client_name})
// If you have `psgrep` inside of your target container, that could be more ergonomic
ps_command = bash`pgrep ${process_name}`.run({branch: coredump_moment.branch(), container: client_name})
// Then create a variable of the pid you're interested in core_dumping.
pid_to_kill = ps_command.stdout_text
file = environment.core_dump_by_pid({moment: coredump_moment, pid: pid_to_kill, container: client_name})
print(file)
Run the profiler
Runs the profiler for ten seconds and prints the its results.
// Start the profiler on some branch
background_profiler = environment.profiler.start({branch})
// You can optionally supply a PID if you want to look at a particular process.
// background_profiler = environment.profiler.start({branch, pid: 1})
// Advance time on the branch
// Note that instead of waiting, you could run commands here.
// This is especially helpful for investigating the performance of a series of commands
branch.wait(Time.seconds(10))
// Stop the profiler on the branch
environment.profiler.stop({branch, background_profiler})
// View the results as of the end of the branch
print(environment.profiler.report({moment: branch.end}))
Ask a counterfactual
If you have a bug you believe is not vulnerable to small changes in CPU-timings you can ask counterfactual questions like “if I turn this feature flag off, does the bug still occur?”
// PRE-REQ: you have a 'moment' and 'environment' (likely from our boilerplate)
// PRE-REQ: you have something you want to tweak that can be controlled by a container in your system under test
// define what you consider a bug as an eventset
bugs_event_set = environment.events.filter(ev => ev.output_text != null && ev.output_text.includes("FATAL"))
// check that the bug occurred in this moment's history
print(bugs_event_set.up_to(moment))
// grab a container, similar to in our diagnostic command example
feature_flag_client = environment.containers.list({moment}).at(0)?.name
// Rewind to where you want to tweak history
alternative_timeline = moment.rewind(Time.seconds(0.8)).branch()
// use your feature-flag-service (eg. StatSig) to flip a feature flag
bash`siggy gates update my-feature-flag '{ "type": "public" }'`.run({branch: alternative_timeline, container: feature_flag_client})
// wait until a bug occurs, or 10 simulated seconds pass, whichever is sooner
alternative_timeline.wait_until({until: bugs_event_set, timeout: Time.seconds(10)})
// see if the bug did occur
print(bugs_event_set.up_to(alternative_timeline))