Introducing our reactive Notebook: the paradigm devs deserve.

November 13, 2024

Snouty in the chemistry lab

At Antithesis, we build an autonomous, deterministic simulation testing (DST) tool. Determinism is so in the water here that it has even seeped into our front-end: our reactive notebook. In this case, determinism was a tool that enabled us to build the low-latency, reactive experiences our users enjoy. I suspect many dev tool authors have done the same, but without realizing just how far you can push this approach.

People got their first peek at our frontend paradigm during our Multiverse Debugging launch, which enables mind-boggling new debugging options. We can run arbitrary bash commands at any point in a bug’s history or future: want a network dump 1ms before your process crashes? Our magic¹ has you covered. This wizardry happens in a reactive orchestration, customization, and analysis tool we call the Antithesis Notebook.

The Notebook is the sole tool we use to set up systems under test, define the properties your system is expected to have, generate tailored reports, and debug issues. For example, our demo report is generated this way. Here’s a peek at its backing Notebook code.²

Demo report

But rather than telling you more about what we use the Notebook for, in this post I want to talk about its execution model: running your code the instant it’s written. We think this is the logical conclusion of reactive developer experiences, and that a reactive setup is uniquely powerful. Hopefully by the end of this post you’ll be convinced that the next dev tool you build or adopt should work this way too.

Reactivity, the next DX paradigm

Consumers today enjoy ‘reactive’ interfaces that reflect their wishes the instant they’re emitted: glance at the camera to unlock, swipe right to change your life.

Reactivity is traditionally defined as a system reacting in response to changing data. In the UI/UX world, reactivity is considered a feature of some libraries (denoting automatic interface updates as data changes), rather than a programming style. In consumer interfaces, changing data are things like changes in loading state or the fact that the user issued a gesture. In developer interfaces the most important changing data is code.

We’re seeing glimmers of instant reactivity in dev tools. First it was syntax highlighting that updates without saving; later it was syntax checks, autocomplete, and linters. Now we even have AI copilots suggesting code as you type. But great developers know there’s something more important than what color the code is or how your linter feels: what’s most important is what the code does when it runs.

By running your code on keystroke, the Notebook’s reactive paradigm informs you of just that, and with an immediacy that’s essential to shortening iteration cycles and flattening learning curves. When you’re in a reactive regime, you’re immediately forced to reckon with the result of your code. The age-old saying of ‘test early, test often’ becomes the default.

My takeaway: In a DX context, the primary data to react to should be code. I’d stretch ‘code’ a bit to include query bodies and even the behind-the-scenes SQL that sliders in an analysis tool might generate.

But this all sounds crazy. You might reasonably say “We can’t actually run all of your code on every keystroke”. We’ve got two key tricks to pulling this off in our Notebook:

A dependency-tracking engine
Offloading expensive operations to a snappy server with a perfect memory

The Notebook deploys a spreadsheet-style dependency tracking system, watching each variable defined in user-provided code. We manage to only recalculate what has changed. This approach handles lightweight configuration and analysis.

Next, side-effectful and expensive operations (like running a step in the simulation of our customers’ software) are defined in the Notebook, but offloaded to our low-latency memorizing-server.³ As a result, we’re able to efficiently maintain the illusion that all of the code in the notebook has run from top-to-bottom on each keystroke, and create unique experiences like the ones we showcased in our Multiverse Debugging launch post.

These are the two techniques we’ve used – other systems have found different paths to similar destinations. We’ll discuss those later on.

Reproducibility: the inevitable consequence

It turns out that if you build something reactive enough, something magical falls out: reproducibility. In this case, maintaining the illusion of having just run every line of Notebook code from top-to-bottom mandates that if we actually did restart the Notebook and run from top-to-bottom, then we should be in the same state.

This stands in stark contrast to Jupyter, the best known notebook out there, where users decide which cells to run and in which order. Imagine Google Sheets allowing you to decide which cells were up-to-date. Chaos. For Jupyter, this scheme produces enough hidden state to motivate research on the resulting bugs. One study found that only 24% of sampled Jupyter notebooks ran without exceptions.⁴

If you can’t reproduce your results, then you can’t trust them. Jupyter results become more like lore than analysis, and you wouldn’t dare include them in CI or as subroutines in other workflows. Conversely, being able to reproduce your results means you can take this idea to the extreme and lean into the fact that a reproducible system effectively implies a pure system: one with minimal-to-no side-effects.

A pure system isn’t just useful at the platform level, it also lets us build incredible features. For example: the record of a long, manual debugging session in our Notebook is effectively a script spelling out exactly which steps were taken in what order. But we can then take that script and run it automatically should a similar-looking issue occur again, meaning you only have to do the debugging once. At the platform level, being a pure, reactive system makes the Notebook an ideal customization layer. A layer where custom code is easily unwound without having mutated the world. A layer that shows the results of your customization the instant it’s defined.

Having a dedicated customization layer saves our company from the enterprise sales failure mode of feature-bloat-by-customer-request. It also means devs using our product will eventually be in control of their own experience. How many times have you or your developers had their productivity thwarted because some product person like me didn’t anticipate their needs? With the Notebook’s customization layer exposed, we’re aiming at a future where external developers (or our services team on their behalf) can write a few lines of code to whip together a visualization, filter a list, or enable a feature as a supported flow.

Reactivity in the wild

Overall, developer-facing reactive interfaces should react as fast as possible to changes in code to promote iteration speed. This regime yields a pipeline from novice to expert, and lends itself to creating reproducible, pure systems, with all the benefits they entail. Two of my favorite tools embracing this approach are tree-sitter, a parser-generator, and React, the ever-popular JavaScript library. ⁵ It’s really interesting to follow the DX wins these tools create by having similar goals to the Antithesis Notebook.

Like the Notebook, both of these tools have the property that reacting to updating state, like code changing or a stateful value mutating, should typically yield an experience as if the updated state was the initial state. As I’ve laid out above, this offers a level of reproducibility and purity.

For tree-sitter, this restriction means that the internal state of a parse tree where the input started as XYZ and a parse tree where text slowly mutated to XYZ must be really similar. Tree-sitter uses this to its advantage, incrementally updating only mutated parts of the parse tree as its inputs change for lower latency.

For React, being a reactive library means that components expect to re-render often. This allows React to do things like Fast Refresh, which replaces and re-renders components on code changes, lowering developer iteration times.

If your tools or systems have similar sounding properties, imagine them in terms of reactivity, reproducibility, and purity to see what wins you might find.

As for the Antithesis Notebook, check out our demo environment here.⁶ It’s pretty neat the first time you use it to tackle a real problem, and I’m excited to share more in the future about how exactly we built such a thing. For our customers, this reactive experience is wired up to their simulated systems under test, giving them novel testing and debugging abilities.

If you think this workflow could be a great fit for your company, then contact us. If you’re just interested in chatting about reactive DX, software testing, or Antithesis, come join our Discord.

You made it to the end! Grab some stickers

Place them anywhere and watch the compliments compile.

Get free stickers