Harrison Brown

Senior Engineer

Our own worst best customer

March 27, 2025

Snouty in a museum

At Antithesis, our job is to break software before it breaks in production – ours included. We’ve spent years stress-testing our systems with property-based testing and deterministic simulation, not just because it makes our software more reliable, but because it actually makes us faster.

Unlike many of our customers, though, we don’t neatly package everything into Docker containers. Our infrastructure is complex, with deeply integrated dependencies like BigQuery and AWS services. And we have critical parts of our system that do not run on Linux. All of these create difficulties in trying to “eat our own dog food” and test Antithesis with Antithesis.

On the other hand, we also have some serious advantages when it comes to testing with Antithesis. We’re the world experts on the Antithesis platform; if I want to know how something works, I can walk across the room and ask the engineer who created it. We’ve also had years of practice in thinking about testing “the Antithesis way”, leveraging randomness and determinism to surface and understand deep bugs. (In other words, we’ve been drinking the Kool-Aid for a long time)!

In this post, we’ll walk you through how we used Antithesis to test one of our most technically ambitious projects: Pangolin, our new, in-house distributed database. We’ll share what we learned, the challenges we faced, and our best practice recommendations for how Antithesis can help make your own systems more reliable.

Pangolin: a database with scale(s)

Pangolin is a brand-new project at Antithesis—a distributed database designed to process event data more efficiently than traditional SQL-based solutions. When we run Antithesis on a software system, our fuzzer generates event histories which are naturally tree-structured thanks to its snapshot capabilities. Pangolin is designed from the ground up to efficiently perform arbitrary computations “folding down” the branches of these trees. This enables sophisticated temporal queries that would be painfully slow or impossible in traditional relational databases, and Antithesis users will start seeing some incredible features built on this over the coming months.

Pangolin’s inner workings are pretty cool, and if you’d like to dive into them, here’s a video of Richard Hart, who masterminded the project, talking about it at the Monster Scale Summit.

Starting out is never easy

One of the main difficulties of using Antithesis on a brand-new project like Pangolin was simply getting it up and running. We had to develop not just an “MVP” version of Pangolin, but one that could run end-to-end and execute real tests. We also had to containerize Pangolin – which isn’t actually containerized in production because we use Nix for everything.

Once we had these pieces in place, we could start applying Antithesis’ full power.

Asserting software quality using our SDKs

One of the most valuable parts of Antithesis is the ability to directly express assertions – rules that check whether certain conditions hold true during execution – using our SDKs. These checks verify that the system is running smoothly and behaving as expected.

While writing Pangolin, we were constantly checking for:

Reachability of both the “happy path” and various corner cases.
Integer underflows that could occur if we subtracted a larger unsigned int from a smaller one.
Invariants related to different data structures or the timing of particular events.

While I’ve written at length elsewhere about the power of the “sometimes” assertions built into our SDKs, we found that the bread and butter of our testing approach was heavy use of the more familiar “always” assertions to express invariants. We ended up with over 150 assertions in Pangolin-related code, about 90% of which were of this type.

Frequent Antithesis testing enabled us to catch problems quickly and fix them before they grew into critical issues. It also helped us feel confident that Pangolin would run properly even in hostile environments.

Finding the hardest bugs

Some of the most insidious bugs in distributed systems – like race conditions, deadlocks, and subtle consistency violations – occur only under specific timing conditions. Normally, finding such bugs requires a combination of luck and long-term observation in production. But Antithesis, by default, forces these conditions to appear by:

Injecting controlled thread pausing to simulate concurrency issues.
Introducing a very high rate of simulated network failures to verify network fault tolerance.
Running thousands of iterations to maximize test coverage.

Note that it required basically zero effort on our part to make this happen – the platform just took care of it for us.

For an example of a tricky bug that would be almost impossible to find without Antithesis, consider the time that we tried to shut down 4.3 billion lambda instances. This occurred because:

We requested a lambda invocation,
then “revoked” it before it was initialized, causing us to increment the lambdas_scheduled_to_die counter.
The lambda connected to our coordinator, and was duly killed, but
before we could update that lambda’s state to “running”,
a timeout triggered indicating that network faults had led us to disconnect from our lambda rate limiter, and we needed to kill all outstanding lambdas that would not die for other reasons.
We calculated that as (lambdas_running - lambdas_scheduled_to_die), which underflowed, leading us to
attempt to shut down 2³² - 1 lambdas.

This could have led to a system-wide failure under the right (or wrong) conditions – something we were able to prevent before it ever reached a customer.

Precision debugging

Once Antithesis found a bug, diagnosing it was the next challenge. Our process involved:

Analyzing logs from failed assertions to trace the issue back to its root cause, and adding extra logs as necessary. (println! debugging is still alive and well!)
Using Antithesis’ bug report to reproduce the exact failure state and determine at what point the bug was “baked in” – as Mark Logan from Mysten Labs described, our bug reports include a graph that frequently tells you exactly when the bug was baked into the code.
Reaching for our Multiverse Debugger – println! is nice, but it’s also nice to know you have a time machine when you need it.

Although Antithesis’ deterministic hypervisor makes it easy to exactly reproduce a bug, it could be harder to be sure that a change we thought fixed a bug really was a fix, especially if the bug occurred rarely. We developed a couple of strategies to help us gain confidence in these situations:

We added “reachable” or “sometimes” assertions to code paths we suspected corresponded with the bug, ensuring that we’d know when those paths were hit.
We introduced “buggification”, adding pathological behaviors (hidden behind a flag, and driven by our random SDK module) to system components to induce bug-triggering logic more frequently and changing rare bugs into common ones.

Dogfooding can be hard

Of course, testing Pangolin with Antithesis wasn’t without its challenges. Thanks to our voluminous logging and the number of SDK properties we defined, a full test cycle – even just 20 minutes of fuzzing – could take several hours, including generating reports. This slowed down our feedback loop, preventing us from quickly iterating to find the cause of a bug. Luckily, one of the things that Pangolin will let us build is a way to stream back test results during fuzzing, which will make these feedback loops much smaller in the future – we’ll say more about this in the coming months!

We also ran into false positives, or what we like to call “Not-a-Real-Bug Syndrome.” Sometimes, we want our system to fail gracefully, but in an end-to-end test, even those graceful failures can violate desirable properties and trigger assertion failures. This creates a trade-off: we can either find a way to exclude (or live with) the test failures that represent “fake” bugs, risking failing to catch “real” ones, or we can prevent the system from experiencing the graceful failures, missing out on test coverage related to those parts of the code.

We’d sometimes write code specifically to test an invariant, but we found that running the invariant-testing code every time we hit the assertion could slow us down so much that it hurt our overall test coverage. To solve this, we wrote a utility method that called the assertion pseudo-randomly, in a small fraction of cases. We’re working on adding this method to the various Antithesis SDKs, so our customers can solve this problem as well.

Antithesis was highly effective for smaller datasets and uncovering subtle bugs, but handling massive volumes of data, like those in production environments, was a challenge. We couldn’t use it for performance and load testing, and we had to fall back on more typical integration or unit testing methods.

Despite these constraints, Antithesis provided significant value by uncovering hidden bugs and giving us the confidence that Pangolin would perform reliably under a wide range of conditions. While we had to work around some limitations, the tool helped us identify critical issues early, and it actually improved our in-house testing practices overall.

What we learned

Through this process, we learned some valuable lessons that apply to any company using Antithesis:

Invest in Assertions and the SDK: The more precise your test properties, the more value you’ll get from Antithesis. Checking system invariants with our SDK helps you figure out exactly what’s going wrong faster, and can turn up bugs you might never even see otherwise. Even basic “this code should never be reached” assertions can catch critical issues.
Simulate Worst-Case Scenarios: Production failures often occur under rare conditions. Antithesis forces those conditions to appear frequently, helping you fix problems before they impact users. Adding “buggification” code, driven by the Antithesis SDK, can help tailor rare conditions to your use case.
Automate Reproduction: Antithesis makes debugging easier by automatically reproducing failed tests and showing exactly what went wrong. This is a huge time-saver.
“Dogfooding” Works: By testing our own system with Antithesis, we made both Pangolin and Antithesis more powerful and robust. In using your own system to test your product, you gain real confidence in how it performs.

I’m going to close with a slide from Richard’s talk (emphasis mine):

0 bugs in production — There's not much else to say.

Turning insights into action

We often get asked questions like:

“What sort of tests can Antithesis help me run?”
“How can I find more bugs in my database?”
“We have a bunch of microservices, is this a good fit?”

The beauty of Antithesis is that it’s designed to work for a wide variety of systems. Whether you’re building a database, a distributed system, or something entirely different, Antithesis can test your software – automating the process and helping you find bugs you may have missed.

If you’re tired of hoping your bugs are being found (or even worse, waiting for them to crash your system in front of users), try it for yourself today! Let’s get your software running like a well-oiled machine… minus the grease stains.

You made it to the end! Grab some stickers

Place them anywhere and watch the compliments compile.

Get free stickers