Is Antithesis working?

Autonomous testing is in the difficult situation of trying to prove a negative

Consider the following scenario: you sign up for Antithesis, package up your software, write a workload, schedule regular testing, and begin receiving reports. The reports indicate that no bugs have been found. Every day the report comes back. “No bugs.” “No bugs.” What does this mean? Are there actually no bugs in your software, or is something about your testing setup broken? Or is Antithesis just doing a bad job? How can you tell?

Brown M&Ms

The band Van Halen was notorious for requiring concert venues to sign a lengthy contract before they would agree to play their music. This contract always had buried within it a requirement that there be a bowl of M&M candies in the dressing room with all of the brown ones removed. But this wasn’t just an instance of rock stars being eccentric and entitled. The contract also included precise technical stipulations that were vital to the safety of the band’s pyrotechnics-laden shows. The “no brown M&Ms” clause was a way of testing whether the venue’s management had actually read the contract carefully. Before the show, the band members could examine the bowl of candies, and would immediately know whether they needed to take a second look at the rest of the setup as well.

Much like the rock band, we can use “brown M&Ms” to figure out if Antithesis is working. A brown M&M is a bug that you know is present in your software, and that you know your workload and test setup should be able to find. Run your tests with the brown M&Ms present – if Antithesis finds the bug, then you can have confidence that your tests are really working. If it doesn’t, then it’s time to figure out which of our assumptions is wrong. It could be something broken on your side, or something broken on our side, but either way we need to dig in and figure it out.

How to do this in practice

There are a few common sources of brown M&Ms:

Look at your production bug tracker

Our philosophy is that any bug that reaches production for our customers is a bug in Antithesis. This could be due to a weakness in your test setup (perhaps the test template you wrote doesn’t even call the function that triggers the bug), or it could be a weakness in Antithesis, but either way we want to know!

Any time a bug is encountered in production, or seen by your customers, please add it to a list of uncaught bugs to review with us. These can be some of the most powerful guidance for what the right ways to strengthen testing are. Hopefully the number of such bugs decreases with time, and these become less and less effective as brown M&Ms. In that case the next three sections can help.

Deliberately introduce a bug

Nothing beats the real thing! Some of our most engaged customers have a sneaky practice of slipping a deliberate bug into one of their Antithesis builds, as a test of whether we can catch it. Sometimes this is an artificial bug designed to be hard to find, and sometimes this is re-introducing a true production bug that was previously solved. Hopefully, Antithesis catches the bug immediately. If not, then it’s time to examine the workload, test settings, and logs to see why the bug isn’t being hit. The material in Finding more bugs can help, as can our professional services team.

Do we find known bugs every time?

There are many advantages to finding a bug right after it’s introduced. For example, it makes bisection easier, and it makes bugs cheaper to fix because your developers have not yet lost context on the issue. But another reason latency to finding a bug is important is that it’s an indication of testing strength: if the common bugs are found immediately, then the rare bugs may be found eventually. But if the common bugs are found slowly, then you’ll probably never find the rare bugs at all!

For this reason, it’s good to pay attention to the frequency and reliability with which particular issues are found. Once an issue has been discovered by Antithesis, do we then discover it with every subsequent test until it’s fixed? If not, you should be dissatisfied and should ask us about this mystery. It could just be that this bug is very rare and difficult to find, and that you should increase the amount of testing you’re doing. But it could also be that this is an easy bug, and that something about your test setup is reducing its likelihood or making it impossible to find in certain situations.

Use sometimes assertions

Perhaps you’re uncomfortable with the idea of introducing deliberate bugs into your build, or perhaps you only have a single toolchain and cannot follow this practice safely. In that case, the Antithesis feature of Sometimes Assertions can achieve much the same goal. A sometimes assertion is like the opposite of a conventional assertion. Instead of asserting that something always or never happens, it instead asserts that something sometimes is able to happen.

We can use these as brown M&Ms in a very straightforward way: simply determine the preconditions that lead up to the bug you’re interested in, and add a sometimes assertion that those preconditions are met. For example, suppose there’s a bug that requires a transaction rollback to occur while memory use is high. Instead of re-introducing the bug, add a sometimes assertion in the transaction rollback code which asserts that more than a certain fraction of memory is in use. Now every Antithesis test run will check that at some point in the testing, that combination of events occurred. This can give you great peace of mind that we would find that bug right away if it were ever accidentally reintroduced, and that we would be able to find other similar bugs.