What is Antithesis? How we’re different Problems we solve Security approach Demo Fintech Customer stories Working with Antithesis Contact us Backstory Leadership Careers Brand
Will Wilson
CEO

Fix the new things first!

About a decade ago I shattered my tibia, and had to spend a few months in a wheelchair. Immobility, plus constant background pain, soon made me depressed and grouchy. One day I was at the doctor’s office and the harried physician, eyes fixed firmly on my medical chart, said to me: “you seem demoralized, you should go outside, walk around, go for a hike, get a little exercise…” Then she turned, saw me sitting in the wheelchair, and said: “oh.”

Sometimes I worry I sound like that doctor when I talk to engineering teams.

I keep talking about how once you have zero bugs, and powerful testing that enables you to stay at zero bugs, your productivity undergoes a phase change and you become a completely different software delivery organization. Having zero bugs makes it easy to stay at zero bugs, because there’s no need to triage or do root cause analysis when issues pop up, you just revert to the version that had zero bugs.1 The amount of time this saves is almost impossible to describe.

Some people instantly get it. They have zero bugs, they love it, and they’re hungry for anything that makes it easier for them to stay at zero bugs. I talk to other people and they look at me like… well, probably like I looked at that doctor. “You’re telling me that I can solve my problem by doing something which presumes that my problem is already solved. Thanks.”

In addition to the “Normal Engineering Team” and “Zero Bugs Promised Land” operating modes, there’s a third way a team can operate. I call it the “Software Quality Death Spiral,” where you have so many bugs you’re spending all your time dealing with production emergencies or slapping band-aids on things, which means you don’t have time to fix anything for real, and the number of bugs actually grows over time. Needless to say, people in this zone are not excited when I try to sell them a product that will find more bugs.

But they should be. Because finding bugs as quickly as possible and as dependably as possible is actually the key to reaching the Zero Bugs Promised Land – whether you’re starting as a “Normal Engineering Team” or from the “Software Quality Death Spiral.” It all comes back to one dumb trick: fix the new things first.

To understand the “Promised Land” and the “Death Spiral,” we need to think about derivatives. There is some rate, r1, at which new bugs are getting introduced into your system through normal feature work, increasing scale, etc. And there is some rate, r2, at which your team is identifying and fixing bugs. Most engineering teams subconsciously adjust the proportion of effort that they spend on new features and bugfixes, such that averaged over a few months r1 roughly equals r2 and the number of total bugs in the backlog doesn’t dramatically grow or shrink.

If you aren’t familiar with calculus, imagine a bucket with a hole in it, that’s also getting filled by a hose. If water is flowing out the leak faster than water is coming in, the bucket will gradually drain. If the leak is slow and the hose is large, it will gradually fill. And if the two are roughly the same, then the amount of water in the bucket will stay about the same. That bucket is your backlog.

The key insight is that r2, the rate at which things are flowing out of the bucket, is not endogenous, it’s affected by what’s already there. The "Promised Land” effect happens because when you have zero bugs, r2 suddenly gets much larger, because new bugs stand out and are easy to fix. This makes it easy to stay at zero bugs once you get there. Conversely, the “Death Spiral” happens because when the bucket gets sufficiently full, r2 gets constricted, as your team gets swamped. The bucket fills further, and the problem gets worse.2

This sounds like we’re just restating the situation, but the thing is, r2 is actually a leaky (heh) abstraction. Some bugs are faster to fix than others. There’s a subset of bugs that already exhibit “Promised Land” dynamics if you’re able to distinguish them – the newest ones! A new bug, if it can be quickly and reliably detected, stands out the way that all bugs do when you’re in the Promised Land, and accordingly can be hunted down and fixed with far less effort than if you allow it to enter the backlog and fester.

This suggests a strategy: fix the new things first. However many bugs you have, whether it’s zero or a million, doesn’t matter. I’ve been in both situations. I don’t judge. What matters is that you draw a line in the sand and say: “not one more.” Take that whole giant list of bugs you’ve got and write it down somewhere if it makes you feel better, then forget about them. Look, now you have zero bugs! Try to keep it that way.

I’m obviously kidding… somewhat. You do still care about that list of old bugs. Any one of them could rear its head in production and ruin your day. But you’re making a conscious decision not to work on them until the list of new bugs is at zero. This allows you to reap the artificially high r2 of newly-introduced issues, so it won’t take you that much effort to keep the new bugs at zero. Then with your newly freed-up time, you can work your way through the backlog and dig yourself out.

Note how different this is from the standard way of prioritizing which issues to work on, usually in order of severity. The severity-based approach would make sense if there were one day left to live. But this is an iterated game, and investing in your team’s productivity by fixing the new things first will lead to a much better outcome eventually.

All of this is a long way of explaining why we’ve completely changed the UI for Antithesis.

In the old days, when you launched an Antithesis run, you’d get a report listing all the properties you’d defined for your system (plus all the ones we include by default): green if they were passing, and red if they were failing. This UI totally works for people who are living in the “Zero Bugs Promised Land,” where every single one of those properties is usually green. Any new red really sticks out, and they immediately charge in and solve it (perhaps using one of our cool debugging features).

That’s great for the 0.001% of the population in that situation, but I’m not excited about that as our TAM. We’re not only trying to help people stay at zero bugs, we want to get everyone there.

As soon as you are not in the “Zero Bugs Promised Land,” this view becomes very hard to interpret. Which of the red properties are new issues I should be worried about introducing into prod, and which are problems we’ve been living with for some time and will get around to fixing… someday? Making these determinations requires carrying an enormous amount of state in your head. It makes triage hard and slow, and it makes Antithesis less useful if you’re not incredibly disciplined about looking at the results all the time. Yuck!

The obvious solution is to transform this view from a point-in-time snapshot of your results into a diff against the results of your previous test run. That makes it easy to fix the new things first, which saves a lot of time and eventually helps you make it to the Promised Land – whatever stage of the quality journey you’re at.

Sounds easy right? It actually turned out to be pretty subtle, and to require some clever statistics. For example: if a property fails one night, and then passes the next night, and then fails again; was it fixed and reintroduced, or is the bug just really hard to find? Since we’re simulating your code across a multiverse, we can actually give intelligent answers to such questions. I learned all kinds of exciting new vocabulary from this project like “survival curves” and “tree-weighted historical correlation.”

But fortunately you don’t need any of that to use the new Findings feature, which is available to all of our customers. Today.