Workload#

Once you’ve deployed your software into Antithesis, there’s one more thing you need before you can start testing: a workload. A workload is what drives your software, by exercising your system’s functionality. This is similar but not identical to what gets called a “test harness” in other contexts. We cover the important differences between conventional test harnesses and Antithesis workloads further below.

Antithesis has a relatively gentle “ramp” of usage. We have customers who have put very little effort into their workloads (basically just running existing integration tests in a loop) but who have still gotten a ton of value from the Antithesis platform. How much effort to invest in testing is a complicated question with a different answer for every team. But if you’ve decided that you do want to test better, strengthening your workload is one of the highest-leverage ways to accomplish that.

This page is about designing and writing the workload itself. For information on containerizing your workload, pushing it to your secure image registry, and deploying it into Antithesis, see the setup guide.

Workload Basics#

Our ultimate goal is to exercise all of the functionality in your system, but that’s very hard because most systems are complex and stateful. For example, if you have an API with two functions, a and b, a naive guess would be that it suffices to have two tests:

void test1() {
    a();
}

void test2() {
    b();
}

But of course this isn’t true! Functions can have side effects on the state of your system—maybe one of these functions leaves your system in a state where calling the other one will have some new effect. And of course we need to try them in both orders.

void test3() {
    a();
    b();
}

void test4() {
    b();
    a();
}

But wait! Who says we can only call each of those functions once? What if one of the functions contains a memory leak and calling it a thousand times is what causes your program to crash? Pretty soon you can end up in this kind of situation:

void test37411() {
    b();
    a();
    a();
    a();
    a();
    b();
    a();
    ...etc.
}

And all of that is in a simplified model of an API with just two functions that each take zero parameters, and without considering concurrency, resilience to out-of-band faults, network errors, etc. This combinatorial explosion of possibilities is one of the fundamental reasons that testing is so hard, and why getting exhaustive behavioral coverage of your system is often impractical.

The Antithesis approach is the opposite of trying to exhaustively enumerate all the possible test cases like the above examples. Instead, we are going to write a program which runs in a loop, where if you ran it long enough, it would eventually try every possible combination of things that a user or client could do with your system. Of course we can never actually do that, because the space of possibilities is effectively infinite. But the goal is that, if we run this program for a very long time, it will asymptotically approach all of the most interesting behavior. The Antithesis platform will then speed up that process by using coverage instrumentation, Sometimes Assertions, and other forms of feedback to guide which paths are taken.

Randomness is your friend#

Our basic approach is to use randomness. The following code will eventually generate all length-100 combinations of the two functions:

void choose_function() {
    return antithesis.random.choose([a,b]);
}

void test() {
    for (int i = 0; i<100; i++) {
        func = choose_function();
        func();
    }
}

There are now 2^100 possible test cases that can result from running this test. Obviously that’s far too many for us to run all of them, but we don’t have to run all of them, because many are duplicative with each other in the sense of exercising the same underlying behavior in your system. The Antithesis platform will use feedback (coverage instrumentation, log messages, Sometimes Assertions that you create, etc.) to continue running the tests that seem to be producing value, and to prune the ones that aren’t.

Moreover, it can suspend and stop running any test at any point, so in practice it’s fine to make the workload run forever, and to count on Antithesis to stop running a particular branch when it gets “boring”.

void choose_function() {
    return antithesis.random.choose([a,b]);
}

void test() {
    while (true) {
        func = choose_function();
        func();
    }
}

Warning

Many languages, frameworks, and runtimes have built-in PRNG abstractions that are initialized at runtime. Within Antithesis, this is an anti-pattern, because it means that we cannot go back just a little bit and “change history”, but need to restart your program from scratch in order to get a different random sequence. Instead, you should be getting random values directly from the Antithesis SDK, or from the system random devices /dev/random or /dev/urandom. If that’s impractical, then at the very least you should be periodically reseeding the PRNG that you are using from these sources.

Try everything sometimes#

Our goal is to make sure that the workload has some chance of producing any legal sequence of operations against your API. The most important way to achieve that is to make sure that the entire API surface is actually exercised in the workload. This may seem obvious, but some functionality is easy to overlook, for example: configuration or administration APIs. A good way of making sure you aren’t omitting anything important is by requesting a coverage report from Antithesis, and looking to see if there are any major files or functions that are entirely omitted by the tests.

A more subtle way in which a workload can fail to exercise entire categories of functionality in your system is by neglecting concurrency. Most systems are designed to support some degree of concurrent use—whether that’s multiple clients connecting simultaneously to a service, multiple concurrent transactions on a database, or multi-threaded access to an in-process library. If your system is designed to support any of these modes of behavior, then we also want the workload to exercise it in this way.

The most obvious way to do this is by building a concurrent workload—either with threaded “tasks”, or by sending additional requests to an asynchronous API without waiting for the previous ones to finish. An easier option, with different performance and coverage tradeoffs, is simply to run multiple copies of your workload in parallel, as if they were independent clients. Both of these options slightly complicate the process of writing good assertions or validation checks in your workload (see next section), but they’re definitely worth it.

The amount of concurrency (number of threads, number of simultaneously running workload containers, or number of pipelined asynchronous requests) is an important tuning parameter of the test. Having too much concurrency could swamp your service and cause it to fail in uninteresting ways, or could simply make the tests very inefficient. It’s best to expose the degree of concurrency as a parameter that we can vary. Ideally the parameter could be changed at runtime, but even if it can only be set on test setup, that’s better than having it hard-coded.

Validation and workload assertions#

The workload’s primary job is making your software do something, but it can also be an excellent source of information about whether there are bugs. Note that this is a supplemental source of information: we should already be finding lots of bugs via things like assertions in your software, log analysis, sanitizers, default properties that detect crashes of resource leaks, etc. Nevertheless, the workload can be a powerful place to put assertions, because it knows what operations it’s performing, so it may also know what the responses or triggered behaviors should look like.

Before we get to that, however, the most important thing is just not to clutter up the existing signal with false positives due to a workload that isn’t sufficiently resilient. Remember, the Antithesis platform is going to be performing fault injection against your systems and services. This means that it will be normal for your workload to encounter network errors, dropped connections, and other effects of fault injection. It’s very important that your workload retry operations as appropriate, and above all that it not crash or log fatal error messages when it encounters “expected” errors. Such behavior could mask real bugs in your client library, which does need to work in production in the face of such issues.

Once we’ve gotten rid of all false positives, it’s a good idea to consider adding workload assertions. The workload knows what operations it has performed on your system, so it can maintain a local model of what the state of the system should be, and occasionally validate the results. For example, suppose your workload creates a set of users in your system. The workload could record each success response and add the corresponding user to a local data structure. It can then periodically query your system for a list of all users, and verify that the two lists match. Depending on the exact semantics and guarantees provided by your system to its clients, you may be able to write even stronger assertions, including ones that should be upheld in the face of arbitrary concurrency. Please Contact us if you need help designing good assertions in your workload.

One common mistake is to only run all of these assertions in a validation phase at the end of your tests, like this:

void validate_system() {
    ...
}

void choose_function() {
    return antithesis.random.choose([a,b]);
}

void test() {
    for (int i = 0; i<100000; i++) {
        func = choose_function();
        func();
    }
    validate_system();
}

This is an anti-pattern for three different important reasons. The first is that it means we need to run the entire test to completion before we can tell if a bug has occurred. Depending on how long the test is and how many resources it uses, this can be very inefficient! A second and more important problem is that this introduces a test weakness, because it’s possible for bugs to “cancel out”. Imagine if the test is able to provoke your system into a broken state, and then later, by random luck, it gets back out of that state again. The third reason is just that it makes debugging more difficult if there’s a very long and complicated history leading up to the bug, most of which is irrelevant. For all these reasons, we recommend that you validate “continuously”, with a repeating pattern of work → validate → work, like this:

void validate_system() {
    ...
}

void choose_function() {
    return antithesis.random.choose([a,b]);
}

void test() {
    for (int i = 0; i<100000; i++) {
        func = choose_function();
        func();
        if (i % 50 == 0) {
            validate_system();
        }
    }
}

In some situations, because of the underlying semantics of your system, it may be difficult or impossible to write assertions that can be checked while the workload is running or while fault injection is taking place. For example, if your system is eventually consistent, it may require some time after all operations and faults have stopped before each of the replicas give the same answer. However, it would be very useful to test that, in this situation, they do in fact eventually give the same answer! In this situation, we recommend packaging the validation check as a separate program or script, and telling us how to run it. Then, during the course of testing, the Antithesis platform will periodically pause either or both of the workload and the fault injector, wait some time for things to settle out, and then run the validation. In addition to testing eventually consistent systems, this technique is very useful when checking for availability or “uptime” properties.

Is it working?#

The gold standard for whether your workload is doing what you need is simple: does it find bugs? Does it find them every time? Does it find the brown M&Ms? Sometimes it isn’t straightforward to tell how many bugs are being left on the table, in which case we recommend adding sometimes assertions to your code and seeing if they are reached.

Sometimes assertions can be useful in your workload as well! Just as a sanity check, it can be very useful to assert that the workload is starting up, reaching its validation phases, and is generating the various sorts of requests that you expect it to generate. A great benefit of autonomous testing is that it can often find many issues that it wasn’t even designed to look for, but the corresponding danger is that even a workload that is broken or barely running can often produce impressive amounts of coverage. It’s important to make sure you aren’t fooling yourself with the test results, and putting Sometimes Assertions into the workload itself is one powerful approach to addressing this risk.

Avoiding pitfalls#

Another way that the intelligent search capabilities of Antithesis bear on workload design is that instead of trying to heavily tune the random distributions of your workload, it’s most important to just make sure that everything can happen sometimes. For example, suppose you thought that in practice, calling a twice in a row was more likely to find a bug. It might be more tempting to write this code:

void choose_function() {
    return antithesis.random.choose([a,b]);
}

void test() {
    while(true) {
        func = choose_function();
        func();
        func();
    }
}

If you’re right, then this version will find bugs slightly faster on average than the previous version. However it’s guaranteed never to find a bug that requires the sequence a -> b -> a without a second intervening b. In general, it’s most important to make sure that you aren’t inadvertently ruling out a possible sequence of test actions, since that creates an opening that a bug can hide in. Writing your workload in a way that makes known tricky behavior especially likely is great, but it’s important to avoid accidentally making other behaviors especially unlikely or impossible.