singleton_driver commands.
Introduction
The goal of testing a piece of software, at least notionally, is to exercise all of the functionality in the system, but that’s very hard because most systems are complex and stateful. For example, if we have an API with two functions,a and b, a naive guess would be that it suffices to have two tests:
- Try everything sometimes
- Notice misbehavior when it happens
- Leverage autonomy
Try everything sometimes
Our goal is to make sure that we have some chance of producing any legal sequence of operations against our API. The most important way to achieve that is to make sure that the entire API surface is actually exercised in the test template. This may seem obvious, but some functionality is easy to overlook, for example: configuration or administration APIs. As much as possible, our test template should bring the system up from a cold start, using configuration or administration APIs to get it ready before testing the other functionality.Don’t forget “good” crashes
Sometimes our software is supposed to exit. It’s tempting to treat “expected” panics, shutdowns, or failures as false positives and try to avoid them, but this is a mistake! Properties like “if a certain connection is unavailable for too long, the system shuts down,” “a surprise-shutdown never results in inconsistent data,” or even “our system eventually recovers from network-driven crashes,” are just as vital as properties about a healthy system, and it’s just as important that they happen sometimes in our tests. In practice, recovery processes tend to hide a lot of bugs, and we want to make sure we have a chance to catch them.Exercise concurrency
A more subtle way in which we can fail to exercise entire categories of functionality in our system is by neglecting concurrency. Most systems support some degree of concurrent use: multiple clients connecting simultaneously to a service, multiple concurrent transactions on a database, or multi-threaded access to an in-process library. If our system supports any of these modes of behavior, then we also need to exercise it in this way. The amount of concurrency (number of threads, number of simultaneously running containers, or number of pipelined asynchronous requests) is also an important tuning parameter. Having too much concurrency could swamp a service and cause it to fail in uninteresting ways, or could simply make the tests very inefficient. The Test Composer can take care of managing parallelism for you, and provides tools for managing the amount of concurrency in the system, but if you’re writing a test template from scratch, you may want to expose the degree of concurrency as a parameter that you can vary.Notice misbehavior when it happens
Many of our most important test properties stem from assertions in our code, but they tend to have very local views of the system. Since it stands outside the rest of the system, the test template has a great view of external or end-to-end properties, and ought to take advantage of that.Validate continuously
One common mistake is to only validate system function in a validation phase at the end of your tests, like this:Validate eventually
At the same time, other properties, like availability, can be trickier to express. While we try to architect our systems to be robust to the real-life failures we face, it’s simply true that a test which (for instance) relies on querying one of our services cannot pass while the network link between the workload and that service is down. The properties that we really care about in cases like these are that eventually, when conditions are better, our system is able to recover. It’s particularly important that our test template distinguishes between always and eventually properties to prevent our tests getting cluttered up with false positives. If our test crashes or logs fatal error messages when it encounters “expected” errors, that will mask real bugs in our client library, which does need to work in production in the face of such issues.Validate at the end when necessary
There are advantages to validating throughout a workload, but some powerful properties only make sense when there’s no more work to do. Properties that fit here are things like checking that our data is consistent, making sure a process finishes gracefully, or looking at the actual results of some systemwide operation.Leverage autonomy
One of the great strengths of autonomous testing is that it will frequently flush out bugs that test-writers can’t predict, by using randomness. Every part of our test is an opportunity to increase its randomness. In addition to randomizing the functions we call, the order in which we call them, and the inputs we give them; we can double down and randomize things like:- How is the system configured?
- How many processes are running at a time?
- How long does the test run?
- When do we check that things look the way we expect?
a twice in a row was more likely to find a bug. It might be more tempting to write this code:
a -> b -> a without a second intervening b. It’s most important to make sure that we aren’t inadvertently ruling out a possible sequence of test actions, since that creates an opening in which a bug can hide.
Again, it’s possible to manually write a test template that accounts for all of this, but we believe the Test Composer is an extremely helpful — and powerful — tool in this regard.