The Testing Pyramid is upside-down

August 6, 2024

Anteater on pyramid

Imagine you add a bunch of unit tests to get your coverage up from 60 to 80 percent to make the boss happy. Later that day (it’s Friday of course) you push a release to production, content in the knowledge that it is covered by your new tests…

Suddenly, your product manager pings you on Slack: their demo stopped working. Confused, you connect into the cluster and a particular resource stands out to you. It was marked as deleted but still existed, causing a naming conflict with the resource the product manager was trying to create. Your tests? All your tests passed, but none of them caught this system-level issue that let the deleted resource refuse death.

That’s exactly what happened to me a few years back while working at a startup. A subsequent end-to-end (E2E) test caught the bug, which turned out to be a race condition between two distributed components. I had initially avoided the E2E test because it would have required significantly more effort to write–and indeed, it did. But, that day, I learned why E2E tests, despite how much they suck to write, should be your first line of defense when building an application.

But why do E2E tests suck?

My decision to procrastinate writing comprehensive E2E tests is not unique. Many of my peers and friends have told me similar stories. Most teams–whether large or small–limit the number of E2E tests they write. Why? Writing, maintaining, and debugging E2E tests requires significant effort, not to mention how flaky they can be (many of us have experienced how a single unreliable E2E test can fail your entire CI/CD pipeline). Even when implemented, these tests are often reserved for light sanity checks rather than thorough testing of real-world scenarios, since their resource-intensive nature makes too many E2E tests painfully slow. E2E tests are hard.

In response to these pain points, many developers adopt the testing pyramid approach. This framework emphasizes having more unit tests, which are faster to run, easier to debug, provide quicker feedback, and are easier to maintain, while having few slower and more costly E2E tests.

Keeping up with the times

The original testing pyramid was created in the early 2000s, a time when software architectures were much simpler and less distributed. In this context, it’s easy to see the motivation for the framework’s emphasis on unit tests. Unit tests are cheaper to write, run, debug and maintain – so why not have more of them? The problem is that it’s avoiding the problem. Rather than providing a solution for the high costs of comprehensive E2E tests, the framework circumvents the issue by suggesting that the bulk of tests should be unit tests (the base) as a way to balance reliability with productivity. But unlike unit and integration tests, E2E tests capture crucial aspects of system behavior that only they can reveal–regardless of test coverage.

Moreover, today’s software systems–whether single-node or microservices–have grown immensely complex, making them much harder to reason about and increasing the importance of comprehensive E2E testing. Without which we cannot confidently predict how our complex systems will perform under actual usage. This necessity is especially true in distributed architectures, where the increased interplay between components harbors critical functionality.

Modern software complexities make it impractical to deprioritize E2E tests as the testing pyramid suggests. What if instead we could flip the testing pyramid upside-down? That would require taking a step back and solving the pain points that made E2E testing so unappealing in the first place.

Requirements for a 10x better testing API

The reduction in productivity and increased costs associated with E2E tests are not inherent to E2E testing itself, but rather to the usability of existing testing tools. It’s an overlooked opportunity to significantly improve the developer experience, so let’s take a look at what a testing API needs to address in order to test system reliability while eliminating the common pain points of writing E2E tests.

Short feedback loops

The time from writing an E2E test to running it and getting results should be short. This requirement should reduce or eliminate the need for extensive manual configurations for each individual test to achieve high test coverage with low effort. This rapid feedback enables developers to iterate quickly and not shy away from testing their systems under more complicated failure modes (like I originally did).

Low maintenance

Code changes occur frequently. To accommodate this, E2E tests should evolve alongside your code changes without manual intervention. Changes to your system’s implementation should not break the tests. This adaptability enables developers to refactor their system quickly without worrying about fragile tests that break with every minor change.

Reduce needless interruptions

The pass/fail status of a test should be solely dependent on the code being tested, not influenced by external factors (e.g., timing, network interruptions) unless these external factors are deliberately introduced. This consistency allows developers to maintain focus and avoid interruptions caused by flaky tests or false alarms.

Easy debugging

When problems are identified in the system, they should be straightforward to understand and recreate. This requirement should hold true even for traditionally elusive bugs that may seem to appear randomly, such as heisenbugs. Easy debugging reduces the cognitive load on developers, allowing them to focus on solving the problem rather than struggling to reproduce and understand it.

These requirements outline a testing API that eliminates the common pain points of end-to-end tests (e.g, harder to write, maintain, flakier, and more difficult to debug), removing the need to choose between productivity and reliability.

The first requirement calls for the ability to rapidly search for bugs in the system. This search should explore various scenarios and environments, which must also be quickly set up, ideally in an automated manner to also help with the second requirement of maintainability. The third and fourth requirements hint at the ability to consistently control the entire test execution end-to-end.

What we’ve built

With these requirements in mind, we’ve taken on the challenge of developing our own solution. Central to its design are two key engineering efforts:

Ensuring determinism

If we ensure tests and their environments are deterministic, we can address the third and fourth requirements for our testing API:

Eliminate flakiness and needless false alarms
Enable easy debugging and reproduction of bugs

We achieve this by having our deterministic hypervisor create a controlled environment where your system’s operations–including timing, I/O, and execution paths–are fully deterministic and analyzable. We also mock popular external APIs (e.g. AWS) that your system might call, ensuring a consistent, reproducible environment for reliable testing results.

Searching autonomously

If we remove the manual labor from setting up and maintaining test cases, we can address the first and second requirements for our testing API:

Short feedback loops for comprehensive tests
Evolve tests alongside your code changes

We achieve this automation by allowing you to focus on defining the higher-level expected behavior of your software through our assertions SDKs, rather than manually writing individual test cases. We then use a searching algorithm based on the probabilities of where bugs are most likely to occur in your system. This approach allows us to focus our testing efforts on these high-risk areas, leading to a very high level of confidence in the reliability of your system without needing to search for a long time. It also differs from traditional randomized testing, which often inefficiently hunts for bugs by repeating the same actions, rather than exploring new, creative ways your system might misbehave.

002

E2E testing as the new base

Sometimes, when a process is annoying or expensive, the correct approach is to do less of it and more of something else. But if it’s possible to do so, it’s almost always better just to fix the underlying problems! By solving the original pain points that made E2E testing so frustrating, we’ve flipped the testing pyramid upside-down and we can now have both high reliability and high productivity. This approach to testing catches your bugs quickly and reliably, giving you and your users the ultimate confidence in your applications. If you don’t want Friday afternoon surprises where your product manager’s demo stops working due to unforseen bugs, contact us.

You made it to the end! Grab some stickers

Place them anywhere and watch the compliments compile.

Get free stickers