Triage report

The Triage Report provides a high-level overview of one or more test runs. Every time you run a test, a triage report is automatically generated and emailed to your team. The report covers information about the high-level properties of your software and embeds basic debugging information for suspected violations of those properties. The report is here to help you decide whether you need to conduct a deeper investigation.

The triage report enables you to tell at a glance whether your testing has identified any new issues in your software. When your tests are run consistently, this report surfaces problems with recent changes so you can fix them immediately. This saves time in root-cause analysis and debugging.

Ideally, you will build and test your software nightly. With this nightly workflow, you only need to consider a very limited set of changes as potential sources of the new bug. This allows you to integrate and ship features more rapidly, while remaining confident that you are not exposing customers to bugs.

The three main sections of the triage report are: (1) Environment, a description of the software that was tested, (2) Utilization, high-level statistics about the run as a whole, (3) Properties, individual pass/fail information about each of the test properties that you have configured. Each section is described in more detail below.

This document does not discuss how to start using Antithesis in the first place. For more information about placing your software under test and triggering test runs, please read our getting started guide.

The report

The rest of this document is a detailed walkthrough of an example triage report. We encourage you to follow along by referencing an interactive example report here.

Collapsed sections may be expanded by clicking the arrow on the left.

Environment

The environment section summarizes information about the software under test. It lists each of the container images that you provided to Antithesis. For each container it displays the following information:

  • The tag (if any) at which you told us to pull your containers. This is useful as a sanity check and to differentiate between test runs that test different versions of the same software.

  • The immutable digest of the container that was downloaded from the registry. This is useful because if you need this exact version of the software, you can pull the container from the registry at this digest.

  • The date on which your containers were built. This is useful for finding bugs in your CI integration that result in images silently failing to be sent to us. Antithesis works best when you are testing regularly with the newest versions of your software.

Finally, it also contains all of the above information for your configuration image.

Utilization

The utilization section assists in optimizing your Antithesis usage. You might wonder: should you run your existing tests with more parallelism? Or, should you instead strengthen your testing by improving the test template or by adding sometimes assertions? This section helps answer that question by graphing the number of new behaviors discovered over time.

If the graph is still increasing at the end of your run, this indicates that you have not yet hit diminishing returns on your existing tests, and would benefit from running with more parallelism. Conversely, the graph becoming entirely horizontal indicates that longer testing is unlikely to provide additional value. In this case, you should try to strengthen your testing instead.

It is normal for autonomous testing to discover a great deal of new behavior early and for this to plateau as low-hanging fruit is picked. Thus the graph will generally flatten out even for valuable tests – however, it should not hit an asymptote.

For more details, see our guide to sizing Your Antithesis deployment.

Properties

The properties section gives insight into the high-level properties of your system. This section shows if recent changes have introduced bugs (and thereby caused desired properties to be violated). Antithesis uses autonomous testing to assist you in finding bugs quickly with minimal time spent by developers; some of Antithesis’s largest customers write fewer than 50 total properties to test their software. You only need to reason about your software at a high level and then write high-level tests declaratively. More specifically, you write properties about your software – always properties, or properties that ought to always hold (like your software doesn’t crash); and sometimes properties, or properties about behavior that ought to happen at least once. When you have declared these properties, Antithesis is ready to initiate a test run.

For each test run, Antithesis will explore many possible execution histories of your software by varying inputs and environmental conditions. Antithesis generates test cases autonomously using the properties you have defined and feedback generated from running your software. Antithesis will search for counterexamples to always properties and examples of sometimes properties. At the end of the test run, Antithesis summarizes the status of these properties. If a property fails it often means that a bug has been identified.

Antithesis comes out of the box with a large set of default properties that should apply to any system. For example, processes should never crash, run out of memory, fill up the hard disk, etc. You may configure additional custom properties that are unique to your software. For example, imagine you promise your customers availability if a single node is killed. Antithesis can write this as a custom property in order to verify that each new version of your software keeps the promise. See Test Properties for more detailed information.

Properties may be defined by Antithesis consultants in the course of testing your software. However, you may also define properties in your own code using the Antithesis SDK. These properties will then be tested and the results will show up in your triage reports.

In the report, properties are grouped together to conveniently display them. You may request that they be regrouped or that properties be added to a group.

In its collapsed form, the group summarizes the status of the properties within it. You can expand the group by clicking the arrow to the left of its name, revealing detailed information about each property in the group. Individual properties can, in turn, be expanded.

Pass/fail property example: No 500 HTTP codes

Let us consider one property that a (hypothetical) customer has defined using the SDK.

We may expand the property group “Antithesis SDK: Always assertions” in order to see every “always assertion” that you have written in your code using our SDK. There are two groups of “always properties”: properties that trivially failed due to never being reached are singled out in the second group. Expand the former group “Always assertions.” We see a property that has failed and several that have passed.

A failed property is likely to be a bug. We will now consider this failed property in detail, namely the property “The server never return a 500 HTTP response code.”

This property asserts that no HTTP 500 response code is ever emitted. If Antithesis encounters such a 500 response code at least once, the property fails. This property is only capable of passing or failing, as opposed to the numeric properties discussed later in this documentation. The property in question failed – the report notes that exceptions to this property were found 51 times during this testing. (It also notes there were 801 examples of the property, or HTTP response codes that were not 500 codes).

You can click on the underlined section “801 passing examples and 51 failing examples” to expand it. The expanded section shows the number of passing and failing examples over time.

For each property, the report includes historical information about its status, and also logs, artifacts, and other details.

The historical chart shows how this property has changed over time, but since this is a pass/fail property, it only tells you whether it passed or failed. In this case, the property has failed both for the current test and for six tests in the past. Antithesis works best when incorporated into a nightly build and test process, so that the timeline will show how various versions of the software have introduced bugs or fixed them.

Examples found

The Examples Found section lists a selection of passing and failing examples along with tools for examining them. There are many passing examples (801) and failing examples (51) in this test run. This section generally selects up to 10 passing and 10 failing examples for further inspection – in this case, 2 of each. There will by default be one passing example and one failing example with full logging information and artifacts available: you can click on these top two examples in order to see the full logging history leading up to the example in question. This is customizable, so reach out to your Antithesis consultant if you would like full logging information to be included for more examples.

Copy moment

A major purpose of the Examples Found section is to allow you to investigate a bug in greater detail. For any example listed in the this section, you may click the “Copy Moment” button; this copies all of the information Antithesis needs to deterministically recreate the example. You can use this to generate a multiverse debugging session or request that Antithesis generate a bug report for that moment.

If instead you want to investigate a moment that happens earlier than when the bug manifested, you could navigate to the logs view and copy any moment leading to the bug. You can do so by clicking the “Copy Moment” button that appears as you hover over each log line.

Logs

The report includes all the logs leading up to the moment when the property was violated. This includes application logs, system journal messages, and other information you may need to understand the bug in question. You may customize what is included by default in the logs, either on a per-property basis or for all of the properties in your test run.

The log viewer allows searching and filtering for particular log messages, using either substrings or regular expressions. You can also select which services or software components should have their logs included, using the button on the top left.

Each log line is timestamped with the absolute time at which it was emitted from the simulation. If you are used to debugging distributed systems, this may come as a surprise. The order of the messages is the true order in which they executed on the underlying hardware, and is independent of the values of the system clocks on the simulated nodes. These times remain absolute and useful even if you are injecting clock faults.

Artifacts

You may define a sequence of actions that should be run against your software to generate an artifact whenever we find a violation of some property.

For example, Antithesis might collect data files or gather debug information. These actions may be configured to take place at the moment that the bug is seen. However, Antithesis deterministically simulates software or even entire distributed systems, which enables time travel for artifact generation. Using this deterministic simulation, Antithesis can generate the artifacts at any point leading up to the error being generated; it can even allow the software to continue executing and generate the artifacts in the future.

Gathering the artifacts that you are interested in will make your triage report more useful. By default, when we detect a process crashing, we will additionally gather a core dump at the moment of the crash.

Details

The details section contains additional contextual information about the selected example or counterexample. For example, if a bug was detected by inspecting your program’s logs, it will contain the log line that exhibits the bug. If the bug is that your program ran out of memory, it will contain the amount of memory that was in use.

Numeric property example: peak memory usage

The previous property was only capable of passing or failing. Other properties might have a numeric value, such as Peak Memory Usage. The value of numeric properties is graphed over time where the color still indicates if the property passed or failed. (Usually a failure is defined as being above or below some threshold value.)

This property records the maximum memory used at any point in testing, measured as a percentage of total memory. This property has a value between 0 and 1 and fails if the total memory ever exceeds 95% of available memory.

The sections of Examples Found, Logs, Artifacts, and Details are exactly the same as in the previous example.

Summary

Autonomous testing allows you to focus on reasoning about declarative, high-level properties of your software. Antithesis then tests your software by searching for examples or counterexamples to these properties. This report summarizes the results of the test run; it tells you what properties have failed and gives you logs for preliminary debugging. However, this report only contains summary information about the bug. If you want Antithesis to investigate an issue in greater detail, you may launch a multiverse debugging session or request a bug report.