Meet the Test Composer

You set up your etcd cluster and got it running in our environment in part 1. Now, you’ll continue working in the same project directory to test it. The source code for this part of the tutorial is here.

Testing a distributed datastore

A distributed datastore needs to be consistent. You put data into it, get a successful response, fetch it in the future, and it matches your expectation.

While it’s easy enough to write a script that inserts and reads key-value pairs, making it into a meaningful test that simulates production conditions – with parallel requests, a varied cadence, and a faulty environment – is more complicated.

Antithesis simplifies this greatly by providing a Test Composer – a framework that takes care of the parallelism, variation in command order, and more. All you need to provide are the basic functions that exercise the system.

1. Create a test command

Start by creating a new directory in your client.

$ pwd
.../etcd-antithesis/client
$ mkdir python-generate-traffic && cd python-generate-traffic

Then add some helper functions to make requests to the cluster using Antithesis’s Python SDK.

$ mkdir resources && cd resources

etcd-antithesis/
    client/
        python-generate-traffic/
            resources/
                helper.py

import etcd3, string

# Antithesis SDK
from antithesis.random import (
    random_choice,
    get_random,
)

def put_request(c, key, value):
    try:
        c.put(key, value)
        return True, None
    except Exception as e:
        return False, e

def get_request(c, key):
    try:
        response = c.get(key)
        database_value = response[0].decode('utf-8')
        return True, None, database_value
    except Exception as e:
        return False, e, None

def connect_to_host():
    host = random_choice(["etcd0", "etcd1", "etcd2"])
    try:
        client = etcd3.client(host=host, port=2379)
        print(f"Client: connected to {host}")
        return client
    except Exception as e:
        print(f"Client: failed to connect to {host}. exiting")
        sys.exit(1)

Add helper functions to generate some random strings to insert.

def generate_random_string():
    random_str = []
    for _ in range(16):
        random_str.append(random_choice(list(string.ascii_letters + string.digits)))
    return "".join(random_str)

To simulate multiple requests inserting data into the datastore, draw a random number from a 1-100 range to represent the number of requests in the traffic. Generate random strings and insert them as key-value pairs.

Another helper function to generate_num_requests.

def generate_num_requests():
    return (get_random() % 100) + 1

Now we have some helper functions that insert key-value pairs into our etcd cluster, and we’ll use the Test Composer to orchestrate them.

The Test Composer relies on an opinionated framework that identifies executable scripts as test commands using a naming convention. There are a few types of test commands, but we’ll only use the parallel driver command in this tutorial.

To make a script a parallel driver command , all we do is name the file parallel_driver_<name> – in this case, parallel_driver_generate_traffic.py.

etcd-antithesis/
    client/
        python-generate-traffic/
            resources/
                helper.py
            parallel_driver_generate_traffic.py

A test command is an executable and requires an appropriate shebang in the first line. Later, we’ll also mark it as an executable.

#!/usr/bin/env -S python3 -u

import sys
sys.path.append("/opt/antithesis/resources")
import helper

def simulate_traffic(prefix):
    """
        This function will first connect to an etcd host, then execute a certain number of put requests. 
        The key and value for each put request are generated using Antithesis randomness (check within the helper.py file). 
        We return the key/value pairs from successful requests.
    """
    client = helper.connect_to_host()
    num_requests = helper.generate_requests()
    kvs = []

    for _ in range(num_requests):

        # generating random str for the key and value
        key = prefix+helper.generate_random_string()
        value = helper.generate_random_string()

        # response of the put request
        success, error = helper.put_request(client, key, value)

        if success:
            kvs.append((key, value))
            print(f"Client: successful put with key '{key}' and value '{value}'")
        else:
            print(f"Client: unsuccessful put with key '{key}', value '{value}', and error '{error}'")

    print(f"Client: traffic simulated!")
    return kvs

Notice that it’s okay for put_request to be unsuccessful during faults, and it should not break the system.

You’ve inserted some data into the distributed datastore. Now see if the values match.

validate_puts will fetch and match the value for all the successfully inserted keys.

def validate_puts(kvs):
    """
        This function will first connect to an etcd host, then perform a get request on each key in the key/value array. 
        For each successful response, we check that the get request value == value from the key/value array. 
        If we ever find a mismatch, we return it. 
    """
    client = helper.connect_to_host()

    for kv in kvs:
        key, value = kv[0], kv[1]
        success, error, database_value = helper.get_request(client, key)

        if not success:
            print(f"Client: unsuccessful get with key '{key}', and error '{error}'")
        elif value != database_value:
            print(f"Client: a key value mismatch! This shouldn't happen.")
            return False, (value, database_value)

    print(f"Client: validation ok!")
    return True, None

Now bring it all together.

if __name__ == "__main__":
    prefix = helper.generate_random_string()
    kvs = simulate_traffic(prefix)
    values_stay_consistent, mismatch = validate_puts(kvs)

values_stay_consistent should be true and mismatch should be None.

2. Add some Assertions to validate

Assertions express properties your system should have, and Antithesis relies on assertions to understand what you’re testing for. Assertions in Antithesis describes the mechanics in a lot more detail.

Antithesis’s SDKs provide many types of assertions, but we’ll only use two here.

The first is an Always assertion – these assertions are similar to the programming assertions you’re familiar with, but they don’t crash your program. They create a property that Antithesis will test, and list in the triage report as passing or failing.

You always want the datastore to be consistent. So, in your parallel driver command, values_stay_consistent must always be true.

Add an always assertion to test that:

if __name__ == "__main__":
    prefix = helper.generate_random_string()
    kvs = simulate_traffic(prefix)
    values_stay_consistent, mismatch = validate_puts(kvs)

	# We expect that the values we put in the database stay consistent
    always(values_stay_consistent, "Database key values stay consistent", {"mismatch":mismatch})

The second assertion we’ll use is a Sometimes assertion (these are so valuable and unusual they get a whole section of documentation to themselves).

When inserting key-value pairs into a distributed datastore in the face of network and environmental faults, it’s okay for some requests to fail. But if none of them succeed then your system is never able to insert keys into etcd and that’s either a bug or a test misconfiguration that needs attention.

Here’s what a sometimes assertion looks like.

sometimes(success, "Client can make successful put requests", {"error":error})
sometimes(error!=None, "Client put requests can fail", None)

The first parameter is the something that should happen sometimes. The second describes the property we’re asserting.

Here’s the traffic simulation and validation function now:

def simulate_traffic(prefix):
    """
        This function will first connect to an etcd host, then execute a certain number of put requests. 
        The key and value for each put request are generated using Antithesis randomness (check within the helper.py file). 
        We return the key/value pairs from successful requests.
    """
    client = helper.connect_to_host()
    num_requests = helper.generate_requests()
    kvs = []

    for _ in range(num_requests):

        # generating random str for the key and value
        key = prefix+helper.generate_random_string()
        value = helper.generate_random_string()

        # response of the put request
        success, error = helper.put_request(client, key, value)

        # Antithesis Assertion: sometimes put requests are successful. A failed request is OK since we expect them to happen.
        sometimes(success, "Client can make successful put requests", {"error":str(error)})
        sometimes(error!=None, "Client put requests can fail", None)

        if success:
            kvs.append((key, value))
            print(f"Client: successful put with key '{key}' and value '{value}'")
        else:
            print(f"Client: unsuccessful put with key '{key}', value '{value}', and error '{error}'")

    print(f"Client: traffic simulated!")
    return kvs

def validate_puts(kvs):
    """
        This function will first connect to an etcd host, then perform a get request on each key in the key/value array. 
        For each successful response, we check that the get request value == value from the key/value array. 
        If we ever find a mismatch, we return it. 
    """
    client = helper.connect_to_host()

    for kv in kvs:
        key, value = kv[0], kv[1]
        success, error, database_value = helper.get_request(client, key)

        # Antithesis Assertion: sometimes get requests are successful. A failed request is OK since we expect them to happen.
        sometimes(success, "Client can make successful get requests", {"error":str(error)})
        sometimes(error!=None, "Client get requests can fail", None)

        if not success:
            print(f"Client: unsuccessful get with key '{key}', and error '{error}'")
        elif value != database_value:
            print(f"Client: a key value mismatch! This shouldn't happen.")
            return False, (value, database_value)

    print(f"Client: validation ok!")
    return True, None

The assertions you’ve added will show up in the triage report as properties, and the report will show if they passed or failed in testing.

3. Build your client

Now you have a test template with one test command to exercise the etcd cluster.

To package it, add instructions in the Dockerfile.client.

FROM docker.io/ubuntu:latest 

# Install dependencies
RUN apt-get update -y && apt-get install -y pip

# PYTHON:

# Install Python and other dependencies
RUN apt-get install -y python3
RUN apt install -y python3-etcd3 python3-numpy python3-protobuf python3-filelock

# Install Antithesis Python SDK
RUN pip install antithesis cffi --break-system-packages

# Copying executable into Test Composer directory
COPY ./python-generate-traffic/parallel_driver_generate_traffic.py /opt/antithesis/test/v1/main/parallel_driver_generate_traffic.py

# Copying additional resources into a resources folder
COPY ./python-generate-traffic/resources/helper.py /opt/antithesis/resources/helper.py

Remember that test commands must be executables, so make sure the parallel-driver script is.

$ pwd
.../etcd-antithesis/client/python-generate-traffic
$ chmod 777 parallel_driver_generate_traffic.py

Build your client container image. Replace $TENANT_NAME with your tenant’s name.

$ docker build . -f Dockerfile.client -t us-central1-docker.pkg.dev/molten-verve-216720/$TENANT_NAME-repository/etcd-client:v1

To make the simulation more realistic, you should always run multiple client containers using a single container image.

Update the docker-compose.yaml to configure 2 client containers. The client containers must be kept running for Antithesis to keep testing. Add a sleep infinity entrypoint for them.

  client1:
    image: 'etcd-tutorial-client:v1'
    container_name: client1
    entrypoint: "sleep infinity"

  client2:
    image: 'etcd-tutorial-client:v1'
    container_name: client2
    entrypoint: "sleep infinity"

Rebuild your config image:

$ docker build . -f Dockerfile.config -t us-central1-docker.pkg.dev/molten-verve-216720/$TENANT_NAME-repository/etcd-config:v2

Follow these steps to push your updated container images to the Antithesis registry. Also, make sure to modify the curl request accordingly.

In practice, you might want to check your test commands and set up locally before running it in Antithesis.

4. Run your test

Now, call the curl command to kick off a test run. Here is the modified request.

Remember to change user, password, <tenant> and antithesis.report.recipients accordingly.

curl --fail -u 'user:password' \
-X POST https://<tenant>.antithesis.com/api/v1/launch/basic_test \
-d '{"params": { "antithesis.description":"basic_test on main",
    "antithesis.duration":"30",
    "antithesis.config_image":"etcd-config:v2",
    "antithesis.images":"docker.io/bitnami/etcd:3.5;etcd-health-checker:v1;etcd-client:v1", 
    "antithesis.report.recipients":"foo@email.com;bar@email.com"
    } }'

This is still set up to run for 30 minutes, so you’ll get a new triage report within an hour.

To recap, your first test run validated that Antithesis could run your etcd cluster. We saw your containers come up in the right order, and your system signaled to Antithesis that it was ready to test.

Here, we added a test template in the client container to actually make your system work.

There’s a lot of depth to test templates, and iterating on your test template is a great way to improve your testing. Check out that section of the docs here.

Within an hour, you’ll receive an email with a link to a triage report.