“How should we test this?”

“Let’s just run it in production and monitor it closely.”

— You and your coworker, probably.

While often mocked, testing in production is the most definitive way to ensure that your system is operating as expected.  Segment has been on a journey for the last 18 months to include end-to-end testing in production as part of our broader testing strategy, so we wanted to share some of the work we’ve been doing in this area.

For those unfamiliar, Segment is a Customer Data Infrastructure which helps our customers route data about their users from various collection points (web, mobile, server-side) to hundreds of Destinations (partners which receive data from segment) and data warehouses.  

The numerous components which compose Segment’s backend creates a challenging environment for testing in general, but especially in production. To manage this complexity, we’ve decided to focus on two areas.

First, we’ve been building towards a staging environment that faithfully represents our production environment. Second, since we cannot cost-effectively operate a staging environment at the same scale as our production environment, we’ve been developing end-to-end tests for production.

Much has been written about other types of testing, so we’re going to focus on end-to-end testing in this post.

End-to-end tests are tests which run against the entire infrastructure. They are distinct from integration tests because they’re run on real infrastructure whereas integration tests are not. These tests are also distinct from unit tests, which only test a very small amount of code or even just one method.  End-to-end tests should also exercise the exact same code paths used by a customer sending data to Segment’s API.

So what does an end-to-end test look like for Segment?

  1. Send an event to the Segment Tracking API

  2. Process that event through our many streaming services (e.g. validation, deduplication, etc)

  3. Send the event into Centrifuge, which handles reliable delivery of events to Destinations in the presence of network timeouts or other failures outside of our control

  4. Verify that the event is received by a Webhook destination

  5. Emit latency and delivery metrics to alert on using segmentio/stats.

To implement this kind of test, we required an end-to-end testing framework that would make it easy for developers to build new tests.

When we started looking at solutions, we played around with some other end-to-end frameworks with varying degrees of success. They often incorporated ideas about contracts and assertions which were tightly coupled to the framework. This not only made it difficult to add new types of tests, but it also made them difficult to debug.

Before we had end-to-end tests, our staging environment wasn’t effective at preventing bugs from getting to production.  Software is updated more frequently in staging, often being a week or so ahead of the production version.  Additionally, configuration of the staging environment was haphazard and occasionally broke due to changes in the software. These breaks were often silent because we had not been monitoring them in staging.

Today we’re open sourcing Orbital, a framework which meets the requirements presented above and helped us reach our testing goals.

Orbital provides the means to define, register and run tests as part of a perpetually-running end-to-end test service. Additionally, it provides metrics (using segmentio/stats) around test latency and failure rates which we can monitor and alert on.


Orbital is a lightweight test framework for running systems tests defined in Go.  Orbital is inspired by Go’s own testing library, specifically the testing.T abstraction.  testing.T is a struct that gets injected into each test which defines a set of methods to determine whether or not that test passed.  We like Go’s testing package for two reasons.

First, the package takes a users first approach in it’s design.  The API couldn’t be more simple!  Doing this greatly reduces friction when writing tests, increasing the likelihood that they’ll get written and maintained properly.

Second, modeling orbital.O  after testing.T gives us the flexibility we need to define our arbitrarily complex tests.  After trying to enumerate all the different things we’d like to support we found that there are just too many behavioral edge cases that needed actual code to describe properly.  For example, say you want to check events were received by a webhook and also that some counters were updated.  This was difficult to articulate with an assertion based framework like the one we were using before. With Orbital, we’re now only limited by what the Go language supports which is an improvement over the “mutation→assertion” style tests which we encountered before.

The following example exercises the above illustrated case: sending an event to our Tracking API produces a webhook to a configured endpoint. In this case, we’ve configured the webhook to be our own end-to-end service’s API for test verification.

type Harness struct {
        API    string
        Waiter *webhook.RouteLogger

type event struct {
        Email     string    `json:"email"`
        Timestamp time.Time `json:"timestamp"`
        ID        string    `json:"id"`
        Processed bool      `json:"processed"`

func (h Harness) OrbitalSmoke(ctx context.Context, o *orbital.O) {
        evt := event{
                Email:     phony.Get("email"),
                ID:        phony.Get("ksuid"),
                Timestamp: time.Now().UTC(),
        // Mark the event as Sent.
        err := h.Waiter.Sent(evt.ID)
        assert.NoError(o, err, "error marking sent")
        // Cleanup after we're done.
        defer h.Waiter.Delete(evt.ID)
        assert.NoError(o, send(h.API, evt), "sending event shouldn't fail")
        // Block until the event has been received
        r, err := h.Waiter.Wait(ctx, evt.ID)
        assert.NoError(o, err, "error waiting")
        var recv event
        err = json.Unmarshal([]byte(r.Body), &recv)
        assert.NoError(o, err, "error unmarshaling")
        assert.True(o, recv.Processed, "processed should be set to true")

As you can see, the code is very straightforward.  Each test runs in its own goroutine and blocks until it’s completed or the context is cancelled.  Modeling tests in this way allows us to check arbitrary side effects and allows for any kind of behavioral testing your imagination can come up with.

Orbital provides a Service struct which registers the tests and manages the process lifecycle. This struct allows you to set global timeouts for all tests as well as configure logging and metrics. During test registration, you set the period (how often the test is run), name, function and optional timeout override.

                Name:    "smoke test",
                Period:  1 * time.Second,
                Timeout: 3 * time.Second,
                Func:    harness.OrbitalSmoke,

One key factor in the design of this framework is the embedded webhook package. This special webhook operates like a normal HTTP server which logs requests to an interface.  One implementation of this interface (RouteLogger) is configured such that after sending an event, you can block your goroutine waiting until that event is received by the webhook or a timeout occurs.

func (s *RouteLogger) Wait(ctx context.Context, key string) (Request, error) {
        c, ok := s.rc[key]
        if !ok {
                return Request{}, errors.New("Wait called before Sent")
        select {
        case r := <-c:
                return r, nil
        case <-ctx.Done():
                return Request{}, ctx.Err()

With this primitive, we can send requests to the API, then wait for them to be sent back after being processed in our pipeline. In the above example, we’re doing this on the line h.Waiter.Wait(ctx, evt.ID). To see a full example of both a tester and tested service, check out the examples directory on GitHub.

How do we use it?

Our Orbital tests are deployed as a service that runs inside of our staging and production infrastructure.  It sends events to the Segment Tracking API using our various library implementations.  We even fork out to headless Chrome to execute tests in the browser with analytics.js!  This framework generates metrics used for dashboards and alerting. Here you can see a comparison of our staging vs production environments.


From the screenshots above you can see that something was broken in staging by looking at the top center graph.

This library was important to creating confidence that our stage environment behaves the same way as our production environment.  We’re now at the point where we can block a release if any of the tests fail in stage.  We know for certain that something did actually break and needs to be investigated. This is the testing strategy you need to have in your infrastructure to reach the ever elusive 5-9s of reliability.

What remains?

Orbital has already proven instrumental in reducing the number of bugs making it to production.  We’ve written numerous tests across multiple teams which exercise various known customer configurations.  However, the framework is not yet bulletproof.

Although you can scale on a single instance to tens of thousands of requests per second, eventually you’ll hit a bottleneck somewhere.  Unfortunately, this framework doesn’t elegantly scale out right now.  Currently, the “RouteLogger/Waiter” records messages sent and received in memory; not to a shared resource or DB.  So if you have multiple tasks running which are load balanced, those requests are unlikely to be sent to the right task and the tests will fail/timeout.  This is a non-trivial but ultimately a solvable problem.

If this is interesting to you, reach out to us!  We’d love to hear from you.  You can find us on twitter @Segment.  Check out our Open Source initiatives here.  We’ve also got many positions in Engineering which involve solving problems similar to this which you can see here.

If you’re interested in reading more about this topic, check out these other great resources on testing in production: