The Segment Infrastructure

Under the hood of the system that processes 1M+ events per second


Customer data lives everywhere: your website, your mobile apps, and internal tools

That’s why collecting and processing all of it is a tricky problem. Segment has built libraries, automatic sources, and functions to collect data from anywhere—hundreds of thousands of times per second.

We’ve carefully designed each of these areas to ensure they’re:

Performant (batching, async, real-time, off-page)
Reliable (cross-platform, handle rate-limits, retries)
Easy (setup with a few clicks, elegant, modern API)
Here’s how we do it.


Data can be messy. As anyone who has dealt with third-party APIs, JSON blobs, and semi-structured text knows that only 20-30% of your time is spent driving insights. Most of your time is spent cleaning the data you already have.

At minimum, you’ll want to make sure your data infrastructure can:

Handle GDPR suppressions across millions of users
Validate and enforces arbitrary inputs
Allow you to transform and format individual events
Deduplicate retried requests