Today, we’re excited to share the architecture for Centrifuge–Segment’s system for reliably sending billions of messages per day to hundreds of public APIs. This post explores the problems Centrifuge solves, as well as the data model we use to run it in production.
The barrier holding back most open source projects is surprisingly mundane. It’s not test coverage. It’s not performance. It’s not code quality.
We’ve been longtime admirers of Google’s efforts to speed up the internet: everything from SPDY to Chrome to Google Fiber. Google has invested heavily in making the internet a better, faster place for billions of people across the world.
As part of our push to open up what’s going on internally at Segment – we’d like to share how we run our CI builds. Most of our approaches follow standard practices, but we wanted to share a few tips and tricks we use to speed up our build pipeline.
AWS is the default for running production infrastructure. It’s cheap, scalable, and flexible to whatever configuration you’d like to run on top of it. But that flexibility comes with a cost: it makes AWS endlessly configurable.
For the past year, we’ve been heavy users of Amazon’s EC2 Container Service (ECS). It’s given us an easy way to run and deploy thousands of containers across our infrastructure.
Since Segment’s first launch in 2012, we’ve used queues everywhere. Our API queues messages immediately. Our workers communicate by consuming from one queue and then publishing to another. It’s given us a ton of leeway when it comes to dealing with sudden batches of events or ensuring fault tolerance between services.
I recently jumped back into frontend development for the first time in months, and I was immediately struck by one thing: everything had changed.
At Segment, we’ve fully embraced the idea of microservices; but not for the reasons you might think.
Every month, Segment collects, transforms and routes over 50 billion API calls to hundreds of different business-critical applications. We’ve come a long way from the early days, where my co-founders and I were running just a handful of instances.
In Segment’s early days, our infrastructure was pretty hacked together. We provisioned instances through the AWS UI, had a graveyard of unused AMIs, and configuration was implemented three different ways.
It wasn’t long ago that building out an analytics pipeline took serious engineering chops. Buying racks and drives, scaling to thousands of requests a second, running ETL jobs, cleaning the data, etc. A team of engineers could easily spend months on it.
We’ve been running Node in production for a little over two years now, scaling from a trickle of 30 requests per second up to thousands today. We’ve been hit with almost every kind of weird request pattern under the sun.
On April 7th (yesterday), a new zero-day vulnerability in OpenSSL was revealed, dubbed the “Heartbleed“ exploit. It allows the attacker to read a random 64-kilobyte section of memory from any server accepting SSL connections with a compromised version of OpenSSL. We’ve patched the vulnerability in our service and taken steps to avoid further information leakage.
Five months ago, we released a small library called Analytics.js by submitting it to Hacker News. A couple hours in it hit the #1 spot, and over the course of the day it grew from just 20 stars to over 1,000. Since then we’ve learned a ton about managing an open-source library, so I wanted to share some of those tips.
We’re happy to announce that we just released our newest premium service: our Omniture integration. Omniture is the premier E-Commerce tool for tracking user behavior, and it’s used by tons of large businesses worldwide.
It’s been said that “constraints drive creativity.” If that’s true, then PHP is a language which is ripe for creative solutions. I just spent the past week building our PHP library for Segment, and discovered a variety of approaches used to get good performance making server-side requests.