Nightmare is a browser automation library for node.js, designed to be much simpler and easier to use than Phantomjs. We originally built Nightmare to create integration logos with 99Designs Tasks before they had an API, and we still use it in Sherlock. But the vast majority of Nightmare developers—now 55k+ downloads per month—use it for web UI testing and crawling.
A few months ago we were at a conference in Half Moon Bay talking with a general manager at a large mobile company, and he said, “One of my projects for the next six months is to reduce SDK bloat in all our apps.” Six months is a substantial investment! So we asked why this mattered so much to him.
As part of our push to open up what’s going on internally at Segment – we’d like to share how we run our CI builds. Most of our approaches follow standard practices, but we wanted to share a few tips and tricks we use to speed up our build pipeline.
AWS is the default for running production infrastructure. It’s cheap, scalable, and flexible to whatever configuration you’d like to run on top of it. But that flexibility comes with a cost: it makes AWS endlessly configurable.
Segment’s mobile SDKs are designed to track behavioral data from your app and translate and route that data to hundreds of downstream integrations. One of the SDK’s core tasks is to upload behavioral data to our servers. Since every network request requires your app to power up the device’s radio, uploading this data in real-time can quickly drain a battery.
Since Segment’s first launch in 2012, we’ve used queues everywhere. Our API queues messages immediately. Our workers communicate by consuming from one queue and then publishing to another. It’s given us a ton of leeway when it comes to dealing with sudden batches of events or ensuring fault tolerance between services.
It’s common for teams to use multiple tools to understand how users interact with their product. Two very popular analytics tools are Google Analytics and Mixpanel. These tools compliment each other nicely since both offer slightly different analysis capabilities. But the potential downside of using two different tools is data discrepancies. That is to say, when data is inconsistent, it’s hard to trust.
When your analytics questions run into the edges of out-of-the-box tools, it’s probably time for you to choose a database for analytics. It’s not a good idea to write scripts to query your production database, because you could reorder the data and likely slow down your app. You might also accidentally delete important info if you have analysts or engineers poking around in there.
Every month, Segment collects, transforms and routes over 50 billion API calls to hundreds of different business-critical applications. We’ve come a long way from the early days, where my co-founders and I were running just a handful of instances.
Growing a business is hard and growing the engineering team to support that is arguably harder, but doing both of those without a stable infrastructure is basically impossible. Particularly for high growth businesses, where every engineer must be empowered to write, test, and ship code with a high degree of autonomy.
A little while ago we open-sourced a static site generator called Metalsmith. We built Metalsmith to be flexible enough that it could build blogs (like the one you’re reading now), knowledge bases, and most importantly our technical documentation.
In Segment’s early days, our infrastructure was pretty hacked together. We provisioned instances through the AWS UI, had a graveyard of unused AMIs, and configuration was implemented three different ways.
It wasn’t long ago that building out an analytics pipeline took serious engineering chops. Buying racks and drives, scaling to thousands of requests a second, running ETL jobs, cleaning the data, etc. A team of engineers could easily spend months on it.
Last week, we open sourced Sherlock, a pluggable tool for detecting third-party services on a given web page. You might use this to detect analytics trackers (eg: Google Analytics, Mixpanel, etc.), or social media widgets (eg: Facebook, Twitter, etc.) on your site.
Make is awesome! It’s simple, familiar,
and compatible with everything. Unfortunately, editing a Makefile can be
challenging because it has a very terse and cryptic syntax. In this post, we
will outline how we author them to get simple, yet powerful, build systems.
We’ve been running Node in production for a little over two years now, scaling from a trickle of 30 requests per second up to thousands today. We’ve been hit with almost every kind of weird request pattern under the sun.
No matter what scale app you’re working on, keeping track of user activity is critical to its success. Segment helps you collect data about what your users are doing, then visualize and manipulate that data with integrations for analytics, marketing, and more.
At Segment, we help companies record and manage their customer data. Our API has three basic methods: identify, track, and page/screen. These methods describe facts about customers. We often get asked why we don’t support recording sessions, and the short answer is that sessions aren’t facts, they’re stories.
Your analytics are only as good as the data you’re tracking. And deciding what to track is the hardest part about making your data useful. It’s overwhelming to create a tracking plan from scratch, so this article will give you a head start.
Every month we’re going to do a round-up of all the projects we’ve open-sourced on Github. We
have hundreds of projects available for anyone to use, ranging from CSS libraries and UI components
to static-site generators and server tools. Not to mention that Segment all started from analytics.js.
TJ comes to us after building a bunch of badass real-time stuff with the CloudUp crew. At Segment he’ll be heading up our Victoria office, aka taking control of as many coffee shops in the greater Canada area as possible.
When we analyze usage and customers and Segment, we constantly need to join queries across Mongo and Redis. Why? Because our account information is in Mongo and our API usage is in Redis. Today we’re open sourcing Hydros. It’s a quick cheat to let us run SQL queries for analysis, while using NoSQL in production.
Five months ago, we released a small library called Analytics.js by submitting it to Hacker News. A couple hours in it hit the #1 spot, and over the course of the day it grew from just 20 stars to over 1,000. Since then we’ve learned a ton about managing an open-source library, so I wanted to share some of those tips.
It’s been said that “constraints drive creativity.” If that’s true, then PHP is a language which is ripe for creative solutions. I just spent the past week building our PHP library for Segment, and discovered a variety of approaches used to get good performance making server-side requests.