Benjamin Yolken on September 15th 2020
Calvin French-Owen, Chris Sperandio on August 4th 2015
It wasn’t long ago that building out an analytics pipeline took serious engineering chops. Buying racks and drives, scaling to thousands of requests a second, running ETL jobs, cleaning the data, etc. A team of engineers could easily spend months on it.
But these days, it’s getting easier and cheaper. We’re seeing the UNIX-ification of hosted services: each one designed to do one thing and do it well. And they’re all composable.
It made us wonder: just how quickly could a single person build their own pipeline without having to worry about maintaining it? An entirely managed data processing stream?
It sounded like analytics Zen. So we set out to find our inner joins peace armed with just a few tools: Terraform, Segment, DynamoDB and Lambda.
make (check it out on Github).
As a toy example, our data pipeline takes events uploaded to S3 and increments a time-series count for each event in Dynamo. It’s the simplest rollup we could possibly do to answer questions like: “How many purchases did I get in the past hour?” or “Are my signups increasing month over month?”
Here’s the general dataflow:
Event data enters your S3 bucket through Segment’s integration. The integration uploads all the analytics data sent to the Segment API on an hourly basis.
That’s where the composability of AWS comes in. S3 has a little-known feature called “Event Notifications”. We can configure the bucket to push a notification to a Lambda function on every file upload.
In theory, our Lambda function could do practically anything once it gets a file. In our example, we’ll extract the individual events, and then increment each
<event, hour> pair in Dynamo.
Once our function is up and running, we’ll have a very rudimentary timeseries database to query event counts over time.
From there, we’ll handle the incoming events, and update each item in Dynamo:
And finally, we can query for the events in our database using the CLI:
We could also build dashboards on it a la google analytics or geckoboard.
Even though we have our architecture and lambda function written, there’s still the task of having to describe and provision the pipeline on AWS.
Configuring these types of resources has been kind of a pain for us in the past. We’ve tried Cloudformation templates (verbose) and manually creating all the resources through the AWS UI (slooooow).
Neither of these options has been much fun for us, so we’ve started using Terraform as an alternative.
If you haven’t heard of Terraform, definitely give it a spin. It’s a badass tool designed to help provision and describe your infrastructure. It uses a much simpler configuration syntax than Cloudformation, and is far less error-prone than using the AWS Console UI.
As a taste, here’s what our
lambda.tf file looks like to provision our Lambda function:
The Terraform plan for this project creates an S3 bucket, the Lambda function, the necessary IAM roles and policies, and the Dynamo database we’ll use for storing the data. It runs in under 10 seconds and immediately sets up our infrastructure so that everything is working properly.
If we ever want to make a change, a simple
terraform apply, will cause the changes to update in our production environment. At Segment, we commit our configurations to source control so that they are easily audited and changelog’d.
We just walked through a basic example, but with Lambda there’s really no limit to what your functions might do. You could publish the events to Kinesis with additional Lambda handlers for further processing, or pull in extra fields from your database. The possibilities are pretty much endless thanks the APIs Amazon has created.
If you’d like to build your own pipeline, just clone or fork our example Github repo.
And that’s it! We’ll keep streaming. You keep dreaming.
Orta Therox on June 24th 2015
We’re excited to welcome Orta from CocoaPods to the blog to discuss the new Stats feature! We’re big fans of CocoaPods and are excited to help support the project.
CocoaPods is the Dependency Manager for iOS and Mac projects. It works similar to npm, rubygems, gradle or nuget. We’ve been running the open source project for 5 years and we’ve tried to keep the web infrastructure as minimal as possible.
Our users have been asking for years about getting feedback on how many downloads their libraries have received. We’ve been thinking about the problem for a while, and finally ended up asking Segment if they would sponsor a backend for the project.
It wasn’t just enough to offer just download counts. We spend a lot of time working around Xcode (Apple’s developer tool) project file intricacies, however in this context, it provides us with foundations for a really nice feature. CocoaPods Stats will be able to keep track of the unique number of installs within Apps / Watch Apps / Extensions / Unit Tests.
This means is that developers using continuous integration only register as 1 install, even if the server runs
pod install each time, separating total installations vs actual downloads.
Let’s go over how we check which pods get sent up for analytics, and how we do the unique installs. CocoaPods-Stats is a plugin that will be bundled with CocoaPods within a version or two. It registers as a post-install plugin and runs on every
pod install or
We’re very pessimistic about sending a Pod up to our stats server. We ensure that you have a CocoaPods/Specs repo set up as your master repo, then ensure that each pod to be sent is inside that repo before accepting it as a public domain pod.
First up, we don’t want to know anything about your app. So in order to know unique targets we use your project’s target UUID as an identifier. These are a hash of your MAC address, Xcode’s process id and the time of target creation (but we only know the UUID/hash, so your MAC address is unknown to us). These UUIDs never change in a project’s lifetime (contrary to, for example, the bundle identifier). We double hash it just to be super safe.
We then also send along the CocoaPods version that was used to generate this installation and about whether this installation is a
pod try [pod] rather than a real install.
My first attempt at a stats architecture was based on how npm does stats, roughly speaking they send all logs to S3 where they are map-reduced on a daily basis into individual package metrics. This is an elegant solution for a companywith people working full time on up-time and stability. As someone who wants to be building iOS apps, and not maintaining more infrastructure in my spare time, I wanted to avoid this.
We use Segment at Artsy, where I work, and our analytics team had really good things to say about Segment’s Redshift infrastructure. So I reached out about having Segment host the stats infrastructure for CocoaPods.
We were offered a lot of great advice around the data-modelling and were up and running really quickly. So you already know about the CocoaPods plugin, but from there it sends your anonymous Pod stats up to stats.cocoapods.org. This acts as a conduit sending analytics events to Segment. A daily task is triggered on the web site, this makes SQL requests against the Redshift instance which is then imported into metrics.cocoapods.org.
If you want to learn more about CocoaPods, check us out here.
Anthony Short on May 11th 2015
Over the past few months at Segment we’ve been rebuilding large parts of our app UI. A lot of it had become impossible to maintain because we were relying on models binding to the DOM via events.
Views that are data-bound to the DOM sound great but they are difficult to follow once they become complex and bi-directional. You’d often forget to bind some events and a portion of your UI would be out of sync, or you’d add a new feature and break 3 others.
So we decided to take on the challenge to build our own functional alternative to React.
We managed to get a prototype working in about a month. It could render DOM elements and the diffing wasn’t too bad. However, the only way to know if it was any good was to throw it into a real project. So that’s what we did. We built the Tracking Plan using the library. At this point it didn’t even have a real name.
It started simple, we found bugs and things we’d overlooked, then we started seeing patterns arising and ways to make the development experience better.
We were able to quickly try some ideas and trash them if they didn’t work. At first we started building it like a game engine. It had a rendering loop that would check to see if components were dirty and re-render on every frame, and a
scenethat managed all the components and inputs like a game world. This turned out to be annoying for debugging and made it overly complex.
Thanks to this process of iteration we were able to cut scope. We never needed
refs like React, so we didn’t add it. We started with a syntax that used prototypes and constructors but it was unnecessarily verbose. We haven’t had to worry about maintaining text selection because we haven’t run across it in real-world use. We also haven’t had any issues with element focus because we’re only supporting newer browsers.
We spent many late nights discussing the API on a white board and it’s something we care about a lot. We wanted it to be so simple that it would be almost invisible to the user. An API is just UI for developers so we treated it like any other design problem at Segment — build, test, iterate.
Performance is the most important feature of any UI library. We couldn’t be sure if the library was on the right path until we’d seen it running in a real app with real data and constraints. We managed to get decent performance on the first try and we’ve been fine-tuning performance as we add and remove new features.
We first ran into performance issues when we had to re-build the debugger. Some customers were sending hundreds of events per second and the animation wouldn’t work correctly if we were just trashing DOM elements every frame. We implemented a more optimized key diffing algorithm and now it renders hundreds of events per second at a smooth 60 fps with ease. Animations included.
Eventually everything started to settle down. We took the risk and implemented our own library and it now powers the a large portion of our app. We’ve stripped thousands of lines of code and now it’s incredibly easy to add new features and maintain thanks to this new library.
Finally, we think it’s ready to share with everyone else.
Deku is our library for building user interfaces. It supports many of the features you’re familiar with in React but aims to be small and functional. You define your UI as a tree of components and whenever a state change occurs it re-renders the entire tree to patch the DOM using a highly optimized diffing algorithm.
The whole library weighs in at less than 10kb and is easy to follow. It’s also using npm so some of those modules are probably being used elsewhere in your code anyway.
It uses the same concept of components as React. However, we don’t support older browsers, so the codebase is small and component API is almost non-existent. It even supports JSX thanks to Babel.
Here’s what a component looks like in Deku:
Then you can import that component and render your app:
You’ll notice there is no concept of classes or use of
this. We’re using plain objects and functions. The ES6 module syntax is used to define components and every lifecycle hook is passed the
component object which holds the
state you’ll use to render your template.
We never really needed classes. What’s the point when you never initialize them anyway? The beauty of using plain functions is that the user can use the ES6 module system to define them however they want! Best of all, there’s is no new syntax to learn.
Deku has many of the same lifecycle hooks but with two new ones -
afterRender. These are called every single render, including the first, unlike the update hooks. We’ve found these let us stop thinking about the lifecycle state so much.
Some of the lifecycle hooks are passed the
setState function so you can trigger side-effects to update state and re-render the app. DOM events are delegated to the root element and we don’t need to use any sort of synthesized event system because we’re not supporting IE9 and below. This means you never need to worry about handling or optimizing event binding.
To render the component to the DOM we need to create a
tree will manage loading data, communicating between components and allows us to use plugins on the entire application. For us it has eliminated the need for anything like Flux and there are no singletons in sight.
You can render the component tree anyway you’d like — you just need a renderer for it. So far we have a HTML renderer for the server and a DOM renderer for the client since those are the two we’ve needed. It would be possible to build a canvas or WebGL renderer.
The dbmonster performance mini-app written in Deku is also very fast and renders at roughly 15-16 fps compared to most other libraries which render at 11-12 fps. We’re always looking for more ways to optimize the diffing algorithm even further but it’s already we think it’s fast enough.
The first thing we usually get asked when we tell people about Deku is “Why didn’t you just use React?”. It could seem like a classic case of NIH syndrome.
We originally looked into this project because we use Duo as a front-end build tool. Duo is like npm, but just uses Github. It believes in small modules doing one thing well. React was a ‘big thing’ doing many things within a black box. We like knowing in detail how code works, so we feel comfortable with it and can debug it when something goes wrong. It’s very hard to do that with React or any big framework.
We ended up using React for a short time but the API forced us to use a class-like syntax that would lock us into the framework. We also found that we kept fighting with function context all the time which is waste of brain energy. React has some functional aspects to it but it still feels very object-oriented. You’re always concerning yourself with implicit environment state thanks to
this and the class system. If you don’t use classes you never need to worry about
this, you never need decorators and you force people to think about their logic in a functional way.
What started as a hack project to see if we could better understand the concept behind React has developed into a library that is replacing thousands of lines of code and has become the backbone of our entire UI. It’s also a lot of fun!
We’ve come a long way in the past few months. Next we’re going to look at a few ways we could add animation states to components to solve a problem that plagues every component system using virtual DOM.
In our next post on Deku, we’ll explain how we structure our components and how we deal with CSS. We’ll also show off our UIKit — the set of components we’ve constructed to rapidly built out our UI.
Steven Miller, Dominic Barne on April 9th 2015
Last week, we open sourced Sherlock, a pluggable tool for detecting third-party services on a given web page. You might use this to detect analytics trackers (eg: Google Analytics, Mixpanel, etc.), or social media widgets (eg: Facebook, Twitter, etc.) on your site.
We know that setting up your integrations has required some manual work. You’ve had to gather all your API keys and enter them into your Segment project one by one. We wanted to make this process easier for you, and thought that a “detective” to find your existing integrations would help!
Enter Sherlock. When you tell us your project’s url, Sherlock searches through your web page and finds the integrations you’re already using. Then, he automatically enters your integrations’ settings, which makes turning on new tools a bit easier.
Here’s a code sample of Sherlock in action:
Since there are no services baked into Sherlock itself, we’re adding a Twitter plugin here manually. Sherlock opens the
url and if
widgets.js is present on the page, then it will be added to
The above example is admittedly trivial. Here’s a more realistic use-case:
Here, we are adding sherlock-segment, a collection of plugins for about 20 of the integrations on our platform. Now,
results will look like this:
To make your own plugin, simply add the following details to your
package.json: (feel free to use sherlock-segment as a starting point)
name should include “sherlock-“ as a prefix
keywords should include “sherlock”
Your plugin should export an array of service configuration objects, each object can support the following keys:
name should be a human-readable string
script can be a string, regular expression, or a function that matches the src attribute of a script tag
settings is an optional function that is run on the page to extract configuration
Here is an example service configuration:
Dominic Barnes on April 3rd 2015
Make is awesome! It’s simple, familiar, and compatible with everything. Unfortunately, editing a
Makefile can be challenging because it has a very terse and cryptic syntax. In this post, we will outline how we author them to get simple, yet powerful, build systems.
For the uninitiated, check out this gist by Isaac Schlueter. That gist takes the form of a heavily-commented Makefile, which makes it a great learning tool. In fact, I would recommend checking it out regardless of your skill level before reading the remainder of this post.
Here at Segment, we write a lot of code. One of our philosophies is that the code we write should be beautiful, especially since we’ll be spending literally hours a day looking at it.
By beautiful, we mean that code should not be convoluted and verbose, but instead it should be expressive and concise. This philosophy is even reflected in how we write a
We dedicate the top section of each
Makefile as a place to define variables (much like normal source code). These variables will be used to reduce the amount of code used in our recipes, making them far easier to read.
In node projects, we always rely on modules that are installed locally instead of globally. This allows us to give each project it’s own dependencies, giving us the room to upgrade freely without worrying about compatibility across our many other projects.
This decision requires more typing at first:
But it’s easily fixed by using
We use this same pattern frequently, as it helps to shorten the code written in a recipe, making the intention far more clear. This makes understanding the recipe much easier, which leads to faster development and maintenance.
Beyond just using variables for the command name, we also put shared flags behind their own variable as well.
This helps keep things dry, but also gives developers a hook to change the flags themselves if needed:
When writing code and interacting with developer tools, we seek to avoid noise as much as possible. There are enough things on a programmer’s mind, so it’s best to avoid adding to that cognitive load unnecessarilly.
One example is “echoing” in Make, which basically outputs each command of your recipe as it is being executed. You may notice that we used the
@ prefix on the recipes above, which actually suppresses that behavior. This is a small thing, but it is part of the larger goal.
We also run many commands in “quiet mode”, which basically suppresses all output except errors. This is one case where we definitely want to alert the developer, so they can take the necessary action to correct it.
make, now we only will see errors that happened with the corresponding build. If nothing is output, we can assume everything went according to plan!
There are some target names that are so commonly used, they practically become a convention. While we haven’t invented most of the targets I will mention here, the main principle here is that using names consistently throughout an organization is important to improve the experience for developers new to a specific project.
Since we have a lot of web projects, the
build/ directory is often reserved as the destination for any files we are bundling to serve to the client.
This target is used to delete any transient files from the project. This generally includes:
build/ directory (the generated client assets)
intermediary build files/caches
test coverage reports
Remote dependencies are not part of this process. (see clean-deps)
Depending on the size and complexity of a project, the downloaded dependencies can take a considerable amount of time to completely resolve and download. As a result, they are cleaned using a distinct target.
While Make will automatically assume the first target in a Makefile is the default one to run, we adopt the convention of putting a
default target in every
Makefile, just for consistency and flexibility.
For our projects, the
default target is usually synonymous with
build, as it is common practice to enter a project and use
make to kick off the initial build.
Runs static analysis (eg: JSHint, ESLint, etc) against the source code for this project.
This starts up the web server for the given project. (in the case of web projects)
This is exclusively for running the automated tests within a project. Depending on the complexity of the project, there could also be other related targets, such as
test-server. But regardless, the
test target will be the entry-point for a developer to run those tests.
All in all, Make is a powerful tool suitable for many projects regardless of size, tooling and environment. Other tools like Grunt and Gulp are great, but Make comes out on top for being even more powerful, expressive and portable. It has become a staple in practically all of our projects, and the conventions we follow have helped to create a more predictable workflow for everyone on the team.
Calvin French-Owen on April 1st 2015
We’ve been running Node in production for a little over two years now, scaling from a trickle of 30 requests per second up to thousands today. We’ve been hit with almost every kind of weird request pattern under the sun.
First there was the customer who liked to batch their data into a single dump every Friday night (getting called on a Friday night used to be a good thing). Then the user who sent us their visitor’s entire social graph with every request. And finally an early customer who hit us with a
while(true) send(data) loop and caused a minor emergency.
By now, our ops team has seen the good, the bad, and the ugly of Node. Here’s what we’ve learned.
One of the great things about Node is that you don’t have to worry about threading and locking. Since everything runs on a single thread, the state of the world is incredibly simple. At any given time there’s only a single running code block.
But here… there be dragons.
Our API ingests tons of small pieces of customer data. When we get data, we want to make sure we’re actually taking the JSON and representing any ISO Strings as dates. We traverse the JSON data we’d receive, converting any date strings into native
Date objects. As long as the total size is under
15kb, we’ll pass it through our system.
It seemed innocent enough… until we’d get a massively nested JSON blob and we’d start traversing. It’d take seconds, even minutes, before we chewed through all the queued up function calls. Here’s what the times and sizes would look like after an initial large batch would get rejected:
And then things would only get worse: the problems would start cascading. Our API servers would start failing healthchecks and disconnect from the ELB. The lack of heartbeat would cause the NSQ connection to disconnect so we weren’t actually publishing messages. Our customer’s clients would start retrying, and we’d be hit with a massive flood of requests. Not. Good.
Clearly something had to be done–we had to find out where the blockage was happening and then limit it.
Now we use node-blocked to get visibility into whether our production apps are blocking on the event loop, like this errant worker:
It’s a simple module which checks when the event loop is getting blocked and calls you when it happens. We hooked it up to our logging and statsd monitoring so we can get alerted when a serious blockage occurs.
We dropped in the module and immediately started seeing the following in our logs:
A customer was sending us really large batches of nested JSON. Applying a few stricter limits to our API (this was back before we had limits) and moving the processing to a background worker fixed the problem for good.
To further avoid event loop problems entirely, we’ve started switching more of our data processing services to Go and using goroutines, but that’s a topic for an upcoming post!
Error handling is tricky in every language–and node is no exception. Plenty of times, there will be an uncaught exception which–through no fault of your own–bubbles up and kills the whole process.
There are multiple ways around this using the
vm module or domains. We haven’t perfected error handling, but here’s our take.
Simple exceptions should be caught using a linter. There’s no reason to have bugs for
undefined vars when they could be caught with some basic automation.
To make that super easy, we started adding make-lint to all of our projects. It catches unhandled errors and undefined variables before they even get pushed to production. Then our makefiles run the linter as the first target of `make test`.
If you’re not already catching exceptions in development, add
make-lint today and save yourself a ton of hassle. We tried to make the defaults sane so that it shouldn’t hamper your coding style but still catch errors.
In prod, things get trickier. Connections across the internet fail way more often. The most important thing is that we know when and where uncaught exceptions are happening, which is often easier said than done.
Fortunately, Node has a global
uncaughtException handler, which we use to detect when the process is totally hosed.
We ship all logs off to a separate server for collection, so we want to make sure to have enough time to log the error before the process dies. Our cleanup could use a bit more sophistication, but typically we’ll attempt to disconnect and then exit after a timeout.
Actually serializing errors also requires some special attention (handled for us by YAL). You’ll want to make sure to include both the
stack explicitly, since they are non-enumerable properties and will be missed by simply calling
JSON.stringify on the error.
Finally, we’ve also written our own module called
oh-crapto automatically dump a heap snapshot for later examination.
It’s easily loaded into the chrome developer tools, and incredibly handy for those times we’re hunting the root cause of the crash. We just drop it in and we’ve instantly got full access to whatever state killed our beloved workers.
It’s easy to overload the system by setting our concurrency too high. When that happens, the CPU on the box starts pegging, and nothing is able to complete. Node doesn’t do a great job handling this case, so it’s important to know when we’re load testing just how much concurrency we can really deal with.
Our solution is to stick queues between every piece of processing. We have lots of little workers reading from NSQ and each of them sets a
maxInFlightparameter specifying just how many messages the worker should deal with concurrently.
If we see the CPU thrashing, we’ll adjust the concurrency and reload the worker. It’s way easier to think about the concurrency once at boot rather than constantly tweaking our application code and limiting it across different pipelines.
It also means we get free visibility into where data is queueing, not to mention the ability to pause entire data processing flows if a problem occurs. It gives us much better isolation between processes and makes them easier to reason about.
We moved away from using streams for most of our modules in favor of dedicated queues. But, there are a few places where they still make sense.
The biggest overall gotcha with streams is their error handling. By default, piping won’t cause streams to propagate their errors to whatever stream is next.
Take the example of a file processing pipeline which is reading some files, extracting some data and then running some transforms on it:
Looking at this code, it’s easy to miss that we haven’t actually setup our error handling properly. Sure, the resulting pipeline stream has handlers, but if any errors occur in the
Transform streams, they’ll go uncaught.
To get around this, we use Julian Gruber’s nifty
multipipe module, which provides a nice API over centralized error handling. That way we can attach a single error handler, and be off to the races.
If you’re also running Node in production and dealing with a highly variable data pipeline, you’ve probably run into a lot of similar issues. For all these gotchas, we’ve been able to scale our node processes pretty smoothly.
Now we’re starting to move our new plumbing and data infrastructure layers to Go. The language makes different trade-offs, but we’ve been quite impressed so far. We’ve got another post in the works on how it’s working out for us, along with a bunch of open source goodies! (and if you love distributed systems, we’re hiring!)
Have other tips for dealing with a large scale node system? Send them our way.
And as always: peace, love, ops, analytics.
Chris Sperandio on February 19th 2015
Before I joined Segment, I was something of a Github stalker. Which is how I found Segment.
(To be clear, I’m still a Github stalker, only now I work here.)
I snooped through Segment’s projects for I don’t know how long before starting to realize what drew me in so consistently across every project. And it wasn’t until I joined the team and learned the thought-process and ethos behind them that I gained a sincere appreciation for why we have over 1000 repos.
Our approach to software is radically modular, pluggable and composable.
Which makes sense, because in reality, that’s the whole point of Segment.
When you build your tools at the right level of abstraction, incidental complexity is hidden away and edge-cases take care of themselves. Not to mention, you can be a lot more productive. It’s what precipitated analytics.js in our early days: the intention to find the right level of abstraction for collecting data about who your users are and what they’re doing. It’s why people love express, koa, and reworktoo.
It’s also why we’re big proponents of the component and duo ecosystem. We even manage customer and partner logos with an extensible and modular systemand our entire front-end is built with components based on ripple and, more recently, deku.
It’s hard to communicate the power of this modular and composable approach, but it ends up being disarmingly obvious to developers and product strategists alike (see Rich Hickey’s presentation. Rather than attempt to explain it outright, I’ll give you a tour of our more popular open source repos to show you how we’ve tried to make them small, self-contained, and composable.
Let’s dive into some examples!
When building our documentation, academy, blog, job board, and help section, we wanted the speed and simplicity of static sites over the restrictiveness and complexity of a CMS. And though there’s a lot of logic that could be shared between them, each called for its own unique feature-set and build process.
But when we looked at existing static site generators, they all imposed a degree of structure on the content, and weren’t flexible enough for our wide array of use cases. Enter: Metalsmith.
Metalsmith does not impose any assumptions on your content model or build process. In fact, Metalsmith is just an abstraction for manipulating a directory of files.
Breaking static sites down to their core, the underlying abstraction includes content in files (blog entries, job listings, what have you), and these files’ associated metadata. Metalsmith allows you to read a directory of files, then run a series of plugins on that data to transform it exactly the way you need.
For example, you can run markdown files through handlebars templates, create navigation or a table of contents, compress images, concatenate scripts, or anything your heart desires before writing the result to the build directory.
For example, our blog articles are just files with two sections: a header with metadata about the author, date, title and url, and then markdown for the content of the article. Metalsmith transforms the markdown to HTML, wraps the posts in their layout, looks up and inserts the author’s avatar, renders any custom Handlebars helpers, etc. The beauty is that the build process is completely customizable and abstract for many use cases. It’s just a matter of which plugins you choose. And the word is out: the metalsmith plugin ecosystem is booming!
By building our static sites with Metalsmith and hosting the source on Github, our marketing, success, and business teams can create and edit posts right from the Github web interface, or work locally and “sync” their updates with Github for Mac. This workflow closely mirrors a traditional CMS, but gives us the speed and reliability of a pre-built, static site while also lending things like auto-generated code samples for every language and automatically compiled navigation, two places otherwise prone to falling out of date.
As you’ve heard, Segment is committed to a component-driven development model: breaking things into small pieces that can be developed in isolation, and then shared and reused. But when everything is a plugin, that means an awful lot of small repos. So we build tooling to make working with lots of small repos frictionless.
For example, when we or a partner want to add a new integration to Segment, the very first thing we need to do is create a new repo to house that project. In order to enforce common style across file structure and code, we built a project scaffolder that generates a “base” the developer can use to jump right into their project.
This is where Khaos comes in — our own project scaffolder that’s built on Metalsmith, the first example here of building building blocks (with building blocks :). In the source, you can see how even we tried to make Khaos itself composable and modular.
Khaos is really just a CLI wrapper for metalsmith with the following plugin pipeline:
Read template files from a directory.
Parse files for template placeholders.
Prompt user to fill in each placeholder.
Render files with a templating engine.
Write filled-in files to a new directory.
We have khaos templates for new logos, integrations, back-end services, nightmare plugins, etc. Not only does this make getting started easy, but it reinforces cultural values like defaulting yes to MIT licensing.
As you might guess, we use Segment as the backbone of our customer data pipeline to route our data into our third-party tools and to Amazon Redshift.
While we use our partners’ visualization tools to write and share ad-hoc queries against data in Segment SQL, we wanted to make the most important data points accessible in real time throughout the organization. So we built Metrics.
We query the underlying the data from Segment Warehouses and services like Stripe and Zendesk, and use Metrics to orchestrate these queries and store the aggregate metrics for each team. On any given team’s board you might see ARR, MRR, daily signups, the depth of our queue for new integration requests, number of active Zendesk tickets by department, number of deploys in the last week – the list goes on.
We’ll go into more details about the business motivations and outcomes around Metrics in a future blog post, but what excites me most about Metrics is how it’s designed under the hood. It’s another example of offloading feature scope to plugins.
You can use plugins to define what data gets collected and stored, the interval at which it’s updated, and where those metrics are pushed: to dashboards, spreadsheets, summary emails, or anywhere else your heart desires.
All metrics does is expose an API for orchestrating this dance via plugins.
Check it out on github here!
We have a bit of an obsession with automation and elimination of the mundane. And that’s what drove the development of Hermes.
Raph, our beloved head of sales and first businessperson at Segment, thought Hermes was Ian’s potty-mouthed “ami français” for a few months. Nope. Hermes is a chatbot whose sole feature is, you guessed it, a plugin interface.
When you’re building a new feature for Hermes, like looking up an account’s usage, or fetching a gif from the interwebs, all you need to do is tell Hermes what he’s listening for and what to say back. Everything in between, you define in your own plugin.
Whether we want to announce that lunch is here, check Loggly for errors related to a customer’s project, kickoff a Metalsmith build of the latest blog release, or create an SVG logo for a new integration, we get our boy Hermes Hubeau to do the dirty (repetitive) work.
We were thankful for the plugin approach when we switched from Hipchat to Slack. Instead of rewriting all of Hermes, we just hot-swapped the old plugin with the new!
“Wait a minute — your chat bot creates SVG logos?!”
Nope! Humans do. Hermes just knows how to ask politely. He creates a new logorepo with Khaos, then spins up a Nightmare instance based on Metalsmith plugins, navigates to 99designs Tasks, and posts a job. When the job is finished, he resizes the logos with our logo component creation CLI.
Automating these sorts of jobs, for which there was not yet a public API, required us to mimic and automate pointing and clicking in a web browser, and that’s what Nightmare does. While there were plenty of tools to do this, like PhantomJS, webdriver APIs imposed the burden of a convoluted interface and lots of mental overhead. So we wrote a library that puts all those headaches under the covers, and lets you automate browsers the way you browse the web:
Browser automation is nothing new, but we tried to give Nightmare a cleaner API and a plugin interface so people could more easily compose automations. The goal is to make it really simple to automate tasks on the web and create APIs where a public one doesn’t yet exist.
We try to break things into small, reusable pieces and hate to solve the same problem twice, pushing for simple solutions that build off of each other and are flexible enough for multiple use cases.
This level of commitment to “building building blocks” and sharing them with the community is what drew me in so hypnotically to Segment in the first place, and why I feel so immensely fortunate to be here now. As a success engineer, part of my job is to build and maintain internal tooling that enables us to better serve our customers. I’m empowered to apply the same principles and rigor used by our product team and core engineers to those projects, and my development and product direction skills have improved faster than I ever thought possible as a result.
If you think of any cool use cases for any of these tools at your company, we’d love to hear more about them. Tweet us @segment with your ideas or fork away on GitHub! We always appreciate new plugins and contributions. And if any of this particularly resonates with you, we’re hiring!
TJ Holowaychuk on February 21st 2014
One of the most popular logging libraries for node.js is Winston. Winston is a great library that lets you easily send your logs to any number of services directly from your application. For lots of cases Winston is all you need, however there are some problems with this technique when you’re dealing with mission critical nodes in a distributed system. To solve them we wrote a simple solution called Yet-Another-Logger.
The typical multicast logging setup looks something like this:
The biggest issue with this technique for us was that many of these plugins are only enabled in production, or cause problems that are only visible under heavy load. For example it’s not uncommon for such libraries to use CPU-intensive methods of retrieving stack traces, or cause memory leaks, or even worse uncaught exceptions!
Another major drawback is that if your application is network-bound like ours is, then sending millions of log requests out to multiple services can quickly take its toll on the network, slowing down everything else.
Finally the use of logging intermediaries allows you to add to or remove services at will, with without re-deploying most of your cluster or making code changes to the applications themselves.
Our solution was to build a simple client/server system of nodes to isolate any probelms just to a set of servers whose sole job is to fan out the logs. We call it Yet-Another-Logger, or YAL.
The Yet-Another-Logger client is pretty much you would expect from a standard logging client. It has some log-level methods and accepts
messagearguments—standard stuff. The only difference is that you instantiate the client with an array of YAL Server addresses, which it uses to round-robin:
YAL is backed by the Axon library, a zeromq-inspired messaging library. The great thing about this is that when a node goes down, messages will be routed to stable nodes, and then resume when the node comes back online.
The YAL server is also extremely simple. It accepts log events from the clients and distributes them to any number of configured services, taking the load off of mission-critical applications.
At the time of writing YAL Server is a library, and does not provide an executable, however in the near future an executable may be provided too. Until then a typical setup would include writing a little executable specific to your system.
Server plugins are simply functions that accept a
server instance, and listen on the
'message' event. That makes writing YAL plugins really simple. It’s also trivial to re-use an existing Winston setup by just plunking your Winston code right into YAL Server.
I’d recommend always running at least 3 YAL Servers in a cluster for redundancy, so you can be sure not to lose any data.
That’s all for now! The two pieces themselves are very simple, but combined they give your distributed system a nice layer of added protection against logging-related outages.
Coming up soon I’ll be blogging about some Elasticsearch tooling that we’ve built exlusively for digging through all of those logs we’re sending through YAL!
Anthony Short on February 19th 2014
Every month we’re going to do a round-up of all the projects we’ve open-sourced on Github. We have hundreds of projects available for anyone to use, ranging from CSS libraries and UI components to static-site generators and server tools. Not to mention that Segment all started from analytics.js.
Myth is a preprocess that lets you write pure CSS without having to worry about slow browser support, or even slow spec approval. It’s a like CSS polyfill.
Diff two versions of a node module.
Yet-Another-Logger that pushes logs to log servers with axon/tcp to delegate network overhead.
Adds some concurrency to a transform stream for that multiple items may be transformed at once.
A FIFO queue for co.
If you want to see more of the awesome code we’re releasing, follow us on Github or follow any of our team members. We’re all open-source fanatics.