Go back to Blog

Engineering

Pablo Vidal Bouza on July 15th 2021

How Segment moved from traditional SSH bastion hosts to use AWS Systems Manager SSM to manage access to infrastructure.

All Engineering articles

Anthony Short on May 11th 2015

Over the past few months at Segment we’ve been rebuilding large parts of our app UI. A lot of it had become impossible to maintain because we were relying on models binding to the DOM via events.

Views that are data-bound to the DOM sound great but they are difficult to follow once they become complex and bi-directional. You’d often forget to bind some events and a portion of your UI would be out of sync, or you’d add a new feature and break 3 others.

So we decided to take on the challenge to build our own functional alternative to React.

Building a prototype

We managed to get a prototype working in about a month. It could render DOM elements and the diffing wasn’t too bad. However, the only way to know if it was any good was to throw it into a real project. So that’s what we did. We built the Tracking Plan using the library. At this point it didn’t even have a real name.

It started simple, we found bugs and things we’d overlooked, then we started seeing patterns arising and ways to make the development experience better.

We were able to quickly try some ideas and trash them if they didn’t work. At first we started building it like a game engine. It had a rendering loop that would check to see if components were dirty and re-render on every frame, and a scenethat managed all the components and inputs like a game world. This turned out to be annoying for debugging and made it overly complex.

Build, test, iterate

Thanks to this process of iteration we were able to cut scope. We never needed context or refs like React, so we didn’t add it. We started with a syntax that used prototypes and constructors but it was unnecessarily verbose. We haven’t had to worry about maintaining text selection because we haven’t run across it in real-world use. We also haven’t had any issues with element focus because we’re only supporting newer browsers.

We spent many late nights discussing the API on a white board and it’s something we care about a lot. We wanted it to be so simple that it would be almost invisible to the user. An API is just UI for developers so we treated it like any other design problem at Segment — build, test, iterate.

Fine-tuning performance

Performance is the most important feature of any UI library. We couldn’t be sure if the library was on the right path until we’d seen it running in a real app with real data and constraints. We managed to get decent performance on the first try and we’ve been fine-tuning performance as we add and remove new features.

We first ran into performance issues when we had to re-build the debugger. Some customers were sending hundreds of events per second and the animation wouldn’t work correctly if we were just trashing DOM elements every frame. We implemented a more optimized key diffing algorithm and now it renders hundreds of events per second at a smooth 60 fps with ease. Animations included.

Stablizing the API

Eventually everything started to settle down. We took the risk and implemented our own library and it now powers the a large portion of our app. We’ve stripped thousands of lines of code and now it’s incredibly easy to add new features and maintain thanks to this new library.

Finally, we think it’s ready to share with everyone else.

Introducing Deku

Deku is our library for building user interfaces. It supports many of the features you’re familiar with in React but aims to be small and functional. You define your UI as a tree of components and whenever a state change occurs it re-renders the entire tree to patch the DOM using a highly optimized diffing algorithm.

The whole library weighs in at less than 10kb and is easy to follow. It’s also using npm so some of those modules are probably being used elsewhere in your code anyway.

It uses the same concept of components as React. However, we don’t support older browsers, so the codebase is small and component API is almost non-existent. It even supports JSX thanks to Babel.

Here’s what a component looks like in Deku:

Then you can import that component and render your app:

Designed for ES6

You’ll notice there is no concept of classes or use of this. We’re using plain objects and functions. The ES6 module syntax is used to define components and every lifecycle hook is passed the component object which holds the props and state you’ll use to render your template.

We never really needed classes. What’s the point when you never initialize them anyway? The beauty of using plain functions is that the user can use the ES6 module system to define them however they want! Best of all, there’s is no new syntax to learn.

Lifecycle hooks

Deku has many of the same lifecycle hooks but with two new ones - beforeRenderand afterRender. These are called every single render, including the first, unlike the update hooks. We’ve found these let us stop thinking about the lifecycle state so much.

Some of the lifecycle hooks are passed the setState function so you can trigger side-effects to update state and re-render the app. DOM events are delegated to the root element and we don’t need to use any sort of synthesized event system because we’re not supporting IE9 and below. This means you never need to worry about handling or optimizing event binding.

Client and server rendering

To render the component to the DOM we need to create a tree. The tree will manage loading data, communicating between components and allows us to use plugins on the entire application. For us it has eliminated the need for anything like Flux and there are no singletons in sight.

You can render the component tree anyway you’d like — you just need a renderer for it. So far we have a HTML renderer for the server and a DOM renderer for the client since those are the two we’ve needed. It would be possible to build a canvas or WebGL renderer.

Performance benchmarks

The dbmonster performance mini-app written in Deku is also very fast and renders at roughly 15-16 fps compared to most other libraries which render at 11-12 fps. We’re always looking for more ways to optimize the diffing algorithm even further but it’s already we think it’s fast enough.

You can read more about Deku and view some examples on it’s Github page.

Why not React?

The first thing we usually get asked when we tell people about Deku is “Why didn’t you just use React?”. It could seem like a classic case of NIH syndrome.

We originally looked into this project because we use Duo as a front-end build tool. Duo is like npm, but just uses Github. It believes in small modules doing one thing well. React was a ‘big thing’ doing many things within a black box. We like knowing in detail how code works, so we feel comfortable with it and can debug it when something goes wrong. It’s very hard to do that with React or any big framework.

So we looked for smaller alternatives, like virtual-dom and mercury. The documentation for virtual-dom was slim and we didn’t think the API for mercury was very user friendly.

We ended up using React for a short time but the API forced us to use a class-like syntax that would lock us into the framework. We also found that we kept fighting with function context all the time which is waste of brain energy. React has some functional aspects to it but it still feels very object-oriented. You’re always concerning yourself with implicit environment state thanks to this and the class system. If you don’t use classes you never need to worry about this, you never need decorators and you force people to think about their logic in a functional way.

What started as a hack project to see if we could better understand the concept behind React has developed into a library that is replacing thousands of lines of code and has become the backbone of our entire UI. It’s also a lot of fun!

What’s next

We’ve come a long way in the past few months. Next we’re going to look at a few ways we could add animation states to components to solve a problem that plagues every component system using virtual DOM.

In our next post on Deku, we’ll explain how we structure our components and how we deal with CSS. We’ll also show off our UIKit — the set of components we’ve constructed to rapidly built out our UI.

Steven Miller, Dominic Barne on April 9th 2015

Last week, we open sourced Sherlock, a pluggable tool for detecting third-party services on a given web page. You might use this to detect analytics trackers (eg: Google Analytics, Mixpanel, etc.), or social media widgets (eg: Facebook, Twitter, etc.) on your site.

Sherlock at Segment

We know that setting up your integrations has required some manual work. You’ve had to gather all your API keys and enter them into your Segment project one by one. We wanted to make this process easier for you, and thought that a “detective” to find your existing integrations would help!

Enter Sherlock. When you tell us your project’s url, Sherlock searches through your web page and finds the integrations you’re already using. Then, he automatically enters your integrations’ settings, which makes turning on new tools a bit easier.

How It Works

Here’s a code sample of Sherlock in action:

Since there are no services baked into Sherlock itself, we’re adding a Twitter plugin here manually. Sherlock opens the url and if widgets.js is present on the page, then it will be added to results.

The above example is admittedly trivial. Here’s a more realistic use-case:

Here, we are adding sherlock-segment, a collection of plugins for about 20 of the integrations on our platform. Now, results will look like this:

Custom Plugins

To make your own plugin, simply add the following details to your package.json: (feel free to use sherlock-segment as a starting point)

  • name should include “sherlock-“ as a prefix

  • keywords should include “sherlock”

Your plugin should export an array of service configuration objects, each object can support the following keys:

  • name should be a human-readable string

  • script can be a string, regular expression, or a function that matches the src attribute of a script tag

  • settings is an optional function that is run on the page to extract configuration

Here is an example service configuration:

Contributing

Our plugin currently supports about 20 integrations. If you are interested in helping us support even more, feel free to open up an issue or a PR on GitHub!

Dominic Barnes on April 3rd 2015

Make is awesome! It’s simple, familiar, and compatible with everything. Unfortunately, editing a Makefile can be challenging because it has a very terse and cryptic syntax. In this post, we will outline how we author them to get simple, yet powerful, build systems.

For the uninitiated, check out this gist by Isaac Schlueter. That gist takes the form of a heavily-commented Makefile, which makes it a great learning tool. In fact, I would recommend checking it out regardless of your skill level before reading the remainder of this post.

readability: documentation | dry

Here at Segment, we write a lot of code. One of our philosophies is that the code we write should be beautiful, especially since we’ll be spending literally hours a day looking at it.

By beautiful, we mean that code should not be convoluted and verbose, but instead it should be expressive and concise. This philosophy is even reflected in how we write a Makefile.

We dedicate the top section of each Makefile as a place to define variables (much like normal source code). These variables will be used to reduce the amount of code used in our recipes, making them far easier to read.

In node projects, we always rely on modules that are installed locally instead of globally. This allows us to give each project it’s own dependencies, giving us the room to upgrade freely without worrying about compatibility across our many other projects.

This decision requires more typing at first:

But it’s easily fixed by using Makefile variables:

We use this same pattern frequently, as it helps to shorten the code written in a recipe, making the intention far more clear. This makes understanding the recipe much easier, which leads to faster development and maintenance.

Beyond just using variables for the command name, we also put shared flags behind their own variable as well.

This helps keep things dry, but also gives developers a hook to change the flags themselves if needed:

clean: documentation

When writing code and interacting with developer tools, we seek to avoid noise as much as possible. There are enough things on a programmer’s mind, so it’s best to avoid adding to that cognitive load unnecessarilly.

One example is “echoing” in Make, which basically outputs each command of your recipe as it is being executed. You may notice that we used the @ prefix on the recipes above, which actually suppresses that behavior. This is a small thing, but it is part of the larger goal.

We also run many commands in “quiet mode”, which basically suppresses all output except errors. This is one case where we definitely want to alert the developer, so they can take the necessary action to correct it.

When running make, now we only will see errors that happened with the corresponding build. If nothing is output, we can assume everything went according to plan!

targets:

There are some target names that are so commonly used, they practically become a convention. While we haven’t invented most of the targets I will mention here, the main principle here is that using names consistently throughout an organization is important to improve the experience for developers new to a specific project.

build

Since we have a lot of web projects, the build/ directory is often reserved as the destination for any files we are bundling to serve to the client.

clean

This target is used to delete any transient files from the project. This generally includes:

  • the build/ directory (the generated client assets)

  • intermediary build files/caches

  • test coverage reports

Remote dependencies are not part of this process. (see clean-deps)

clean-deps

Depending on the size and complexity of a project, the downloaded dependencies can take a considerable amount of time to completely resolve and download. As a result, they are cleaned using a distinct target.

default

While Make will automatically assume the first target in a Makefile is the default one to run, we adopt the convention of putting a default target in every Makefile, just for consistency and flexibility.

For our projects, the default target is usually synonymous with build, as it is common practice to enter a project and use make to kick off the initial build.

lint

Runs static analysis (eg: JSHint, ESLint, etc) against the source code for this project.

server

This starts up the web server for the given project. (in the case of web projects)

test

This is exclusively for running the automated tests within a project. Depending on the complexity of the project, there could also be other related targets, such as test-browser or test-server. But regardless, the test target will be the entry-point for a developer to run those tests.

done:

All in all, Make is a powerful tool suitable for many projects regardless of size, tooling and environment. Other tools like Grunt and Gulp are great, but Make comes out on top for being even more powerful, expressive and portable. It has become a staple in practically all of our projects, and the conventions we follow have helped to create a more predictable workflow for everyone on the team.

Calvin French-Owen on April 1st 2015

We’ve been running Node in production for a little over two years now, scaling from a trickle of 30 requests per second up to thousands today. We’ve been hit with almost every kind of weird request pattern under the sun.

First there was the customer who liked to batch their data into a single dump every Friday night (getting called on a Friday night used to be a good thing). Then the user who sent us their visitor’s entire social graph with every request. And finally an early customer who hit us with a while(true) send(data) loop and caused a minor emergency.

By now, our ops team has seen the good, the bad, and the ugly of Node. Here’s what we’ve learned.

Beware the event loop

One of the great things about Node is that you don’t have to worry about threading and locking. Since everything runs on a single thread, the state of the world is incredibly simple. At any given time there’s only a single running code block.

But here… there be dragons.

Our API ingests tons of small pieces of customer data. When we get data, we want to make sure we’re actually taking the JSON and representing any ISO Strings as dates. We traverse the JSON data we’d receive, converting any date strings into native Date objects. As long as the total size is under 15kb, we’ll pass it through our system.

It seemed innocent enough… until we’d get a massively nested JSON blob and we’d start traversing. It’d take seconds, even minutes, before we chewed through all the queued up function calls. Here’s what the times and sizes would look like after an initial large batch would get rejected:

Ouch!

And then things would only get worse: the problems would start cascading. Our API servers would start failing healthchecks and disconnect from the ELB. The lack of heartbeat would cause the NSQ connection to disconnect so we weren’t actually publishing messages. Our customer’s clients would start retrying, and we’d be hit with a massive flood of requests. Not. Good.

Clearly something had to be done–we had to find out where the blockage was happening and then limit it.

Now we use node-blocked to get visibility into whether our production apps are blocking on the event loop, like this errant worker:

It’s a simple module which checks when the event loop is getting blocked and calls you when it happens. We hooked it up to our logging and statsd monitoring so we can get alerted when a serious blockage occurs.

We dropped in the module and immediately started seeing the following in our logs:

A customer was sending us really large batches of nested JSON. Applying a few stricter limits to our API (this was back before we had limits) and moving the processing to a background worker fixed the problem for good.

To further avoid event loop problems entirely, we’ve started switching more of our data processing services to Go and using goroutines, but that’s a topic for an upcoming post!

Exceptions: the silent noisy killer

Error handling is tricky in every language–and node is no exception. Plenty of times, there will be an uncaught exception which–through no fault of your own–bubbles up and kills the whole process.

There are multiple ways around this using the vm module or domains. We haven’t perfected error handling, but here’s our take.

Simple exceptions should be caught using a linter. There’s no reason to have bugs for undefined vars when they could be caught with some basic automation.

To make that super easy, we started adding make-lint to all of our projects. It catches unhandled errors and undefined variables before they even get pushed to production. Then our makefiles run the linter as the first target of `make test`.

If you’re not already catching exceptions in development, add make-lint today and save yourself a ton of hassle. We tried to make the defaults sane so that it shouldn’t hamper your coding style but still catch errors.

In prod, things get trickier. Connections across the internet fail way more often. The most important thing is that we know when and where uncaught exceptions are happening, which is often easier said than done.

Fortunately, Node has a global uncaughtException handler, which we use to detect when the process is totally hosed.

We ship all logs off to a separate server for collection, so we want to make sure to have enough time to log the error before the process dies. Our cleanup could use a bit more sophistication, but typically we’ll attempt to disconnect and then exit after a timeout.

Actually serializing errors also requires some special attention (handled for us by YAL). You’ll want to make sure to include both the message and stack explicitly, since they are non-enumerable properties and will be missed by simply calling JSON.stringify on the error.

Finally, we’ve also written our own module called oh-crapto automatically dump a heap snapshot for later examination.

It’s easily loaded into the chrome developer tools, and incredibly handy for those times we’re hunting the root cause of the crash. We just drop it in and we’ve instantly got full access to whatever state killed our beloved workers.

Limiting concurrency

It’s easy to overload the system by setting our concurrency too high. When that happens, the CPU on the box starts pegging, and nothing is able to complete. Node doesn’t do a great job handling this case, so it’s important to know when we’re load testing just how much concurrency we can really deal with.

Our solution is to stick queues between every piece of processing. We have lots of little workers reading from NSQ and each of them sets a maxInFlightparameter specifying just how many messages the worker should deal with concurrently.

If we see the CPU thrashing, we’ll adjust the concurrency and reload the worker. It’s way easier to think about the concurrency once at boot rather than constantly tweaking our application code and limiting it across different pipelines.

It also means we get free visibility into where data is queueing, not to mention the ability to pause entire data processing flows if a problem occurs. It gives us much better isolation between processes and makes them easier to reason about.

Streams and errors

We moved away from using streams for most of our modules in favor of dedicated queues. But, there are a few places where they still make sense.

The biggest overall gotcha with streams is their error handling. By default, piping won’t cause streams to propagate their errors to whatever stream is next.

Take the example of a file processing pipeline which is reading some files, extracting some data and then running some transforms on it:

Looking at this code, it’s easy to miss that we haven’t actually setup our error handling properly. Sure, the resulting pipeline stream has handlers, but if any errors occur in the ReaderExtract or Transform streams, they’ll go uncaught.

To get around this, we use Julian Gruber’s nifty multipipe module, which provides a nice API over centralized error handling. That way we can attach a single error handler, and be off to the races.

Go’ing Forward

If you’re also running Node in production and dealing with a highly variable data pipeline, you’ve probably run into a lot of similar issues. For all these gotchas, we’ve been able to scale our node processes pretty smoothly.

Now we’re starting to move our new plumbing and data infrastructure layers to Go. The language makes different trade-offs, but we’ve been quite impressed so far. We’ve got another post in the works on how it’s working out for us, along with a bunch of open source goodies! (and if you love distributed systems, we’re hiring!)

Have other tips for dealing with a large scale node system? Send them our way.

And as always: peace, love, ops, analytics.

Chris Sperandio on February 19th 2015

Before I joined Segment, I was something of a Github stalker. Which is how I found Segment.

(To be clear, I’m still a Github stalker, only now I work here.)

We’ve all been there before: lost in the depths of a mental call stack, eight repos deep with only a foggy idea of why it’s 3AM and how I ended up here.

I snooped through Segment’s projects for I don’t know how long before starting to realize what drew me in so consistently across every project. And it wasn’t until I joined the team and learned the thought-process and ethos behind them that I gained a sincere appreciation for why we have over 1000 repos.

Our approach to software is radically modularpluggable and composable.

Which makes sense, because in reality, that’s the whole point of Segment.

When you build your tools at the right level of abstraction, incidental complexity is hidden away and edge-cases take care of themselves. Not to mention, you can be a lot more productive. It’s what precipitated analytics.js in our early days: the intention to find the right level of abstraction for collecting data about who your users are and what they’re doing. It’s why people love expresskoa, and reworktoo.

It’s also why we’re big proponents of the component and duo ecosystem. We even manage customer and partner logos with an extensible and modular systemand our entire front-end is built with components based on ripple and, more recently, deku.

It’s hard to communicate the power of this modular and composable approach, but it ends up being disarmingly obvious to developers and product strategists alike (see Rich Hickey’s presentation. Rather than attempt to explain it outright, I’ll give you a tour of our more popular open source repos to show you how we’ve tried to make them small, self-contained, and composable.

Let’s dive into some examples!

Metalsmith

segmentio/metalsmith

When building our documentationacademyblogjob board, and help section, we wanted the speed and simplicity of static sites over the restrictiveness and complexity of a CMS. And though there’s a lot of logic that could be shared between them, each called for its own unique feature-set and build process.

But when we looked at existing static site generators, they all imposed a degree of structure on the content, and weren’t flexible enough for our wide array of use cases. Enter: Metalsmith.

Metalsmith does not impose any assumptions on your content model or build process. In fact, Metalsmith is just an abstraction for manipulating a directory of files.

Breaking static sites down to their core, the underlying abstraction includes content in files (blog entries, job listings, what have you), and these files’ associated metadata. Metalsmith allows you to read a directory of files, then run a series of plugins on that data to transform it exactly the way you need.

For example, you can run markdown files through handlebars templates, create navigation or a table of contents, compress images, concatenate scripts, or anything your heart desires before writing the result to the build directory.

For example, our blog articles are just files with two sections: a header with metadata about the author, date, title and url, and then markdown for the content of the article. Metalsmith transforms the markdown to HTML, wraps the posts in their layout, looks up and inserts the author’s avatar, renders any custom Handlebars helpers, etc. The beauty is that the build process is completely customizable and abstract for many use cases. It’s just a matter of which plugins you choose. And the word is out: the metalsmith plugin ecosystem is booming!

By building our static sites with Metalsmith and hosting the source on Github, our marketing, success, and business teams can create and edit posts right from the Github web interface, or work locally and “sync” their updates with Github for Mac. This workflow closely mirrors a traditional CMS, but gives us the speed and reliability of a pre-built, static site while also lending things like auto-generated code samples for every language and automatically compiled navigation, two places otherwise prone to falling out of date.

Khaos

segmentio/khaos

As you’ve heard, Segment is committed to a component-driven development model: breaking things into small pieces that can be developed in isolation, and then shared and reused. But when everything is a plugin, that means an awful lot of small repos. So we build tooling to make working with lots of small repos frictionless.

For example, when we or a partner want to add a new integration to Segment, the very first thing we need to do is create a new repo to house that project. In order to enforce common style across file structure and code, we built a project scaffolder that generates a “base” the developer can use to jump right into their project.

This is where Khaos comes in — our own project scaffolder that’s built on Metalsmith, the first example here of building building blocks (with building blocks :). In the source, you can see how even we tried to make Khaos itself composable and modular.

Khaos is really just a CLI wrapper for metalsmith with the following plugin pipeline:

  • Read template files from a directory.

  • Parse files for template placeholders.

  • Prompt user to fill in each placeholder.

  • Render files with a templating engine.

  • Write filled-in files to a new directory.

We have khaos templates for new logos, integrations, back-end services, nightmare plugins, etc. Not only does this make getting started easy, but it reinforces cultural values like defaulting yes to MIT licensing.

Metrics

segmentio/metrics

As you might guess, we use Segment as the backbone of our customer data pipeline to route our data into our third-party tools and to Amazon Redshift.

While we use our partners’ visualization tools to write and share ad-hoc queries against data in Segment SQL, we wanted to make the most important data points accessible in real time throughout the organization. So we built Metrics.

We query the underlying the data from Segment Warehouses and services like Stripe and Zendesk, and use Metrics to orchestrate these queries and store the aggregate metrics for each team. On any given team’s board you might see ARR, MRR, daily signups, the depth of our queue for new integration requests, number of active Zendesk tickets by department, number of deploys in the last week – the list goes on.

We’ll go into more details about the business motivations and outcomes around Metrics in a future blog post, but what excites me most about Metrics is how it’s designed under the hood. It’s another example of offloading feature scope to plugins.

You can use plugins to define what data gets collected and stored, the interval at which it’s updated, and where those metrics are pushed: to dashboards, spreadsheets, summary emails, or anywhere else your heart desires.

All metrics does is expose an API for orchestrating this dance via plugins.

Check it out on github here!

Hermes

segmentio/hermes

We have a bit of an obsession with automation and elimination of the mundane. And that’s what drove the development of Hermes.

Raph, our beloved head of sales and first businessperson at Segment, thought Hermes was Ian’s potty-mouthed “ami français” for a few months. Nope. Hermes is a chatbot whose sole feature is, you guessed it, a plugin interface.

When you’re building a new feature for Hermes, like looking up an account’s usage, or fetching a gif from the interwebs, all you need to do is tell Hermes what he’s listening for and what to say back. Everything in between, you define in your own plugin.

Whether we want to announce that lunch is here, check Loggly for errors related to a customer’s project, kickoff a Metalsmith build of the latest blog release, or create an SVG logo for a new integration, we get our boy Hermes Hubeau to do the dirty (repetitive) work.

We were thankful for the plugin approach when we switched from Hipchat to Slack. Instead of rewriting all of Hermes, we just hot-swapped the old plugin with the new!

Nightmare

segmentio/nightmare

“Wait a minute — your chat bot creates SVG logos?!”

Nope! Humans do. Hermes just knows how to ask politely. He creates a new logorepo with Khaos, then spins up a Nightmare instance based on Metalsmith plugins, navigates to 99designs Tasks, and posts a job. When the job is finished, he resizes the logos with our logo component creation CLI.

Automating these sorts of jobs, for which there was not yet a public API, required us to mimic and automate pointing and clicking in a web browser, and that’s what Nightmare does. While there were plenty of tools to do this, like PhantomJS, webdriver APIs imposed the burden of a convoluted interface and lots of mental overhead. So we wrote a library that puts all those headaches under the covers, and lets you automate browsers the way you browse the web:

Browser automation is nothing new, but we tried to give Nightmare a cleaner API and a plugin interface so people could more easily compose automations. The goal is to make it really simple to automate tasks on the web and create APIs where a public one doesn’t yet exist.

Check out Github for nightmare plugins like X-Ray99designsSecureapay, and LinkedIn.

Why

We try to break things into small, reusable pieces and hate to solve the same problem twice, pushing for simple solutions that build off of each other and are flexible enough for multiple use cases.

This level of commitment to “building building blocks” and sharing them with the community is what drew me in so hypnotically to Segment in the first place, and why I feel so immensely fortunate to be here now. As a success engineer, part of my job is to build and maintain internal tooling that enables us to better serve our customers. I’m empowered to apply the same principles and rigor used by our product team and core engineers to those projects, and my development and product direction skills have improved faster than I ever thought possible as a result.

If you think of any cool use cases for any of these tools at your company, we’d love to hear more about them. Tweet us @segment with your ideas or fork away on GitHub! We always appreciate new plugins and contributions. And if any of this particularly resonates with you, we’re hiring!

TJ Holowaychuk on February 21st 2014

One of the most popular logging libraries for node.js is Winston. Winston is a great library that lets you easily send your logs to any number of services directly from your application. For lots of cases Winston is all you need, however there are some problems with this technique when you’re dealing with mission critical nodes in a distributed system. To solve them we wrote a simple solution called Yet-Another-Logger.

The Problem

The typical multicast logging setup looks something like this:

The biggest issue with this technique for us was that many of these plugins are only enabled in production, or cause problems that are only visible under heavy load. For example it’s not uncommon for such libraries to use CPU-intensive methods of retrieving stack traces, or cause memory leaks, or even worse uncaught exceptions!

Another major drawback is that if your application is network-bound like ours is, then sending millions of log requests out to multiple services can quickly take its toll on the network, slowing down everything else.

Finally the use of logging intermediaries allows you to add to or remove services at will, with without re-deploying most of your cluster or making code changes to the applications themselves.

The Solution

Our solution was to build a simple client/server system of nodes to isolate any probelms just to a set of servers whose sole job is to fan out the logs. We call it Yet-Another-Logger, or YAL.

It’s made up of two parts: the YAL Client which sends data to a cluster of YAL Servers, which in turn fan-out your logs out to the target services. Together they give you an architecture like this:

YAL Client

The Yet-Another-Logger client is pretty much you would expect from a standard logging client. It has some log-level methods and accepts type and messagearguments—standard stuff. The only difference is that you instantiate the client with an array of YAL Server addresses, which it uses to round-robin:

YAL is backed by the Axon library, a zeromq-inspired messaging library. The great thing about this is that when a node goes down, messages will be routed to stable nodes, and then resume when the node comes back online.

YAL Server

The YAL server is also extremely simple. It accepts log events from the clients and distributes them to any number of configured services, taking the load off of mission-critical applications.

At the time of writing YAL Server is a library, and does not provide an executable, however in the near future an executable may be provided too. Until then a typical setup would include writing a little executable specific to your system.

Server plugins are simply functions that accept a server instance, and listen on the 'message' event. That makes writing YAL plugins really simple. It’s also trivial to re-use an existing Winston setup by just plunking your Winston code right into YAL Server.

I’d recommend always running at least 3 YAL Servers in a cluster for redundancy, so you can be sure not to lose any data.

That’s it!

That’s all for now! The two pieces themselves are very simple, but combined they give your distributed system a nice layer of added protection against logging-related outages.

Coming up soon I’ll be blogging about some Elasticsearch tooling that we’ve built exlusively for digging through all of those logs we’re sending through YAL!

Anthony Short on February 19th 2014

Every month we’re going to do a round-up of all the projects we’ve open-sourced on Github. We have hundreds of projects available for anyone to use, ranging from CSS libraries and UI components to static-site generators and server tools. Not to mention that Segment all started from analytics.js.


Myth

Myth is a preprocess that lets you write pure CSS without having to worry about slow browser support, or even slow spec approval. It’s a like CSS polyfill.

npm-diff

Diff two versions of a node module.

YAL

Yet-Another-Logger that pushes logs to log servers with axon/tcp to delegate network overhead.

concurrent-transform

Adds some concurrency to a transform stream for that multiple items may be transformed at once.

co-queue

A FIFO queue for co.


If you want to see more of the awesome code we’re releasing, follow us on Github or follow any of our team members. We’re all open-source fanatics.

Peter Reinhardt on November 27th 2013

When we analyze usage and customers and Segment, we constantly need to join queries across Mongo and Redis. Why? Because our account information is in Mongo and our API usage is in Redis. Today we’re open sourcing Hydros. It’s a quick cheat to let us run SQL queries for analysis, while using NoSQL in production.

What we’ve noticed is that every business question boils down to a simple join across account info and usage. Here are some examples:

  • Enterprise integrations: find the integrations used by projects (Mongo) that send over 100 million API calls per month (Redis).

  • Mobile projects: get the names of projects (Mongo) that use our iOS or Android SDKs (Redis).

  • Power users: get the emails of users (Mongo) who have 20 or more active projects (Redis).

Before Hydros, I’d cobble together a bunch of 50-line node scripts that would connect to both databases. All the join and relational logic was in code. It was horrible. Just a huge, messy folder of code that I never wanted to touch again. Check out cohort.js for a taste of what should have been a simple SQL query.

For an engineer turned business guy, this is pretty frustrating. I wanted something maintainable, that we could build on as the company grows.

This was such an annoying problem for us that we even went to so far as to sync our entire database to Google Spreadsheets so that we could sort, filter and join the databases there. Ilya made some magic happen there, but Google Docs is just really slow and clunky.

Finally! one night at Happy Data Hour, Josh from Mode yammered my ear off about how Yammer’s internal analytics system worked. I was a couple beers in, but what I understood, I liked :) I definitely walked away with a bastardized version of what they’ve accomplished over there, but…

Yammer’s system was simple and badass.

It was akin to “data marts”… you sync your databases to SQL tables idempotently and transactionally, and then run the SQL queries there. Simple.

Here we were, getting all fancy with NoSQL, but the answer was right there all along. Good old SQL.

If we had a good syncing abstraction, all we’d need to do is:

  1. Write idempotent transformations from production databases to SQL tables.

  2. Run our queries against the SQL tables.

Want!

So that weekend I got really excited, and started building a similar system for ourselves. After a couple fresh starts and a rewrite, Hydros was born.

Hydros is a node module that lets you easily pull any data source into a MySQL table. You define the SQL table name, columns, and two functions: list and get.

The list function is generates a list of all rows that should be in the Hydros table. For a user table this would be an array of all the user IDs. For a project table this would be an array of all the project IDs. Dropdead simple, that’s the point :)

The get function is responsible for filling in a single row. The function is passed a single row id, and returns all the column values for that row. For a project table, this might mean looking up project metadata in Mongo, or looking up API usage in Redis.

Between those two functions, you get a full sync: list all the rows, then get the columns for each row.

Hydros handles table creation and manages the timing of list and get for you automatically.

The goal is to have many simple tables in MySQL, and then have many simple Hydros instances syncing the data into them. We have a half-dozen tables already, and it’s growing quickly.

For example, a hydros implementation of an “Project API Usage” table might work like this:

  1. list project IDs from Mongo

  2. get each project’s API usage by pulling counters from Redis

The Hydros table gets a list of rows it should have by polling the list function. Then, at a higher frequency, Hydros polls the get function to fill out the columns for each row. You control the refresh time.

Here’s an incomplete implementation of that example:

How we use it.

At Segment we use Hydros to answer a ton of business questions. Combined with Chartio, even the nontechnical people on our team can run queries and dashboard the results.

We have seven tables so far:

  • Project API Usage (list: Mongo, get: Redis)

  • Project Integrations (list: Mongo, get: Mongo)

  • Project Channel Usage (list: Mongo, get: Redis)

  • Project Library Usage (list: Mongo, get: Redis)

  • Project Metadata (list: Mongo, get: Mongo)

  • User Metadata (list: Mongo, get: Mongo)

  • User Projects (list: Mongo, get: Mongo)

And from that we create 25 charts and tables. Here are some examples:

  • A table of client libraries, sorted by popularity.

  • A table of integrations sorted by popularity. Juicy competitor data!

  • A graph of monthly project cohorts, colored by payment tier.

  • A graph of monthly project cohorts, colored by client library.

The charts tell us which client libraries and integrations we should focus on, help us estimate future revenue, and even let us prioritize enterprise leads to contact.

Hydros keeps the underlying tables in sync with production Mongo and Redis, completing a sync every 8 hours. Chartio keeps the charts up to date within 30 minutes. This is plenty fast for product analytics!

With Hydros in place, we’re saving a ton of time. Instead of writing piles of janky query code for every business question, we just run SQL queries. Best of all, we get to use business intelligence tools like Chartio right out of the box. We’re actually able to build on our analysis instead of treading water.

If you want to take Hydros for a spin, we open-sourced it. Check it out on Github: https://github.com/segmentio/hydros.

Calvin French-Owen on May 9th 2013

Five months ago, we released a small library called Analytics.js by submitting it to Hacker News. A couple hours in it hit the #1 spot, and over the course of the day it grew from just 20 stars to over 1,000. Since then we’ve learned a ton about managing an open-source library, so I wanted to share some of those tips.

At the very beginning, we knew absolutely nothing about managing an open-source library. I don’t think any of us had even been on the merging side of a pull request before. So we had to learn fast.

Since Analytics.js has over 2,000 stars now, lots of people are making amazing contributions from the open-source community. Along the way, we’ve learned a lot about what we can do to keep pull requests top-quality, and how to streamline the development process for contributors.

1. Keep a consistent style.

New contributors will look at your existing codebase to learn how to add functionality to your library. And that’s exactly what they should be doing. Every developer wants to match the structure of a library they contribute to, but don’t own. Your job as a maintainer is to make that as easy as possible.

The trouble starts when your library leaves ambiguity in its source. If you do the same thing two different ways in two different places, how are contributors going to know which way is recommended? Answer: they won’t.

In the worst case they might even decide that because you aren’t consistent, they don’t have to be either!

Solving this takes a lot of discipline and consistency. As a rule, you shouldn’t experiment with different styles inside a single open source repository. If you want to change styles, do it quickly and globally. Otherwise, newcomers won’t be able to differentiate new conventions from ones you abandoned months ago.

We started off being very poorly equipped to handle this. All of our code lived in a single file, and the functions weren’t organized at all. (And if you check out the commits, that was after a cleanup!) We hadn’t taken the time to set a consistent style - the library was a jumble of different conventions.

As the pull requests started coming in, each one conflicted with the others. Everyone was modifying the same parts of the same files and adding their own utility functions wherever they felt like it.

Which leads into my next point…

2. Make the “right” way, the easy way.

The initial way we structured our code was leading to loads of problems with merging pull requests: namely we had no structure! One of the big changes we made to fix our structure was moving over to Component.

We love Component because it eliminates the magic from our code and reduces our library’s scope. It lets us use CommonJS, so we can just require the modules we need, right from where we need them. Everything is explicit, which means our is code much easier for newcomers to follow. It’s a maintainer’s dream.

While making the switch, we wrote a bunch of our own components to replace all the utility functions we had been attaching to our global analytics object. And now, since components are easy to include and use everywhere else in the library, pull requesters just use them by default!

As soon as we released the right way and made it clear, pull request quality went up dramatically.

As far as keeping a consistent style goes, you have to be militant when it comes to new code. You cannot be afraid of commenting on pull requests even if it seems like a minor style correction, or refusing requests which needlessly clutter your API.

And remember, that goes for your own code as well! If you get lazy while adding new features, why shouldn’t contributors? The more clean code in your repository, the more good examples you have for newcomers to learn from.

Speaking of not getting lazy…

3. Write tests, and hook up Travis.

Having great test coverage is easily the best way to speed up development. We push changes all the time, so we can’t afford to spend time worrying about breaking existing functionality. We write lots of tests, and get lots of benefits: much fastoer development, more confidence in our own code, more trust from outsiders, and…

It also leads to much higher-quality pull requests!

When developers copy the library coding style, that extends to tests as well. We don’t let contributors add their own code without adding corresponding tests.

The good part is we don’t have to enforce this too much. Thanks to Travis-CI, nearly all of our pull requests come complete with tests patterned from our existing tests. And since we’ve made sure our existing tests are high quality, the copied tests naturally start off at a higher bar.

Travis is so well integrated with GitHub, that it will make merging your pull requests significantly easier. Both you and your contributors get notifications when a pull request doesn’t pass all your tests, so they’ll be incentivized to fix a breaking change.

Having a passing Travis badge also inspires trust that your library actually does work correctly and is still maintained. Not to mention peer pressures you into keeping your tests in good condition (which we all know is the hardest part).

4. Version from day one.

Versioning properly, just like testing, takes discipline and is completely worth it. Most developers when they are having issues will immediately check to see if they are running the later version of a library. If not, they’ll update and pray the bug is fixed. Without versioning, you’ll get issues being reported about bugs you’ve already fixed.

When we started, we had no idea how to manage versions at all; our repository was basically just a pile of commits. If you push frequent updates, this will needlessly hassle the developers using your library.

From the start, there are three things your repo needs for versioning:

  1. Readme.md describing what your library does.

  2. Version numbers both in the source and in git tags.

  3. History.md containing versioned and dated descriptions of your changes.

Having a changelog is essential for letting developers track down issues. Any time a developer finds a potential bug, the changelog if the first place they’ll go to see if an upgrade will fix it.

Tagging our repository also turned out to be immensely useful. Each version has a well defined point where the code has stabilized. If you’re using a frontend registry like Component or Bower, you also get the advantage of automatic packaging.

Don’t forget to put the version somewhere in your source too, where the developer using the library can access it. Because when you’re helping someone remotely debug you’ll want to have a quick way to check what version they’re running so you don’t waste time needlessly.

Oh, and use semver. It’s the standard for open-source projects.

5. Add a Makefile—the hacker’s instruction manual.

I’ll let you in on a little secret: we have close to no documentation for people wanting to contribute to the library. (That’s another problem we need to fix.) But I’m amazed at how many pull requests we receive even without any building or testing instructions whatsoever.

Just from looking at our repository, how do new developers know how to build the script?

As I mentioned before, we switched our build process over to Component. It’s worlds ahead of our old build process, but still hasn’t gained widespread adoption yet (though we’re betting it will).

Considering that most of them have never seen a Component-based repository, my best guess is that they learn through our Makefile. Running make forms a natural entry point to see how a new project works. It’s become the de facto instruction manual for editing code.

We use long flags in our scripts, and give each command a descriptive name. By using Component to structure and a build our library, we can essentially defer to their documentation before writing any of our own. Once you understand how Component works, you know how any Component-based repo is built.

We’ve done a lot to clean up our Makefile through the different iterations of ourcode. Remember, your build and test processes are part of your code as well. They should be clean and readable as they are the starting point newcomers.

6. Keep iterating on your process.

Notice how all the tips I’ve mentioned are about streamlining your process? That’s because maintaining a popular repository is all about staying above water. You’ll be making lots of little changes throughout the day as new issues are filed, and if you don’t optimize your development process all of your free time will evaporate.

Not only that, but the quality of your library will suffer. Without a good build system, automated testing, and a clean codebase, fixing small bugs becomes a chore, so issues start taking longer and longer to resolve. No one wants that.

We’ve learned a lot when it comes to managing a repo of our size, but we still have a long ways to go in terms of managing a really big project. Managing a large open source project takes a lot of work. As the codebase grows, it will be harder and harder to make major overhauls of the code. More and more people will start depending on it, and the number of pull requests will start growing.

We still have a few more TODOs that will hopefully make maintaining Analytics.js even easier:

  • We want to split up our tests even more to make them as manageable as possible. Right now the file sizes are getting pretty out of control, which means it’s hard for newcomers to keep everything in their head at once.

  • Add better contributor documentation. It’s kind of ridiculous that we haven’t done this yet. We’re surprised we’ve even gotten any pull requests at all without it, so this is very high on our list.

  • Start pull requesting every change we make to the repository ourselves as well. This way we can always peer review each other’s changes, and other contributors can get involved with discussions.

Lots of these tactics come from the Node.js source, which has great guidelines for new contributors.

Every Node.js commit is first pull requested, and reviewed by a core contributor before it is merged. New features are discussed first as issues or pull requests, so multiple opinions are considered. Node commit logs are clean, yet detailed. They have an extensive guide for new contributors, and a linter to serve as a rough style guide.


No project is perfect, but learning these lessons firsthand has helped us adopt better practices across the rest of our libraries as well. Hopefully, they’ll help you too!

Got open source lessons of your own, or projects which do a particularly good job of this? We’d seriously love to hear them, or post them up in the comments on Hacker News.

Become a data expert.

Get the latest articles on all things data, product, and growth delivered straight to your inbox.