Subscribe to get a new lesson each week!

Why You Should Own Your Data

If you’re just starting out with analytics, building an end-to-end analytics infrastructure is probably overkill (for reference, it cost a company with more than 500 employees 13 months and $240k to build theirs). However, having access to the raw underlying data is strongly encouraged nonetheless, since one day it can help you answer granular questions about your business.

In this lesson, we’ll share why you might want to own your data and your options for data storage if you’re using a popular analytics tool on the market.

Own your data or else.

Ivan Kirigan from Yes Graph gets to the heart of the issue in his post about owning your data:

If you don’t control your analytics data, you’ll be left with a choice: answer with the tools you have or don’t answer at all Ivan Kirigin, “Why You Must Store Your Own Analytics Data”

Answering with your tools will certainly get you pretty far. These services let you easily make funnel reports, graphs, dashboards, etc. with your data so you can get a holistic view of your customers and business. And you can make decisions as quickly as setting up the tool—aside from implementing the tracking, there is no further engineering required.

But at a certain point it will be hard for you to get to the bottom of a question or unusual signal in your data. “Why did that happen?” “How often does this happen?” Unfortunately, it may be difficult or even impossible for some tools to provide that depth and specificity in their reporting. It’s at this point where owning your data gives you the leg up.

You might not be there yet, but being unable to answer a critical question about your business is harmful in these two ways:

  1. Loss of time. This is an obvious one. Your competitors who have the ability to answer these questions now have a chance to close the gap.
  2. Aversion towards asking questions that are hard or costly to answer. This one is trickier, but also more dangerous. If you have an unanswerable question, you’ll start to consider the cost of answering the question (data dumps, transforming data, etc.). Your organization may begin to avoid hard to answer questions in favor of the easy ones.

Tales of data ownership.

Here are some stories from other companies that illustrate the importance of owning your own data.


Case Study: Optimizely

Optimizely, a optimization and personalization platform, uses SQL to gain actionable insights on user engagement. Prior to building and analyzing their own data warehouse, they relied on Excel, Salesforce reports, and Zendesk. However, it was too difficult and time consuming to measure metrics necessary to track adoption and usage of their new products.

The team, empowered with direct access to its own data, is currently building a custom dashboard with ChartIO that’ll pull relevant information about a particular customer for their support team. The dashboard will include product usage and engagement, ticket history, and other contextual data to enable their support team to keep customers happy.

Optimizely's support dashboard

More information can be found at the Optimizely, ChartIO Case Study.


Case Study: OneMonth

OneMonth, an online education platform aimed to teaching students practical skills (ranging from web development to growth) within one month, was happily using a set of various data analytics tools to set growth goals and measure progress.

The inevitable day came, when the team could no longer ignore the inconvenience and time-suck of asking more granular questions (“What’s the LTV of user persona A based on their course completion rate?”) and answering them by manually stitching together data from disparate and silo’d sources. Weekly reports of key metrics took entire mornings to piece together. There were also discrepancies in the data amongst different tools, making it hard to be confident in decision making.

To improve speed and maintain a single source of truth, the team decided to move all its data into one central location—Amazon Redshift.

Now all data is moved into Redshift: customer event data, production data, Typeform entries, Lesschurn events, Stripe, etc. Figuring out LTV of highly specific cohorts is just a query away. Reports with key metrics are automatically generated and emailed on Sunday evenings, saving the team valuable time.


Ways to own your data.

If you’re not owning your data yet, fear not! There are a lot of ways for you to keep it near and dear.

Google Analytics, a crowd favorite, is great for slicing and dicing your aggregate visitor data. Unfortunately, the free version of Google Analytics does not provide you access to your raw. Unfiltered, raw analytics data is available on (the premium tier and starts at $150k/year).

Mixpanel, another event-based analytics platform, provides an API to allow you to export your data. Hitting up their raw data export API is totally free, but note that you can only run one export on a project at any given time. Here is their documentation on getting started. Or for the lazy and rich.

Segment — If you’re one of our customers, there are three ways to do this in near real-time: webhooks, S3, and Warehouses. Our webhooks sends Segment data to an endpoint you host. Many of our customers use webhooks to keep a copy of their data and power internal apps. This integration is free, depending on how much data you send us. If you don’t want to deal with setting up storage, you can use our S3 integration to automatically copy your Segment data to your S3 bucket every hour. It’s on our growth plan at $449/month. Lastly, you can check out our Warehouses offering, where we programatticaly schematize and load your web and mobile data into your own Amazon Redshift or Postgres database.

Keen helps companies access their raw data for free via Extractions.

Amplitude is a predictive analytics service. They are one of the very few tools that provide raw data export access on the free tier. Learn more about their Export API.

KISSmetrics, another popular customer journey tool, provides raw data export for free. Here is the documentation to help you get started.

The last method would be building the entire customer data pipeline yourself. Here are a few data pipeline management technologies to consider if you’re interested: Luigi, Snowplow, and Airflow. We won’t dive into the nitty gritty details here (wait for the course: Leveraging Raw Data!!), but here are some awesome blog posts to point you in the right direction. We will caution you: only go this route if you have some serious engineering resources for analytics.

Control your fate—and data.

There comes a day when your analytics needs are no longer served adequately without raw, direct access to your customer data, with flexible interfaces like SQL. Until then, it’s smart to leverage hosted analytics tools and their reporting as much as possible, so long as you also own or have access to the raw data for when the time comes!

Next article

When to Use SQL for Analysis

Start your free email course today!

Get lessons right in your inbox!