Building Trust in Your Data

By Kevin White Head of Growth Marketing @ Segment

Data can be your best friend or your worst enemy.

When it’s collected in a standardized way and consistent across the tools where it’s used, data acts as your foundation for unlocking growth. As a result, teams who rely on data will trust that it’s accurate and understand what it’s telling them so they can use it effectively.

On the other hand, messy data causes less than ideal outcomes like misdirected communication with customers, confusing user experiences, ill-informed strategy decisions, and delays in time to business insight.

To gain widespread trust in data across your organization, it’s critical to lay the right infrastructure and process for how it’s collected, cleaned, governed, and acted on.

In this chapter, we’ll outline a path to do just that. We’ll start with a method for getting everyone on the same page, share proven frameworks that you can use data collection, and provide a few pointers for building trust in data across your organization.

Get everyone on the same page

Your first step to cleaning up your data? Naming conventions.

Naming conventions ensure your data is collected in a uniform fashion so that the teams who use it are all on the same page and speaking the same language.

You may not realize it, but there are many ways to name the same user interaction. Take, for example, the simple action of a user signing up for a newsletter. You could implement the event as “Sign up,” “Signup,” or “User Signed Up.”

event-protocols

Without a consistent and agreed upon naming convention, you’d inevitably collect a mixed bag of data that varies by the preferences of whomever implemented it. In the example above, your teammates would be left guessing which event actually corresponds to a user signing up for your newsletter. This may not seem like a big deal, but imagine how confusing this will become as your traffic, customer base, and number of meaningful events grow over time.

To avoid getting into this situation and enable your company to actually put data to use, there are two simple things you can do today:

  • Align on a framework for naming your events and properties (down to the casting)
  • Put a process in place to enforce your company’s framework
object-action-framework

When it comes to naming conventions, we highly recommend using the Object → Action framework as it’s easy to understand and get everyone to adhere to.

For example, imagine you’re building a music app that functions in a way that’s similar to Spotify. To understand user activity and the context of that activity, you can apply an Object → Action naming convention. Here’s how it to use the Object → Action framework when a song is played: First, choose your objects (ex: Song). Then define actions your users can perform on those objects (ex: Played). When you put it all together, your event reads ‘Song Played’.

The Object-Action Framework makes it easy to:

  • Analyze a particular feature’s performance
  • Quickly scan a list of events to find what you’re looking for
  • Impose a standard that’s easy for teams to understand

Of course, the object-action framework isn’t the only way to do this.

You can use any order of actions and objects, and any type of casing. You can also use the present or past tense. What really matters is that you keep data collection consistent!

Developing your data dictionary

We’ve helped thousands of companies implement customer data collection and found that the most successful teams have one thing in common: they use a data dictionary or data collection spec.

A data collection spec clarifies what user actions to collect, where those events live in the code base, and why those events are necessary from a business perspective. These documents of record typically live in a spreadsheet. They serve as both as a project management tool and as collateral to align your team (and the entire organization for that matter) around what data to measure success by.

When first getting started, it’s helpful to limit data collection to a handful of core user events. These events should also have a rich set of properties that can be used to give context on the action taking place. For example, if one of your core events was a user signing up for a free trial with your event being ‘user signed up‘ you’d probably also want to collect properties that give context as to who is taking that event, where they are coming from, etc.

Here’s an example of what that event could look like in your code base:

analytics-tracking

When getting started, follow these rules when spec-ing out your data dictionary to keep it neat, tidy, and semantically useful:

  • Don’t create event names dynamically
  • Don’t create events to track properties
  • Don’t create property keys dynamically
  • Make sure every event helps you answer a question about your business
  • Start with your core customer lifecycle to construct your funnel
  • Only add events as you feel they’re missing

Data dictionary examples

Because data dictionaries can be a somewhat new concept for teams to become acquainted with, we’ve developed sample tracking plans for a variety of industries and use cases.

BASIC DATA DICTIONARY

Here is a simplified version of a data dictionary. We recommend starting with a plan like this before digging into more complicated tracking.

ADVANCED DATA DICTIONARY

Here are tracking plans we use to organize to help customers get started (and also for our own Segment tracking). Some of the event properties have been trimmed to keep things clean, but everything is here.

Ensuring data quality and consistency

Any team who uses data benefits from pristine data quality. Product teams can iterate faster and build immersive user experiences with confidence. Analytics teams can build queries without heavy workarounds and informcross-functional decisions faster than ever. And marketing teams can inspire user action and improve advertising efforts by personalizing messaging according to user behaviors and traits.

Getting your organization to a state of high-quality data that all stakeholders trust takes a combination of alignment, validation, and enforcement.

Let’s dig into each of those...

ALIGNMENT

All teams need to be aligned around the importance of data before it can be clean (or cleaned up) and trusted across your org. This means standardizing data collection with an actionable data collection dictionary.

As mentioned above, we recommend using a “data dictionary” to document and standardize customer activity. We’ve found it helpful to assign a single owner of this document to oversee and enforce data standards throughout your organization. This owner should ensure:

  • Your naming convention and data schema is documented in a way that’s easy for any team across your organization to understand
  • Collected customer data is collected in a uniform fashion that matches spec
  • Resources are available and secured to diagnose any areas where “dirty data” is introduced and apply a fix
VALIDATION

Once your data dictionary is agreed upon, set, and implemented, you’ll want to make dirty data stays out of the picture. To do so, you’ll need a method for validating how new user actions make their way to your codebase.

Even with rigorous naming conventions and instrumentation instructions, errors will be introduced if engineers don’t receive automated feedback to help them identify and resolve issues during implementation. And when you’re responsible for reviewing thousands of lines of code across dozens of events, it’s inevitable that mistakes will happen.

A single tracking error on a business-critical event, like ‘Lead Captured’, can cost your business hundreds of thousands of dollars. The problem is that these bugs are typically detected weeks or months later, and by that time, the damage has been done.

Time is of the essence, so it’s important to detect mistakes before they make their way to your production environment. Rather than manually trying to compare event payloads against your data dictionary, you’ll want a way to automatically confirm when data matches your spec and alert you when it doesn’t. There’s a lot of ways to go about this, but (naturally) we prefer using our Protocols product to either 1) send a daily digest of current and new violations or 2) enable violation event forwarding to send violations as .track() calls to a Segment Source.

ENFORCEMENT

To really take data quality to the next level, you can implement a system and standards for data enforcement.

There’s a wide range of variables to consider when it comes to enforcement. On the lighter side, you may want to only block PII data from reaching tools where it can be seen by anyone with access to said tool. Or (in more developed use cases) you may want to completely block all data which does not match your spec or schema from reaching any downstream tools. At first, it’s probably best to start on the lighter spectrum of enforcement and slowly make your way to the other end of the spectrum.

If discarding data from blocked events sounds scary, there are precautions you can take to ensure no data is lost while still enforcing standards. For example, you could configure an isolated data warehouse to send data which doesn’t meet your enforcement standards. Doing so will ensure that no data loss, and you could retroactively get discarded data into necessary tools with a bit of analytics and data engineering help.

Get every chapter delivered to your inbox...

Enter your email below and we'll send you the next chapter (and every new chapter we release) as soon as it's ready.