Segment Protocols enables Business Tier users to proactively prevent and transform bad data so it is clean and consistent across tools. The first major steps to prevent bad data are:
Creating a tracking plan to define what bad data looks like
Analyzing which data points are not adhering to the tracking plan
Segment Protocols provides tools to help you monitor which data points are not adhering to the tracking plan. We also provide the ability to analyze bad data in downstream analytics and business intelligence tools. In this guide, we will explore how Protocols enables you to manage your data easily.
For this recipe, we’ll use Mixpanel to build the dashboard, but the concepts are similar for other Segment-supported analytics tools. We’ll set up this dashboard for one Source, but other Sources can be added to the same dashboard by repeating steps 1-4.
Step 1: Create a Tracking Plan
Log into the Segment web application. If you have not already connected a Source, do so now.
Once the Source is enabled, head to the Protocols tab, and click New Tracking Plan:
Give your tracking plan a name. If your Source has already sent in data, you can easily build your tracking plan based on events already sent in. In this case, click Import events from source. If this a new Source with no events yet, select Add events manually:
Select the Source, then click Import and save:
The tracking plan will auto-populate with events that have already flown in. If you click the arrow next to any event, you can see all properties or traits that have been seen:
From here, you can choose to require certain properties or even enforce the data type:
You may also add events and properties to the tracking plan manually. Refer to the Segment documentation for a complete guide to creating your tracking plan.
Once you are happy with your tracking plan, click the dropdown that says 0 connected sources, then Connect source. Select your Source, then click Next:
Review the consequences, then click Connect source:
Your tracking plan should now be live for the selected Source.
Step 2: Configure the Schema
Now, we’ll choose what happens to events that violate the tracking plan. Potential reasons for violation include:
Missing required properties. For example, for an order_completed event, if the price property is set to required and missing, the event is in violation.
Invalid property value data types. For example, if you require a String but receive an integer for an order ID, throw a violation.
Property values that do not pass applied conditional filtering. For example, for an email_opened event, if the campaign_id property does not satisfy the regular expression, throw a violation.
Protocols can also monitor for unplanned events, properties, and traits that are not explicitly listed in the tracking plan.
We can choose to allow or block violations and unplanned events:
Allow: send to the Destination, despite being unplanned or in violation
Block: do not send to the Destination
For unplanned or violating event properties and traits, we can additionally choose to omit those properties or traits while sending the non-violating and planned properties/traits to Destinations.
To set this up in your Segment workspace, navigate to the desired Source and select Settings, then Schema Configuration:
In the first section (Unplanned Events, Properties and Values), use the matrix to decide which values are allowed, blocked, or omitted:
Note that this is done by Source, so the settings applied here will not apply for other Sources.
Step 3: Forward Violations & Blocked Events
To demonstrate the concept of violation forwarding, we are going to set up an additional Source, which I will call a Violation Source. The Violation Source receives violations and/or blocked events from the standard Source. From the Violation Source, events can be sent to an analytics Destination (such as Mixpanel) for monitoring.
To set this up, we are going to create a new Source. From the Connections tab, select Sources then Add Source:
Give it a name such as Violation Source, then click Add Source:
We then want to return to the Schema Configuration page from step 2. From the Segment webapp, navigate back to the standard Source (not the Violation Source) and select Settings, then Schema Configuration:
Scroll to Forwarding Settings, then enable forwarding for Violations and/or Blocked Events and Traits, and select the Violation Source.
Step 4: Connect Your Analytics Tool
Now, we’ll set up Mixpanel (the Segment Destination where our dashboard will live) and connect it to the Violation Source. This will be a brief overview of setting up Mixpanel, but you can refer to the full instructions within the Mixpanel (Actions) Destination Segment Documentation.
From your Segment workspace under Connections, click Destinations, then New Destination:
Search for and select the Mixpanel (Actions) Destination. Click “Configure Mixpanel”:
Connect the Violation Source, then click Next:
Give your Destination a name, such as Mixpanel Violation Destination, then click Save:
If you haven’t already, in a separate tab, create a free Mixpanel account.
Log into your Mixpanel account, then go to the Mixpanel project settings and copy the unique token and API secret.
Back in Segment in the Basic Settings tab of the Destination, paste the token and secret. Turn on the switch for Enable Destination, then click Save Changes:
At this point, ensure data is flowing in from your standard Source using the Debugger tab.
Step 5: Set Up Your Dashboard
In this step, we’ll create a basic data quality dashboard showing:
1) all violating events over time
2) the events ranked by number of violations.
This will help you understand the recency and total amount of bad data, respectively, broken down by event. You are welcome to create more, but this is a solid starting point.
Log in to Mixpanel, then select New Dashboard in the upper-left corner:
Give the dashboard a name, then click Add, then Insights Report:
Give the chart a name, such as All Violating Events, then under Events & Cohorts, select Your Top Events. Click Save:
If you go back to the Violations dashboard, you should see the first chart. Click Add content to create another chart. Select Insights Report again:
Give it a name, such as All Events with Violations. Under Events & Cohorts, select Your Top Events again.
On the right side, choose Bar chart, then click Save:
Back in your dashboard, you’ll see both charts. You did it!
In this recipe, we created a tracking plan, defined what to do with violations, and forwarded violations to Mixpanel to create a centralized data health dashboard. There are many more possibilities to understanding your data health, such as forwarding violations to Slack and enforcing your tracking plan in code before it is even pushed to production with Typewriter. To learn more about the possibilities of Segment protocols, refer to the documentation and this video.