Bad data is a real problem for your company. It can lead to extra work, a bad customer experience, and even data breaches. Fortunately, bad data can be resolved and prevented with a proper data governance strategy.
Bad data can be a real problem for your company if you allow it to sneak into your data stream. According to estimates, the side effects of bad data could cost your company as much as 30% of your yearly revenue.
Fortunately, bad data can be resolved, and even prevented, with a proper data governance strategy.
What is data governance?
Data governance is the process and framework an organization uses to maintain the quality, consistency and security of their data. Data governance is the key to preventing bad data by having a clear set of standards and policies for how data that enters the pipeline is named and organized.
What is a data governance strategy?
A data governance strategy (also called a data governance program) dictates how an organization names, categorizes, and organizes data, and makes it accessible to the right stakeholders. Your data governance strategy is your plan of attack to prevent your company from collecting bad data. It’s as important to your company as your product strategy, your pricing strategy, or your sales strategy. Without a data governance strategy, you won’t have a good understanding of what data is collected and why (and ultimately you’ll make less-informed business decision)
An adequate data governance framework also ensures data security and regulatory compliance with laws like the GDPR. .
Preventing bad data from entering your data stream comes down to mastering the four components of data governance: standardize, diagnose, defend, and transform.
Your data governance strategy will help you build a plan for each of those components.
The four components of a data governance strategy
Before you can develop your strategy, you need to fully understand what each of those four components means. Each of these data governance practices is designed to give you clear, consistent data that’s usable.
We’ll break down each component so that you understand what’s involved with it and why:
The most important part of your data governance strategy is creating standardized naming conventions for data-collection events. Doing this will prevent you from losing data, creating siloed data, or having redundant data. All of which can be serious problems.
Standardizing your naming conventions allows your data-collection events to be consistent across platforms. You’ll need to be specific about whether you’re going to use upper- or lower-case letters, dashes or underscores, and spaces or no spaces between words. If you don’t take the time to detail this, you’ll have data-collection events that look like this:
If you end up with that, when it comes time to analyze your data, you won’t know which version of “sign up” you should be using for correct analysis.
Once you’ve developed your standard naming conventions, make sure you log everything in your tracking plan. Tracking plans explain which data-collection events are currently named, where they live, and how future events will be named. Tracking plans will also help you understand the status of each data-collection event.
Here’s a simple tracking plan that we’ve put together as an example.
Now that you’ve created standardized naming conventions, you need to diagnose the quality of your active data events to make sure that they’re working properly.
If your events are not collecting data properly, you’re leaving your company open to missing or lost data. For example, say you’re collecting data when an end-user creates a new account on your platform. The data event is called “Account Created,” and its location is on your signup page/signup.
But because of an oversight, you accidentally used the URL/sign-in as the location for this event. As a result, all collected data is wrong. The correct data was never collected. If your data governance strategy had a plan to diagnose problems like that, you would’ve caught it before any data collection happened.
Diagnosing data errors can be done manually by triggering each one of your events and monitoring the result, but that process is time-consuming and is itself open to problems. Instead, you could simply use a testing tool like Protocols. That tool can test all of your events in minutes, rather than the hours it would take if you did it manually.
The next component of your data governance strategy is how you’re going to defend against rogue data events.
Rogue data events often happen simply because of oversight. Someone created an event that’s not approved in your tracking plan, and it started collecting data. Rogue events can be a problem if they’re unchecked because they’ll lead to undefined data cluttering up your data warehouse.
The only practical way to prevent this is to use a data governance tool that can automatically catch those unapproved data-collection events. Data governance tools do this by comparing incoming to your tracking plan.
If the incoming data doesn’t match your tracking plan, you can either prevent that data from being stored in your data storage systems, or you can route it to an isolated database for review.
The first three components of your data governance strategy are about planning for the future. But the fourth component is about looking at the past.
Chances are, you’ve been collecting data prior to implementing your data governance strategy. That means you’re going to have nonstandard naming conventions and data that isn’t formatted properly.
This component is designed to help you transform all past data events that aren’t named properly. You’ll need to do a comprehensive review of all of your data-collection events to make sure you have a good understanding of where your data is coming from and why.
Like the other steps, this can be done manually, but it will be very time-consuming. Instead, consider using a data governance tool.
Why data governance is essential to data management
Data governance and data management are closely related, but they’re not the same. Data governance helps to standardize your data and prevent bad data from entering your data stream. Data management is a broader concept that implements the standards set by your data governance strategy but also manages your overall data strategy.
Data Republic offers a simple explanation of the differences, “Data management is the implementation of architectures, tools, and processes to achieve stated data governance objectives.”
Without a data governance strategy, you won’t have any data standards for data management to implement. On the other side, if you don’t have data management, implementing your data standards will be very difficult.
If you want proper data management, you need to have a clearly defined data governance strategy.
Data governance prevents bad data
Preventing bad data comes down to having a data policies to deal with it. That’s exactly what your data governance strategy is designed to do. Good data governance means coming up with processes to handle naming standardization, data quality diagnosis, rogue event defense, and data transformation.