Data Integrity: What It Is & How to Ensure Its Integrity

Kelly Kirwan on October 26th 2021

Remember the childhood game of telephone? You whisper a word like “banana” to someone, then they repeat that word in someone else’s ear, and so on, down the line. The word tends to change with each repetition, and by the end of the game, the last person in the chain may end up saying “pants.” Just like that, you’ve experienced a loss of data integrity.

Now take a step back and imagine a game of telephone on a global scale that’s growing exponentially. In their annual internet report, Cisco projects that by 2023 the number of networked devices will grow to over three times the global population. And the more data in circulation, the harder it is to manage and understand.

Big data—how we collect it, what we do with it, and how we protect or share it—is a growing concern for companies, researchers, governing bodies, and individuals alike. Ensuring data integrity is vital for business growth and customer trust.

Below we cover the essentials of data integrity:

  • What is data integrity?

  • Why data integrity is critical for business success

  • 3 important steps to ensure data integrity

  • Guarantee the integrity of your data throughout the data lifecycle with Segment

  • Data integrity FAQs

What is data integrity?

Data integrity is a company’s continual process of gathering and maintaining accurate, consistent data across multiple sources, teams, and formats. A data integrity audit asks: Is the data accurate? Was it damaged or accidentally lost? Changed in any way? Are data sets complete and free of duplication? Those all contribute to a data set’s integrity—or lack thereof.

Companies ensure data integrity through quality control, integration, validation, de-duping, preservation and storage, privileged access, cybersecurity, and audits.

Data integrity threats can take many forms

Data integrity threats include:

  • Data loss or deletion (e.g., hackers gain access and damage data with malicious intent)

  • Transfer errors (e.g., cell referencing errors when transferring data from Microsoft Excel to Google Sheets)

  • Data entry mistakes (e.g., transposing a number)

  • Collection errors (e.g., survey results don’t paint an accurate picture of a customer)

  • Commission (e.g., creating or adding inaccurate information)

  • Manipulation (e.g., changing data that already exists)

  • Corrupted data (e.g., due to server power outages)

  • Duplicate data (e.g., creating more than one set of information about a customer)

Data integrity is NOT the same as data security

It’s easy to confuse integrity processes with cybersecurity, but they are not exactly the same thing. Data security involves setting permissions that protect sensitive data from unauthorized access, malware, or data breaches. Protecting the integrity of data requires security, sure, plus a whole lot more. Automation, error-checking, physical integrity, logical integrity, domain integrity, access controls, referential integrity, and entity integrity all play a role in maintaining accurate data.

“What good is collecting customer data if you don’t trust it?”

- Tido Carriero, Segment’s VP of Engineering, Product, and Design at Synapse 2018. 

Why data integrity is critical for business success

Businesses often underestimate the importance of data integrity. Yet, it’s a vital principle of the GDPR “CIA triad.” Getting it wrong costs serious money in product development, marketing, personnel, and even possible fines or lawsuits. To get it right, you need to cultivate data accuracy, control accessibility, and achieve full integration.

Read more about how one company saved time and increased user conversion using Segment Protocols.

Accurate data is critical for trust

Back to that game of telephone. Imagine your customer data is manipulated in some way, and it’s no longer accurate. The (inaccurate) data might tell you that most people like to eat apples for breakfast, but in fact, they like bananas.

In this scenario, two things may happen: in the short term, no one will buy your product, and in the long term, trust in your brand will decrease. So basically, you just wasted a bunch of money on a product that no one wants. And you’re bleeding customers and investors left and right. After all, who wants to get involved with a company that is so blatantly wrong?

Good data integrity policies include managing access, so everyone has the data they need—no more and no less. Privileged access has two distinct advantages. First, it increases efficiency as different teams access and use it in sync. By managing access to different sets of information, you’re able to protect assets while allowing personnel to carry out tasks based on their specific job responsibilities.

Second, and maybe more importantly, privileged access ensures only appropriate users can view or edit information. This provides more robust security, and it also complies with regulatory requirements such as GDPR.

Integrated data yields valuable insights

Integration is critical to ensure consistency and completeness of data. For example, you’re probably collecting data on your customers at each point of interaction—mobile, phone, social, brick-and-mortar, etc.

But if you want a complete, accurate data set for decision making, you need data from all those touchpoints to combine into a single point of reference. Once you pool all that data in one massive data lake and integrate it with your tech stack, you can make much more sophisticated insights about your customers.

Zoom with margin

3 important steps to ensure data integrity

Ensuring data integrity takes intention and consistency. It will necessitate an ongoing investment of time and money, but it will pay for itself in the long term.

Standardize data collection & cleaning before storing

Standardized data collection means collecting data in a uniform format. First, determine the data sources and standards for spelling, capitalization, and spacing across all platforms. For example, if you collect customer addresses, use one standard format like “Ave” instead of “ave,” “Ave.,” or “avenue.” That’s a tedious process to do manually, so consider using an automated tool like Segment’s Protocols to handle this process.

Validate and diagnose data quality issues quickly

Data validation is the process of verifying the accuracy of source data to prevent mistakes before the data is moved, used, or implemented. For example, fix naming convention mistakes like the order of words, typos, or capitalization. Protocols will send engineers an automated validation report to correct errors without manually comparing each line of code.

Zoom with margin

Set specs to block bad data at the source

What’s better than cleaning up a mess? Not making a mess in the first place. You should block bad data at the source to prevent it from ever polluting your data warehouses or marketing tools. With Protocols, all data that falls outside your Tracking Plan is quarantined in a separate warehouse. That way, you can diagnose any tracking problems and keep or discard the non-standard data.

Guarantee the integrity of your data throughout the data lifecycle with Segment

To trust and use your data effectively, you need to ensure data integrity through standardized collection, validation, and diagnostics across all platforms. Segment’s Protocols does all of the above automatically so that you can ensure data integrity with minimal manual effort. Your customers are always connected, and your data should be, too.

The State of Personalization 2022

Our annual look at how attitudes, preferences, and experiences with personalization have evolved over the past year.

Frequently asked questions

Become a data expert.

Get the latest articles on all things data, product, and growth delivered straight to your inbox.