Yanis Bousdira on September 8th 2021
Kelly Kirwan on August 27th 2021
Doug Roberge, Jen Skene on August 19th 2021
Kelly Kirwan on August 13th 2021
Geoffrey Keating on June 9th 2021
Geoffrey Keating on June 8th 2021
By using data models, various stakeholders such as developers, data architects, and business analysts can agree on the data they’ll capture and how they want to use it before building databases and warehouses.
A data model specifies what information to capture, how to store it, and how it relates to various components of your business. The ultimate aim for data modeling is to establish clear data standards for your entire organization. For example, a model for an eCommerce website might specify the customer data you’ll capture. It will define how to label that data and its relation to product information and the sales process.
Like a blueprint for a house, a data model defines what to build and how, before starting construction, when things become much more complicated to change. This approach prevents database design and development errors, capturing unnecessary data, and duplicating data in multiple locations.
In this article, we’ll cover these basics of data modeling:
Understanding different types of data models
Why data models are necessary for building a data infrastructure
Top three data modeling techniques
Data models get divided into three categories: abstract, conceptual, and physical models. They help align stakeholders around the why, how, and what of your data project. Each type of model serves a different purpose and audience in the data modeling process.
Conceptual data models visualize the concepts and rules that govern the business processes you’re modeling without going into technical details. You use this visualization to align business stakeholders, system architects, and developers on the project and business requirements: what information the data system will contain, how elements should relate to each other, and their dependencies.
Typically, a conceptual model shows a high-level view of the system’s content, organization, and relevant business rules. For example, a data model for an eCommerce business will contain vendors, products, customers, and sales. A business rule could be that each vendor needs to supply at least one product.
There’s no standard format for conceptual models. What matters is that it helps both technical and non-technical stakeholders align and agree on the purpose, scope, and design of their data project. All of the below images could be examples of conceptual data models.
A logical data model is based on the conceptual model and defines the project’s data elements and relationships. You’ll see the names of specific entities in the database, as well as their attributes. To stay with the eCommerce example: A logical model shows products are identified through a “product ID,” with properties like a description, category, and unit price.
Data architects and business analysts use the logical data model to plan the implementation of a database management system—software that stores, retrieves, defines, and manages data in a database.
The physical data model gets technical. Database analysts and developers use it for the design of the database and relevant data structures. The model specifies the types of data you’ll store along with technical data requirements.
An example of data type specifications is whether a piece of data will be an integer—a number without a decimal point—or a float—a number with a decimal place. Technical requirements include details on storage needs, access speed, and data redundancy—storing a piece of data in multiple locations to increase durability and improve query performance.
In practice, only very large projects, say modeling a container shipping business, move from conceptual to logical to physical models. Most other projects skip the conceptual phase and spend most of their time in logical modeling. Some teams even cover elements from the physical phase simultaneously because the people working on the logical model also do the technical implementation.
Data models are a visual representation that turns abstract ideas (“we want to track our global container shipments in real time”) into a technical implementation plan (“we will store an attribute called ‘container GPS location’ in a table called ‘Containers’ as an integer”). They help avoid costly demolition and reconstruction of your data infrastructure because data modelers need to think about the data they'll need, its relations, the data architecture, and even whether your project is viable before creating databases and warehouses.
Data models also help with data governance and legal compliance. They allow you to set standards from the start of the project so teams don’t end up with conflicting data sets that need cleaning up before they can use it or, worse, can’t use at all.
Data models and standardization help avoid situations like a sign-up field labeled in nearly a dozen different ways across the organization.
You can also identify sensitive information—social security numbers, passwords, credit card numbers—while you’re modeling so you can involve security and legal experts before you start building.
Safe, accurate, and high-quality data, confers a range of real-world benefits for various teams in your organization. Product teams can iterate faster and build immersive user experiences. Analytics and business intelligence teams can create queries without heavy workarounds. And marketing teams can improve advertising efforts by personalizing messaging according to user behaviors and traits.
Customer Data Platforms (CDPs) like Segment can do much of the heavy-lifting during data modeling projects by simplifying and systematizing data storage and organization. Segment’s Connections feature makes it easy to capture, organize, and visualize every customer-facing interaction with your business, whether digital or offline. Protocols lets you define your data standards and enforce them at the point of collection. Functionality, like real-time data validation and automatic enforcement controls, allows your to diagnose issues before they pollute your marketing and analytics tools or data warehouse.
There are many different techniques to design and structure a database. You should explore these techniques and decide on the most suitable one for your project at the end of the conceptual phase. These data modeling methodologies define how the database gets structured and closely relate to the type of formatting or technology you can use to manage your data project.
For example, many people now default to graph modeling because it’s new and popular, even when a simple relational model would suffice. Understanding the most popular techniques helps you avoid such mistakes.
In a relational data model, data gets stored in tables, of which specific elements link to information in other tables. Entities can have a one-to-one, one-to-many, or many-to-many relationship.
Relational databases often use SQL (Structured Query Language), a programming language, for accessing and managing data. They’re frequently used in point-of-sale systems, as well as for other types of transaction processing.
The Entity-Relationship Model—sometimes referred to as ER model—is similar to the relational model. It visualizes the relationships between different elements in a system but without going into technical details. You can use the ER model during the conceptual phase to align technical and non-technical stakeholders.
To understand dimensional data models, picture a cube. Each side of the cube represents an aspect of the data you’re trying to capture.
For example, suppose your business sells multiple products to different customer segments, and you want to evaluate sales performance over time. You can visualize this as a data cube, with dimensions for time, products, and customer segments. By traveling up, down, left, and right on the axes of the cube, you can make comparisons across all those dimensions. You’ll see how the sales of each of these products compare to each other and different customer segments at any point in time.
You use the cube model during the conceptual phase. One of the most frequent manifestations of such a cube in the logical stage is the “star schema,” like the one below. At first, it might look like a relational model. Still, the star schema is different because it has a central node that connects to many others.
During the conceptual phase, most people sketch a data model on a whiteboard. Such a sketch resembles the graph model. It consists of “nodes” and edges—a node represents where the data is stored, the edge the relation between nodes. It’s also the main advantage of this approach: “what you sketch on the whiteboard is what you store in the database.”
Other techniques require you to translate the output from the conceptual phase into a different format for the logical and physical implementation—for example, going from an ER to a relational model or from a cube model to a star schema. Not so with graph models. You can implement them straight away using technology like Neo4j, a native graph database platform.
When you understand the purpose of data models and the process to follow, they’re not challenging to create, especially if you also collect, organize, and standardize your data with Segment. You’ll align all stakeholders before starting technical implementation and avoid costly mistakes or rebuilds. You’ll know what expertise you need on the team to execute your plan and have your data governance defined, too.