Implementing role-based access control with Segment
The Endeavor Digital Engineering team used Segment, allowing them to rapidly onboard additional data sources and expose those sources to the data science team for analysis.
The Endeavor Digital Engineering team used Segment, allowing them to rapidly onboard additional data sources and expose those sources to the data science team for analysis.
This article is an introduction to Endeavor Digital’s engineering team’s approach to managing Segment. The data engineering team is a part of Endeavor Digital, which sits at the intersection of technology, content, and celebrity talent. The Endeavor Digital team operates mature, large-scale businesses and launches new businesses that are driving the future of media and entertainment.
In this post, we will focus on our role-based access management design pattern and its implementation using infrastructure-as-code principles and the Segment Public API. Over the coming months, we will be sharing more about our engineering practices and their implementation in our data stack. We hope this article can be informative and useful for other engineering teams facing similar problems.
At Endeavor, much of our data infrastructure and services are managed in-house through data orchestration and storage tools like Prefect, dbt, and Snowflake. However, we've found that including Segment as part of our data stack has enabled us to rapidly onboard additional data sources - especially websites and applications - and expose those sources to our data science team for further exploratory analysis. For example, when working with a new internal customer our team can send them a simple one-line JavaScript snippet to include in whichever websites we'd like to enable user activity tracking against.
Before we dive further into our implementation of role-based access control, let’s discuss some of the unique needs that we have at Endeavor and what this framework does to help achieve them.
Our data team manages pipelines and provides analytics for dozens of Endeavor brands or “business units” such as the UFC, World’s Strongest Man, Frieze, Professional Bull Riders, etc. Given that we serve a conglomeration of businesses, each with their own distinct customer data, responsible stewardship is always at the top of mind when architecting our pipelines.
We’re diligent in applying a federated model to separate different business units' data within our infrastructure. In the realm of Segment, this means that every business unit has their own set of sources and personas to prevent any sort of cross-over of data where it shouldn’t exist.
Practically, this translates to us having dozens of sources and personas spaces, all of which are methodically created, named, and labeled according to the property they belong to. For example, sources are named in the following format:
Moreover, all sources have a minimum of two labels – property and environment (property:ufc and environment:prod) for the example above.
The ability to create these resources and labels programmatically through code is a crucial part of our overall strategy for governing Segment. (Stay tuned, we hope to publish a subsequent article in the near future outlining our approach.)
Property labels are instrumental in helping us identify which users should have access to which resources within Segment, and are used directly in our implementation of role-based access control. This enables users from different Endeavor business units to self-serve within Segment and fully realize some of our favorite capabilities of the platform.
Broadly speaking, RBAC is a security framework that manages user access to a resource based on their specific role in a group or organization. You've almost certainly encountered this paradigm at some point in your professional journey.
For example, take this blog post. A user who's role we can define as "writer" probably has the privileges beyond writing. They can publish, edit, delete, or read a blog post. A user to which we assign the "reader" role will only have the read privilege. This separation of capabilities is based on the user’s role.
One of the greatest benefits of RBAC is its scalability. In less mature organizations, you might find access governed at the level of an individual being assigned permissions on an ad-hoc basis i.e. Aliyah has "read", "publish" & "edit" permissions, Miguel has "read" & "delete" permissions, Faye has "read" & "publish" permissions. These can be changed immediately depending on the assignment of each user.
As this team of three grows, this process of onboarding and managing access manually becomes increasingly difficult to scale.
Endeavor has dozens of users that interact with Segment, many of whom operate within different business units. This includes marketers, product managers, data engineers and scientists, analysts, and members of our IT team.
As these users have discrete roles in our broader organization and often their own distinct business units, relying on role-based access control principles to govern their access makes a lot of sense.
While it is possible to implement Role-Based Access Control in Segment using just its web interface, combining it within an "infrastructure as-code" (IaC) framework makes it even more useful.
Infrastructure-as-code is exactly as it sounds: managing and provisioning applications through code and configuration files. Our engineering team is a big fan of IaC and uses it wherever feasible, especially in the realm of access control.
Thankfully, Segment exposes the Segment Public APIwhich enables us to write code which directly interacts with Segment's identity-access management settings.
So, why did we decide to use infrastructure-as-code to manage RBAC?
"Nuclear Sub Principle": By committing all of our access management documents as configuration files to a git repository, we use peer review and require a minimum number of approvers to provision access. This removes the need of a human administrator interacting with a web interface and instead allows 2+ developers with sufficient access to review a pull request, and with peer oversight.
Auditability: A shared repository where all our configuration files are version controlled allows our team to understand exactly which users had what level of access at any particular point in time, as well as the ability to know who granted that access.
Scalability: IaC makes expansion even more streamlined by removing admin-user bottlenecks and introducing a common, singular interface for multiple platforms' access management settings.
Repeatability: We're able to easily reproduce our permission structures across Segment workspaces i.e. production and development, and make a full recovery of our access management configuration in case of catastrophic failure.
Before describing our method to implement RBAC-as-Code, let's further define a few of the object types that we'll be working with:
Users:
Individuals that can be assigned to a user group
Never assigned permissions directly, rather they inherit permissions from their group
Can belong to multiple user groups
Identified by their email address
Roles: A basic grouping of permissions. For example:
Minimal Workspace Access/Access to view the workspace.
Cannot view any sub-resources or make changes to the workspace.
Source Read-only/Read-only access to assigned Source(s), Source settings, enabled Destinations, Schema, live data in the Debugger, and connected Tracking Plans.
You can find a comprehensive list of roles available in Segment here.
User Groups:
These hold users and one or more roles that users belonging to the user group are eligible to assume
User groups are a useful abstraction above the level of a role that allows us to overlay our business-specific role definitions over Segment’s built-in roles
Able to define access based on our organization’s specific roles. At Endeavor, we may use groups like Data Scientist, Taste of London Marketer, and IT Administrator to categorize various levels of access.
Using infrastructure-as-code as principles, all configurations for manipulating user, role, and user group objects are defined in YAMLfiles. Of course, JSON or any other data serialization language can be used in its place but we prefer YAML due to its readability.
We have two kinds of YAML files: users and groups. These files are processed using a wrapper that we’ve built around the Segment Public API.
User YAML files accept two inputs: an email address and status.
Line by line explanation below:
email: The email address acts as a unique identifier for a user and the status as a toggle to determine whether we want to on or off-board a user.
status: To send an invite out, we set the status to active, and when off-boarding access, we simply switch this field to inactive.
To accomplish this, we utilize the following Public API endpoints:
Assigning users and roles to groups
Line by line explanation below:
User group files accept the following configuration options:
Status: used for group on- and off-boarding.
Permissions:
Role: a predefined Segment role.
Type: the subset of objects this role should apply to. We allow for three input parameters here: all, label, and entity.
All grants access to all objects belonging to the Segment role i.e. Type “Source Read-only” and type “all” would grant read only privileges on all sources in the workspace.
Label grants access to all objects with a particular label or label(s) defined in the Target field. “Source Read-only” and targets “property:taste_of_london” and “property:tol” would grant read-only access to all sources that have property labels of “taste_of_london” or “tol”.
Entity grants access to specific entities based on their name(s) as defined in the Target field. For example, “Source Read only” and target “Taste_of_London_Main_Website” would grant read-only access to a source named “Taste_of_London_Main_ Website”.
Target: Used to scope down access to specific labels or individual objects.
Users: a list of users as defined by their email address that are part of this group. Removing a user's email address from this list will also remove them from the group.
Below is an example of an inactive YAML file:
We use the following Public API endpoints to manage user groups:
We hope that this article proves useful as a basic introduction to the use of role-based access control, infrastructure-as-code, and our implementation of these paradigms via the Segment Public API. Please feel free to look up the author on LinkedIn if you'd like to further chat about this topic.
It’s free to connect your data sources and destinations to the Segment CDP. Use one API to collect analytics data across any platform.