Accelerate Your Data Activation with AI-Assisted Suggested Mappings

Experience the future of data activation with Twilio Segment's Suggested Mappings, where AI generates complex mappings in seconds, giving you the perfect blend of automation and control.

By Pooya Jaferian, Doosan Baik, Rajul Vadera

Imagine a data activation process where complex mappings are automatically generated using AI with just a few clicks, yet you still retain full control to modify the suggestions.

That ease of creation is possible with the introduction of Suggested Mappings, Twilio Segment’s latest feature designed to simplify and enhance the tedious task of data mapping. By leveraging AI technology, Suggested Mappings automates the mapping of data warehouse columns to destination fields, providing you with smart suggestions while allowing you to tweak and perfect each mapping. Experience the perfect blend of automation and customization, accelerating your path to activating your data.

The Challenge

In the current data activation workflow for RETL, users must manually connect data warehouses to various destinations, select necessary actions, and map each column individually. 

There are several reasons why this process is both time-consuming and error-prone:

  • Data warehouses often contain complex and nested structures, making it challenging to understand and correctly map data to the appropriate fields.

  • With potentially hundreds or thousands of columns, the sheer volume of data adds to the labor-intensive nature of this task.

  • The diversity in data types and formats requires meticulous attention to detail, increasing the likelihood of errors.

The repetitive and tedious nature of manual mapping exacerbates these issues, leading to inconsistencies and human errors. 

By using AI to assist users in this process, Suggested Mappings addresses these issues, providing a more efficient, accurate, and consistent way to map data from warehouses to destinations.

Our Solution

Here’s a detailed look at the key components and how they work together to achieve this:

1. JSON Schema for the Destination Action:

In Segment Action Destinations, each “Action” defines an operation to be performed in a third-party destination. This includes specifying the type of operation (e.g., creating a user, updating an object) and all the required/optional data and their types required to execute this operation. This comprehensive definition can be modeled using a JSON schema, effectively representing the “API” for the destination. The JSON schema serves as a blueprint that outlines the structure and data types expected by the destination, ensuring that all necessary fields are accurately populated.

2. Data Warehouse Schema and Sample Data:

The data that needs to be mapped originates from the user’s data warehouse. This data includes valuable information for the mapping process:

  • Schema: This is the structure of the data, consisting of column names and their respective data types.

  • Sample Data: The actual data stored in each column, providing real-world examples of the data that will be mapped. Sample data is crucial when column names in the schema are not informative enough, as it offers context and clarity. It also helps in identifying and correctly mapping nested JSON structures, ensuring that complex data types are handled accurately.

Using these two components, we can abstract the mapping problem as an LLM function calling problem, which LLMs are well-suited to handle. The process involves the following steps:

  • Schema Analysis: The LLM analyzes the JSON schema of the destination action to understand the required data fields and their types.

  • Data Extraction: The LLM examines the schema and sample data from the data warehouse, identifying the relevant columns and their data types.

  • Function Call Generation: The LLM uses the extracted data to generate a “Function Call” that conforms to the JSON schema of the destination action. This function call includes all the necessary parameters, mapped correctly from the data warehouse to the destination fields.

Figure 1 highlights the high-level approach that we use to generate mappings using LLMs.

Figure 1. High level approach to generate mappings

Architecture

The architecture of Suggested Mappings is designed to automate the data mapping process by leveraging AI models, ensuring both efficiency and accuracy. Figure 2 shows the interaction between various components of the system.

Figure 2. Architecture Diagram for Suggested Mappings

The process begins when a user invokes the suggested mappings feature from the Segment Application. The Segment Application makes an API call to the Segment Control Plane, which acts as a central hub, coordinating the entire mapping process. 

The Control Plane then requests the schema and sample rows from the user’s data warehouse via the Data Warehouse Service. Once the data is obtained, it is sent to the Privacy Service, where personally identifiable information (PII) is redacted to ensure that no sensitive information is used during the AI inference process. The sanitized schema and sample rows are then sent back to the Segment Control Plane. Concurrently, the Control Plane fetches the action definition representing the destination API from a database of actions and creates a JSON schema representing this action.

This combined data, which now includes the sanitized warehouse schema, sample rows, and the action JSON schema, is sent to the AI Service. The AI Service selects an appropriate AI model from a repository, which could be hosted externally (such as OpenAI) or internally (on our AWS Bedrock instance). The AI model performs the inference on the received data to generate the necessary mappings.

Once the mappings are generated, the AI Service sends them back to the Segment Control Plane. The Control Plane processes these mappings by converting them into the “Mapping Kit” syntax, Segment’s proprietary data mapping language. Finally, these mapping suggestions are forwarded to the Segment Application. Here, users can review, modify, and finalize the mappings, maintaining control over the data mapping process.

AI Nutrition Label

Twilio’s AI Nutrition Facts provide an overview of the AI feature you’re using, so you can better understand how the AI is working with your data. Twilio outlines AI qualities in Suggested Mappings in the Nutrition Facts label below. For more information, including the AI Nutrition Facts label glossary, refer to the AI Nutrition Facts page.

Looking Ahead

The development of Suggested Mappings is an ongoing process, and we are committed to continuous improvement. By collecting detailed analytics on user interactions, specifically tracking the diffs between AI-generated suggestions and user adjustments, we gain valuable insights into user requirements, enabling us to refine our prompt design and fine-tune our AI models. While we currently utilize GPT-3.5, we are evaluating other options, including internal AI models and the latest from OpenAI, to ensure optimal performance. Additionally, we plan to expand the capabilities of Suggested Mappings to include support for streaming sources, extending its utility beyond RETL.

Get Started with Suggested Mappings

Suggested Mappings brings a practical improvement to the data activation workflow by leveraging Large Language Models. This feature reduces manual effort, minimizes errors, and significantly enhances efficiency. By automating the mapping process, Suggested Mappings directly impacts Customer Time to Value, helping Segment customers activate their data from warehouses much faster. Users can initiate this feature from the Segment Application, review and fine-tune the AI-generated mappings, and seamlessly integrate their data into various destinations.

Ready to experience the magic of automated data mappings? Navigate to your rETL destination mappings page, scroll down to the “Select mappings” section, and click on the “Suggest Mappings” button to get started.

Happy Mapping!

Test drive Segment CDP today

It’s free to connect your data sources and destinations to the Segment CDP. Use one API to collect analytics data across any platform.

Recommended articles

Loading

Want to keep updated on Segment launches, events, and updates?