Keeping it together: Achieving Warehouse interoperability without duplication of business logic

By introducing a SQL dialect transpiling layer, we accelerated support of new warehouses for Linked Audiences, ensuring that our audience computation logic remains consolidated, keeping future maintenance and development costs low.

By Andrés Rubio Chávez

Classic Twilio Engage Audiences let you group users or accounts based on event behavior and traits tracked within Segment. Now, Linked Audiences makes it easy to combine real-time behavioral data, Customer AI predictive traits, and rich entity data from your own data warehouse, all within a simple audience builder. This ensures that marketers no longer need to wait for data teams to build rich targeted audiences lists. 

The audience computation

In an audience flow, the interactive input from the Audience Builder (or from the CreateAudience API) is converted into an abstract syntax tree (AST) that represents the audience. This AST is then used by an audience computation service that will find the profiles fitting the audience targeting criteria, using the power of Segment’s identity resolution, event tracking, and trait data associated with them. The resulting list of matching profiles then can be synchronized to hundreds of Destinations, and you can access them with the Profile API, download them as a CSV file, or visualize them through the Audience Explorer.

But what if this profile data already lives in your data warehouse?

Supporting all the warehouses

Engage now leverages Reverse ETL (rETL) and Profiles Sync to model profile and related entities data in any shape from the warehouse and use them as traits and events during audience computation. Here, audience computation consists of transforming the AST into SQL queries that are then executed in the target warehouse. 

However, this doesn’t come without its own challenges. No two warehouses are equal, differing on syntax and functionality. Even if all of them can be queried using SQL, each one has its own particular SQL dialect. It would follow that custom query generation logic per warehouse would be required to handle these differences, right?

Unfortunately, such an approach brings multiple drawbacks - the audience generation logic would be duplicated with minor differences, a copy per warehouse, and code maintenance and future development would quickly become a nightmare. A new feature or bug fix would become a shotgun surgery, where small changes would have to be introduced to each query generator, increasing time and effort with every warehouse onboarded, not even considering testing.

Our solution

Following the single responsibility principle, we can abstract the dialect and other differences from the query generation logic, keeping it warehouse agnostic by introducing a SQL transpiler layer. This transpiler will take an input query in a base dialect and translate it to the target warehouse’s dialect before sending it to rETL. That way we keep our business logic consolidated and decoupled from the warehouse executing it.

Now, writing a full fledged SQL dialect transpiler is not an easy task, and the effort just for developing such a component from scratch would be even greater than duplicating the query generation logic. That’s why we decided to leverage SQLGlot, a 3rd party open source SQL parser and bi-directional dialect transpiler (and more) Python library to do that for us. Since our services are mainly written in Go, Typescript, and other languages, we encapsulated this library in a containerized RPC microservice, along with any configuration that we may need to support new warehouses. This structure allows us to deploy the transpiler as a service or as an application sidecar where needed, keeps it decoupled from our business logic, and leaves the door open for extension.

Conclusion

With this new SQL dialect transpiler service in place, we streamlined the audience computation flow, made query generation warehouse agnostic, and paved the road to easily and quickly roll-out new features to all supported warehouses and add new warehouse vendors to our catalog in no time.

And not only that, this transpiler has found its way to help on database migrations, E2E and integration testing, and the development of more warehouse interoperability features. A great addition to a developer’s toolkit!

To learn more about how to enrich audiences with data from your warehouse or lakehouse and everything that is possible with Engage and Unify, check out our product pages.

Test drive Segment CDP today

It’s free to connect your data sources and destinations to the Segment CDP. Use one API to collect analytics data across any platform.

Recommended articles

Loading

Want to keep updated on Segment launches, events, and updates?