Growth & Marketing

Cross-database Joins!

Nov 17, 2014

By Harry Glaser


We welcome Harry Glaser, CEO of our SQL partner Periscope, to the Segment blog! Harry’s talking cross-database joins on the heels of our Amazon Redshiftlaunch. If you want to analyze behavioral data across platforms, and Excel won’t cut it, here are some tips to level-up your analytics game.

All The King’s Databases

It began with the best of intentions: You launched your first web app for your customers, backed by a database full of transactional data to analyze. In time you added a read replica, and replaced Excel with an more-advanced visualization tool to go with it.

Now you’re launching your first mobile app. You want SQL access to the underlying data store, but building a server to receive pings is much too difficult. So you make great use of a fabulous event-tracking to SQL solution.

But now your data is in two places. What if you want to know whether your iOS users are big spenders? You’d need to slice monthly iOS users in your mobile app database by payment plan information in your web app database. Luckily, there is a solution: cross-database joins.

Cross-Database Joins

You need to connect your transactional, web, and mobile in one table. To start, Let’s count our iPhone MAUs (monthy active users), with Segment’s SQL schema as an example.

We’re counting a user as active in a given month if they’ve started a session in that month. This query gives us a graph like this:

asset_rEM8Btz9.png

Now, we just need to bring our payment plans into the chart. This is where the magic happens. We’ll join in the users table on our web database, and slice the query by users.payment_plan:

Note that we now need to fully qualify the tables in the FROM and JOIN clauses with their database names: segment and web_prod.

And our hard work pays off! Here we can see our iPhone MAUs sliced by payment plan:

asset_5YQOBsoL.png

How It Works

Cross-database joins, and in the case of Periscope – our query speeds, are enabled by our Postgres-based data cache. Each customer’s data is stored in the same database, with one schema per (database, schema) pair. This architecture allows us to run exactly the query you wrote, with some simple rewrites to make it valid.

Here’s the rewritten query:

In this example, your Segment database’s iphone_production schema is translated to the db_1234_iphone_production schema in Periscope’s data cache. And web_prod‘s (unspecified) public schema is translated to the db_1235_public schema. The rest of the query remains the same!

Start Exploring

We hope you enjoyed this lesson in cross-data base joins. If you have any thoughts or questions, reach out to us on Twitter @PeriscopeData or hit up friends@segment.com. We’d love to discuss.

We’re also offering a 30 day trial for Segment Redshift customers. To get started, swing on over to the Segment Redshift access page and then sign up for Periscope.

The State of Personalization 2023

Our annual look at how attitudes, preferences, and experiences with personalization have evolved over the past year.

Get the report
SOP 2023

The State of Personalization 2023

Our annual look at how attitudes, preferences, and experiences with personalization have evolved over the past year.

Get the report
SOP 2023

Share article

Want to keep updated on Segment launches, events, and updates?

We’ll share a copy of this guide and send you content and updates about Twilio Segment’s products as we continue to build the world’s leading CDP. We use your information according to our privacy policy. You can update your preferences at any time.