StartupMarketplaceMobileMediaB2BRetailAny Industry

Using Hex, Python, Segment, and AWS S3 for easy data visualization and collaboration

Perform analysis and create visualizations of your first-party customer event data, using this recipe which combines Hex’s charting capabilities with data stored in an AWS S3 bucket (a popular Segment destination). You’ll also be able to publish and share your data work, allowing you to collaborate with your colleagues.

Charlie Gerrior

Made by Charlie Gerrior

Segment customers who collect first-party customer data often want to be able to analyze it and to build visualizations. In this recipe, we share an easy and interactive way for Segment users to share their work with their colleagues without sending screenshots back and forth. We do so by leveraging Hex, a little bit of Python code, and Segment’s AWS S3 destination for easy data visualization and collaboration.

With Segment, you can collect, transform, send, and archive your first-party customer data. Segment simplifies the process of collecting data and connecting new tools, allowing you to spend more time using your data, and less time trying to collect it. A very popular destination for your first-party customer data is AWS S3. AWS S3 is is an object storage service offering industry-leading scalability, data availability, security, and performance.

With your data stored in AWS S3, you can now use it within Hex. Hex is a modern data workspace which makes it easy to:

  • connect to data

  • analyze it in collaborative SQL and Python-powered notebooks

  • share your work as interactive data apps and stories

Step 1 - Add an AWS S3 Destination

Follow the documentation here to add an AWS S3 Destination to your Segment workspace and connect it to an existing source. Make note of your bucket’s name. Once set up, verify that events are flowing into your S3 bucket (you may have to wait ~1 hour for files to appear). Also take note of the path used by Segment in your S3 bucket.

Step 2 - Create/Retrieve API key(s) from AWS

In order for Hex to read data from the S3 bucket previously set up as a Destination in your Segment workspace, you’ll need to generate a new set of AWS API credentials. Once you open the AWS Console, click on the following in this order:

  1. Your username near the top right and select My Security Credentials

  2. Users in the sidebar

  3. Your username

  4. Security Credentials tab

  5. Create Access Key

  6. Show User Security Credentials

Step 3 - Create a new project in Hex

  1. As a first step, create Secrets for the two AWS credentials: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Additionally, create a Secret for the name of your S3 bucket

    1. To create the Secrets, navigate to the Variables tab and hit the + Add button. In this example, the Secrets are named

      aws_access_key_id, aws_secret_access_key, and s3_bucket_name

  2. Add a Python cell and copy in this code...

  3. In order to visualize our results, we’ll add a Table cell using df_froms3 as the dataframe

  4. Because we’re only interested in Track events, we’ll now add a Filter cell, with source set to df_froms3.

    1. Add a filter where type is equal to track

    2. Save the result as filter_result (the default)

  5. Now that our data has been cleansed, we will add a Chart cell using filter_result for our Data.

    1. Let’s analyze how frequently each Track event appears in our data. To do so, set:

      1. Mark to Bar

      2. X-axis to event

      3. Y-axis to sent_at or any other field

        1. set Type to Quantitative

        2. set Aggregate to Count

      4. The resulting chart should look something like this:


(Optional) Step 4 - Publish/share your work

  1. You can make you work available to others in your organization by publishing and sharing it.

  2. Here is the published app used in this recipe.

Wrapping up

In this recipe, we learned how to set up an AWS S3 destination in Segment, send our event data there, and then read that data using Hex (and some Python). In Hex, we ultimately created a chart to show how frequently each type of event appears in our data set. This work can then be shared and published, allowing our colleagues to easily collaborate with us.

Getting started is easy

Start connecting your data with Segment.