Segment enables you to collect clean, compliant customer data and sync it to data warehouses like AWS Redshift without needing to write code or maintain data pipelines. This treasure-trove of data can then be leveraged by anyone to unlock further insights.
One of the key methods in use today to unlock insights is to build predictive analytics based on machine learning models. However, these projects often require a large lift (write data pipelines, spin up expensive data science infrastructure, and data science DevOps teams). In this recipe, we will learn how to build machine learning models without the large lift or needing lots of machine learning experience and how to deliver predictive analytics to all BUs and groups in your company instantly. This recipe is a good fit for data analysts, but even experienced machine learning experts can leverage this recipe to unlock predictive analytics faster and cheaper.
In this particular example, we will be predicting a lifetime value (LTV) of a customer on an e-commerce store.
We start by capturing customer data in Segment. To get started, please follow these steps:
Login to Segment or sign up for a new workspace
Create a source for each web, mobile, or server application that holds eCommerce and (optionally) email lifecycle events.
Make sure to collect as much of the eCommerce Spec as possible across all channels. The more data we receive, the better our ML predictions will be.
Navigate to Profiles → Profiles Settings.
Click ‘Sources’ and ensure these sources are flowing.
Every ML model requires training data which is composed of features (e.g. user traits or events that we think affect the prediction) and the actual value (e.g. user’s actual current LTV). By leveraging this data, an ML model will be able to predict the output value based on the same features for new users.
Most commonly, training data is delivered in a raw format to the machine or data pipeline where it is then transformed and eventually used to compile an ML prediction model.
In our case, we will simply be setting up Segment Computed Traits and connecting them to Redshift which has several key advantages:
Computed Traits are unified based on multiple identifiers. For example, if one of our features is an email click count and the user has multiple emails, we will correctly add up clicks across the multiple emails and sync as one attribute. Or, if the user is adding products to the cart in both the website and mobile app, then we will add them up together and sync them as one attribute.
There is no need to build your own data pipeline. Just point and click to enable syncing.
Computed Traits come with predefined data types so need to do data preprocessing or quality checks.
We will be setting up three computed traits: (a) User’s current LTV, (b) how many times have they clicked emails, and (c) how many times have they added a product to the cart.
Log in to Segment and follow these steps
Choose "Engage" and then "Computed Traits"
Create a new Computed Trait