If you don’t trust your data, it’s useless. This is just one of many helpful nuggets our customer Fareed Mosavat, Senior Growth PM at Instacart, shared with us this week.
If you’re not familiar, Instacart is an awesome service for delivering groceries and is leading the on demand economy with crazy growth.
We took some time with Fareed to discuss challenges that face most data-driven product and growth teams. To Fareed, understanding user behavior is essential for devising, running, and interpreting experiments. And, making data easily accessible in a raw format is the only way the company can measure and achieve their specific goals.
In the Q&A Fareed Covers:
How the growth team operates at Instacart
When out-of-the-box analytics tools work, and when they fall short
Why row-level event data is vital for data confidence and advanced analysis
How the team discovered that building a data pipeline to Redshift is harder than it looks
Why combining data sources into a single database is the “Holy Grail”
Dive in to the interview below, or get the PDF.
Growth Team at Instacart
Diana: Fareed! Thanks for chatting with us today. Why don’t we start with you telling us a little bit about your role and responsibilities at Instacart.
Fareed: Sure, Diana! I’m the growth product manager. Our team is responsible for consumer growth, so growing our user base, retaining them and activating them through the whole funnel.
Diana: Not a small task! Are you a part of the product team?
Fareed: Yes, It’s a multidisciplinary team, but it’s a product team. We’ve got a designer, a couple engineers and myself. And then we work closely with a bunch of other teams, like analytics and marketing.
Segment loves Instacart
Diana: What’s your main focus right now on the growth team?
Fareed: Number one is just making sure that we have everything that our users are doing recorded, measured, and in a good place. Having our data in order is the only way we can make good product decisions, and Segment helps a lot with that.
Then the second thing we’re really focused on is first-time user activation. So, figuring out what are people doing in their first session, why are they dropping off, when are they dropping off, how can we help them get to their first order.
We know there’s a lot of real-world stuff that happens after their first experience, like quality of service and fulfillment, that are sort of outside of our team’s control. So we’re focusing on getting people that first wonderful experience as quickly as possible, working across the mobile apps and the website.
Switching to SQL
Diana: So I know you’re using Segment Warehouses that loads your user data into Amazon Redshift. Why did you want to do this type of analysis in SQL compared to something like Google Analytics or Mixpanel?
Fareed: Yeah, so we’ve been using Amplitude through Segment for a bunch of stuff, and it’s super helpful. Amplitude helped us look at aggregate numbers, counts, funnels, analyze user segments, and understand how many people are taking certain actions in our product.
But I think there are a couple other reasons why the row-level individual event data, whether it be SQL or somewhere else, is super important.
One is that SQL makes tracking easier to debug. I can watch events fly by in the Segment debugger, but if you have like a lot of data, it’s hard to catch everything. We have very specific taxonomy, and rules, and event names, and everything that need to be correct. With SQL, we can easily diagnose issues like when we forget to pass important traits like userID.
SELECT * FROM checkout_placed_order WHERE userID is NULL and platform = 'ios' GROUP BY 1
The second is we have a couple of self-defined metrics that are important and specific to the company. And those things tend to be buried in a database somewhere, usually in SQL and sometimes outside of this event data. Being able to merge those metrics and that analysis with our event analytics is really important.
The Holy Grail: One Database. All of the Data.
Diana: What are some of those metrics?
Fareed: Things like quality of service, refunds, consumer support, how much people spend.
We have a lot of steps in our funnel for each order, some of which are online and some of which are not. They are all recorded somewhere, but they happen in different places in the process. The Holy Grail we’re working towards is getting all of the data into one place.
Combining this fulfillment and shipping data with our Segment user behavior data is key for us to connect the dots across the entire customer experience.
With Segment Warehouses, we can put all of the data about how customers are using our apps and websites right into our own Redshift alongside this other data to query as we want.
Diana: What are some of the questions that you’re querying across your Segment data and the internal data to answer?
Fareed: The big one is AB testing and understanding the behavior of users in one test group versus another. We’re currently working towards removing signup from the onboarding flow and making it part of checkout. There isn’t a clear event there, so we need to be able to watch an anonymous user from beginning to end and see their conversion.
We’re measuring what percentage of users that have zero orders, whether they check out on the same day or within seven days, and we want to be able to use any window we want. Those kinds of things are a little bit easier to just define in a set of rules in SQL than it is to try and manipulate a UI to give us exactly what we want.
The second is defining metrics specific to Instacart. Let’s take visitors for example. We have different definitions for visitors: landing page visitors, storefront visitors, and visitors per region. You might be able to figure this out in out-of-the-box tools, but to really trust it you have to know exactly how it was defined. While these out-of-the-box options are great for quick analysis, they tend to be a little bit opaque for understanding session measurement, new vs. returning users, and stuff like that.
Third is making sense of multi-touch attribution. We find that users will visit a couple times before they actually place an order. Previously, we were only attributing that to the last click. But now, because we have all the data, we’re actually able to do a longer attribution cycle and understand how many touches a user had before they actually convert. It’s already been really helpful, but it will become even more important over time as we understand what our marketing looks ROI like and can maximize results from our spends.
On Building a Data Pipeline
Diana: I heard you were building your own Redshift pipeline for a bit before you chose to go with Warehouses. What made you make the switch?
Fareed: We’ve tried a lot of different things here. I think the biggest reason we use Segment is because it gives us the most portability, so we can use whatever services we want and still be able to like keep our instrumentation clean and in a single place. So, we have used a plethora of things.
Our team has played around with S3. The shopper team is using Outbound, we’re using Amplitude, we’ve tried Mixpanel, Google Analytics, tag manager, etc. We try all this stuff, and it’s just flipping switches.
So with Redshift, it was another thing like that where we said to ourselves, “We can bake it off against our own internal system or just turn it on with Segment.” Our main goal was finding something sustainable long term without sacrificing the costs of getting up and running quickly.
Going from schema-less data to like a schema style SQL database turns out to be harder than it looks. With a team of engineers and number of months, we could build something, but it would take a long time and a lot of work to be as fully featured or scalable as Segment Warehouses.
Building data pipelines is not our primary job. But we do need the data in SQL. Luckily, other people have already solved this problem for us.
Plus, it was really important for this data to be in our own Redshift, which Segment could do for us. At the end of the day this is our data, right? Our users took these actions, they did their thing, and it’s important that no matter what we choose from a vendor standpoint that we own this data and that it exists with us.
Thanks so much to Fareed for chatting with us about the growth team at Instacart, getting your data squeaky clean and all into one place to analyze.
If you’re curious about how Segment Warehouses can help you load your web and mobile data into Redshift or Postgres without writing a line of ingestion code, you can learn more here!