Go back to Blog

Engineering

Solon Aguiar, Brian Lai on January 20th 2022

With the launch of Twilio Engage, the team needed to update the data delivery system to support the latest platform features. This blog covers the process, including questions and lessons learned, to get to the end result.

All Engineering articles

Maggie Yu on May 14th 2021

At Segment, we rely heavily on custom-built, internal tools for running ad-hoc tasks and debugging problems in our infrastructure. These are run not just by the engineers doing product development but also by our support teams, who use them to efficiently investigate and resolve customer-reported issues.

This internal tooling has been especially critical for maintaining the Segment Data Lakes product, which, as described in our previous blog post, loads data into a customer’s S3 bucket and Glue Catalog via EMR jobs. 

Because the latter infrastructure is all in a customer’s AWS account and not a Segment-owned one, configuring, operating, and debugging issues in this product is a bit trickier than it otherwise would be for something running completely in Segment infrastructure.

One of the most common problems customers have when trying to set up Segment Data Lakes is missing permissions in their IAM policies. So, one of our tools tests if Segment’s role is allowed to access the target S3 bucket, Glue Catalog, and EMR cluster in the customer account. Another common tooling task is to submit EMR jobs to replay archived data into Segment Data Lakes, so customers can bootstrap the latter from historical data.

Since the people running these internal tools can access a customer’s resources, security is extremely important. Hence, all the security best practices that apply for sensitive production systems need to apply for our Data Lakes tooling as well. Among other examples, the access to the tools should be limited to a small group of people, and all the operations performed using the tools need to be tracked and auditable.

We evaluated multiple approaches to creating and securing these tools, but ultimately settled on using AWS’s API Gateway product. In this blog post, we’ll explain how we integrated the latter with the Data Lakes backend, and how we used IAM authorization and API Gateway resource policies to tighten up access control. We will also provide some examples of setting up the associated infrastructure in Terraform.

Design Evolution

Data Lakes tooling has two components – a CLI client and a RPC service. Each CLI command issues a request to the RPC service, which can then access the associated customer resources. 

When Segment Data Lakes was launched, only the Data Lakes engineers could access and use the CLI tool. This was fine for the initial launch, but in order to scale the product, we realized that we needed to allow wider access inside Segment. We thus set out to design some improvements that would allow us to open up this access in a safe, controlled manner.

Our initial design depended on SAML to authenticate users, which is how several other, internal tools at Segment address this problem. Segment uses the Okta-AWS SAML integration to manage and assign access to AWS IAM roles, and then we use aws-okta to authenticate with AWS and run tooling commands with the resulting credentials.

The SAML approach would allow us to verify that the operator of our tools is an authenticated and authorized Segment employee. However, this would add a lot of complexity because we’d need to create and maintain the code for both the CLI client and the associated SAML auth flow. 

Just as we were about to implement the SAML solution, AWS released several new features for their AWS API Gateway product, including improved support for IAM-role-based API authentication. Since Segment was already using IAM roles to delegate access to Segment employees, it seemed fitting to also control access to the Data Lakes APIs using IAM-based policies.

With this solution, we could offload responsibility for authentication and authorization to an external service and spend more time focusing on just the Data Lakes aspects of our tooling. 

After working on a POC to validate the setup would work, we officially decided to lean into the AWS API Gateway with IAM authorization instead of supporting a SAML flow in our code.

Before getting into the details of how we integrated AWS API Gateway into our design, let’s review how this product works and the various features it offers.

What is AWS API Gateway?

AWS API Gateway sits between API clients and the services that back these APIs. It supports integrations with various backend types including AWS Lambda and containerized services running in ECS or EKS. The product also provides a number of features to help maintain and deploy APIs, for instance managing and routing traffic, monitoring API calls, and controlling access from a security standpoint.

API Gateway supports two types of APIs – HTTP APIs and REST APIs. Many features are available for both, but some are exclusive to just one or the other. Before deciding on which type of API Gateway best fits your use case, it’s important to understand the differences between the two. Check out this guide in the AWS documentation library for a good summary.

Important API Gateway Concepts

Here are some API Gateway concepts that will be mentioned in this blog post and that you may come across when you set up your own AWS API Gateway.

Resource policies

IAM policies are attached to IAM users, groups or roles whereas API Gateway resource policies are attached to resources. You can use IAM policies and/or resource policies to define who can access the resource and what actions can be performed.

Proxy integration

A proxy integration allows requests to be proxied to a backing service without intermediate body transformations. If a proxy integration is used, only the headers, path parameters, and query string parameters on each request can be modified. If a proxy integration is not used, mapping templates can also be defined to transform request bodies.

Proxy resource

A proxy resource allows bundling access to multiple resources via a greedy path parameter, {proxy+}. With a proxy resource, you don’t need to define a separate resource for each path. This is especially useful if all the resources are backed by the same backing service. For more details, refer to this AWS doc.

Stage

A Stage in API Gateway is a named reference to a deployment. You can create and manage multiple stages for each API, e.g. alpha, beta and production. Each stage can be integrated with different backend services.

Why we chose a REST API over an HTTP one

In our case, we really wanted to use resource policies to define access control, and this feature is only available in REST APIs. HTTP APIs are cheaper than REST APIs, but since we expected low request volumes from our tooling, total monthly cost wouldn’t be a concern. Therefore, we decided to go with a REST API.

If you need to integrate a REST API with an ALB, or if you want to use API Gateway to handle API access control, you may experience some challenges. In the following sections, we share how we addressed these issues.

Setting up a REST API Gateway

ALB Integration

Our backend tooling service resides in a private VPC and sits behind an ALB. HTTP APIs support integrating with an ALB directly, but REST APIs do not (see comparisons between HTTP APIs and REST APIs). To integrate a REST API with an ALB, you need to place an NLB between the two. 

The setup here involves two steps:

  1. NLB → ALB Integration

  2. REST API Gateway → NLB Integration

NLB → ALB Integration

The first step is to create an NLB which can forward traffic to the ALB:

This AWS blog explains how to set up this integration using a Lambda function, and it includes the Lambda function code and a CloudFormation template that you can use for the setup.

REST API Gateway → NLB Integration

The next step is to link the REST API to the NLB as shown below:

This can be achieved by first creating a VPC link to the NLB, and then using this VPC link in the Gateway API method integration. This guide from AWS provides the instructions for integrating an API with a VPC link to a NLB. 

We set up our API using both a proxy integration and a proxy resource. If a new API is introduced or our API schema evolves, we don’t need to update the API Gateway.

Terraform example

We use Terraform to organize and manage our AWS infrastructure at Segment. The following is an example of integrating a REST API with an NLB, and creating a proxy integration with a proxy resource:

Authorization & access control

To ensure  only authorized users can access our tooling service, we use a combination of IAM Authorization and API Gateway resource policies. In the following sections, we go into more details about what these are and how we configured them for our tool.

IAM Authorization

API Gateway supports enabling IAM authorization for the APIs. Once this is enabled, the API requests sent need to be signed with AWS security credentials (AWS doc). If a request is not signed (i.e. anonymous identity), then the request will fail. Otherwise, API Gateway will allow or deny the API requests based on the IAM policies and/or resource policies defined. This AWS guide explains how API Gateway decides whether to allow or deny based on the combination of an IAM policy and resource policy.

Resource Policy

An IAM policy and/or resource policy can be used to specify who can access a resource and what actions can be performed, but we use only the latter for our use case. 

At Segment, we have multiple roles in the same AWS account, e.g. a read role with a ReadOnlyAccess policy, an admin role with an AdministratorAccess policy, etc. Some policies grant execute-api permission, which is required for invoking APIs in API Gateway, and some do not. For instance, the AWS managed policy AdministratorAccess allows execute-api for all resources. 

If you want to restrict execute-api permission to a specific role api-invoke-role, you need to attach an IAM policy that denies execute-api to all of the other roles that have this permission, or explicitly deny those roles in the resource policy. 

For the former approach, you would create a policy to deny invoking execute-api against your API Gateway and then attach this policy to the roles that should not be allowed to invoke the APIs. Here’s an example policy:

For the latter approach, you would list all the roles that should not be allowed to invoke the APIs in the resource policy. The following is an example:

If there are many roles in the account and new roles are created over time, it can be very tedious to create these policies and keep them all up-to-date. This is even worse if there are multiple APIs in the same account and each has a separate set of allowed IAM roles.

As an alternative to this, you can create a resource policy that explicitly allows access to a subset of roles while blocking access to everything else by default.

This is the approach that we decided to go with:

Terraform example

The following is an example of implementing the above approach in Terraform:

Access logging

Another important security requirement is to keep an audit trail of the requests made to the API Gateway. Each record should contain the caller identity and the details of the request that was made. 

API Gateway has a built-in integration with CloudWatch that makes it easy to generate these access logs in the latter. You can select a log format (CLF, JSON, XML or CSV) and specify what information should be logged.

In addition to using CloudWatch, we also created a Segment source for the audit logs. Our backend service issues a track call for every request, and all the track events are synced into our warehouse. With this, we didn’t need to implement a connector to export the data from CloudWatch and load it into our warehouse.

Terraform example

The following shows how to configure CloudWatch audit logs for an API gateway. Note that you need to create an IAM role that has the appropriate read and write permissions on the CloudWatch side:

Signing requests

With IAM authorization enabled, all API requests need to be signed with an AWS access key so AWS can identify the caller. There are multiple versions of the signing process, with v4 being the latest (and recommended) one. The various AWS SDKs provide helper functions that handle the low-level details of this process.

Here’s an example using the AWS SDK for go:

In our use case, the CLI client automatically wraps the input, signs the request using the user’s AWS credentials, and sends the request to our API Gateway, making it really easy for users to use. 

For example, we expose the following command to check the status of an EMR cluster. The CLI client maps the command and flags to the request path and body in our API:

Deployment

One thing to keep in mind when using an API Gateway REST API is that it does not support automatic deployments (HTTP APIs do). So, if you make changes to API resources or stages, you need to manually trigger a deployment.

Terraform example

This example demonstrates how to trigger a deployment with Terraform 0.11. The trigger hash could be a hash of the Terraform file(s) containing resources associated with the API Gateway. If there are any changes in these files, the hash will change and thus a deployment will be triggered.

Note that Terraform 0.12 supports an explicit triggers argument that allows you to provide the trigger hash inside of the latter as opposed to doing this via the deployment variables.

Conclusion

We have been using this API-Gateway-based setup for several months, and it has been working really well for the Data Lakes tooling use case. Before this setup became available, the tooling operations were executed solely by our engineering team to resolve customers’ issues. After the solution was deployed, we opened up the tooling service to our support team, reducing the dependency on the engineering team and the operational costs.

This was our first time using the API Gateway product, so we had to spend some time figuring out various solutions and workarounds (e.g. integrating with an ALB, triggering deployments with Terraform, etc.), but overall we had a pleasant experience once we worked through these issues.

There are many other features of API Gateway that we didn’t use for our tooling and thus didn’t discuss in this blog post, but may still be helpful for your use case. For example, API Gateway allows you to set throttling limits and set up a cache. These features will help to prevent your system from being overwhelmed by the high volume of requests. You can also use API Gateway to manage and run multiple versions of the API simultaneously by configuring the API with different backend integrations. 

Dustin Bailey on May 5th 2021

Here's how you can reasonably determine a third party’s risk using only a handful of questions and select documents…no lengthy questionnaires or time-consuming audits required.

Tim French on April 29th 2021

Say goodbye to long customer support wait times and lengthy back-and-forth with Segment Personas and Twilio Flex.

Sam Gehret on March 29th 2021

Use Segment and Twilio Flex to intelligently route inbound support requests to a chatbot or a real support agent based on historical user behavior and user attributes.

Jeevan Saini on March 29th 2021

It was a cold and dark mid-December day in Vancouver; another Staff Security Engineer and I were a part of a developer-led threat modeling session. We couldn’t find a single additional security concern for the developer-led team. The developer leading the project had considered all the options!

When the session was over, the Staff Security Engineer asked: “What have we done? What purpose do we even have any more?” We joked about losing our jobs, but we knew that this is the best kind of problem to have!

Record scratch, freeze frame: You may be wondering, “How did we get here?” 

Why is threat modeling important?

If you’ve never heard of Threat Modeling before, you’re not alone. Threat Modeling can mean very different things to different people. At Segment, threat modeling is the process of systematically identifying assets within a system or feature, brainstorming ways to compromise those assets, and collaboratively designing ways to defend them.

Threat modeling is a critical step in Segment’s Security Development Lifecycle. Before we make a feature from Segment available to the public, the Security Engineering (SecEng) team reviews it. Our job is to reduce the risk of a bad actor exploiting a vulnerability in that feature, which ultimately ensures the safety of our clients’ data.

What are the benefits of threat modeling?

We love to do threat modeling because we get to fix vulnerabilities before they are coded into the system. It is much cheaper to fix vulnerabilities like this during the design phase. Fixing vulnerabilities in production might mean that a developer has to learn about the vulnerable system from scratch, on-the-fly, and spend valuable time context switching. We also save money because we reduce the number of vulnerabilities discovered in our bug bounty program. And since developers spend less time fixing vulnerabilities, it means they get to focus on building features, not fighting fires. 

From a security perspective, it also means fewer security incidents where we have to activate our SIRT and possibly tell our customers we’ve messed up. Everyone wins, as we’ve essentially created a Utopia for developers and Security engineers!

What are the challenges of threat modeling?

Threat modeling sounds great, but it isn’t without its challenges. As much fun as threat modeling is, there are usually only a handful of folks at an organization that are good at threat modeling. Security teams can often become bottlenecks on threat modeling and reviews due to the ratio between Security Engineers and Engineers in an organization. Some people think threat modeling is too expensive when counted in engineering-hours, given the number of people involved. But given the engineering-hours you save in responding to incidents, this isn’t true.

With Segment growing at a very rapid pace and our feature set growing even quicker, it was time to try something different. Self-serve threat modeling checks all of the right boxes: it removes bottlenecks, and helps deal with an unbalanced ratio of security engineers to developers. If we could meet those requirements, we’d have built ourselves a Segment Threat Modeling Utopia.

Why a self-serve threat model?

In our security Utopia, the Engineering teams perform the threat modeling exercises, and the Security team are optional attendees. Security can then pick and choose which sessions they attend based on interests, priorities, and size of attack surfaces.

The security “goal state” is important to model, because it solves many problems. In our current world, the Security team’s availability becomes a bottleneck, but in our goal state, no engineering team is blocked by a lack of Security personnel, because the engineering teams can run their own threat modeling sessions and the Security team are optional attendees. In our current world, threat modeling can take up additional engineering hours to review, where in the goal state, threat modeling activities are built into the design review, which is already a standard meeting in the development cycle. And while threat modeling is hard to scale when only a handful of people are responsible for it, it scales beautifully when everyone can do it. 

Developers know their features better than anyone: they understand what works well, where the gaps are, and where they might run into trouble. In contrast, Security Engineers can only spend a few hours on a specific feature when they assess it, but are expected to provide in-depth security concerns about that feature. It makes a lot more sense for the feature experts to discuss potential security concerns, rather than the folks that are looking at the problem from the outside.

Size matters. The Engineering teams are considerably larger than the Security Engineering team. As a Security Engineer, I know when more folks look at a problem, we discover more potential issues. We believe that with training and practice, coupled with their intimate knowledge of their features’ architectures and codebases, our Engineering teams will be better at threat modeling than our Security team. With this outcome on the horizon, it just makes sense to invest in self-service threat modeling training.

How we built Utopia

Self-serve threat modeling is an Engineering-wide initiative and it has taken considerable developer time. We have a comprehensive curriculum that goes through several stages to make sure this project is a success.

There are five distinct phases to have Engineering self-serve threat model:

  1. Traditional Phase

  2. Training Phase

  3. Observation Phase

  4. Review Phase

  5. Security Optional Phase

1. Traditional Phase

This is less a stage, and more “where we were before we started the self-serve threat modeling project.”

All features need a security review during the design phase. Security is responsible for the review and will build the threat model for the feature when appropriate.

2. Training Phase

In order to shift to a self-service model, the first and most important step is to train developers how to perform a threat-modeling exercise. During this stage, you teach developers the threat modeling workflow, strategies for how to find threats, how to prioritize them, and how to decide which threats should be addressed.

(We’ll cover the actual training in more detail below.)

3. Observation Phase

This is where Segment is currently in our journey.

The vast majority of developers have been fully trained, and now teams lead their own threat modeling sessions. The Security team is still (virtually) in the room, and are there to help guide engineers to a successful session.

The people who are leading the session create all of the necessary pre-read material before the threat modeling session begins. We use an established threat-modeling section as part of our standard design document template. The design document provides the appropriate context, but developers are asked to create architectural diagrams, explain what is in-scope and out-of-scope for the meeting, and catalog the assets involved in the project. The assets could be infrastructure, the application system itself, data in the system; anything that’s worth protecting gets listed out as an “asset.” 

Security’s role during this phase is to attend the meeting and help guide the conversation whenever required. Security does NOT tell the developers what the vulnerabilities are, but rather guides the team to discover the vulnerabilities themselves. Although a threat model is occurring, this is still a great time to teach developers how to discover vulnerabilities themselves. 

There are still situations where developers do not discover important vulnerabilities, even with hints, and that is fine. When they observe this, Security gets involved and helps guide the thought process so the development team learns to discover those vulnerabilities. Again, this is a teaching moment, and everyone involved goes away feeling better about the threat model.

We’ve had many developer-led self-service threat modeling sessions so far, and we know that things are going well when a Staff Security Engineer says “Well, I feel completely useless in this meeting.” 

Threat modeling is not an easy skill to learn, and it takes practice to become good at it. It is our responsibility as Security Engineers to create a culture of learning. As our teams start to discover critical- and high-severity vulnerabilities more consistently, they graduate on to the next phase.

4. Review Phase

In this phase, the Engineering teams are so good at threat modeling that the Security team no longer needs to be in the meeting. Our job is to review the artifacts from the threat modeling session, and validate that we don’t anticipate additional critical- or high-severity vulnerabilities.

You might keep finding medium- and low-priority vulnerabilities. That’s great! You’re encouraged to share them with the team, and explain how you discovered those vulnerabilities.

To go on to the next phase, the Engineering teams must discover critical-, high- and medium-severity vulnerabilities consistently. As an aside, after a few cycles of this, the Security team should start seeing a noticeable decline in bug-bounty and penetration test findings for those teams’ features. 

5. Security-Optional Phase

This is the ideal world that we Security Engineers dream about. Once teams are at this stage, it is optional for Security to even review threat models. 

We believe it may take us a year or two to get to this phase, but we know that in the end, it will have been worth the effort.

Training deep dive

The team at Segment reviewed many options for training. We ultimately decided that the best way to proceed would be to build the training from scratch. The training itself had to be Segment-relevant and specific to our cultural norms.

The training was delivered to the entire Engineering organization, and anyone else who wanted to sign up: this included product managers, tech writers, and members of the general IT staff. This meant that the training had to be engaging for all levels of skill and experience. Classes were made up of a mix of folks that were junior, mid-career, and very senior. The training had to also be relevant to people who have never done threat modeling before, as well as to others who are security champions.

We broke the training down into three parts, which we delivered in three-week intervals:

  • Part 1 - Introduction to Threat Modeling 

  • Part 2 - Foundations of Threat Modeling

  • Part 3 - Live Threat Modeling

Introduction to Threat Modeling

The goal of the first training session is to introduce the concept of threat modeling, teach the workflow, and show developers the tools Security uses to find vulnerabilities. 

We introduced the concept of threat modeling to the developers by making parallels with how you might think about personal safety. This is much more relatable than diving right into software vulnerability concepts. 

Because we started this process in April of 2020, in the very first exercise we talked about what our grocery shopping habits were like before, and then after the pandemic. This exercise got us talking about risks, and how they evaluate and decide their tolerance for risks. We work through several non-software risk exercises to help make threat modeling relatable.

Later in the session we introduce STRIDE to the class, and do a deep-dive and talk about each vulnerability class. We then talk about real-life vulnerabilities and how they impact applications. We do some light exercises to help the class get used to asking the right security questions.

Foundations of Threat Modeling

The goal of the second training is to get the developer’s hands dirty with exercises to solidify their knowledge.

We introduce the concepts of “assets,” and the value of a diversity of viewpoints in the threat modeling sessions. It’s important that as we go through a threat modeling exercise, everyone in the class understands the assets in the system we’re assessing. It’s also important that all members of the Engineering teams get involved in the threat modeling exercises. Each member of the team has a slightly different background and experience, and so they each see the system in a different way. When finding potential vulnerabilities, everyone’s voice is important.

As a part of this training, we have a lengthy discussion about a fake Segment feature: a credit-card billing system. The class is broken up into small teams, and each team reads a design document and talks through it to discover potential vulnerabilities in the system. At the end of the evaluation, everyone comes back together, and we discuss and compare the top three threats from each group. The class learns how to read a design document with a focus on security, how to identify security vulnerabilities, and how to prioritize risks.

Fake Segment application we learned upon

Live Threat Modeling

Performing an end-to-end threat modeling session with the team cements the concepts that we discussed.

In the final training, we do an actual threat modeling exercise. We discuss the feature we are threat modeling, and review the architectural diagrams. We then list the assets in the system, and discuss what is in, and out of scope for the exercise. Next, we spend some time finding potential vulnerabilities, and discuss how to prioritize them. Finally, we discuss when the vulnerabilities we discovered should be remediated, based on severity and priority. 

Key takeaways

We’ve learned a LOT while rolling out our self-serve threat modeling program. Here’s some advice if you’re considering doing something similar.

Fail fast

One of our initial learnings was to fail fast and iterate on the training contents. There were two teams willing to sit through the very first training iterations.

Immediately after the initial sessions, I held focus groups to get attendees’ feedback and better understand how to improve the training. It had an immediate impact - we adjusted the length of our sessions, and changed what concepts we introduced and when. With feedback from the first team, we modified our approach, and ran the training for the second team. With the feedback and suggestions from those two initial teams, we solidified the training content and approach. We were then ready to roll it out for the rest of the organization.

Trust your counterparts

I was initially hesitant, but I have been totally impressed by our developers. They learn quickly and are interested and engaged in the training. When you create a safe space for them to learn and make mistakes, you’ll get to your self-service goals much faster. Trust that they can figure out how to uncover vulnerabilities, and coach the team on how to find them.

Security culture improvements

As I rolled out the threat modeling training, I noticed that people in the classes really enjoyed the training. This led to more people reaching out to me about the concerns that they have about vulnerabilities within their system, allowing me to give them proactive advice and support. In the past, developers only contacted me sometimes, but now they reach out far more often, and in greater numbers.

The simplicity in the training made the scary topic of “security” accessible to everyone, which makes them less afraid to ask us security-related questions. Many folks have even asked about more security learning resources, and how to join application security related meetups.

With this project, we identified and created security champions at Segment. There are folks that are interested in learning about, security, and who often impress us with their security knowledge. We have only seen benefits from this program.

How do we get to Utopia?

Every company is unique! I recommend that you start somewhere, and adjust the training to fit your culture.

We at Segment don’t want you to start from scratch! Today, we’re open-sourcing Segment’s Threat Modeling training content for you to adapt and modify. You can find the links below. 

We hope that you can use these training modules, give feedback on how we can make it better, and share what you find with the community.

The slides are great, but how can I learn more?

We have a strong sense of community here at Segment and in addition to sharing our training slides and this blog post, we have an email address so you can submit your questions and thoughts about the program. I am always interested in hearing feedback, good or bad!

Email address: sstm@segment.com 

I will also be talking about this entire program at an upcoming OWASP Vancouver meetup. Feel free to add yourself to the OWASP Vancouver meetup group.

Future improvements, even in Utopia 

So, we’re on our way to Utopia, and things couldn’t look brighter. Time to rest on our laurels, right? Wrong. We’re constantly thinking about how we could improve our threat modeling training, and build sustainable systems. We’ve trained all of our current developers, but what about new Segmenters in the future? We’re thinking about how best to incorporate this training into onboarding for new hires.

We’re also thinking about how to keep this knowledge fresh. Maybe you took the training months ago, but haven’t had to use it since, and now you have this huge project - yipee! Threat modeling refresher courses to the rescue! Finally, for the people who really enjoyed the training and want to learn more, or who really want to deep dive on advanced strategies for large and complex systems, we’re working on a Threat Modeling “201” class.

We are hiring!

The Security team is growing and we are looking for candidates that have a fresh approach to security problems. If you found this blog interesting, you may want to check out our careers page

Meredydd Luff on March 25th 2021

This is a guest post from Meredydd Luff, Founder of Anvil, the Python-based drag-and-drop app builder. It was first published on the Anvil blog.

Segment makes it easy to collect customer interaction data from your apps and websites, and send it to all your analytics tools, data warehouses, CRMs, and literally anything else. 

Anvil makes it easy to build web apps, by letting you do it all in Python (no HTML or JS required!). Today, I’m going to show you how to use them together: we’re going to wire up an Anvil app to collect user interactions with Segment.

We’re going to take a simple feedback-form app, and use Segment to track the user’s journey as they fill it out. Then we’re going to show you how to use Anvil’s built-in user authentication to identify users to Segment. 

Ready? Let’s dive in.

1. Create an Anvil app

We’re going to start with a simple feedback form, built from this tutorial. The tutorial will take you 10 minutes, but if you’re feeling impatient you can get a complete copy of the app by clicking this link.

But seriously, consider checking out the 10-minute tutorial – it’s pretty simple to get started:

Building web apps with Anvil is really easy. Try the tutorial!

Deploy the app

We need to publish our app to give it a URL:

Choose a URL for your app and Anvil deploys it

OK, we have an app that can receive feedback! Now we want to track when a user starts typing, and when they hit the Submit button. For that, we need a Segment account:


2. Set up our Segment account

We need Segment to recognize our app as a data source, using the analytics.js SDK. This is pretty easy - when you sign up for a free account at segment.com, you’ll get prompted to “create a source for your website”. Follow the instructions to get a code snippet:

Adding a source to our Segment workspace

We now have a code snippet we can embed in our page. That Javascript is pretty ugly, but don’t worry - we’ll be using it entirely from Python!


3. Generate Segment events from Python code

First, paste that snippet into your app’s Native Libraries. This will include the snippet in your app’s HTML:

Don’t worry, that’s the last Javascript we’ll need today.

Now we can import Segment’s analytics SDK and use it from Python! Open your form’s code and add at the top:

That imports the analytics Javascript object into our Python module. We can now call all the functions from the Segment docs, but we can do it from Python! (For more information about calling Javascript functions from Python, you can check the Anvil docs.)

Let’s record an event when the user clicks the Submit button. Open your form’s code and add this line at the top of the submit_btn_click function:

That’s it! Now, every time you click that button, your app sends a "Feedback submitted" event to Segment.

You can verify that it’s working in your Segment dashboard, by opening the Debug panel for the source. Then open your app in another tab, and watch the events flow in:

This display updates live. It’s pretty cool.


Record detailed client-side behavior

Let’s get a bit more detail about what our users are up to. Let’s say we want to track our funnel. Specifically, we want to know how many people type something into the feedback form but don’t submit it.

When you write an Anvil app, you write code in Python that runs in the browser as your user interacts with your app. This means your Python code can know, for example, about each keypress in a text box. We can use this to send a Segment event when the user first starts typing.

We’ll write a function called on_started_editing(), and trigger it on the change event of any of our text fields. The first time this function is called, we’ll call analytics.track() to send the event to Segment.

Here’s how our Form1 class looks when we’re done:

Wiring up the event

Finally, it’s time to wire up the event, so that Anvil calls on_started_editing() every time someone edits their name, email address, or feedback text. We do this by selecting each component, opening the Properties panel, scrolling down to Events, and entering on_started_editing in the change box:

Now run your app, and watch the Segment debugger as you start to type!


4. Identify users to Segment when they log in

If a user logs into our app, we want to associate tracked events with individual users. Let’s add Anvil’s built-in user authentication to our app, then integrate it with Segment’s user identification.

First, we’ll add the Users Service to our app, and call anvil.users.login_with_form() in the form constructor. That function returns a row from the Users table. We can then pass the ID of the row, and the email column, to analytics.identify():

Here’s the whole process. This is all it takes to add user authentication to our app, and integrate it with Segment:

Again, once you’ve done this, you can run the app and watch the information arrive in Segment’s debugger! (You probably want to add a user account to your app first!)


That’s it!

We’ve just created a web app with nothing but Python; added Segment tracking to it, and even integrated Anvil’s user authentication with Segment’s user identities. Pretty cool, huh?

More about Anvil

Anvil is a platform for building full-stack web apps with nothing but Python. No need to wrestle with JS, HTML, CSS, Python, SQL, and all their frameworks – just build it all in Python.

Purva Adke on March 23rd 2021

Learn how to take your offline and online data from Segment and Twilio, and join it all together to get a 360-degree picture of the entire customer journey.

Rachel Landers, Doug Roberge on March 10th 2021

Workspace Home is a dashboard that delivers a unified view of your customer data infrastructure. It allows you to quickly view all aspects of your Segment implementation, ensuring that your customer data stays complete, accurate, and error-free.

Leif Dreizler on March 2nd 2021

Building customer-facing security features in partnership with dev teams helps you better serve your customers, unlocks additional revenue, and bidirectionally transfers knowledge between teams—a concept at the very core of DevSecOps.

Become a data expert.

Get the latest articles on all things data, product, and growth delivered straight to your inbox.