Nupur Bhade Vilas on October 20th 2021

Meet Twilio Engage: the first growth automation platform designed for the digital era.

Recent articles

Jim Young on October 22nd 2021

From October 20-21, Segment joined SIGNAL, Twilio’s annual customer and developer conference, alongside 50,000+ developers, product leaders, enterprises, and startups.

Jim Young on October 21st 2021

In addition to keynotes, product demos, and some very special guests, SIGNAL featured over 100 sessions exploring the intersection of technology, innovation, and the future of customer engagement. 

Nupur Bhade Vilas on October 20th 2021

Meet Twilio Engage: the first growth automation platform designed for the digital era.

Geoffrey Keating on October 20th 2021

We outline a path to building trust in data across your organization, and explain why it's critical to lay the right infrastructure and process for how data is collected, cleaned, governed, and acted on.

Benjamin Yolken on October 14th 2021

At Segment, we use Apache Kafka extensively to store customer events and connect the key pieces of our data processing pipeline. Last year we open-sourced topicctl, a tool that we developed for safer and easier management of the topics in our Kafka clusters; see our previous blog post for more details.

Since the initial release of topicctl, we’ve been working on several enhancements to the tool, with a particular focus on removing its dependencies on the Apache ZooKeeper APIs; as described more below, this is needed for a future world in which Kafka runs without ZooKeeper. We’ve also added authentication on broker API calls and fixed a number of user-reported bugs.

After months of internal testing, we’re pleased to announce that the new version, which we’re referring to as “v1”, is now ready for general use! See the repo README for more details on installing and using the latest version.

In the remainder of this post, we’d like to go into more detail on these changes and explain some of the technical challenges we faced in the process.

Kafka, ZooKeeper, and topicctl

A Kafka cluster consists of one or more brokers (i.e. nodes), which expose a set of APIs that allow clients to read and write data, among other use cases. The brokers coordinate to ensure that each has the latest version of the configuration metadata, that there is a single, agreed-upon leader for each partition, that messages are replicated to the right locations, and so forth.

Original architecture

Historically, the coordination described above has been done via a distributed key-value store called Apache ZooKeeper. The latter system stores shared metadata about everything in the cluster (brokers, topics, partitions, etc.) and has primitives to support coordination activities like leader election.

ZooKeeper was not just used internally by Kafka, but also externally by clients as an interface for interacting with cluster metadata. To fetch all topics in the cluster, for instance, a client would hit the ZooKeeper API and read the keys and values in a particular place in the ZooKeeper data hierarchy. Similarly, updates to metadata, e.g. changing the brokers assigned to each partition in a topic, were done by writing JSON blobs into ZooKeeper with the expected format in the expected place.

Some of these operations could also be done through the broker APIs, but many could only be done via ZooKeeper.

Given these conditions, we decided to use ZooKeeper APIs extensively in the original version of topicctl. Although it might have been possible to provide some subset of functionality without going through ZooKeeper, this “mixed access” mode would have made the code significantly more complex and made troubleshooting connection issues harder because different operations would be talking to different systems.

Towards a ZooKeeper-less World

In 2019, a proposal was made to remove the ZooKeeper dependency from Kafka. This would require handling all coordination activities internally within the cluster (involving some significant architectural changes) and also adding new APIs so that clients would no longer need to hit ZooKeeper for any administrative operations.

The motivation behind this proposal was pretty straightforward — ZooKeeper is a robust system and generally works well for the coordination use cases of Kafka, but can be complex to set up and manage. Removing it would significantly simplify the Kafka architecture and improve its scalability.

This proposal was on our radar when we originally created topicctl, but the implementation was so far off in the future that we weren’t worried about it interfering with our initial release. Recently, however, the first Kafka version that can run without ZooKeeper landed. We realized that we needed to embrace this new world so the tool would work continue to work with newer Kafka versions.

At the same time, we got feedback both internally and externally that depending on ZooKeeper APIs for the tool would make security significantly harder. ZooKeeper does have its own ACL system, but managing this in parallel with the Kafka one is a pain, so many companies just block ZooKeeper API access completely for everything except the Kafka brokers. Many users would be reluctant to open this access up (rightfully so!) and thus the ZooKeeper requirement was blocking the adoption of the tool in many environments.

Given these multiple factors, removing the ZooKeeper requirement from topicctl became a high priority.

Removing the ZooKeeper requirement

In the original code for topicctl, all cluster admin access went through a single struct type, the admin client, which then used a private ZooKeeper client for fetching configs, updating topics, etc. This struct exposed methods that could be called by other parts of the tool; the golang code for triggering a leader election in the cluster, for instance, looked like the following (some details omitted for simplicity):

Note that the client in this case isn’t actually communicating with the Kafka brokers or using any Kafka APIs. It’s just writing some JSON into /admin/preferred_replica_election, which, by convention, is where the Kafka brokers will look to start the process of running a leader election.

Our first step was to convert the APIs exposed by this struct into a golang interface with two implementations- one that depended on ZooKeeper, i.e. using the code from our original version of the admin client, and a second that only used Kafka broker APIs. 

So, the Client above became:

with the RunLeaderElection implementations becoming the following for the ZooKeeper and ZooKeeper-less versions, respectively:

The next step was to fill out the details of the broker-based admin client so that it actually worked. topicctl was already using the excellent kafka-go library for its functionality that depended on broker APIs (e.g., tailing topics), so we wanted to use that here as well. Unfortunately, however, this library was designed primarily for reading and writing data, as opposed to metadata, so it only supported a subset of the admin-related Kafka API.

After doing an inventory of our client’s requirements, we determined that there were six API calls we needed that were not yet supported by kafka-go:

Our next step was to update kafka-go to support these! At first, it looked easy- this library already had a nice interface for adding new Kafka APIs; all you had to do was create go structs to match the API message specs, and then add some helper functions to do the calls.

But, as often happens, we ran into a wrinkle: a new variant of the Kafka protocol had been recently introduced (described here) to make API messages more space-efficient. Although most of the APIs we needed had versions predating the update, a few only supported the new protocol format. To add all of the APIs we needed, we’d have to update kafka-go to support the new format.

Thus, we first went through all of the protocol code in kafka-go, updating it to support both the old and new formats. The proposal linked above didn’t have 100% of the details we needed, so in several cases, we also had to consult the Kafka code to fully understand how newer messages were formatted. After much trial and error, we eventually got this code working and merged.

Once that was done, we were unblocked from adding the additional APIs, which we did in this change. Finally, we could go back to the topicctl code and fill out the implementation of the broker-based admin client.

Returning to the RunLeaderElection example from above, we now had something like:

The end result is that we were able to get topicctl working end-to-end with either the ZooKeeper-based implementation (required for older clusters) or the ZooKeeper-less one (for newer clusters), with only minimal changes in the other parts of the code.

Security updates

In addition to removing the ZooKeeper requirement from topicctl , we also got several requests to support secure communication between the tool and the brokers in a cluster. We didn’t include these in the original version because we don’t (yet) depend on these features internally at Segment; but, they’re becoming increasingly important, particularly as users adopt externally hosted Kafka solutions like AWS MSK and Confluent Cloud.

We went ahead and fixed this, at least for the most common security mechanisms that Kafka supports. First, and most significantly, topicctl can now use TLS (called “SSL” for historical reasons in the Kafka documentation) to encrypt all communication between the tool and the brokers. 

In addition to TLS, we also added support for SASL authentication on these links. This provides a secure way for a client to present a username and password to the API; the permissions for each authenticated user can then be controlled in a fine-grained way via Kafka’s authorization settings.

Testing and release

As we updated the internals of topicctl, we extended our unit tests to run through the core flows like applying a topic change under multiple conditions, e.g. using ZooKeeper vs. only using Kafka APIs. We also used docker-compose to create local clusters with different combinations of Kafka versions, security settings, and client settings to ensure that the tool worked as expected in all cases. 

Once this initial testing was done, we updated the internal tooling that Segment engineers use to run topicctl to use either the old version or the new one, depending on the cluster. In this way, we could roll out to newer, lower-risk clusters first, then eventually work up to the bigger, riskier ones. 

After several months of usage, we felt confident enough to use v1 for all of our clusters and deprecate the old version for both internal and external users of the tool.

Conclusion

topicctl v1 is ready for general use! You might find it a useful addition to your Kafka toolkit for understanding the data and metadata in your clusters, and for making config changes. Also, feel free to create issues in our Github repository to report problems or request features for future versions.

Pete Walker on October 12th 2021

At Segment, we work with a wide range of customers both in terms of industry and scale. Some customers have millions of users and have a lot of experience integrating our tracking libraries or similar systems; others are comparatively smaller and are just starting to work with such systems. Especially with these smaller customers, a common question is: how do we actually integrate Segment into our unique SPA (Single Page App) architecture? 

In fact, we’ve seen  this concern about using tracking libraries inside a SPA become a barrier to using tracking libraries at all! But not tracking anything means you don’t have visibility into how your customers are interacting with your brand. Therefore, you can’t objectively answer questions like: Which advertising or engagement channels are the most profitable for our business? What content or products should we be focusing our attention on improving? What behavior is correlated with people staying with our brand, or worse, churning? 

The advantage that Segment brings is that the way we track customer behavior is completely agnostic to the tools that you use to answer questions about your business and then act on that information to improve it. Segment helps companies answer these questions by collecting customer behavior data in a way that is completely agnostic to the tools that your teams rely on for insight. Once your customer data has been collected by Segment, we take care of the rest by making the data accessible to the teams that need it, empowering informed decision-making for improving the customer experience.

How Segment Works

Segment works by allowing engineering teams to collect data once, saving the time it would otherwise take to integrate each analytics or engagement tool individually. This also allows end users of the data (i.e. analytics, marketing, and product teams) to add the customer data tools that they need in a matter of minutes rather than months.

Getting clean tracking in place is crucial, especially for small businesses just starting out. At some point, your business will outgrow any ad hoc tracking implementation, making data-driven initiatives come to a screeching halt. Customer data should be an investment, not a liability.

I’d like to outline exactly how to accomplish an initial Segment integration, and what considerations need to be made for an effective implementation.  

What is a Single Page App?

A SPA, or Single Page Application, is a front-end architecture pattern, where the entire application sits in a single HTML file in the browser, and the user interface dynamically loads as the user interacts with it via javascript. These are very commonly implemented using libraries like React.js, Angular.js, or Vue.js. Well known examples of Single Page Apps include Gmail and Twitter. This architecture is contrasted with the more traditional Multi-Page Application, which has links to multiple html pages that each get served to the client from a server as they are visited by the user.

In terms of integrating Segment with a SPA, the hidden challenge comes with the fact that there is technically only one html page to track on and therefore no distinction between pages. This is an important consideration for the third type of data you will want to collect on your customers in “the who, the what, and the where?” since there is only one “where” with a SPA!

Segment’s Javascript Library

When integrating Segment on a Web Source, you'll first inject the tracking snippet on your site which will look something like this:

You’ll notice that the second to last line in the script is “analytics.page()”. What this javascript does is record a page view to send to your Segment workspace every time that html page is loaded. The trouble with SPA integrations is that the html document only gets loaded once! Therefore the Analytics.js library won’t record a page view automatically for each “page” in our app, even if we are using a routing library like React Router to give our SPA the appearance of having multiple pages.

So, in order for us to effectively record where users are interacting with our app, we need to take manual control over the “analytics.page()” call.

The Implementation

As a demonstration, we’ve created this simple site from this code to illustrate how to accomplish an effective SPA implementation. We’re using React.js and React Router to give our app the appearance of having multiple pages and routes.

First, you’ll want to create a javascript source and place the Segment SDK into the only HTML document. You’ll then want to remove the “analytics.page()” call from the snippet. Once this is done, you are then free to call “analytics.page()” on whatever makes sense as a “page” in the context of the app. 

In our case, we have three different React components in a react router switch: “Home”, “experiment”, and “about.”

In a sense, each one of these components shows an entirely different page, and so we’ll want to attach a page call to each one of these React components. This is so we can answer questions like how many times a specific user has used the experiment page, for instance. 

In each of these components, we’ll call “analytics.page()” every time that specific component is loaded. In React, the corresponding lifecycle method for doing something when a component is loaded is “componentDidMount”, which is where we will track our page view:

As a side note, If you use hooks rather than class based components in React, you can implement page views from within the “useEffect()'' method.

After doing the same for each one of our components that we consider to be a page, we can then start to track user actions and profile traits via the same analytics.js library, using the “analytics.track()” and “analytics.identify()” methods respectively.

Wrapping up

And there you have it! For any other user behaviors besides page views, we will embed inline tracking code much the same way we did for our page calls, and we are well on our way to a full implementation. For those of you wondering what exactly you should be tracking besides page views, there is an excellent article in our Analytics Academy, that covers just that.

There is always a little bit of subtlety to every implementation. But because of the flexibility of Segment’s tracking libraries, it's easy to customize an implementation to virtually any architecture.

Kelly Kirwan on October 8th 2021

We’re entering a new era for customer engagement. 

This has been clear in the explosion of digital touchpoints we’ve seen in the past year and a half, the surge of customer data that followed, and the new omnichannel mindset customers have adopted. 

But where do we go from here? 

That was the question we set out to answer in our new guide, The Next Generation of Customer Engagement. We wanted to define what exactly is our “new normal,” and understand how businesses can meaningfully engage customers on this new playing field. 

Here’s a preview of what we found. 

The importance of data accessibility 

90% of businesses said that customer insights increased due to greater digital engagement during the pandemic. 

But many businesses are still struggling to make sense of their data, and make it actionable, in large part due to departmental silos and legacy infrastructures. 

The average company manages 162.9TB of data. For enterprises, that number jumps to 347.6 TB. And global data creation is projected to reach more than 180 zettabytes by 2025. (That’s 180 followed by 21 zeros.)

This influx of data can help businesses build a holistic, nuanced understanding of their customers – but only if it’s cleaned, consolidated, and easily shared across teams.

Yet, only 14% of companies say that they’ve made data widely accessible across their organization.

To rectify this, businesses need to prioritize having a scalable data infrastructure that can not only handle increasing volumes of customer data, but make it so any team member can use said data (without relying on developers). 

A focus on first-party data 

From data breaches to the opaque marketplace of third-party data, consumers have become increasingly wary of who’s collecting their information (and how that information is being used). 

As a result, both government regulators and browsers have taken a stronger stance on data privacy, which we’ve seen in initiatives like the GDPR and the phaseout of third-party cookies. 

To adapt to this privacy-first landscape, businesses need to focus on first-party data (which is collected directly by your company). Not only is it important from a legal and ethical standpoint, but first-party data also lends itself to a more transparent relationship with your customers. 

And while some have feared that the end of third-party cookies could lead to a “personalization-privacy paradox,” that’s actually not the case. In fact, 69% of consumers say they appreciate personalization, as long as it’s based on data they’ve shared with a business directly. That is, first-party data. 

The expectation of omnichannel 

Consumers today expect to be able to switch seamlessly between channels when interacting with a business. Or, in other words: they expect an omnichannel experience. 

Ensuring continuity as consumers switch online and offline touchpoints is no easy task. In fact, only 24% of businesses say they’re successfully investing in omnichannel personalization. 

But for 45% of consumers, all it takes is one unpersonalized experience for them to no longer do business with a brand. These are high stakes. 

That’s why customer data platforms have emerged as an essential component for omnichannel engagement, for their ability to consolidate data that’s scattered across different apps and teams, and then send it to any downstream tool for activation. 

In fact, 73% of companies surveyed this year said that a customer data platform will be critical to their customer experience efforts going forward. 

The future of customer engagement

For businesses, survival will always come down to their ability to meaningfully connect with customers. In our latest e-book, The Next Generation of Customer Engagement, we go into more depth into the themes mentioned above, and distill the five leading trends that will define customer relationships in the years to come.

You can read the full guide here.

Guest author: Dan McGaw on October 8th 2021

I couldn’t count the number of times I've been asked the following questions in my 20 years of MarTech implementations:

  1. Which success metrics should I care about about?

  2. Which user events are the most actionable?

  3. How should I build and use my stack to scale?

B2B marketers compete in a marketplace. Small, known customer bases and lengthy purchase cycles require clearly defined lead scoring, tight nurturing funnels, and consistent engagement across the customer lifecycle.

That's why we’re collaborating with Segment’s Startup Program to show you the tools, metrics, and user events crucial for optimizing your stack's impact.

The Added Value of a Solid B2B Subscription Stack

Getting B2B marketing right means aligning your team members with their responsibility to the purchase cycle, helping them identify and fast-track high-value leads, and ensuring smooth transitions between funnel stages and business functions. I can’t imagine B2B companies remaining competitive without the feedback, insights, and added functionality that tech stacks provide. 

Top B2B Events Your Stack Should Track 

You can't optimize what you don't measure. Marketers in B2B subscription businesses need deep visibility into their audience’s firmographics, combined with the ability to get granular and tinker with each conversion step along the way. Here are the top B2B events you should track:

  • Lead Created: Unknown visitors become leads once they submit a lead form, sharing attributes, and contact information.

  • User Created: Occurs once visitors create a login in the application for the first time. You’ll benefit from an event property that’ll distinguish between new accounts and users invited to existing accounts.

  • [Feature Used]: This event’s name will be custom to the most important features in your application. Record it so you can optimize for retention, advocacy, or revenue. Examples include Integration Enabled or Task Created.

  • Order Completed: This event will be used to report on revenue, conversion rates from free to paid, and customer retention.

Top B2B Subscription Metrics—How a Stack Helps Measure What Matters

You need metrics covering your customer journey from beginning to end across nurturing touchpoints and noteworthy events. That also includes recurring subscription metrics and evaluating marketing and sales performance (and marketing-sales collaboration). 

Marketers can learn how features correlate to engagement, churn, and satisfaction by analyzing behavioral differences in feature usage across user groups. Below are just a few metrics that B2B businesses should focus on.

New and Total Monthly Recurring Revenue (MRR)

Users who trigger the Payment Completed event are at the start of MRR or ARR. The metrics help you iIdentify valuable user cohorts and behaviors by comparing marketing channels, feature usage, and retention trends.

Trial Subscription Conversion Rate

Learn how frequently visitors become trial users after viewing your marketing pages. Cross-reference page views with User Created to get a handle on how well your channels or campaigns convert.

Sales Opportunity Closed-Won Rate

Combine the User Created and Payment Completed events to begin tracking. Factor in feature usage and original marketing channel of these users in your analysis to optimize this conversion rate.

Visitor-to-Signup Conversion Rate

This conversion rate is a key part of the full customer journey, and you’ll get the most mileage out of it by integrating your CRM tool with a data warehouse. When analyzing, you’ll want to look at both CRM and website activity, as recorded by Segment.

Use Cases of Integrations for B2B Growth

When you combine the power of a variety of sophisticated MarTech tools, you not only pool together valuable resources, you expand each tool’s abilities. Making the best use of these advanced use cases can help you scale B2B revenue. Get inspired below.

Analyze Firmographics to Validate Lead Quality

Lead quality almost always takes precedence over lead volume—early-and-middle staged B2B marketing programs frequently shift their focus away after sales struggles to qualify poor-fit leads.

In this use case, use Clearbit Reveal to gather firmographic data via IP addresses, which you need to qualify visitors. You'll next use the Clearbit Enrichment to add even more context to leads. Finally, you’ll pass this data to your data warehouse BigQuery, and your user journey analytics tool Amplitude. 

Combine Salesforce and Autopilot for Automated Messaging and Salesforce Integration

Connecting Salesforce and Autopilot creates a loop that synchronizes shared fields. When data is added or changed in one, it's applied to the same contact in the other.

We frequently use Autopilot with our clients in this way for lead scoring and automated messaging. As well as for keeping CRM records complete in Salesforce.

Automate and Power Personalized Messaging

Clearbit's enrichment data can superpower your touch points. When you use Segment to connect it to Customer.io and Drift, you can trigger automatic personalized message sends based on company info for each lead.

I’ve seen custom messaging make a considerable impact—more leads, higher conversion rates, increasing coverage of the customer lifecycle, and, most of all, volumes of first-party data for further optimization.

Add Customer.io Email Events

Email marketing data is often disconnected from tech stacks and conversion strategies, leaving money on the table. This use case is designed to help with just that — fully integrating key email events and expanding our analysis of full customer journey.

Email events Segment is able to send downstream include Email Delivered/Opened/Clicked/Unsubscribed. They can then be used in BigQuery or Amplitude, where they’re processed or analyzed. It’ll help with validating messaging strategies, improving email conversion rates, even moving the needle on retention.

Enable Custom Marketing Attribution

Custom attribution models use JS to send the data such as Page and Identity tables. But such custom data sources can be hard to sync with your reporting and data exploration tools such as Chartio.

With Segment, you can translate data and consolidate taxonomy, so they live alongside data from other sources in your warehouse and reporting tools. As a result, you can build accurate attribution and calculate your ROAS.

Your Visual Reference of B2B Tool Integrations

The above diagrams were taken from our B2B subscription MarTech stack infographic — with examples and explanations of B2B stacks integrated through Segment.

Download this pdf to remind yourself of best practice and ideas for your stack.

Join the Segment Startup Program, Build a Strong Stack, Grow Your Ecommerce Business

Segment's Startup Program gives early-stage startups all the tools they need to build high-performing stacks. Eligible startups get $25k in Segment credits for up to two years, using Segment’s Team Plan. On top of that, Segment is also giving away more than $1 million in free marketing and analytics platforms (Like Amplitude and Amazon Web Services). Heavy discounts on software are also available—and you unlock Segment’s advanced resources like the Analytics Academy or Analytics office hours.

To be eligible, startups must have been incorporated fewer than two years ago and have not raised more than $5 million in funding.

Don’t wait! Learn more about Segment’s one-of-a-kind Startup Program. And if you’d like a hand picking your tools along the way, feel free to use our WYSIWYG MarTech stack builder.

About the Author

Dan McGaw is the founder of McGaw.io, MarTech speaker, and co-founder of analytics tools such as UTM.io. He’s worked extensively with Segment implementations and led the creation of tools such as the Segment CSV importer.

Sonia Sidhpura, Michael Tan, Bryn Saunders on September 30th 2021

Enhanced Security Services enables our customers to stay ahead of the complex threat landscape, detect breaches within 24 hours, and provide unmatched visibility into their security posture.

Jim Young on October 22nd 2021

From October 20-21, Segment joined SIGNAL, Twilio’s annual customer and developer conference, alongside 50,000+ developers, product leaders, enterprises, and startups.

Jim Young on October 21st 2021

In addition to keynotes, product demos, and some very special guests, SIGNAL featured over 100 sessions exploring the intersection of technology, innovation, and the future of customer engagement. 

Nupur Bhade Vilas on October 20th 2021

Meet Twilio Engage: the first growth automation platform designed for the digital era.

Geoffrey Keating on October 20th 2021

We outline a path to building trust in data across your organization, and explain why it's critical to lay the right infrastructure and process for how data is collected, cleaned, governed, and acted on.

Benjamin Yolken on October 14th 2021

At Segment, we use Apache Kafka extensively to store customer events and connect the key pieces of our data processing pipeline. Last year we open-sourced topicctl, a tool that we developed for safer and easier management of the topics in our Kafka clusters; see our previous blog post for more details.

Since the initial release of topicctl, we’ve been working on several enhancements to the tool, with a particular focus on removing its dependencies on the Apache ZooKeeper APIs; as described more below, this is needed for a future world in which Kafka runs without ZooKeeper. We’ve also added authentication on broker API calls and fixed a number of user-reported bugs.

After months of internal testing, we’re pleased to announce that the new version, which we’re referring to as “v1”, is now ready for general use! See the repo README for more details on installing and using the latest version.

In the remainder of this post, we’d like to go into more detail on these changes and explain some of the technical challenges we faced in the process.

Kafka, ZooKeeper, and topicctl

A Kafka cluster consists of one or more brokers (i.e. nodes), which expose a set of APIs that allow clients to read and write data, among other use cases. The brokers coordinate to ensure that each has the latest version of the configuration metadata, that there is a single, agreed-upon leader for each partition, that messages are replicated to the right locations, and so forth.

Original architecture

Historically, the coordination described above has been done via a distributed key-value store called Apache ZooKeeper. The latter system stores shared metadata about everything in the cluster (brokers, topics, partitions, etc.) and has primitives to support coordination activities like leader election.

ZooKeeper was not just used internally by Kafka, but also externally by clients as an interface for interacting with cluster metadata. To fetch all topics in the cluster, for instance, a client would hit the ZooKeeper API and read the keys and values in a particular place in the ZooKeeper data hierarchy. Similarly, updates to metadata, e.g. changing the brokers assigned to each partition in a topic, were done by writing JSON blobs into ZooKeeper with the expected format in the expected place.

Some of these operations could also be done through the broker APIs, but many could only be done via ZooKeeper.

Given these conditions, we decided to use ZooKeeper APIs extensively in the original version of topicctl. Although it might have been possible to provide some subset of functionality without going through ZooKeeper, this “mixed access” mode would have made the code significantly more complex and made troubleshooting connection issues harder because different operations would be talking to different systems.

Towards a ZooKeeper-less World

In 2019, a proposal was made to remove the ZooKeeper dependency from Kafka. This would require handling all coordination activities internally within the cluster (involving some significant architectural changes) and also adding new APIs so that clients would no longer need to hit ZooKeeper for any administrative operations.

The motivation behind this proposal was pretty straightforward — ZooKeeper is a robust system and generally works well for the coordination use cases of Kafka, but can be complex to set up and manage. Removing it would significantly simplify the Kafka architecture and improve its scalability.

This proposal was on our radar when we originally created topicctl, but the implementation was so far off in the future that we weren’t worried about it interfering with our initial release. Recently, however, the first Kafka version that can run without ZooKeeper landed. We realized that we needed to embrace this new world so the tool would work continue to work with newer Kafka versions.

At the same time, we got feedback both internally and externally that depending on ZooKeeper APIs for the tool would make security significantly harder. ZooKeeper does have its own ACL system, but managing this in parallel with the Kafka one is a pain, so many companies just block ZooKeeper API access completely for everything except the Kafka brokers. Many users would be reluctant to open this access up (rightfully so!) and thus the ZooKeeper requirement was blocking the adoption of the tool in many environments.

Given these multiple factors, removing the ZooKeeper requirement from topicctl became a high priority.

Removing the ZooKeeper requirement

In the original code for topicctl, all cluster admin access went through a single struct type, the admin client, which then used a private ZooKeeper client for fetching configs, updating topics, etc. This struct exposed methods that could be called by other parts of the tool; the golang code for triggering a leader election in the cluster, for instance, looked like the following (some details omitted for simplicity):

Note that the client in this case isn’t actually communicating with the Kafka brokers or using any Kafka APIs. It’s just writing some JSON into /admin/preferred_replica_election, which, by convention, is where the Kafka brokers will look to start the process of running a leader election.

Our first step was to convert the APIs exposed by this struct into a golang interface with two implementations- one that depended on ZooKeeper, i.e. using the code from our original version of the admin client, and a second that only used Kafka broker APIs. 

So, the Client above became:

with the RunLeaderElection implementations becoming the following for the ZooKeeper and ZooKeeper-less versions, respectively:

The next step was to fill out the details of the broker-based admin client so that it actually worked. topicctl was already using the excellent kafka-go library for its functionality that depended on broker APIs (e.g., tailing topics), so we wanted to use that here as well. Unfortunately, however, this library was designed primarily for reading and writing data, as opposed to metadata, so it only supported a subset of the admin-related Kafka API.

After doing an inventory of our client’s requirements, we determined that there were six API calls we needed that were not yet supported by kafka-go:

Our next step was to update kafka-go to support these! At first, it looked easy- this library already had a nice interface for adding new Kafka APIs; all you had to do was create go structs to match the API message specs, and then add some helper functions to do the calls.

But, as often happens, we ran into a wrinkle: a new variant of the Kafka protocol had been recently introduced (described here) to make API messages more space-efficient. Although most of the APIs we needed had versions predating the update, a few only supported the new protocol format. To add all of the APIs we needed, we’d have to update kafka-go to support the new format.

Thus, we first went through all of the protocol code in kafka-go, updating it to support both the old and new formats. The proposal linked above didn’t have 100% of the details we needed, so in several cases, we also had to consult the Kafka code to fully understand how newer messages were formatted. After much trial and error, we eventually got this code working and merged.

Once that was done, we were unblocked from adding the additional APIs, which we did in this change. Finally, we could go back to the topicctl code and fill out the implementation of the broker-based admin client.

Returning to the RunLeaderElection example from above, we now had something like:

The end result is that we were able to get topicctl working end-to-end with either the ZooKeeper-based implementation (required for older clusters) or the ZooKeeper-less one (for newer clusters), with only minimal changes in the other parts of the code.

Security updates

In addition to removing the ZooKeeper requirement from topicctl , we also got several requests to support secure communication between the tool and the brokers in a cluster. We didn’t include these in the original version because we don’t (yet) depend on these features internally at Segment; but, they’re becoming increasingly important, particularly as users adopt externally hosted Kafka solutions like AWS MSK and Confluent Cloud.

We went ahead and fixed this, at least for the most common security mechanisms that Kafka supports. First, and most significantly, topicctl can now use TLS (called “SSL” for historical reasons in the Kafka documentation) to encrypt all communication between the tool and the brokers. 

In addition to TLS, we also added support for SASL authentication on these links. This provides a secure way for a client to present a username and password to the API; the permissions for each authenticated user can then be controlled in a fine-grained way via Kafka’s authorization settings.

Testing and release

As we updated the internals of topicctl, we extended our unit tests to run through the core flows like applying a topic change under multiple conditions, e.g. using ZooKeeper vs. only using Kafka APIs. We also used docker-compose to create local clusters with different combinations of Kafka versions, security settings, and client settings to ensure that the tool worked as expected in all cases. 

Once this initial testing was done, we updated the internal tooling that Segment engineers use to run topicctl to use either the old version or the new one, depending on the cluster. In this way, we could roll out to newer, lower-risk clusters first, then eventually work up to the bigger, riskier ones. 

After several months of usage, we felt confident enough to use v1 for all of our clusters and deprecate the old version for both internal and external users of the tool.

Conclusion

topicctl v1 is ready for general use! You might find it a useful addition to your Kafka toolkit for understanding the data and metadata in your clusters, and for making config changes. Also, feel free to create issues in our Github repository to report problems or request features for future versions.

Pete Walker on October 12th 2021

At Segment, we work with a wide range of customers both in terms of industry and scale. Some customers have millions of users and have a lot of experience integrating our tracking libraries or similar systems; others are comparatively smaller and are just starting to work with such systems. Especially with these smaller customers, a common question is: how do we actually integrate Segment into our unique SPA (Single Page App) architecture? 

In fact, we’ve seen  this concern about using tracking libraries inside a SPA become a barrier to using tracking libraries at all! But not tracking anything means you don’t have visibility into how your customers are interacting with your brand. Therefore, you can’t objectively answer questions like: Which advertising or engagement channels are the most profitable for our business? What content or products should we be focusing our attention on improving? What behavior is correlated with people staying with our brand, or worse, churning? 

The advantage that Segment brings is that the way we track customer behavior is completely agnostic to the tools that you use to answer questions about your business and then act on that information to improve it. Segment helps companies answer these questions by collecting customer behavior data in a way that is completely agnostic to the tools that your teams rely on for insight. Once your customer data has been collected by Segment, we take care of the rest by making the data accessible to the teams that need it, empowering informed decision-making for improving the customer experience.

How Segment Works

Segment works by allowing engineering teams to collect data once, saving the time it would otherwise take to integrate each analytics or engagement tool individually. This also allows end users of the data (i.e. analytics, marketing, and product teams) to add the customer data tools that they need in a matter of minutes rather than months.

Getting clean tracking in place is crucial, especially for small businesses just starting out. At some point, your business will outgrow any ad hoc tracking implementation, making data-driven initiatives come to a screeching halt. Customer data should be an investment, not a liability.

I’d like to outline exactly how to accomplish an initial Segment integration, and what considerations need to be made for an effective implementation.  

What is a Single Page App?

A SPA, or Single Page Application, is a front-end architecture pattern, where the entire application sits in a single HTML file in the browser, and the user interface dynamically loads as the user interacts with it via javascript. These are very commonly implemented using libraries like React.js, Angular.js, or Vue.js. Well known examples of Single Page Apps include Gmail and Twitter. This architecture is contrasted with the more traditional Multi-Page Application, which has links to multiple html pages that each get served to the client from a server as they are visited by the user.

In terms of integrating Segment with a SPA, the hidden challenge comes with the fact that there is technically only one html page to track on and therefore no distinction between pages. This is an important consideration for the third type of data you will want to collect on your customers in “the who, the what, and the where?” since there is only one “where” with a SPA!

Segment’s Javascript Library

When integrating Segment on a Web Source, you'll first inject the tracking snippet on your site which will look something like this:

You’ll notice that the second to last line in the script is “analytics.page()”. What this javascript does is record a page view to send to your Segment workspace every time that html page is loaded. The trouble with SPA integrations is that the html document only gets loaded once! Therefore the Analytics.js library won’t record a page view automatically for each “page” in our app, even if we are using a routing library like React Router to give our SPA the appearance of having multiple pages.

So, in order for us to effectively record where users are interacting with our app, we need to take manual control over the “analytics.page()” call.

The Implementation

As a demonstration, we’ve created this simple site from this code to illustrate how to accomplish an effective SPA implementation. We’re using React.js and React Router to give our app the appearance of having multiple pages and routes.

First, you’ll want to create a javascript source and place the Segment SDK into the only HTML document. You’ll then want to remove the “analytics.page()” call from the snippet. Once this is done, you are then free to call “analytics.page()” on whatever makes sense as a “page” in the context of the app. 

In our case, we have three different React components in a react router switch: “Home”, “experiment”, and “about.”

In a sense, each one of these components shows an entirely different page, and so we’ll want to attach a page call to each one of these React components. This is so we can answer questions like how many times a specific user has used the experiment page, for instance. 

In each of these components, we’ll call “analytics.page()” every time that specific component is loaded. In React, the corresponding lifecycle method for doing something when a component is loaded is “componentDidMount”, which is where we will track our page view:

As a side note, If you use hooks rather than class based components in React, you can implement page views from within the “useEffect()'' method.

After doing the same for each one of our components that we consider to be a page, we can then start to track user actions and profile traits via the same analytics.js library, using the “analytics.track()” and “analytics.identify()” methods respectively.

Wrapping up

And there you have it! For any other user behaviors besides page views, we will embed inline tracking code much the same way we did for our page calls, and we are well on our way to a full implementation. For those of you wondering what exactly you should be tracking besides page views, there is an excellent article in our Analytics Academy, that covers just that.

There is always a little bit of subtlety to every implementation. But because of the flexibility of Segment’s tracking libraries, it's easy to customize an implementation to virtually any architecture.

Kelly Kirwan on October 8th 2021

We’re entering a new era for customer engagement. 

This has been clear in the explosion of digital touchpoints we’ve seen in the past year and a half, the surge of customer data that followed, and the new omnichannel mindset customers have adopted. 

But where do we go from here? 

That was the question we set out to answer in our new guide, The Next Generation of Customer Engagement. We wanted to define what exactly is our “new normal,” and understand how businesses can meaningfully engage customers on this new playing field. 

Here’s a preview of what we found. 

The importance of data accessibility 

90% of businesses said that customer insights increased due to greater digital engagement during the pandemic. 

But many businesses are still struggling to make sense of their data, and make it actionable, in large part due to departmental silos and legacy infrastructures. 

The average company manages 162.9TB of data. For enterprises, that number jumps to 347.6 TB. And global data creation is projected to reach more than 180 zettabytes by 2025. (That’s 180 followed by 21 zeros.)

This influx of data can help businesses build a holistic, nuanced understanding of their customers – but only if it’s cleaned, consolidated, and easily shared across teams.

Yet, only 14% of companies say that they’ve made data widely accessible across their organization.

To rectify this, businesses need to prioritize having a scalable data infrastructure that can not only handle increasing volumes of customer data, but make it so any team member can use said data (without relying on developers). 

A focus on first-party data 

From data breaches to the opaque marketplace of third-party data, consumers have become increasingly wary of who’s collecting their information (and how that information is being used). 

As a result, both government regulators and browsers have taken a stronger stance on data privacy, which we’ve seen in initiatives like the GDPR and the phaseout of third-party cookies. 

To adapt to this privacy-first landscape, businesses need to focus on first-party data (which is collected directly by your company). Not only is it important from a legal and ethical standpoint, but first-party data also lends itself to a more transparent relationship with your customers. 

And while some have feared that the end of third-party cookies could lead to a “personalization-privacy paradox,” that’s actually not the case. In fact, 69% of consumers say they appreciate personalization, as long as it’s based on data they’ve shared with a business directly. That is, first-party data. 

The expectation of omnichannel 

Consumers today expect to be able to switch seamlessly between channels when interacting with a business. Or, in other words: they expect an omnichannel experience. 

Ensuring continuity as consumers switch online and offline touchpoints is no easy task. In fact, only 24% of businesses say they’re successfully investing in omnichannel personalization. 

But for 45% of consumers, all it takes is one unpersonalized experience for them to no longer do business with a brand. These are high stakes. 

That’s why customer data platforms have emerged as an essential component for omnichannel engagement, for their ability to consolidate data that’s scattered across different apps and teams, and then send it to any downstream tool for activation. 

In fact, 73% of companies surveyed this year said that a customer data platform will be critical to their customer experience efforts going forward. 

The future of customer engagement

For businesses, survival will always come down to their ability to meaningfully connect with customers. In our latest e-book, The Next Generation of Customer Engagement, we go into more depth into the themes mentioned above, and distill the five leading trends that will define customer relationships in the years to come.

You can read the full guide here.

Guest author: Dan McGaw on October 8th 2021

I couldn’t count the number of times I've been asked the following questions in my 20 years of MarTech implementations:

  1. Which success metrics should I care about about?

  2. Which user events are the most actionable?

  3. How should I build and use my stack to scale?

B2B marketers compete in a marketplace. Small, known customer bases and lengthy purchase cycles require clearly defined lead scoring, tight nurturing funnels, and consistent engagement across the customer lifecycle.

That's why we’re collaborating with Segment’s Startup Program to show you the tools, metrics, and user events crucial for optimizing your stack's impact.

The Added Value of a Solid B2B Subscription Stack

Getting B2B marketing right means aligning your team members with their responsibility to the purchase cycle, helping them identify and fast-track high-value leads, and ensuring smooth transitions between funnel stages and business functions. I can’t imagine B2B companies remaining competitive without the feedback, insights, and added functionality that tech stacks provide. 

Top B2B Events Your Stack Should Track 

You can't optimize what you don't measure. Marketers in B2B subscription businesses need deep visibility into their audience’s firmographics, combined with the ability to get granular and tinker with each conversion step along the way. Here are the top B2B events you should track:

  • Lead Created: Unknown visitors become leads once they submit a lead form, sharing attributes, and contact information.

  • User Created: Occurs once visitors create a login in the application for the first time. You’ll benefit from an event property that’ll distinguish between new accounts and users invited to existing accounts.

  • [Feature Used]: This event’s name will be custom to the most important features in your application. Record it so you can optimize for retention, advocacy, or revenue. Examples include Integration Enabled or Task Created.

  • Order Completed: This event will be used to report on revenue, conversion rates from free to paid, and customer retention.

Top B2B Subscription Metrics—How a Stack Helps Measure What Matters

You need metrics covering your customer journey from beginning to end across nurturing touchpoints and noteworthy events. That also includes recurring subscription metrics and evaluating marketing and sales performance (and marketing-sales collaboration). 

Marketers can learn how features correlate to engagement, churn, and satisfaction by analyzing behavioral differences in feature usage across user groups. Below are just a few metrics that B2B businesses should focus on.

New and Total Monthly Recurring Revenue (MRR)

Users who trigger the Payment Completed event are at the start of MRR or ARR. The metrics help you iIdentify valuable user cohorts and behaviors by comparing marketing channels, feature usage, and retention trends.

Trial Subscription Conversion Rate

Learn how frequently visitors become trial users after viewing your marketing pages. Cross-reference page views with User Created to get a handle on how well your channels or campaigns convert.

Sales Opportunity Closed-Won Rate

Combine the User Created and Payment Completed events to begin tracking. Factor in feature usage and original marketing channel of these users in your analysis to optimize this conversion rate.

Visitor-to-Signup Conversion Rate

This conversion rate is a key part of the full customer journey, and you’ll get the most mileage out of it by integrating your CRM tool with a data warehouse. When analyzing, you’ll want to look at both CRM and website activity, as recorded by Segment.

Use Cases of Integrations for B2B Growth

When you combine the power of a variety of sophisticated MarTech tools, you not only pool together valuable resources, you expand each tool’s abilities. Making the best use of these advanced use cases can help you scale B2B revenue. Get inspired below.

Analyze Firmographics to Validate Lead Quality

Lead quality almost always takes precedence over lead volume—early-and-middle staged B2B marketing programs frequently shift their focus away after sales struggles to qualify poor-fit leads.

In this use case, use Clearbit Reveal to gather firmographic data via IP addresses, which you need to qualify visitors. You'll next use the Clearbit Enrichment to add even more context to leads. Finally, you’ll pass this data to your data warehouse BigQuery, and your user journey analytics tool Amplitude. 

Combine Salesforce and Autopilot for Automated Messaging and Salesforce Integration

Connecting Salesforce and Autopilot creates a loop that synchronizes shared fields. When data is added or changed in one, it's applied to the same contact in the other.

We frequently use Autopilot with our clients in this way for lead scoring and automated messaging. As well as for keeping CRM records complete in Salesforce.

Automate and Power Personalized Messaging

Clearbit's enrichment data can superpower your touch points. When you use Segment to connect it to Customer.io and Drift, you can trigger automatic personalized message sends based on company info for each lead.

I’ve seen custom messaging make a considerable impact—more leads, higher conversion rates, increasing coverage of the customer lifecycle, and, most of all, volumes of first-party data for further optimization.

Add Customer.io Email Events

Email marketing data is often disconnected from tech stacks and conversion strategies, leaving money on the table. This use case is designed to help with just that — fully integrating key email events and expanding our analysis of full customer journey.

Email events Segment is able to send downstream include Email Delivered/Opened/Clicked/Unsubscribed. They can then be used in BigQuery or Amplitude, where they’re processed or analyzed. It’ll help with validating messaging strategies, improving email conversion rates, even moving the needle on retention.

Enable Custom Marketing Attribution

Custom attribution models use JS to send the data such as Page and Identity tables. But such custom data sources can be hard to sync with your reporting and data exploration tools such as Chartio.

With Segment, you can translate data and consolidate taxonomy, so they live alongside data from other sources in your warehouse and reporting tools. As a result, you can build accurate attribution and calculate your ROAS.

Your Visual Reference of B2B Tool Integrations

The above diagrams were taken from our B2B subscription MarTech stack infographic — with examples and explanations of B2B stacks integrated through Segment.

Download this pdf to remind yourself of best practice and ideas for your stack.

Join the Segment Startup Program, Build a Strong Stack, Grow Your Ecommerce Business

Segment's Startup Program gives early-stage startups all the tools they need to build high-performing stacks. Eligible startups get $25k in Segment credits for up to two years, using Segment’s Team Plan. On top of that, Segment is also giving away more than $1 million in free marketing and analytics platforms (Like Amplitude and Amazon Web Services). Heavy discounts on software are also available—and you unlock Segment’s advanced resources like the Analytics Academy or Analytics office hours.

To be eligible, startups must have been incorporated fewer than two years ago and have not raised more than $5 million in funding.

Don’t wait! Learn more about Segment’s one-of-a-kind Startup Program. And if you’d like a hand picking your tools along the way, feel free to use our WYSIWYG MarTech stack builder.

About the Author

Dan McGaw is the founder of McGaw.io, MarTech speaker, and co-founder of analytics tools such as UTM.io. He’s worked extensively with Segment implementations and led the creation of tools such as the Segment CSV importer.

Sonia Sidhpura, Michael Tan, Bryn Saunders on September 30th 2021

Enhanced Security Services enables our customers to stay ahead of the complex threat landscape, detect breaches within 24 hours, and provide unmatched visibility into their security posture.

Pablo Vidal Bouza on July 15th 2021

How Segment moved from traditional SSH bastion hosts to use AWS Systems Manager SSM to manage access to infrastructure.

Leif Dreizler on March 2nd 2021

Building customer-facing security features in partnership with dev teams helps you better serve your customers, unlocks additional revenue, and bidirectionally transfers knowledge between teams—a concept at the very core of DevSecOps.

Udit Mehta on January 20th 2021

Learn how we use AWS Step Functions for large-scale data orchestration

Growth & Marketing

Nupur Bhade Vilas on October 20th 2021

Meet Twilio Engage: the first growth automation platform designed for the digital era.

Sam Gehret on July 29th 2021

A look at server-side activation as the new alternative to the third-party advertising pixel.

Sudheendra Chilappagari on February 18th 2021

Learn how to use Segment and Twilio Programmable Messaging to send a personalized SMS campaign.

Become a data expert. Subscribe to our newsletter.

Josephine Liu, Sherry Huang on June 9th 2021

Our latest feature, Journeys, empowers teams to unify touchpoints across the end-to-end customer journey.

Kate Butterfield on June 16th 2021

Get an inside look at the design process for Journeys.

Katrina Wong on March 31st 2021

With Segment, brands can leverage their first-party customer data to build deeper customer relationships.

Madelyn Mullen on August 17th 2020

Your business growth depends on empowering every team with good data. Introducing the Segment Data Council, a series of interviews with seasoned customer data experts who know how to build bridges across the organization and empower teams.

Madelyn Mullen on August 17th 2020

Imagine if your PMs had an overview of support tickets, billing issues, sales interactions, and users’ clickstreams—all unified and available via self-service. It would be the Holy Grail of data management. Listen to more in this Data Council episode.

Madelyn Mullen on August 17th 2020

Simply put, data governance leads to better automation. Listen to this Data Council episode to hear how Arjun Grama grew his customer data wrangling techniques to transform product lines at IBM and raise the bar on growth KPIs at Anheuser-Busch InBev.

Madelyn Mullen on August 17th 2020

What does it take for a data driven business case to excite stakeholders across an organization? Tune in to this Data Council episode for an insider perspective from Kurt Williams, Global Director of Customer Products at Anheuser-Busch InBev.

Become a data expert.

Get the latest articles on all things data, product, and growth delivered straight to your inbox.