Geoffrey Keating on November 10th 2021

Twilio Segment releases its 2021 Growth Report, an exclusive view into how customer data fuels today’s most high-performing businesses.

Recent articles

Niels Tindbaek, Simone Roscitt on December 2nd 2021

Now more than ever, customers expect personalized, integrated, and frictionless digital experiences - and it’s rapidly changing how businesses build and adapt their customer data infrastructure. 

Kelly Kirwan on December 1st 2021

Are you concerned that data silos are stunting your organization’s growth? Read this to learn what data silos are, why they matter, and how to fix them.

Jes Kirkwood on December 1st 2021

GitHub's VP, Growth Thibault Imbert reveals how the company's growth team drives results in an exclusive interview.

Geoffrey Keating on November 30th 2021

A data pipeline comprises the tools and processes that move data from source to destination. Here’s how to build one.

Geoffrey Keating on November 23rd 2021

All I want for Christmas is a customer data platform that makes it through the holidays.

Not your average wish-list headliner, except among infrastructure teams dealing with peak requests that are at least five times greater than the average load.

An outage during any of the peak shopping season events —Thanksgiving, Black Friday, 11/11 in China, Christmas—can easily cost you your festive mood and your company hundreds of thousands to millions of dollars.

At least 48 prominent brands experienced technical problems on 2020's Black Friday. Luckily, you don't have to put your fate in the hands of Santa or anyone else to make your wish come true. You can turn the holidays—and other peaks—into business as usual with proper and timely preparation.

We've put together five peak-season best practices for your data infrastructure that get you ready for some peaceful time off while the business keeps humming along.

1. Know previous peaks and what's changed since

Start preparing by revisiting the data and lessons learned from last year's holidays or another peak event. While historical performance isn't a mirror of the future, it's a helpful starting point for your preparations.

When making your projections, consider industry data like the above chart for the eCommerce sector, which shows and predicts that overall online sales will keep increasing.

These are common characteristics of peak events you should expect to find in your historical data, according to the Google Cloud blog:

  • Traffic increases of 5 to 20 times (or even greater).

  • Higher conversion rates and a more considerable burden on back-end systems—like payment processing—than on the front end.

  • Rapidly increasing traffic in a short period as the event starts.

  • A trailing decline to normal levels that's much slower than the acceleration to the peak level.

Not every business runs into a peak over the holidays—another good reason to check your historical data before acting.

Looking at Segment's data, for example, on average across all our customers, we saw a decrease of 30% in traffic during Thanksgiving. But for those in online retail, we observed surges of up to 1,000% for several hours during peak time.

Once you've collected your historical data, you want to consider what major changes have happened since then. Ask yourself:

  • What systems have we rolled out or refactored significantly?

  • What major features or services have we launched?

  • What new sources of data have we plugged into our data stack?

  • What types of customers, partners, and other vendors have we added or changed?

Note any such changes, especially if you haven't tested them under significant amounts of load. You'll want to pay special attention to these in the subsequent steps of your preparation.

You'll also want to reach out to colleagues in other departments to get their projections for the upcoming peaks. Forecasts for sales revenue, inventory, and shipping, as well as marketing initiatives, will all give you insights into what kind of volume to plan for.

2. Check in with customers and partners

Most people and businesses make plans for the holiday season. Whether you work directly with consumers (B2C) or other businesses (B2B), reach out and learn their plans so you can further improve yours. Also, talk to partners you rely on—like us, Segment, or cloud providers like AWS or Google—so they can make the necessary preparations, too.

In a B2C business, you want to understand how consumer behavior might change compared to last year. Maybe a new type of device is more popular now or a different social media channel. Such changes can affect loads on different areas of your system or require you to collect new data events. You also want to check which product launches, special offers, and other marketing campaigns look like upcoming holiday hits. Short of coordinating research interviews directly with customers—never a bad idea—heading over to the folks in marketing, growth, or research should get you plenty of insights.

For a B2B-focused company, you want to understand what the businesses you serve are planning for their customers. Your customer should be running through a preparation exercise similar to what we're outlining here. Figure out where you can best support them in this process and the kind of numbers they're expecting, and agree on how you'll cooperate over the holidays.

Some of the B2B customers of Segment, all of which we proactively reach out to to support them during the peak seasons.

Make sure to talk to partners you rely on once you've collected this information from your customers. Albert Strasheim, VP, Segment core engineering, says the cloud isn't infinite. Your vendors face the same peak events as you do, and their other customers compete with you for the same cloud resources.

Reach out early to:

  • Check how your partners can support you. They might have documentation on handling peaks, template configurations for their products or services, and dedicated support teams to take on some of the preparation tasks.

  • Make reservations for the cloud resources you'll need. (More on this later under Create headroom and other buffers, as most but not all cloud resources can scale up automatically.)

  • Establish how and with whom you'll communicate leading up to and during the peak event.

Don't forget to check the existing service-level agreements (SLAs) you have in place with vendors and whether the agreed response times suffice.

3. Prepare with tests, game days, and system checks

You can only determine whether your team, partners, and infrastructure are peak-season-ready through tests and games that simulate actual events as closely as possible.

Since you can't measure, test, or prepare for the unknown, you first need to establish which metrics reflect system health and whether you're capturing such data for monitoring.

Image credit

If you had to pick just four such data points, you can't go wrong with Google's Four Golden Signals:

  • Latency: The time it takes a request—like asking a web page to load—to reach the system.

  • Traffic: The total demand on the system, usually measured in requests per second.

  • Errors: The rate of failing requests, either as an absolute number or a proportion of all requests.

  • Saturation: One or more metrics reflecting the utilization of critical system resources, say what percentage of memory the servers are using or the amount of database storage you have available.

You can complement technical health metrics like the ones above by monitoring critical business numbers such as revenue, website traffic, and orders placed. These indicators can also signal potential problems when they show irregularities.

With the numbers to monitor identified, you'll want to see how different aspects of your data infrastructure hold up under a large volume of simulated requests—load testing.

You want to test the entire customer journey—not just individual elements—under such pressure, so you'll see what the customer experience will be like. Make sure to try several load mixes, like mobile versus desktop, various transaction types, or traffic coming from different regions. These test variations can all reveal particular weaknesses in your system.

You'll also want to check whether spikes in traffic don't set off automated defensive mechanisms in your system that mistake the holiday surge for an attack or other security breach.

While load tests reveal how a system responds to peak traffic, game days show how your teams respond. You'll want to think through potential failures during the holidays and then stage those situations to see where your operational procedures or knowledge are insufficient. Depending on the stakes and your team's size, you might want to run several of these and even include outside vendors.

You could, for example, simulate that one of your primary methods of payment stops working because of an outage at your payment processor. Such a simulation will reveal which teams need to get involved—did anyone think of the added load on customer support such an outage will cause?—and whether you’ve established effective lines of communication with a third party like your payment processor.

4. Create headroom and other buffers

You'll want to create buffers in your system because your earlier plans and test results are estimates—reality can always turn out differently.

At Segment, we look at vital infrastructure components like Kafka, a data event streaming platform, and DynamoDB, a NoSQL database service, to ensure they have headroom—additional space above the peak traffic we expect.

You need to do such checks on all critical pieces of your tech stack. As much as possible, you want to create headroom and other buffers in your system through auto-scaling: the automatic expanding and shrinking of storage, memory, and other resources as traffic goes up or down.

You can configure many modern cloud services like AWS and Google for such auto-scaling. Yet, these processes might not keep up with rapid, five- to tenfold traffic increases. Under such circumstances, manual scaling might still be the most reliable solution. Discuss your plans and projections with your partners to determine which parts of your system can auto-scale and which elements need reservations and manual operation.

Amazon Kinesis is an advanced data product that helps customers analyze their video and data streams in real time. (Screenshot from the Kinesis explainer video.)

A typical issue we see with Segment customers is scaling Kinesis data, which provides real-time insights on video and data streams. It's not impossible to configure auto-scaling for Kinesis, but it's not straightforward either and often overlooked during peak season preparations.

A final, important buffer you want to create is a change freeze across your infrastructure leading up to and during peak events. Changes in one part of a system can trigger unexpected events elsewhere and render all of your preparations useless. At a minimum, such a change freeze should apply to releasing new or updated features. Typically, you want to extend it to the scope of marketing activities and third-party services integrated with or connected to your system.

5. Ensure focus and clarity during the peak event

When you've done all the preparations we've run through, a peak day can unfold much like a regular one. But some folks will have to work or be on call on days they'd rather be home, no matter how much prep work you put in, and they'd better know what to do in case something does go wrong.

You want to establish well in advance the exact responsibilities of each team and which individuals take on which shifts. Do such planning before the festive season starts and spread the burden of working on events such as Thanksgiving, Christmas, and perhaps New Year's evenly across team members.

Make sure to carefully review your standard on-call and incident procedures and see what changes you need to make for peak season by asking questions such as:

  • Are alerts set up for every critical metric you identified earlier?

  • To whom does the first alert go? Who is the fallback if that alert goes unanswered or unnoticed? What are the required response times for different types of alerts?

  • What's the first action someone should take for a specific alert? What happens when the initial procedure, template, or checklist doesn't solve the problem?

  • What does escalation look like? Under what circumstances and how can someone reach the engineering management team or even executives?

Don't just think about internal procedures and contacts when answering such questions. Consider which incidents need third-party involvement and include their contact persons and details in your plans. Segment's support team, for example, have coverage plans in place during holidays, and have team members standing by should a significant issue occur.

Most incidents are manageable if they get detected and handled anywhere between minutes and an hour or two. Albert Strasheim, VP, Segment core engineering, sees customers run into trouble when there's no automated alerting and noticing something is wrong takes more than a few hours.

You can avoid such problems by having virtual or in-office dashboards with your critical metrics and alerts displayed in real time. You might want to create a war room—physical or virtual—for larger operations, where people from all relevant teams work together synchronously and communication is instant.

Get next year's holiday gift ready now

That's the best part of the holiday season: you can be pretty sure the same events will be on the calendar again next year. All the effort you put in now increases the chances of your wishes coming through this holiday season and the one after.

Jes Kirkwood on November 22nd 2021

Zendesk's Director, Product Growth & Monetization Mona Nasiri reveals how the company's growth team drives results in an exclusive interview.

Geoffrey Keating on November 19th 2021

Tray.io uses Segment to calculate a user score that predicts retention. Here's how they built it.

Humberto Oliveira, Seth Familian, Charles Crawford on November 18th 2021

The term 'RFM' stands for Recency, Frequency, and Monetary Value. Tracking these three fairly straightforward characteristics allows you to quickly build complex models about how your customers relate to your brand, and how you should engage with them.

Kelly Kirwan on November 17th 2021

Data cleaning or data cleansing is the process of identifying and removing dirty data. It is a crucial step in ensuring high-quality data for your organization.

Niels Tindbaek, Simone Roscitt on December 2nd 2021

Now more than ever, customers expect personalized, integrated, and frictionless digital experiences - and it’s rapidly changing how businesses build and adapt their customer data infrastructure. 

Kelly Kirwan on December 1st 2021

Are you concerned that data silos are stunting your organization’s growth? Read this to learn what data silos are, why they matter, and how to fix them.

Jes Kirkwood on December 1st 2021

GitHub's VP, Growth Thibault Imbert reveals how the company's growth team drives results in an exclusive interview.

Geoffrey Keating on November 30th 2021

A data pipeline comprises the tools and processes that move data from source to destination. Here’s how to build one.

Geoffrey Keating on November 23rd 2021

All I want for Christmas is a customer data platform that makes it through the holidays.

Not your average wish-list headliner, except among infrastructure teams dealing with peak requests that are at least five times greater than the average load.

An outage during any of the peak shopping season events —Thanksgiving, Black Friday, 11/11 in China, Christmas—can easily cost you your festive mood and your company hundreds of thousands to millions of dollars.

At least 48 prominent brands experienced technical problems on 2020's Black Friday. Luckily, you don't have to put your fate in the hands of Santa or anyone else to make your wish come true. You can turn the holidays—and other peaks—into business as usual with proper and timely preparation.

We've put together five peak-season best practices for your data infrastructure that get you ready for some peaceful time off while the business keeps humming along.

1. Know previous peaks and what's changed since

Start preparing by revisiting the data and lessons learned from last year's holidays or another peak event. While historical performance isn't a mirror of the future, it's a helpful starting point for your preparations.

When making your projections, consider industry data like the above chart for the eCommerce sector, which shows and predicts that overall online sales will keep increasing.

These are common characteristics of peak events you should expect to find in your historical data, according to the Google Cloud blog:

  • Traffic increases of 5 to 20 times (or even greater).

  • Higher conversion rates and a more considerable burden on back-end systems—like payment processing—than on the front end.

  • Rapidly increasing traffic in a short period as the event starts.

  • A trailing decline to normal levels that's much slower than the acceleration to the peak level.

Not every business runs into a peak over the holidays—another good reason to check your historical data before acting.

Looking at Segment's data, for example, on average across all our customers, we saw a decrease of 30% in traffic during Thanksgiving. But for those in online retail, we observed surges of up to 1,000% for several hours during peak time.

Once you've collected your historical data, you want to consider what major changes have happened since then. Ask yourself:

  • What systems have we rolled out or refactored significantly?

  • What major features or services have we launched?

  • What new sources of data have we plugged into our data stack?

  • What types of customers, partners, and other vendors have we added or changed?

Note any such changes, especially if you haven't tested them under significant amounts of load. You'll want to pay special attention to these in the subsequent steps of your preparation.

You'll also want to reach out to colleagues in other departments to get their projections for the upcoming peaks. Forecasts for sales revenue, inventory, and shipping, as well as marketing initiatives, will all give you insights into what kind of volume to plan for.

2. Check in with customers and partners

Most people and businesses make plans for the holiday season. Whether you work directly with consumers (B2C) or other businesses (B2B), reach out and learn their plans so you can further improve yours. Also, talk to partners you rely on—like us, Segment, or cloud providers like AWS or Google—so they can make the necessary preparations, too.

In a B2C business, you want to understand how consumer behavior might change compared to last year. Maybe a new type of device is more popular now or a different social media channel. Such changes can affect loads on different areas of your system or require you to collect new data events. You also want to check which product launches, special offers, and other marketing campaigns look like upcoming holiday hits. Short of coordinating research interviews directly with customers—never a bad idea—heading over to the folks in marketing, growth, or research should get you plenty of insights.

For a B2B-focused company, you want to understand what the businesses you serve are planning for their customers. Your customer should be running through a preparation exercise similar to what we're outlining here. Figure out where you can best support them in this process and the kind of numbers they're expecting, and agree on how you'll cooperate over the holidays.

Some of the B2B customers of Segment, all of which we proactively reach out to to support them during the peak seasons.

Make sure to talk to partners you rely on once you've collected this information from your customers. Albert Strasheim, VP, Segment core engineering, says the cloud isn't infinite. Your vendors face the same peak events as you do, and their other customers compete with you for the same cloud resources.

Reach out early to:

  • Check how your partners can support you. They might have documentation on handling peaks, template configurations for their products or services, and dedicated support teams to take on some of the preparation tasks.

  • Make reservations for the cloud resources you'll need. (More on this later under Create headroom and other buffers, as most but not all cloud resources can scale up automatically.)

  • Establish how and with whom you'll communicate leading up to and during the peak event.

Don't forget to check the existing service-level agreements (SLAs) you have in place with vendors and whether the agreed response times suffice.

3. Prepare with tests, game days, and system checks

You can only determine whether your team, partners, and infrastructure are peak-season-ready through tests and games that simulate actual events as closely as possible.

Since you can't measure, test, or prepare for the unknown, you first need to establish which metrics reflect system health and whether you're capturing such data for monitoring.

Image credit

If you had to pick just four such data points, you can't go wrong with Google's Four Golden Signals:

  • Latency: The time it takes a request—like asking a web page to load—to reach the system.

  • Traffic: The total demand on the system, usually measured in requests per second.

  • Errors: The rate of failing requests, either as an absolute number or a proportion of all requests.

  • Saturation: One or more metrics reflecting the utilization of critical system resources, say what percentage of memory the servers are using or the amount of database storage you have available.

You can complement technical health metrics like the ones above by monitoring critical business numbers such as revenue, website traffic, and orders placed. These indicators can also signal potential problems when they show irregularities.

With the numbers to monitor identified, you'll want to see how different aspects of your data infrastructure hold up under a large volume of simulated requests—load testing.

You want to test the entire customer journey—not just individual elements—under such pressure, so you'll see what the customer experience will be like. Make sure to try several load mixes, like mobile versus desktop, various transaction types, or traffic coming from different regions. These test variations can all reveal particular weaknesses in your system.

You'll also want to check whether spikes in traffic don't set off automated defensive mechanisms in your system that mistake the holiday surge for an attack or other security breach.

While load tests reveal how a system responds to peak traffic, game days show how your teams respond. You'll want to think through potential failures during the holidays and then stage those situations to see where your operational procedures or knowledge are insufficient. Depending on the stakes and your team's size, you might want to run several of these and even include outside vendors.

You could, for example, simulate that one of your primary methods of payment stops working because of an outage at your payment processor. Such a simulation will reveal which teams need to get involved—did anyone think of the added load on customer support such an outage will cause?—and whether you’ve established effective lines of communication with a third party like your payment processor.

4. Create headroom and other buffers

You'll want to create buffers in your system because your earlier plans and test results are estimates—reality can always turn out differently.

At Segment, we look at vital infrastructure components like Kafka, a data event streaming platform, and DynamoDB, a NoSQL database service, to ensure they have headroom—additional space above the peak traffic we expect.

You need to do such checks on all critical pieces of your tech stack. As much as possible, you want to create headroom and other buffers in your system through auto-scaling: the automatic expanding and shrinking of storage, memory, and other resources as traffic goes up or down.

You can configure many modern cloud services like AWS and Google for such auto-scaling. Yet, these processes might not keep up with rapid, five- to tenfold traffic increases. Under such circumstances, manual scaling might still be the most reliable solution. Discuss your plans and projections with your partners to determine which parts of your system can auto-scale and which elements need reservations and manual operation.

Amazon Kinesis is an advanced data product that helps customers analyze their video and data streams in real time. (Screenshot from the Kinesis explainer video.)

A typical issue we see with Segment customers is scaling Kinesis data, which provides real-time insights on video and data streams. It's not impossible to configure auto-scaling for Kinesis, but it's not straightforward either and often overlooked during peak season preparations.

A final, important buffer you want to create is a change freeze across your infrastructure leading up to and during peak events. Changes in one part of a system can trigger unexpected events elsewhere and render all of your preparations useless. At a minimum, such a change freeze should apply to releasing new or updated features. Typically, you want to extend it to the scope of marketing activities and third-party services integrated with or connected to your system.

5. Ensure focus and clarity during the peak event

When you've done all the preparations we've run through, a peak day can unfold much like a regular one. But some folks will have to work or be on call on days they'd rather be home, no matter how much prep work you put in, and they'd better know what to do in case something does go wrong.

You want to establish well in advance the exact responsibilities of each team and which individuals take on which shifts. Do such planning before the festive season starts and spread the burden of working on events such as Thanksgiving, Christmas, and perhaps New Year's evenly across team members.

Make sure to carefully review your standard on-call and incident procedures and see what changes you need to make for peak season by asking questions such as:

  • Are alerts set up for every critical metric you identified earlier?

  • To whom does the first alert go? Who is the fallback if that alert goes unanswered or unnoticed? What are the required response times for different types of alerts?

  • What's the first action someone should take for a specific alert? What happens when the initial procedure, template, or checklist doesn't solve the problem?

  • What does escalation look like? Under what circumstances and how can someone reach the engineering management team or even executives?

Don't just think about internal procedures and contacts when answering such questions. Consider which incidents need third-party involvement and include their contact persons and details in your plans. Segment's support team, for example, have coverage plans in place during holidays, and have team members standing by should a significant issue occur.

Most incidents are manageable if they get detected and handled anywhere between minutes and an hour or two. Albert Strasheim, VP, Segment core engineering, sees customers run into trouble when there's no automated alerting and noticing something is wrong takes more than a few hours.

You can avoid such problems by having virtual or in-office dashboards with your critical metrics and alerts displayed in real time. You might want to create a war room—physical or virtual—for larger operations, where people from all relevant teams work together synchronously and communication is instant.

Get next year's holiday gift ready now

That's the best part of the holiday season: you can be pretty sure the same events will be on the calendar again next year. All the effort you put in now increases the chances of your wishes coming through this holiday season and the one after.

Jes Kirkwood on November 22nd 2021

Zendesk's Director, Product Growth & Monetization Mona Nasiri reveals how the company's growth team drives results in an exclusive interview.

Geoffrey Keating on November 19th 2021

Tray.io uses Segment to calculate a user score that predicts retention. Here's how they built it.

Humberto Oliveira, Seth Familian, Charles Crawford on November 18th 2021

The term 'RFM' stands for Recency, Frequency, and Monetary Value. Tracking these three fairly straightforward characteristics allows you to quickly build complex models about how your customers relate to your brand, and how you should engage with them.

Kelly Kirwan on November 17th 2021

Data cleaning or data cleansing is the process of identifying and removing dirty data. It is a crucial step in ensuring high-quality data for your organization.

Pablo Vidal Bouza on July 15th 2021

How Segment moved from traditional SSH bastion hosts to use AWS Systems Manager SSM to manage access to infrastructure.

Leif Dreizler on March 2nd 2021

Building customer-facing security features in partnership with dev teams helps you better serve your customers, unlocks additional revenue, and bidirectionally transfers knowledge between teams—a concept at the very core of DevSecOps.

Udit Mehta on January 20th 2021

Learn how we use AWS Step Functions for large-scale data orchestration

Nupur Bhade Vilas on October 20th 2021

Meet Twilio Engage: the first growth automation platform designed for the digital era.

Sam Gehret on July 29th 2021

A look at server-side activation as the new alternative to the third-party advertising pixel.

Sudheendra Chilappagari on February 18th 2021

Learn how to use Segment and Twilio Programmable Messaging to send a personalized SMS campaign.

Become a data expert. Subscribe to our newsletter.

Josephine Liu, Sherry Huang on June 9th 2021

Our latest feature, Journeys, empowers teams to unify touchpoints across the end-to-end customer journey.

Kate Butterfield on June 16th 2021

Get an inside look at the design process for Journeys.

Katrina Wong on March 31st 2021

With Segment, brands can leverage their first-party customer data to build deeper customer relationships.

Madelyn Mullen on August 17th 2020

Your business growth depends on empowering every team with good data. Introducing the Segment Data Council, a series of interviews with seasoned customer data experts who know how to build bridges across the organization and empower teams.

Madelyn Mullen on August 17th 2020

Imagine if your PMs had an overview of support tickets, billing issues, sales interactions, and users’ clickstreams—all unified and available via self-service. It would be the Holy Grail of data management. Listen to more in this Data Council episode.

Madelyn Mullen on August 17th 2020

Simply put, data governance leads to better automation. Listen to this Data Council episode to hear how Arjun Grama grew his customer data wrangling techniques to transform product lines at IBM and raise the bar on growth KPIs at Anheuser-Busch InBev.

Madelyn Mullen on August 17th 2020

What does it take for a data driven business case to excite stakeholders across an organization? Tune in to this Data Council episode for an insider perspective from Kurt Williams, Global Director of Customer Products at Anheuser-Busch InBev.

Become a data expert.

Get the latest articles on all things data, product, and growth delivered straight to your inbox.