Any Industry Engineering Intermediate

Privacy-First Testing: The Power of Synthetic Data in Segment Setup

Explore synthetic data in testing Segment configurations, ensuring rigorous validation at scale without compromising user privacy, and unlocking a new level of confidence in your implementation process.

Made by Human37

Instinctively you could copy a production data set and use it for testing purposes. But what if your tests actually include filtering out sensitive data? What if tests need to be repeated with every release? How do you build a framework that allows you to test your Segment implementation at scale (including all of the previous checks) without compromising your actual user’s privacy if the test ends up badly?

The answer is synthetic data. Synthetic data refers to artificially generated data that mimics the statistical properties and patterns of real-world data, without containing any personally identifiable information (PII) or sensitive details.

The advantage? None of the users actually exist, nor have the actions taken place. This makes it ideal for testing with zero risk.

Recipe

Clone the h37-fakestream repository.
Open it in your favorite IDE.
Create a new source in Segment which will be used for ingesting the test data. We recommend an HTTP source.
Copy your write key.

In the FakeStream.js file navigate to line 6. Remove {{INSERT KEY HERE}} and replace it with your own write key.

const { faker } = require("@faker-js/faker");
const { Analytics } = require("@segment/analytics-node");

// Initialize Segment client with your write key
const analytics = new Analytics({
  writeKey: "{{INSERT KEY HERE}}",
});

The code between lines [12-47] include mock events and their properties. Replace them with the actual events you would like to send to Segment for testing purposes. Including multiple property values will randomize the property value that is being sent. To ensure you send in an event with each of the properties you can either:

Only have events with a single property value and duplicate the event with a different property value.
Ensure the probability is set to 1.0 at the event level and generate a sufficiently large number of users. This will be tackled later in this guide.

// Define event names and their associated properties with probabilities
const events = [
  {
    name: "CTA Selected",
    properties: {
      cta_type: ["Sign-up Now", "Sign-up For Free"],
    },
    probability: 1.0,
  },
  {
    name: "Upgrade Initiated",
    properties: {},
    probability: 1.0,
  },
  {
    name: "Plan Selected",
    properties: {
      plan_type: ["Free", "Paid"],
    },
    probability: 0.9,
  },
  {
    name: "Payment Details Completed",
    properties: {
      payment_type: ["Visa", "Paypal", "Crypto"],
      payment_plan: ["Monthly", "Yearly"],
    },
    probability: 0.7,
  },
  
  {
    name: "Upgrade Completed",
    properties: {},
    probability: 0.4,
  },
];

The lines [50-61] can be used to set specific user properties.

function generateUser() {
  const userId = faker.string.uuid();
  const name = faker.person.fullName();
  const userTraits = {
    company: faker.company.name(),
    name: name,
    email: `${name.split(" ")[1].toLowerCase()}${Math.round(
      Math.random() * 100
    )}@fakeHuman.37`,
  };
  return { userId, userTraits };
}

Finally, decide how many users you would like to generate in line 10.

const numUsers = 10;

Run the script either by using your IDE’s “run” command or by executing the following command in the terminal while in the Fakestream root folder:

Node FakeStream.js

Data will start flowing into Segment via your selected source.

Application of FakeStream for Segment testing purposes

Within Segment, the FakeStream can validate if:

Protocols work properly
Destination specific filters work properly
Functions work properly

Summary

In this recipe, we covered how to use Human 37’s FakeStream in Segment to ensure safe testing, protecting the privacy of real customers.

Getting started is easy

Start connecting your data with Segment.

Get a demo Create a free account

Privacy-First Testing: The Power of Synthetic Data in Segment Setup

On this page

Recipe

Application of FakeStream for Segment testing purposes

Summary

Getting started is easy