Testing your Segment configuration is a best practice before moving things into production.
Instinctively you could copy a production data set and use it for testing purposes. But what if your tests actually include filtering out sensitive data? What if tests need to be repeated with every release? How do you build a framework that allows you to test your Segment implementation at scale (including all of the previous checks) without compromising your actual user’s privacy if the test ends up badly?
The answer is synthetic data. Synthetic data refers to artificially generated data that mimics the statistical properties and patterns of real-world data, without containing any personally identifiable information (PII) or sensitive details.
The advantage? None of the users actually exist, nor have the actions taken place. This makes it ideal for testing with zero risk.
Recipe
Clone the h37-fakestream repository.
Open it in your favorite IDE.
Create a new source in Segment which will be used for ingesting the test data. We recommend an HTTP source.
Copy your write key.