Amazon S3 Destination

Segment makes it easy to send your data to Amazon S3 (and lots of other destinations). Once you've tracked your data through our open source libraries we'll translate and route your data to Amazon S3 in the format they understand. Learn more about how to use Amazon S3 with Segment.

Getting Started

The Amazon S3 destination puts the raw logs of the data we’re receiving into your S3 bucket, encrypted, no matter what region the bucket is in.

Note: The data is copied into your bucket every hour around the :40 minute mark. You may see multiple files over a period of time depending on how much data is copied.

Keep in mind that Amazon S3 works in a different way from most of our destinations. Using a destinations selector like the integrations object does not affect events in the case of Amazon S3.

Also keep in mind that Amazon S3 works in a different way from most of our destinations. Using a destinations selector like the integrations object will not have any effect on events in the case of Amazon S3.

The diagram below illustrates how the S3 destination works.

Data from your sources is processed by the Segment Tracking API, which collects the Events in batches. When the batches reach a 100 MB, or once an hour, a process in the Segment backend uploads them to a secure Segment S3 bucket, from which they can be securely copied to your own S3 bucket.

Required Steps

  • Create a bucket in your preferred region.
  • Create a folder “segment-logs” inside the bucket.
  • Edit your bucket policy to allow Segment to copy files into the bucket:
{
    "Version": "2008-10-17",
    "Id": "Policy1425281770533",
    "Statement": [
        {
            "Sid": "AllowSegmentUser",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::107630771604:user/s3-copy"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/segment-logs/*"
        }
    ]
}

Note: the Resource property string must end with /*.

Specifically, this adds the ability to s3:PutObject for the Segment s3-copy user for your bucket.

If you have server-side encryption enabled, please see additional required setup.

You can edit your bucket policy in the AWS management console by right-clicking the bucket and then selecting the “edit policy” option.

Lastly, enable the Amazon S3 destination in your Segment destination catalog, and put in your bucket name in the destination settings. It will take about an hour to start receiving data.

Data format

Your logs will be stored as gzipped, newline-separated JSON containing the full call information. For a list of supported properties, you’ll want to check out our Spec docs.

The logs themselves are binned by day, and named according to the following file format:

s3://{bucket}/segment-logs/{source-id}/{received-day}/filename.gz

The received-day will refer to the UTC date unix timestamp, that the files were received by our API, which makes it easy to find all calls received within a certain timeframe.

Encryption

Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)

Segment supports optional, S3-managed Server-Side Encryption, which you can disable or enable from the Desintation Configuration UI. By default, the destination now automatically enables encryption, and we recommend that you continue to encrypt. If you’ve had the S3 destination enabled since before October 2017, you might need to enable encryption manually on your bucket.

While most client libraries transparently decrypt the file when fetching it, you should make sure that any applications that are consume data in the S3 bucket are ready to decrypt the data before you enable this feature. When you’re ready, you can enable encryption from the setting in the destination configuration UI.

Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)

Segment can also write to S3 buckets with Default Encryption set to AWS-KMS. This ensures that objects written to your bucket are encrypted using customer managed keys created in your AWS Key Management Service (KMS). Follow the steps below to enable encryption using AWS KMS Managed Keys:

Create a new customer-managed key and grant the Segment user permissions to generate new keys

The Segment user must have the permission to GenerateDataKey from your AWS Key Management Service. Here is a sample policy document that grants the Segment user the necessary permissions.

{
    "Version": "2012-10-17",
    "Id": "key-consolepolicy-3",
    "Statement": [
        {
            "Sid": "Allow Segment S3 user to generate key",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::107630771604:user/s3-copy"
            },
            "Action": "kms:GenerateDataKey",
            "Resource": "*"
        }
    ]
}

creating customer managed key screenshot

Update S3 bucket default encryption propery

The target S3 bucket should have the “Default encryption” property enabled and set to AWS-KMS. Choose the customer-managed key generated in the above step for encryption.

update default encryption property

Disable ServerSideEncryption in Segment S3 Destination settings

Disable the Server Side Encryption setting in the Segment destination configuration. This allows bucket-level encryption to be enabled, so Amazon can automatically encrypt objects using KMS managed keys.

disable segment s3 destination property

Enforcing encryption

To further secure your bucket by ensuring that all files are uploaded with the encryption flag present, you can add to the bucket policy to strictly enforce that all uploads trigger encryption.

We recommend doing this as a best practice. The following policy strictly enforces upload encryption with Amazon S3-Managed keys.

{
    "Version": "2008-10-17",
    "Id": "Policy1425281770533",
    "Statement": [
        {
            "Sid": "AllowSegmentUser",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::107630771604:user/s3-copy"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/segment-logs/*"
        },
        {
            "Sid": "DenyIncorrectEncryptionHeader",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*",
            "Condition": {
                "StringNotEquals": {
                    "s3:x-amz-server-side-encryption": "AES256"
                }
            }
        },
        {
            "Sid": "DenyUnEncryptedObjectUploads",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*",
            "Condition": {
                "Null": {
                    "s3:x-amz-server-side-encryption": "true"
                }
            }
        }
    ]
}

Region

Segment automatically infers the region of your bucket when we copy data to it, so you do not need to specify a bucket region in your configuration. However, if you’re using VPC Endpoints for your S3 bucket, make sure the endpoint is configured in the same region as your bucket. You can find more information on this in the AWS S3 docs here.

Custom Path Prefix

To use a custom key prefix for the files in your bucket, append the path to the bucket name in the Segment S3 destination configuration UI. For example, a bucket string mytestbucket/path/prefix would result in data being copied to /path/prefix/segment-logs/{source-id}/{received-day}/

How can I download the data from my bucket?

We’ve had the most luck using the AWS CLI and writing a short script to download specific days, one at a time. We’ve found AWS CLI to be significantly faster than s3cmd because it downloads files in parallel.

NOTE: S3 transparently decompresses the files for most clients. However, to access the raw gzipped data you can programmatically download the file using the AWS SDK and setting ResponseContentEncoding: none. This functionality isn’t available in the AWS CLI). You can also manually remove the metadata on the file (Content-Type: text/plain and Content-Encoding: gzip) through the AWS interface, which allows you to download the file as gzipped.

To set up AWS CLI, you’ll need to first install it. There are detailed instructions here, or this will generally work for linux machines:

$ sudo apt-get install awscli

Then you’ll need to configure AWS CLI with your Access Key ID and Secret Access Key. You can create or find these keys in your Amazon IAM user management console. Then run the following command which will prompt you for the access keys:

$ aws configure

Now you’re ready to download some logs!

To see a list of the most recent log folders:

$ aws s3 ls s3://{bucket}/segment-logs/{source-id}/ | tail -10

To download the files for a specific day:

$ aws s3 sync s3://{bucket}/segment-logs/{source-id}/{received-day} .

Or to download all files for a source:

$ aws s3 sync s3://{bucket}/segment-logs/{source-id} .

To put the files in a specific folder replace the . at the end (“current directory”) with the desired directory like ~/Downloads/logs.


Personas

You can send computed traits and audiences generated through Segment Personas to this destination as a user property. To learn more about Personas, reach out for a demo.

For user-property destinations, an identify call will be sent to the destination for each user being added and removed. The property name will be the snake_cased version of the audience name you provide with a true/false value. For example, when a user first completes an order in the last 30 days, we will send an identify call with the property order_completed_last_30days: true, and when this user no longer satisfies we will set that value to false.

When the audience is first created an identify call is sent for every user in the audience. Subsequent syncs will only send updates for those users which were added or removed since the last sync.

Settings

Segment lets you change these destination settings via your Segment dashboard without having to touch any code.

Bucket Name

Your S3 bucket name.

Use Server Side Encryption?

If you enable this setting, the data we copy to your bucket will be encrypted at rest using S3-Managed encryption keys. For more information, see here.

Adding Amazon S3 to the integrations object

To add Amazon S3 to the integrations JSON object (for example, to filter data from a specific source), use one of the 1 valid names for this integration:
  • Amazon S3


  • Questions? Need help? Contact us!
    Can we improve this doc?
    Email us: docs-feedback@segment.com!