Amazon S3 Destination

Segment makes it easy to send your data to Amazon S3 (and lots of other destinations). Once you've tracked your data through our open source libraries we'll translate and route your data to Amazon S3 in the format they understand. Learn more about how to use Amazon S3 with Segment.

Getting Started

The Amazon S3 destination puts the raw logs of the data we’re receiving into your S3 bucket, encrypted, no matter what region your bucket is in.

Note: The data is copied into your bucket every hour around :40 minute mark. You may see multiple files over a period of time depending on the volume of data copied.

Required Steps

  • Create a bucket in your preferred region.
  • Create a folder “segment-logs” inside the bucket.
  • Edit your bucket policy to allow Segment to copy files into the bucket:
{
    "Version": "2008-10-17",
    "Id": "Policy1425281770533",
    "Statement": [
        {
            "Sid": "Stmt1425281765688",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::107630771604:user/s3-copy"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/segment-logs/*"
        }
    ]
}

Note: the Resource property string must end with /*.

Specifically, this adds the ability to s3:PutObject for the Segment s3-copy user for your bucket.

You can edit your bucket policy in the AWS management console by right-clicking the bucket and then selecting the “edit policy” option.

Lastly, enable the Amazon S3 destination in your Segment destination catalog, and put in your bucket name in the destination settings. It will take about an hour to start receiving data.

Data format

Your logs will be stored as gzipped, newline-separated JSON containing the full call information. For a list of supported properties, you’ll want to check out our Spec docs.

The logs themselves are binned by day, and named according to the following file format:

s3://{bucket}/segment-logs/{source-id}/{received-day}/filename.gz

The received-day will refer to the UTC day that the files were received by our API, which makes it easy to find all calls received within a certain timeframe.

Encryption

Segment supports S3-managed Server-Side Encryption via an optional setting, which you can disable or enable in our UI. The default behavior of the destination since we added the feature is to auto-enable encryption, and we recommend that you keep it enabled. If you’ve had the S3 destination enabled since before October 2017, encryption is likely not enabled on your bucket yet. While most client libraries will transparently decrypt the file when fetching it, you should make sure that any applications that are dependent on the S3 bucket will be ready to decrypt the data in the bucket before enabling this feature. Once you have, we you can enable it by toggling the setting in the destination configuration UI.

If you’d like to further secure your bucket by ensuring that all files be uploaded with the encryption flag present, you may augment the bucket policy to strictly enforce that all uploads trigger encryption. We recommend doing so as a best practice. The following policy does just that:

{
  "Version": "2012-10-17",
  "Id": "PutObjPolicy",
  "Statement": [
    {
      "Sid": "DenyIncorrectEncryptionHeader",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::YourBucket/*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "AES256"
        }
      }
    },
    {
      "Sid": "DenyUnEncryptedObjectUploads",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::YourBucket/*",
      "Condition": {
        "Null": {
          "s3:x-amz-server-side-encryption": "true"
        }
      }
    }
  ]
}

Region

We automatically infer the region of your bucket in the process of copying data to it, so there is no additional configuration required to target a bucket region. However, if you’re using VPC Endpoints for your S3 bucket, make sure the endpoint was configured in the same region as your bucket. You can find more information on this in the AWS S3 docs here.

Custom Path Prefix

If you want to use a custom key prefix for the files in your bucket, just append the path to the bucket name in the Segment S3 destination configuration UI. For example, a bucket string mytestbucket/path/prefix would result in data being copied to /path/prefix/segment-logs/{source-id}/{received-day}/

How can I download the data from my bucket?

We’ve had the most luck using the AWS CLI and writing a short script to download particular days, one at a time. We’ve found AWS CLI to be significantly faster than s3cmd because it downloads files in parallel.

NOTE: S3 transparently decompresses the files for most clients. However, if you would like to access the raw gzipped data for whatever reason, you can programmatically download the file using their SDK and setting ResponseContentEncoding: none (doesn’t work on the CLI). You can also manually remove the metadata on the file (Content-Type: text/plain and Content-Encoding: gzip) through the AWS interface, which will allow you to download the file as gzipped.

To set up AWS CLI, you’ll need to first install it. There are detailed instructions here, or this will generally work for linux machines:

$ sudo apt-get install awscli

Then you’ll need to configure AWS CLI with your Access Key ID and Secret Access Key. You can create or find these keys in your Amazon IAM user management console. Then run the following command which will prompt you for the access keys:

$ aws configure

Now you’re ready to download some logs!

To see a list of the most recent log folders:

$ aws s3 ls s3://{bucket}/segment-logs/{source-id}/ | tail -10

To download the files for a specific day:

$ aws s3 sync s3://{bucket}/segment-logs/{source-id}/{received-day} .

Or to download all files for a source:

$ aws s3 sync s3://{bucket}/segment-logs/{source-id} .

To put the files in a specific folder replace the . at the end (“current directory”) with the desired directory like ~/Downloads/logs.


Supported Sources and Connection Modes

WebMobileServer
📱 Device-based
☁️ Cloud-based

To learn more about about Connection Modes and what dictates which we support, see here.

Settings

Segment lets you change these destination settings via your Segment dashboard without having to touch any code.

Bucket Name

Your S3 bucket name.

Use Server Side Encryption?

If you enable this setting, the data we copy to your bucket will be encrypted at rest using S3-Managed encryption keys. For more information, see here.


If you have any questions or see anywhere we can improve our documentation, please let us know or kick off a conversation in the Segment Community!