Amazon Kinesis Destination

Amazon Kinesis enables you to build custom applications that process or analyze streaming data for specialized needs. Amazon Kinesis Streams can continuously capture and store terabytes of data per hour from hundreds of thousands of sources such as website clickstreams, financial transactions, social media feeds, IT logs, and location-tracking events.

This document was last updated on July 17, 2018. If you notice any gaps, outdated information or simply want to leave some feedback to help us improve our documentation, please let us know!

Getting Started

The first step is to make sure Amazon Kinesis supports the source type and connection mode you’ve chosen to implement. You can learn more about what dictates the connection modes we support here.

WebMobileServer
📱 Device-mode
☁️ Cloud-mode
  1. Create a Kinesis stream. A stream is composed of multiple shards, each of which provides a fixed unit of capacity. The total capacity of the stream is the sum of the capacities of its shards. Each shard corresponds to 1 MB/s of write capacity and 2 MB/s of read capacity. See the Amazon Kinesis Developer Guide for more information on estimating number of shards needed for your stream. Follow these instructions in order to create a new AWS Kinesis Stream.
  2. Create an IAM policy. Sign in to the Identity and Access Management (IAM) console and follow these instructions to Create an IAM policy to allow Segment permission to write to your Kinesis Stream. Select the Create Policy from JSON option and use the following template policy in the Policy Document field. Be sure to change the {region}, {account-id} and {stream-name} with the applicable values.
    {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "kinesis:PutRecord"
               ],
               "Resource": [
                   "arn:aws:kinesis:{region}:{account-id}:stream/{stream-name}"
               ]
           }
       ]
    }
    
  3. Create an IAM role. Follow these instructions to Create an IAM role to allow Segment permission to write to your Kinesis Stream. When prompted to enter an Account ID, enter “595280932656”. Make sure to enable ‘Require External ID’ and enter your Segment Source ID as the External ID*. This can be found by navigating to Settings > API Keys from your Segment source homepage. When adding permissions to your new role, find the policy you created above and attach it.

    Note: If you have multiple sources using Kinesis, enter one of their source IDs here for now and then follow the procedure outlined in the Multiple Sources section at the bottom of this doc once you’ve completed this step and saved your IAM role.

  4. Create a new Kinesis destination. In the Segment source that you want to connect to your Kinesis destination, click the “Add Destination” button. Search and select the Amazon Kinesis destination and enter the options: Role Address, region, stream.

Page

If you haven’t had a chance to review our spec, please take a look to understand what the Page method does. An example call would look like:

  analytics.page();

Identify

If you haven’t had a chance to review our spec, please take a look to understand what the Identify method does. An example call would look like:

analytics.identify('97980cfea0085', {
  email: 'gibbons@initech.com',
  name: 'John Gibbons'
});

Track

If you haven’t had a chance to review our spec, please take a look to understand what the Track method does. An example call would look like:

analytics.track("User Registered", {
  checkinDate: new Date(),
  myCoolProperty: "foobar",
});

Data Model

Let’s say you’re connecting your Segment customer data stream to Kinesis Stream arn:aws:kinesis:{region}:{account-id}:stream/{stream-name}. If you send Segment the following in a track call:

{
  "userId": "user_1",
  "event": "User Registered",
  "properties": {
    "plan": "Pro Annual",
    "account_type" : "Facebook"
  }
}

The Segment Kinesis destination issues a PutRecord request with the following parameters:

kinesis.putRecord({
  Data: new Buffer(JSON.stringify(msg)).toString('base64')
  PartitionKey: msg.userId() || msg.anonymousId(),
  StreamName: 'stream-name'
});

Segment uses the the userId || anonymousId as the PartitionKey. The partition key is used by Amazon Kinesis to distribute data across shards. Amazon Kinesis segregates the data records that belong to a stream into multiple shards, using the partition key associated with each data record to determine which shard a given data record belongs to.

Note: The JSON payload is base64 stringified.

Group

If you haven’t had a chance to review our spec, please take a look to understand what the Group method does.

An example group call is shown below:

analytics.group("0e8c78ea9d9dsasahjg", {
  name: "group_name",
  employees: 3,
  plan: "enterprise",
  industry: "Technology"
});

Troubleshooting

When you get started, we recommend using any of the open source Kinesis tailing utility to validate that data is flowing correctly!

Best Practices

Multiple Sources

If you have multiple sources using Kinesis/Firehose, you have two options:

Attach multiple sources to your IAM role

Find the IAM role you created for this destination in the AWS Console in Services > IAM > Roles. Click on the role, and navigate to the Trust Relationships tab. Click Edit trust relationship. You should see a snippet that looks something that looks like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::595280932656:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "YOUR_SEGMENT_SOURCE_ID"
        }
      }
    }
  ]
}

Replace that snippet with the following, and replace the contents of the array with all of your source IDs.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::595280932656:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": ["YOUR_SEGMENT_SOURCE_ID", "ANOTHER_SOURCE_ID", "A_THIRD_SOURCE_ID"]
        }
      }
    }
  ]
}

Use a single secret ID

If you have so many sources using Kinesis that it is impractical to attach all of their IDs to your IAM role, you can instead opt to set a single ID to use instead. This approach should be avoided in favor of the above approach if possible since it will result in you having to keep track of a secret value. To set this value, go to the Kinesis destination settings from each of your Segment sources and set the ‘Secret ID’ to a value of your choosing. This value is a secret and should be treated as sensitively as a password. Once all of your sources have been updated to use this value, find the IAM role you created for this destination in the AWS Console in Services > IAM > Roles. Click on the role, and navigate to the Trust Relationships tab. Click Edit trust relationship. You should see a snippet that looks something that looks like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::595280932656:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "YOUR_SEGMENT_SOURCE_ID"
        }
      }
    }
  ]
}

Replace your source ID (found at “YOUR_SEGMENT_SOURCE_ID”) with your secret ID.


Personas

You can send computed traits and audiences generated through Segment Personas to this destination as a user property. To learn more about Personas, reach out for a demo.

For user-property destinations, an identify call will be sent to the destination for each user being added and removed. The property name will be the snake_cased version of the audience name you provide with a true/false value. For example, when a user first completes an order in the last 30 days, we will send an identify call with the property order_completed_last_30days: true, and when this user no longer satisfies we will set that value to false.

When the audience is first created an identify call is sent for every user in the audience. Subsequent syncs will only send updates for those users which were added or removed since the last sync.

Settings

Segment lets you change these destination settings via your Segment dashboard without having to touch any code.

AWS Kinesis Stream Region

The Kinesis Stream’s AWS region key

Role Address

The address of the AWS role that will be writing to Kinesis (ex: arn:aws:iam::874699288871:role/example-role)

Secret ID

If you have so many sources that it’s impractical to attach all of their source IDs as external IDs to your IAM role, you can specify a single external ID here instead and attach that as an external ID to your IAM role. This value is a secret and should be treated as a password.

AWS Kinesis Stream Name

The Kinesis Stream Name

Use Segment Message ID

You can enable this option if you want to use the Segment generated messageId for the Partition Key. If you have issues with too many provisionedthroughputexceededexceptions errors, this means that your Segment events are not being evenly distributed across your buckets as you do not have even user event distribution (default partition key is userId or anonymousId). This option should provide much more stable and even distribution.

Adding Amazon Kinesis to the integrations object

To add Amazon Kinesis to the integrations JSON object (for example, to filter data from a specific source), use one of the 1 valid names for this integration:
  • Amazon Kinesis


  • Questions? Need help? Contact us!
    Can we improve this doc?
    Email us: docs-feedback@segment.com!