Custom Sources

NOTE: Custom Sources (Source Functions) are currently in beta. The use is governed by (1) Segment First Access and Beta Terms and Conditions and (2) Segment Acceptable Use Policy.

Custom Sources allow you to gather data from thousands of Cloud Applications without having to worry about setting up or maintaining any infrastructure. Custom Sources are small pieces of code that you upload to a Segment Source to translate webhook or POST events into events or objects that match the Segment Spec.

Here are some example uses cases we’ve heard from customers that you can use Source Functions to implement in your organization:

  • LTV computed trait using Zuora subscription data: A customer hooked up Zuora webhooks to a webhook source and sent that data into Personas where they used a computed trait to calculate the LTV for the customer in real-time.
  • Started an onboarding campaign when a customer is marked “Closed Won” in Pipedrive: One customer setup an automated onboarding email campaign in Iterable that was triggered when the lead closed in Pipedrive.
  • Combine mobile subscription revenue with web revenues: A major media player was able to connect their mobile and web revenues together for the first time and analyze this data together in Amplitude.

Here is an example of what a Source Function could be used to do with a webhook from Drift.

Embed Analytics.framework

Getting Started

Setup your source

In the sources catalog, search for “Custom Source”. From there you will need to add a name to your source and click the “Add Source” button.

Embed Analytics.framework

Write your source

Next step is to write your source to transform the webhook payload to send events/objects downstream.

Building your function

From the source overview page, click on “Write New Function” to open the web based editor.

Embed Analytics.framework

This will open up a new tab where you’ll find the development environment. The editor will come filled with a template that you can use to get started. The editor comes with three key areas to help you get started: Editor, Testing Environment and Deploy.

Embed Analytics.framework

Templates

To help you get started, we’ve written some templates of common sources to help you get started:

You can view the full list of templates available here.

Accessing the webhook payload

You can access the body, headers and query parameters of the function through the following:

exports.processEvents = async (event) => {
  event.payload.body;
  event.payload.headers;
  event.payload.queryParameters;
}

The payload is wrapped in a “payload” object so that you can access each piece of the request independently.

Note that the payload that you receive from the webhook will not be wrapped in the payload object.

Function Output

In order to simplify the function, we will handle all of the identify, group and track events. The only requirement is that your function return a very specific object. The object must be in the format outlined below. You can mix and match identify, track and group events in whatever order you would like just as long as the individual objects are returned inside of the events or objects array with the required values.

{
    events: [{
        type: 'identify',
        userId: '1234',
        traits: {}
    },
    {
        type: 'track',
        event: 'Event Name',
        userId: '1234',
        properties: {}
    },
    {
        type: 'group',
        userId: '1234',
        groupId: '1234',
        traits: {}
    }],
    objects: [{
      collection: 'rooms',
      id: "String of Object ID",
      properties: {}
    }]
}
Response KeyValuesDescription
eventsArrayTop level value. All events should be contained within this array. These events get sent to our tracking API. Can only have one key called events returned from the function.
objectsArrayTop level value. All objects should be contained within this array. These objects get sent to our objects API. Can only have one key called objects returned from the function.

Events

Events are used to trigger real-time workflows in downstream streaming destinations. These events are compatible with both streaming destinations and warehouses. Events should match the following return format from the function:

Response KeyValuesDescription
typeStringThe type of event you want to track: identity, track, group, screen or page. All values should be lower case.
eventStringThe name of the event you want to fire. This is only valid if type='track'.
userIdStringThe user ID you want to associate the event with. If the type of the call is group then use the groupId.
groupIdStringThe Account or Group ID of the user. Only used if type=group.
propertiesObjectsA list of key/value pairs that will be sent as properties to the event.

Objects

Objects are pieces of data that you can ETL (expand, transform, load) to your warehouse. Objects are not compatible with streaming destinations.

Response KeyValuesDescription
collectionStringThe collection will translate to the name of the table in your warehouse. Examples: products, rooms, leads. Naming should be lower case and plural.
idStringThe unique object ID. Any future objects with the same Object ID will be upserted, de-duped and merged.
propertiesObjectA list of key/value pairs that will be sent as properties of the object. These will translate to columns downstream in your warehouse.

Uploading a function with dependencies

Does your function have dependencies? If your function has dependencies and you need to run npm install then please proceed to [build and deploy your function with the CLI]. Dependencies are not currently supported using the web based IDE but we’re looking to add that in the near future.

Deploy your source

Click on “Save and Deploy” to deploy your function. This will deploy your function and override any currently deployed function for that source. You can see that your function has been successfully deployed by refreshing your source overview page. You will see “Function uploaded” on your source overview page.

Copy your webhook URL to the upstream source

Back on the Source Overview page, copy the source function URL and paste it into your upstream webhook settings. Once you’ve added the event, then trigger an event in that source. You should now see events appearing in the debugger.

Using the CLI

The command line client allows you to write a source using your own editor. You can use a continuous integration server to test it and upload your function using the CLI.

Download the CLI Client

curl https://raw.githubusercontent.com/segmentio/functions-cli-bin/master/install.sh | sh

Authenticate and upload your Function

First create an access token by following these directions: https://segment.com/docs/config-api/authentication/#create-an-access-token. Then, create a file in your home directory: ~/.source-fn-config.json

{
   "token": "<token created in the workspace>",
   "workspace": "<workspace name>"
}

Zip your Function

zip function.zip handler.js

To zip your function with dependencies

zip -r function.zip .

Upload your Function

source-functions-cli upload --file function.zip --source <source slug>

Parameters: --file - The zipped up file from Step 4 --source - The source slug you want to upload to

Debugging after a function has been uploaded When you have real events flowing through the function and you’re still not seeing the events in the debugger appear, you can see logs from your source function locally to debug and understand what’s going on. Use the following command to get the last 100 lines of output from the function:

source-functions-cli logs --source <source name>

Debugging and Troubleshooting

To debug your function in the web IDE, you can use the built in testing environment on the right hand side of the development centre.

The first thing you will need to do is understand how the events sent from your upstream webhook. To do this, you can go to: https://requestbin.com and copy the webhook URL into your upstream source. You will then want to trigger an event from your upstream source. RequestBin will then display the payload including the body, query parameters and headers. You will need to copy this payload to display it in the Developer Centre Testing environment below:

Note: Remove the comments to ensure your test event contains valid JSON.

Non-JSON Payloads

We currently support non-JSON payloads by way of the incoming event.payload.body. This will be your primary access point to non-JSON data. If the data is not JSON, for example if it’s form-encoded data then you will need to parse it out of event.payload.body yourself.

For form-encoded data it could look something like this:

exports.processEvents = async (event) => {
  let jsonEvent = JSON.parse(event.payload.body)
  return transform(jsonEvent)
}

Accessing Header data

You can access header data via the event.payload.headers. With the type of data that is in a header, it’s usually best accessed like this:

exports.processEvents = async (event) => {
  let headers = event.payload.headers['User-Agent']
  return transform(headers)
}

Manually triggering a function

Manually send a payload to source function with curl:

curl -X POST \
  'https://fn.segmentapis.com/?b=your-key-here' \
  -H 'Content-Type: application/json' \
  -d '{"key": "value"}'

Alternatively the key can be in an HTTP header

curl -X POST \
  'https://fn.segmentapis.com/' \
  -H 'X-Segment-Source-Function-Token: your-key-here' \
  -H 'Content-Type: application/json' \
  -d '{"key": "value"}'

FAQ

What is the retry policy for a webhook payload?

The webhook payload will retry up to 5 times with an exponential backoff for the function in the event that there is a failure with the function. After 5 attempts, the message will be dropped.

What is the maxium payload size for the incoming webhook?

The maxium payload size for an incoming webhook payload is 2MB.

What is the timeout for a function to execute?

The lambda execution limit is to 1 second.



Questions? Need help? Contact us!
Can we improve this doc?
Email us: docs-feedback@segment.com!