Postgres Destination

Heroku Postgres

This guide will explain how to set up a Postgres database with Heroku. Heroku is a cloud platform as a service which simplifies the process of setting up and administering a Postgres database.

  1. Signup for a Heroku account

    The first step to setting up Postgres on Heroku is to get a Heroku account. You can sign up for a free account here.

  2. Log in to your Heroku account

    You can log in to Heroku here.

  3. Go to the Databases page

    To get to the Databases page, either click here or in the Dashboard menu dropdown, select Databases.

  4. Create a Database

    Once you are at the Databases page, you will see a button called Create Database in the top right-hand corner. Click this button.

  5. Choose your plan

    Once you press the Create Database button, a modal will pop-up with choices of plans. More expensive plans will yield better performance and more storage. Depending on your needs, some plans may be more suitable than others. We recommend new customers start with the Standard 4 plan.

    In addition, for the fastest sync times, select the US East region. If you must store data elsewhere, let us know.

    After you are done configuring these settings, click Add Database.

  6. Wait for the database provisioning to complete

    You may have to refresh the page to have the database show up. After a few moments, the database should shift into the Available state.

  7. View the database overview and settings

    To access the database overview and settings, click on the database. You will see a screen with Connection Settings. You will need these settings for the next steps of the connection process.

RDS Postgres

This guide will explain how to set up a Postgres database with Amazon Relational Database Service (RDS). RDS simplifies the process of setting up and administering a Postgres database.

As a supplement to this guide, Amazon has created an official guide to setting up a Postgres database.

  1. Log in to your AWS account

    If you don’t have an AWS account, you can sign up for an account by visiting the AWS homepage and clicking ‘Create an AWS Account’ in the top right-hand corner.

  2. Open the RDS Console

    Go to the RDS console when you are logged in to AWS.

  3. Select the region you’d like to place the database in

    In the top right-hand corner of the console, you should see a drop-down with various AWS regions. We suggest putting your database in US West for the best performance.

  4. Launch a DB Instance

    Go to the Instances tab on the sidebar on the left, and then click the button that says Launch DB Instance.

  5. Select the PostgreSQL Engine

    Click on the PostgreSQL icon and then click the Select button.

  6. Select whether or not you’d like to use the database for production purposes

    There are two differences between the production and non-production options on this screen.

    Multi-AZ Deployment means that Amazon will maintain an additional database machine in a separate availability zone. An availability zone is a datacenter that is independent of other availability zones in the same geographic region. In the case of the failure of one of the database machines or availability zone loss, Amazon will automatically transition to using the other database machine, resulting in no downtime.

    Provisioned IOPS helps to guarantee the disk I/O performance of a database. Due to the fact that databases often cannot keep all of their data in RAM, they must store some data on disk. When running queries, the database may have to read data from the disk. With Provisioned IOPS, Amazon guarantees that disk will be able to perform a certain number of reads and writes a second.

    If you anticipate high utilization on your Postgres database or downtime is unacceptable, please choose Yes. If you don’t plan to have high-utilization of your database or periods of downtime are acceptable and you know how to recover from them, choose No.

    After choosing, press Next Step.

  7. Specify the DB Details

    This screen will prompt you to specify details about the database. If you’re not sure about an option, there are some details in this document below, and also on the sidebar when edit an option. The instance specifications options are:

    License Model: only has one option, so choose the default.

    DB Engine Version: specifies the version of Postgres to use. If you aren’t sure which version you’d like to use, the default is fine.

    DB Instance Class: selects the machine your database will run on. If you’re not sure what DB instance class is suitable for your database, check the DB Instance Classes chartand the Pricing Page.

    Multi-AZ Deployment: whether or not you want a backup machine on standby. The pricing is equivalent to running two instances.

    Storage Type: specifies the type of disk you’d like to use for the instance. From the sidebar information:

    • General Purpose (SSD)storage is suitable for a broad range of database workloads. Provides baseline of 3 IOPS/GB and ability to burst to 3,000 IOPS.
    • Provisioned IOPS (SSD)storage is suitable for I/O-intensive database workloads. Provides flexibility to provision I/O ranging from 1,000 to 30,000 IOPS.
    • Magnetic storage may be used for small database workloads where data is accessed less frequently.

      Provisioned IOPS allows you to specify what performance guarantees you’d like on disk I/O.

      The database settings are:

      DB Instance Identifier is a unique identifier for the database. The ID must be unique for your account in a single region.

      Master Username is the username you will use to log in to the instance.

      Master Password is a password that is 8 to 128 ASCII characters long that doesn’t contain the characters /, “, or @.

      After you are done configuring all of these settings, press Next Step.

  8. Configure the advanced settings

    The options for Network & Security are:

     - VPC specifies the Virtual Private Cloud you want the servers to reside in. If you have previously set up a VPC that you want the database in, select it here. If you aren’t sure or don’t have a VPC set up, select Create New VPC
     - Subnet Group specifies the subnets that the DB instances can use in the VPC. If you’re not sure, select Create new DB Subnet Group
     - Publicly Accessible specifies whether your DB instances are internet-addressable. This option must be set to Yes.
     - Availability Zone specifies which availability zone you want the instances to reside in. If you have a preference, you can set it here, else leave it on the No Preference default.
     - VPC Security Groups specify traffic rules concerning what traffic can leave the instances and what traffic can arrive at the instance. Unless you’ve previously made a security group specifically for DB instances, it’s best to create a new one.
    

    The options for Database Options are:

     - Database Name is an optional value for a Postgres database to be created at instance startup. We highly recommend filling this out to avoid manual creation of the database unless you have a good reason to create a database manually. This value must be 8 characters or less. If you fill this out, keep a note of what the value is.
     - Database Port specifies what port the DB listens on. The default of 5432 is fine.
     - DB Parameter Group specifies the configuration applied to the database. If you haven’t created a custom parameter group that you want to use, choosing the default is fine.
     - Option Group specifies additional options of the database. At the time of writing, option groups are not available for Postgres.
     - Copy Tags To Snapshots specifies whether you want the tags metadata on DB instances copied to corresponding instance snapshots.  It’s fine to leave it on the default, but you can [learn more about it here.](http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Tagging.html)
     - Enable Encryption allows you to specify whether you want the database and snapshots to be encrypted. If you choose to enable encryption, your data will be encrypted with AES-256, both in the instances themselves and in data at rest. There are some limitations though, which you can[read about here](http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.Encryption.html). You will also have to specify a key in the AWS Key Management Service. If you select Yes, another option will appear to allow you to select what key you’d like to use.
    

    The options for Backup are:

     - Backup Retention Period specifies how long you want to retain automatic point-in-time recovery backups. Specifying a longer period of time will increase cost.
     - Backup Window allows you to select the time of day you’d like backups to occur. If you have no preference, select No Preference.
    

    The options for Maintenance are:

     - Auto Minor Version Upgrade allows you to choose whether or not the database automatically receives minor version upgrade.
     - Maintenance Window allows you to select a period of time that you prefer updates and other maintenance to be applied. If you select No Preference, a random time period will be picked. We recommend choosing a time window where usage is historically low.
    

    When you’re finished choosing settings, press Launch DB Instance.

  9. Wait for DB Instance to be Created

    In the Instances tab, you should be able to observe the DB instance being created. After a while, the status should change to Available.

Network Permissions for Segment to RDS

This guide will help you change permissions on your Amazon Relational Database Service (RDS) instance to allow Segment to connect.

  1. Open the RDS Console

    To get to the RDS console when you are logged in to AWS, visit this page.

  2. Go to the Instances tab

    On the right-hand sidebar, click Instances

  3. Expand the DB Instance information

    You can do this by clicking the arrow near the left hand side of the row.

  4. Go to the tab with the magnifying glass

    This is the tab pictured above. This tab will show many details, including the Security Groups.

  5. Click on the security group

    This should bring you to a page to configure the active security group.

  6. Click on the Inbound tab

    This should bring you to a screen that looks like this

  7. Click Edit to add a new rule

  8. Click Add Rule

    Select PostgreSQL as the type. For Source, change the custom IP to 52.25.130.38/32. This will allow Segment to connect to the instance. Press Save when done.

Segment should be able to connect to your database now!

Compose Postgres

Compose is the first DBaaS (Database as a Service) of its kind, geared at helping developers spend more time building their applications rather than wrestling with database provisioning and maintenance. Compose provides easy to deploy and scale data stores and services in many flavors: PostgreSQL, MongoDB, RethinkDB, Elasticsearch, Redis, etcd, and RabbitMQ.

Using Compose, companies can deploy databases instantly with backups, monitoring, performance tuning, and a full-suite of management tools. Compose Enterpise brings all this to the corporate VPC (virtual private cloud).

Compose uses Segment for hooking together web analytics, email, and social tracking and manages its Segment warehouse on PostgreSQL. Compose is pleased to be able to harness [the power of Postgres to query Segment data and be able create custom reports.

  1. Setup PostgreSQL

    If you don’t yet have an account with Compose, sign-up and select the PostgreSQL database to get started.

    For those of you already on Compose, if don’t yet have a PostgreSQL instance, you can add one from the Deployments page in the management console by clicking “Create Deployment” then selecting PostgreSQL or just add a PostgreSQL deployment to your account.

    Once your PostgreSQL deployment is spun up, you may want to create a user to be the owner of the database you’ll use for Segment. There is already an admin user role that is generated on initialization of your deployment, but this user has full privileges for your deployment so you may want to create additional users with more specific privileges. You may also want to manually scale up your deployment for the initial load of Segment data since it loads the past 2 months of data by default. You can then scale it back down according to your data needs after the initial load. The easy-to-use management console lets you perform these tasks, monitor your deployments, configure security settings, manage backups, and more.

    Now, all you need to do is create a database where your Segment data will live. You can create a database directly from the Data Browser interface in the Compose management console, by using a tool such as the pgAdmin GUI or programmatically via code you’ve written. For simplicity, this database is simply named “segment” and associated it to the “compose” user as the owner. Here is the SQL statement to create the database for Segment data, using the default PostgreSQL arguments (set yours appropriately to your requirements):

    SQL

     CREATE DATABASE segment
     WITH OWNER = compose
     ENCODING = 'SQL_ASCII'
     TABLESPACE = pg_default
     LC_COLLATE = 'C'
     LC_CTYPE = 'C'
     CONNECTION LIMIT = -1;
    

    And that’s it! You don’t even need to create any tables - Segment will handle that for you.

  2. Browse & Query

    And now the fun part - browsing and querying the data!

    You’ll notice in your PostgreSQL database that a new schema has been created for each source that was synced. Under the production source schema a whole bunch of tables were created. You can see the tables in the Compose data browser “Tables” view:

    When the Segment data is loaded to the PostgreSQL database, several tables are created by default: aliases, groups, identifies, pages, screens and tracks. You may also have accounts and users tables if you use unique calls for groups and for identifies. To learn more about these default tables and their fields, see the Segment schema documentation.

    All of the other tables will be event-specific, according to the event names and properties you use in your tracks calls. The number of tables will depend on the number of unique events you’re tracking. For example, at Compose, there is a track call for when customers view their deployments such as:

    JavaScript

     analytics.track('deployments_show', {
     deployment_name: 'heroic-rabbitmq-62',
     deployment_type: 'RabbitMQ'
     });
    

    In the Postgres Segment database, there will then be a table named “deployments_show” which can be queried for that deployment to see how many times it was viewed:

    SQL

     SELECT COUNT(id)
     -- Don't forget the schema: FROM <source>.<table>
     FROM production.deployments_show
     WHERE deployment_name = 'heroic-rabbitmq-62';
    

    The result is 18 times in the past 2 months by a particular database user. To verify, just join to the identifies table, which contains user data, through the user_id foreign key:

    SQL

     SELECT DISTINCT i.name
     FROM production.identifies i
     JOIN production.deployments_show ds ON ds.user_id = i.user_id
     WHERE ds.deployment_name = 'heroic-rabbitmq-62';
    

    A more interesting query for this, however, might be to see how many deployments were created in November using the “deployments_new” event:

    SQL

     SELECT COUNT(DISTINCT id)
     FROM production.deployments_new
     WHERE original_timestamp &gt;= '2015-11-01'
     AND original_timestamp &lt; '2015-12-01';
    

    In this way custom reports can be created for analysis on the tracking data, using SQL as simple or as complex as needed, to expose insights which the Segment-integrated tracking tools may not be able to do easily.

Database Setup

After you have set up your Postgres database running, there are a few good steps to take before making the connection in Segment to get the best experience.

Database User + Permissions

Depending on how exactly you set up your database, you probably have an admin username/password. While you can give this directly to Segment, we would prefer that you create a separate user for us (and any other 3rd-parties that might connect with your database) in order to promote isolation and make it easier to audit access.

-- create a user named "segment" that Segment will use when connecting to your Redshift cluster.
CREATE USER segment WITH PASSWORD '<enter password here>';

-- allows the "segment" user to create new schemas and temporary tables on the specified database.
GRANT CREATE TEMPORARY ON DATABASE '<enter database name here>' TO 'segment';

Connect with Segment

  1. Open up Segment in another browser window or tab

    Visit the Segment Workspaces screen. Click on the workspace you’d like the database to be associated with.

  2. Go to the Warehouses tab

    In the Workspace sidebar, you will see a Warehouses tab. Click this tab. If for some reason you don’t see this tab, pleasecontact support.

  3. Click the Add Warehouse button.

  4. Configure the Database Connection

    Select Postgres database from the ‘Choose Your Warehouse’ dropdown. Then, copy the relevant settings into the text fields on this page and then press the Connect button.

  5. Verify the database connected successfully

    You should see a message indicating that the connection was successful. If the connection wasn’t successful, double-check that you entered the settings correctly. If it still isn’t working, feel free to contact us!

Sync schedule

Your data will be available in Warehouses within 24-48 hours after your first sync. Your warehouse will then be on a sync schedule based on your Warehouse Plan.

Segment allows you to schedule the time and frequency of loading data into your data warehouse.

You can schedule your warehouse syncs by going to Warehouse > Settings > Sync Schedule. You can schedule up to the number of syncs allowed on your billing plan.

sync schedule image

Query Speed

The speed of your queries depends on the capabilities of the hardware you have chosen as well as the size of the dataset. The amount of data utilization in the cluster will also impact query speed. For Redshift clusters if you’re above 75% utilization, you will likely experience degradation in query speed.

All the power of PostgreSQL for Segment data - it’s that easy. Segment has written an article on useful queries that you may want to read to get started on pulling data and creating reports from your warehouse.

Once you’ve got the data in Postgres, you can do so much more with it, too. One idea might be to develop an app that performs various functions based on the different events being loaded to the database, potentially using RabbitMQ as your asynchronous message broker. An example could be that you want a banner to display on your website once your 1000th customer has signed up. The data is at your fingertips; you just need to decide how you want to use it.


If you have any questions or see anywhere we can improve our documentation, please let us know or kick off a conversation in the Segment Community!