Postgres Destination

Segment makes it easy to send your data to (and lots of other destinations). Once you've tracked your data through our open source libraries we'll translate and route your data to in the format they understand. Learn more about how to use with Segment.

PostgreSQL, often simply Postgres, is an object-relational database management system (ORDBMS) with an emphasis on extensibility and standards compliance. As a database server, its primary functions are to store data securely and return that data in response to requests from other software applications. It can handle workloads ranging from small single-machine applications to large Internet-facing applications (or for data warehousing) with many concurrent users; on macOS Server, PostgreSQL is the default database; and it is also available for Microsoft Windows and Linux (supplied in most distributions).

PostgreSQL is ACID-compliant and transactional. PostgreSQL has updatable views and materialized views, triggers, foreign keys; supports functions and stored procedures, and other expandability. Developed by the PostgreSQL Global Development Group, free and open-source.

This document was last updated on April, 2018. If you notice any gaps, out-dated information or simply want to leave some feedback to help us improve our documentation, please let us know!

Getting started

Find below a selection of options on how to create your Postgres warehouse.

Heroku Postgres

This guide will explain how to set up a Postgres database with Heroku. Heroku is a cloud platform as a service which simplifies the process of setting up and administering a Postgres database.

  1. Signup for a Heroku account

    The first step to setting up Postgres on Heroku is to get a Heroku account. You can sign up for a free account here.

  2. Log in to your Heroku account

    You can log in to Heroku here.

  3. Create a new app and go to the Data page

    To get to the Data page, either click here or in the Dashboard menu dropdown, select Data.

  4. Create a Database

    Once you are at the Data page, you will see a button called Create one under Heroku Postgres. Click this button.

  5. Choose your plan

    Once you press the Create Database button, a modal will pop-up with choices of plans. More expensive plans will yield better performance and more storage. Depending on your needs, some plans may be more suitable than others. We recommend new customers start with the Standard 4 plan.

    In addition, for the fastest sync times, select the US East region. If you must store data elsewhere, let us know.

    After you are done configuring these settings, click in “Login to Install” and/or “Visit Heroku Dashboard to create your first app”.

  6. Within your newly created app go to the “Resources” tab and add the “Heroku Postgres” Add-on.

  7. Click on “Provision” and wait for the database provisioning to complete

    Click on the “Heroku Postgres :: Database”. After a few moments, the database should shift into the Available state.

  8. View the database overview and settings

    To access the database overview and settings, click on the tabs. You will see the “View credentials” under the “Settings”. You will need these settings for the next steps of the connection process.

RDS Postgres

This guide will explain how to set up a Postgres database with Amazon Relational Database Service (RDS). RDS simplifies the process of setting up and administering a Postgres database.

As a supplement to this guide, Amazon has created an official guide to setting up a Postgres database.

  1. Log in to your AWS account

    If you don’t have an AWS account, you can sign up for an account by visiting the AWS homepage and clicking ‘Create an AWS Account’ in the top right-hand corner.

  2. Open the RDS Console

    Go to the RDS console when you are logged in to AWS.

  3. Select the region you’d like to place the database in

    In the top right-hand corner of the console, you should see a drop-down with various AWS regions. We suggest putting your database in US West for the best performance.

  4. Launch a DB Instance

    Go to the Instances tab on the sidebar on the left, and then click the button that says Launch DB Instance.

  5. Select the PostgreSQL Engine

    Click on the PostgreSQL icon and then click the Next button.

  6. Select whether or not you’d like to use the database for production purposes

    There are two differences between the production and non-production options on this screen.

    Multi-AZ Deployment means that Amazon will maintain an additional database machine in a separate availability zone. An availability zone is a datacenter that is independent of other availability zones in the same geographic region. In the case of the failure of one of the database machines or availability zone loss, Amazon will automatically transition to using the other database machine, resulting in no downtime.

    Provisioned IOPS helps to guarantee the disk I/O performance of a database. Due to the fact that databases often cannot keep all of their data in RAM, they must store some data on disk. When running queries, the database may have to read data from the disk. With Provisioned IOPS, Amazon guarantees that disk will be able to perform a certain number of reads and writes a second.

    If you anticipate high utilization on your Postgres database or downtime is unacceptable, please choose Production. If you don’t plan to have high-utilization of your database or periods of downtime are acceptable and you know how to recover from them, choose Dev/Test.

    After choosing, press Next.

  7. Specify the DB Details

    This screen will prompt you to specify details about the database. If you’re not sure about an option, there are some details in this document below, and also on the sidebar when edit an option. The instance specifications options are:

    License Model: only has one option, so choose the default.

    DB Engine Version: specifies the version of Postgres to use. If you aren’t sure which version you’d like to use, the default is fine.

    DB Instance Class: selects the machine your database will run on. If you’re not sure what DB instance class is suitable for your database, check the DB Instance Classes chartand the Pricing Page.

    Multi-AZ Deployment: whether or not you want a backup machine on standby. The pricing is equivalent to running two instances.

    Storage Type: specifies the type of disk you’d like to use for the instance. From the sidebar information:

    • General Purpose (SSD)storage is suitable for a broad range of database workloads. Provides baseline of 3 IOPS/GB and ability to burst to 3,000 IOPS.
    • Provisioned IOPS (SSD)storage is suitable for I/O-intensive database workloads. Provides flexibility to provision I/O ranging from 1,000 to 30,000 IOPS.
    • Magnetic storage may be used for small database workloads where data is accessed less frequently.

      Provisioned IOPS allows you to specify what performance guarantees you’d like on disk I/O.

      The database settings are:

      DB Instance Identifier is a unique identifier for the database. The ID must be unique for your account in a single region.

      Master Username is the username you will use to log in to the instance.

      Master Password is a password that is 8 to 128 ASCII characters long that doesn’t contain the characters /, “, or @.

      After you are done configuring all of these settings, press Next Step.

  8. Configure the advanced settings

    The options for Network & Security are:

    • VPC specifies the Virtual Private Cloud you want the servers to reside in. If you have previously set up a VPC that you want the database in, select it here. If you aren’t sure or don’t have a VPC set up, select Create New VPC
    • Subnet Group specifies the subnets that the DB instances can use in the VPC. If you’re not sure, select Create new DB Subnet Group
    • Publicly Accessible specifies whether your DB instances are internet-addressable. This option must be set to Yes.
    • Availability Zone specifies which availability zone you want the instances to reside in. If you have a preference, you can set it here, else leave it on the No Preference default.
    • VPC Security Groups specify traffic rules concerning what traffic can leave the instances and what traffic can arrive at the instance. Unless you’ve previously made a security group specifically for DB instances, it’s best to create a new one.

      The options for Database Options are:

    • Database Name is an optional value for a Postgres database to be created at instance startup. We highly recommend filling this out to avoid manual creation of the database unless you have a good reason to create a database manually. This value must be 8 characters or less. If you fill this out, keep a note of what the value is.

    • Database Port specifies what port the DB listens on. The default of 5432 is fine.
    • DB Parameter Group specifies the configuration applied to the database. If you haven’t created a custom parameter group that you want to use, choosing the default is fine.
    • Option Group specifies additional options of the database. At the time of writing, option groups are not available for Postgres.
    • Copy Tags To Snapshots specifies whether you want the tags metadata on DB instances copied to corresponding instance snapshots. It’s fine to leave it on the default, but you can learn more about it here.
    • Enable Encryption allows you to specify whether you want the database and snapshots to be encrypted. If you choose to enable encryption, your data will be encrypted with AES-256, both in the instances themselves and in data at rest. There are some limitations though, which you can read about here. You will also have to specify a key in the AWS Key Management Service. If you select Yes, another option will appear to allow you to select what key you’d like to use.

      The options for Backup are:

    • Backup Retention Period specifies how long you want to retain automatic point-in-time recovery backups. Specifying a longer period of time will increase cost.

    • Backup Window allows you to select the time of day you’d like backups to occur. If you have no preference, select No Preference.

      The options for Maintenance are:

    • Auto Minor Version Upgrade allows you to choose whether or not the database automatically receives minor version upgrade.

    • Maintenance Window allows you to select a period of time that you prefer updates and other maintenance to be applied. If you select No Preference, a random time period will be picked. We recommend choosing a time window where usage is historically low.

      When you’re finished choosing settings, press Launch DB Instance.

  9. Wait for DB Instance to be Created

    In the Instances tab, you should be able to observe the DB instance being created. After a while, the status should change to Available.

Network Permissions for Segment to RDS

This guide will help you change permissions on your Amazon Relational Database Service (RDS) instance to allow Segment to connect.

  1. Open the RDS Console

    To get to the RDS console when you are logged in to AWS, visit this page.

  2. Go to the Instances tab

    On the left-hand sidebar, click Instances

  3. Go to the Connect and Details sections of the DB Instance

    You can do this by clicking in the instance and scrolling down to the relevant sections.

  4. Click on the security group

    This should bring you to a page to configure the active security group.

  5. Click on the Inbound tab

    This should bring you to a screen that looks like this

  6. Click Edit to add a new rule

    Select PostgreSQL as the type. For Source, change the custom IP to 52.25.130.38/32. This will allow Segment to connect to the instance. Press Save when done.

Segment should be able to connect to your database now!

Compose Postgres

Compose is the first DBaaS (Database as a Service) of its kind, geared at helping developers spend more time building their applications rather than wrestling with database provisioning and maintenance. Compose provides easy to deploy and scale data stores and services in many flavors: PostgreSQL, MongoDB, RethinkDB, Elasticsearch, Redis, etcd, and RabbitMQ.

Using Compose, companies can deploy databases instantly with backups, monitoring, performance tuning, and a full-suite of management tools. Compose Enterpise brings all this to the corporate VPC (virtual private cloud).

Compose uses Segment for hooking together web analytics, email, and social tracking and manages its Segment warehouse on PostgreSQL. Compose is pleased to be able to harness [the power of Postgres to query Segment data and be able create custom reports.

  1. Setup PostgreSQL

    If you don’t yet have an account with Compose, sign-up and select the PostgreSQL database to get started.

    For those of you already on Compose, if don’t yet have a PostgreSQL instance, you can add one from the Deployments page in the management console by clicking “Create Deployment” then selecting PostgreSQL or just add a PostgreSQL deployment to your account.

    Once your PostgreSQL deployment is spun up, you may want to create a user to be the owner of the database you’ll use for Segment. There is already an admin user role that is generated on initialization of your deployment, but this user has full privileges for your deployment so you may want to create additional users with more specific privileges. You may also want to manually scale up your deployment for the initial load of Segment data since it loads the past 2 months of data by default. You can then scale it back down according to your data needs after the initial load. The easy-to-use management console lets you perform these tasks, monitor your deployments, configure security settings, manage backups, and more.

    Now, all you need to do is create a database where your Segment data will live. You can create a database directly from the Data Browser interface in the Compose management console, by using a tool such as the pgAdmin GUI or programmatically via code you’ve written. For simplicity, this database is simply named “segment” and associated it to the “compose” user as the owner. Here is the SQL statement to create the database for Segment data, using the default PostgreSQL arguments (set yours appropriately to your requirements):

    SQL

     CREATE DATABASE segment
     WITH OWNER = compose
     ENCODING = 'SQL_ASCII'
     TABLESPACE = pg_default
     LC_COLLATE = 'C'
     LC_CTYPE = 'C'
     CONNECTION LIMIT = -1;
    

    And that’s it! You don’t even need to create any tables - Segment will handle that for you.

  2. Browse & Query

    And now the fun part - browsing and querying the data!

    You’ll notice in your PostgreSQL database that a new schema has been created for each source that was synced. Under the production source schema a whole bunch of tables were created. You can see the tables in the Compose data browser “Tables” view:

    When the Segment data is loaded to the PostgreSQL database, several tables are created by default: aliases, groups, identifies, pages, screens and tracks. You may also have accounts and users tables if you use unique calls for groups and for identifies. To learn more about these default tables and their fields, see the Segment schema documentation.

    All of the other tables will be event-specific, according to the event names and properties you use in your tracks calls. The number of tables will depend on the number of unique events you’re tracking. For example, at Compose, there is a track call for when customers view their deployments such as:

    JavaScript

     analytics.track('deployments_show', {
     deployment_name: 'heroic-rabbitmq-62',
     deployment_type: 'RabbitMQ'
     });
    

    In the Postgres Segment database, there will then be a table named “deployments_show” which can be queried for that deployment to see how many times it was viewed:

    SQL

     SELECT COUNT(id)
     -- Don't forget the schema: FROM <source>.<table>
     FROM production.deployments_show
     WHERE deployment_name = 'heroic-rabbitmq-62';
    

    The result is 18 times in the past 2 months by a particular database user. To verify, just join to the identifies table, which contains user data, through the user_id foreign key:

    SQL

     SELECT DISTINCT i.name
     FROM production.identifies i
     JOIN production.deployments_show ds ON ds.user_id = i.user_id
     WHERE ds.deployment_name = 'heroic-rabbitmq-62';
    

    A more interesting query for this, however, might be to see how many deployments were created in November using the “deployments_new” event:

    SQL

     SELECT COUNT(DISTINCT id)
     FROM production.deployments_new
     WHERE original_timestamp &gt;= '2015-11-01'
     AND original_timestamp &lt; '2015-12-01';
    

    In this way custom reports can be created for analysis on the tracking data, using SQL as simple or as complex as needed, to expose insights which the Segment-integrated tracking tools may not be able to do easily.

Database Setup

After you have set up your Postgres database running, there are a few good steps to take before making the connection in Segment to get the best experience.

Database User + Permissions

Depending on how exactly you set up your database, you probably have an admin username/password. While you can give this directly to Segment, we would prefer that you create a separate user for us (and any other 3rd-parties that might connect with your database) in order to promote isolation and make it easier to audit access.

-- create a user named "segment" that Segment will use when connecting to your Redshift cluster.
CREATE USER segment WITH PASSWORD '<enter password here>';

-- allows the "segment" user to create new schemas and temporary tables on the specified database.
GRANT CREATE, TEMPORARY ON DATABASE <enter database name here> TO segment;

Connect with Segment

  1. Open up Segment in another browser window or tab

    Visit the Segment Workspaces screen. Click on the workspace you’d like the database to be associated with.

  1. Click on Add Destination

    In the Workspace, you can find the button beside the Destinations.

  2. Either select “Warehouses” categories on the left-hand sidebar or use the search field and look for Postgres.

  3. Configure the Database Connection

    Select Postgres database. Then, copy the relevant settings into the text fields on this page and then press the Connect button.

  4. Verify the database connected successfully

    You should see a message indicating that the connection was successful. If the connection wasn’t successful, double-check that you entered the settings correctly. If it still isn’t working, feel free to contact us!

Sync schedule

Your data will be available in Warehouses within 24-48 hours after your first sync. Your warehouse will then be on a sync schedule based on your Warehouse Plan.

Segment allows you to schedule the time and frequency of loading data into your data warehouse.

You can schedule your warehouse syncs by going to Warehouse > Settings > Sync Schedule. You can schedule up to the number of syncs allowed on your billing plan.

Security

Make sure you’re logging in with a user that has read and write permissions so that we can write to your database.

- Whitelist the Segment IP (52.25.130.38/32)
- Create a user that has read/write permissions.

Best Practices

All the power of PostgreSQL for Segment data - it’s that easy. Segment has written an article on useful queries that you may want to read to get started on pulling data and creating reports from your warehouse.

Once you’ve got the data in Postgres, you can do so much more with it, too. One idea might be to develop an app that performs various functions based on the different events being loaded to the database, potentially using RabbitMQ as your asynchronous message broker. An example could be that you want a banner to display on your website once your 1000th customer has signed up. The data is at your fingertips; you just need to decide how you want to use it.

Query Speed

The speed of your queries depends on the capabilities of the hardware you have chosen as well as the size of the dataset. The amount of data utilization in the cluster will also impact query speed. Check with your hosting provider or Postgres docs for performance best practices.

Single and Double Quotes in PostgreSQL

If you use double quotes on the name of a table, column, index, or other object when you create it, and if there is even one capital letter in that identifier, you will need to use double quotes every single time you query it.

Single quotes and double quotes in PostgreSQL have completely different jobs, and return completely different data types. Single quotes return text strings. Double quotes return identifiers, but with the case preserved.

If you create a table using double quotes:

CREATE TABLE "Example" ( ... );

We have now created a table in which the table name has not been forced to lowercase, but which has preserved the capital E. This means that the following query will now fail:

select * from example; ERROR: relation "example" does not exist

For more information on single vs double follow this link.

FAQs

Can I add an index to my tables?

Yes! Adding indexes to your tables won’t block your Segment syncs. However, be aware that the more indexes you have, the slower the syncs will go. All indexes must be updated whenever new data are loaded, which isn’t unique to Segment, but rather how Postgres works natively.

Troubleshooting

Permission denied for databaseThe syncs are failing due to a permissions issue. The user you have configured in Segment does not have permission to connect to the appropriate database. In order to resolve these errors we recommend connecting to your warehouse with the owner, or granting permissions to the current user you’re connect with. The following SQL statement can be run for the user connected in your Segment warehouse settings: GRANT CONNECT ON DATABASE <database_name> TO <user>
Permission denied for schemaThe syncs for the source, <source_name>, are failing due to a permissions issue. It looks like the user connected does not have permission to view the necessary schemas in the warehouse. In order to resolve these errors we recommend connecting your warehouse with the owner, or granting permissions to the current user you’ve connect with. The following SQL statement can be run for every schema in your warehouse for the user connected in your Segment warehouse settings. GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA <schema_name> TO <user>
Dial TCP: no such hostThe syncs are failing because we are running into errors when trying to connect to the warehouse. It appears we are unable to locate the host entered in your warehouse settings. This error is usually due to an invalid host address, a warehouse hosted on a private IP, or a credentials issue. In order to resolve the error, we suggest checking the following settings: - The host address listed under your warehouse settings - Ensuring the host is configured to a publicly accessible IP - Ensuring the username and password entered in your Segment workspace matches the username and password on the Warehouse directly
Dial TCP: i/o timeoutThe warehouse syncs are failing due to a connection issue: dial tcp XX.XXX.XXX.XXX:XXXX: i/o timeout This error can be caused for a few reasons: - Your Warehouse went offline. - There’s a setting needed for Segment to connect which hasn’t been correctly configured. Please refer to our Warehouse docs to ensure all steps outlined there have been followed.
Schemadoes not existThe syncs are failing due to a permissions issue. It looks like the user connected does not have permission to create schemas in your warehouse. In order to resolve these errors we recommend connecting to your warehouse with the owner, or granting permissions to the current user you’ve connect with. The following SQL statement can be run for every schema in your warehouse for the user connected in your Segment warehouse settings. GRANT CREATE ON DATABASE <database_name> TO <user>

If you have any questions, or see anywhere we can improve our documentation, please let us know!