Choosing a Warehouse
Comparing Redshift and Postgres
In most cases, you will get a much better price-to-performance ratio with Redshift for typical analyses.
Redshift lacks some features, datatypes, and functions supported by Postgres and also implements some features differently. If you need any of the features or functions missing in Redshift and BigQuery, choose Postgres. If not (or you’re not sure), Segment recommends choosing Redshift.
If you’d like more information, Amazon wrote about this in their documentation.
Comparing Redshift and BigQuery
Both Redshift and BigQuery are attractive cloud-hosted, affordable, and performant analytical databases. The differences between the two are around their architecture and pricing.
When you provision a Redshift cluster, you’re renting a server from Amazon Web Services. Your cluster consists of nodes, each with dedicated memory, CPU, and disk storage. These nodes handle data storage, query execution, and - if your cluster contains multiple nodes - a leader node will handle coordination across the cluster.
Redshift performance and storage capacity is a function of cluster size and cluster type. As your storage or performance requirements change, you can scale up or down your cluster as needed.
With BigQuery, you’re not constrained by the storage capacity or compute resources of a given cluster. Instead, you can load large amounts of data into BigQuery without running out of memory, and execute complex queries without maxing out CPU.
This is possible because BigQuery takes advantage of distributed storage and networking to separate data storage from compute power. Google’sColossus distributed file system distributes data across many servers in the Google cloud. When you execute a query, the Dremel query engine splits the query into smaller sub-tasks, distributes the sub-tasks to computers across Google data centers, and then re-assembles them into your results.
The difference in architecture translates into differences in pricing.
Redshift prices are based on an hourly rate determined by the number and types of nodes in your cluster. They offer dense storage - optimized for storage - and dense compute nodes - optimized for query performance.
BigQuery has two pricing options: variable and fixed pricing. With the variable, pay-as-you-go plan, you pay for the data you load into BigQuery, and then pay for the amount of data you query. BigQuery allows you to set up Cost Controls and Alerts to help control and monitor costs.
Fixed-price plans are more for high-volume customers and allow you to rent a fixed amount of compute power.
Redshift does require you to create a cluster, choose sort and distribution keys, and resize your cluster as storage and performance needs change over time.
BigQuery is “fully-managed”, which means that you’ll never have to resize or adjust distribution or sort keys. BigQuery handles all of that.
This page was last modified: 30 Jun 2021
Questions? Problems? Need more info? Contact Segment Support for assistance!