Choosing a Warehouse
Comparing Redshift and Postgres
In most cases, you will get a much better price-to-performance ratio with Redshift for typical analyses.
Redshift lacks some features, datatypes, and functions supported by Postgres and also implements some features differently. If you absolutely need any of the features or functions missing in Redshift and BigQuery, choose Postgres. If not (or you’re not sure), we recommend Redshift.
If you’d like more information, Amazon wrote about this in their documentation.
Comparing Redshift and BigQuery
Both Redshift and BigQuery are attractive cloud-hosted, relatively cheap, and performant analytical databases. The differences between the two are largely around their architecture and pricing.
When you provision a Redshift cluster, you’re renting a server from Amazon Web Services. Your cluster is comprised of nodes, each with dedicated memory, CPU, and disk storage. These nodes handle data storage, query execution, and - if your cluster contains multiple nodes - a leader node will handle coordination across the cluster.
Redshift performance and storage capacity is a function of cluster size and cluster type. As your storage or performance requirements change, you can scale up or down your cluster as needed.
With BigQuery, you’re not constrained by the storage capacity or compute resources of a given cluster. Instead, you can load large amounts of data into BigQuery without running out of memory, and execute complex queries without maxing out CPU.
This is possible because BigQuery takes advantage of distributed storage and networking to separate data storage from compute power. Data is distributed across many servers in the Google cloud using their Colossus distributed file system. When you execute a query, the Dremel query engine splits the query into smaller sub-tasks, distributes the sub-tasks to many computers across Google data centers, and then re-assembles them into your results.
The difference in architecture translates into differences in pricing.
Redshift prices based on an hourly rate determined by the number and types of nodes in your cluster. They offer dense storage - optimized for storage - and dense compute nodes - optimized for query performance.
BigQuery has two pricing options: variable and fixed pricing. With the variable, pay-as-you-go plan, you pay for the data you load into BigQuery, and then pay only for the amount of data you query. BigQuery allows you to setup Cost Controls and Alerts to help control and monitor costs.
Fixed-price plans are geared toward high-volume customers and allow you to rent a fixed amount of compute power.
Redshift does require you to create a cluster, schedule vacuums, choose sort and distribution keys, and resize your cluster as storage and performance needs change over time.
BigQuery is “fully-managed”, which means that you’ll never have to resize, vacuum, or adjust distribution or sort keys. All of that is handled by BigQuery.
Questions? Problems? Need more info? Contact us, and we can help!