Amazon Redshift Pros
First, let’s look at some of the advantages of Amazon Redshift:
- High Performance — Redshift achieves high performance using massive parallelism, efficient data compression, query optimization, and distribution. Using its Massively Parallel Processing (MPP) architecture, Redshift can parallelize data loading, backup, and restore operations. Additionally, queries that you execute get distributed across multiple nodes. Redshift’s columnar storage database is optimized for massive and repetitive type of data, dramatically reduces the input/output (I/O) operations on disk, improving performance.
- Speed — When it comes to loading data and querying it for analytics and reporting, Redshift is extremely fast. MMP allows you to load data at blazing fast speeds.
- Scalability – Scalability is crucial for any data warehousing solution, and Redshift performs well in this arena too. It is horizontally scalable, meaning whenever you need to increase the storage or need it to run faster, you can add more nodes using AWS console or Cluster API, and it will upscale immediately.
- Transparent and Competitive Pricing – Redshift is considerably cheaper than alternatives or an on-premise solution. Redshift has two pricing models that give you the flexibility to categorize the expense as an operational expense or capital expense.
- SQL Interface – The Redshift Query Engine has the same interface as PostgreSQL, which means developers who are already familiar with SQL won’t have a steep learning curve to get going. Since Redshift uses SQL, it works with existing Postgres drivers, easily connecting to most business intelligence tools.
- Security — Redshift comes packed with security features, including various ways to handle access control, Virtual Private Cloud (VPC) for network isolation, data encryption etc... You can launch Redshift clusters inside your VPC so you can define security groups and restrict inbound or outbound access to your Redshift clusters.
- AWS Ecosystem – Many businesses are already running their infrastructure on AWS. As you’d expect, Redshift works very well with the rest of the AWS infrastructure tools. For example, when loading or dumping data on S3, Redshift uses MPP to move data very quickly.
Amazon Redshift Cons
Now that we’ve addressed the benefits of Redshift, let’s talk about some of the limitations and disadvantages.
- Limited Support for Parallel Upload — Redshift can quickly load data from Amazon S3, relational DyanmoDBs, and Amazon EMR using Massively Parallel Processing. But Redshift doesn’t support parallel loading from other sources. If you’re working with other data sources, you’ll need to use an ETL solution, JDBZC inserts, or scripts to load data.
- Uniqueness Not Enforced — Redshift doesn’t offer a way to enforce uniqueness on inserted data. So if you have a distributed system that writes data on Redshift, you’ll need to handle the uniqueness yourself either by using some method of data de-duplication or on the application layer.
Amazon Redshift is a powerful solution for data warehousing, and after reading this blog post you should have a better understanding of some of the pros and cons of Redshift. It has some limitations, but it is still our preferred data warehousing solution.