Overview of Amazon Redshift

We all must have heard or fantasized about the word “Data Warehousing” at some point of time. Though the name suggests a warehouse in it, it isn’t technically a warehouse. What is Data Warehousing? Data Warehousing refers to a system that reports and analyzes data of a particular organization. This is usually practiced in consideration of business intelligence. But as per its nomenclature ‘warehouse’, it stores all historic generated data at a same place, which are used to generate analytics reports for the organization. There are many data warehouse service providers such as IBM Db2, Apache Hive, Amazon Redshift, Google BigQuery, Oracle Autonomous Data Warehouse are namely some of the data warehouse software’s by leading tech giants with major stake hold in the global market.

Now, let’s discuss about Amazon’s Redshift. which is a part of services of Amazon Web Services (AWS). Though Amazon has many hosted database services, Redshift stands out for its high storage capacity of data cluster. Amazon Redshift uses SQL to analyze data availabe across dataa warehouses, databases or data lakes using AWS-designed hardware, be it structured or unstructured data.

Features of Amazon Redshift

Now, let’s discuss about some key features of the Redshift. As mentioned earlier all the data irrespective of its storage method or location can be analyzed with Redshift. Let’s consider few features grouped into two sub domains namely: Analysis and Security & Reliability.

Analysis Features

  • Federated query: This feature enables you to write query for live data across the data sources and, without the requirement of any data movement. Instead of moving data across the network, redshift tries running the codes on large scale parallel process. And considers it to be more optimized.
  • Data Sharing:  It provides fast real time data sharing, that update live data and also enable us to access the database. This data sharing helps in not sharing files across the network whenever needed.
  • Redshift ML: Redshift ML is a built-in tool used for analytics and to create, train and deploy ML models for your data. These models can assist you with Churn detection, financial forecasting.
  • Amazon Redshift Integration for Apache Spark: This feature helps you to run Apache spark applications on the redshift data, which generates more scope for analytics and ML solutions in data warehousing. This integration also makes it easier to monitor and troubleshoot performance issues of Apache stark applications.
  • Streaming Ingestion: With the help of streaming ingestion feature in redshift, one can use SQL to connect and ingest data from Amazon kinetic data streams and amazon managed streaming. This streaming ingestion also makes it easier in creating and managing downstream pipelines by allowing you to create views on top of the streams.
  • Query and export data to and from your data lake:  Amazon is one of the best service providers for data lake systems in open format. You are allowed to query open format files directly in Amazon S3. You can simply use UNLOAD command to export data onto data lake. This allows storing high frequently used data in Amazon Redshift, while storing the rest of the exabytes on data in Amazon S3.
  • AWS services integration: Native integration with AWS services, databases, and machine learning services makes it easier to handle complete analytics workflows without friction.
  • Partner console integration: You can step up your data onboarding and can help you creating business insights by integrating with selected partners in the Redshift console. With this solution, you can bring in data from applications such as salesforce, Google Analytics into your Redshift data warehouse in more efficient way.
  • Auto-copy from Amazon S3: Amazon Redshift supports auto-copy to simplify and automate data loading from Amazon S3 reducing time and effort to build custom solutions or manage 3rd party services. With this feature, Amazon Redshift eliminates the need for manually and repeatedly running copy procedures by automating file ingestion and taking care of continuous data loading steps under the hood. All file formats are supported by the Redshift copy command, including CSV, JSON, Parquet, and Avro. 

Security & Reliability Featuers

  • Amazon Redshift Serverless: It is a serverless option of Redshift that makes it easy to run analytics in seconds without any hustle to set up and manage data warehouse infrastructure. With this feature enabled, any user can get insights from any data by just loading data onto the data warehouse.
  • Query Editor v2: Query Editor v2 allows you to visualize query results with ease. It also provides a query editor to build and share SQL queries, analyses, visualizations, and annotations, and securely sharing them with your team.
  • Automated Table Design: Redshift monitors user workloads and uses sophisticated algorithms to find ways to improve the layout of data to optimize the query speed.
  • Query using your own tools: Amazon Redshift gives you the flexibility to run queries within the console or connect SQL client tools, libraries, or data science tools including Amazon QuickSight, Tableau, PowerBI, QueryBook and Jupyter Notebook.
  • Fault tolerant: There are multiple features that enhance the reliability of your data warehouse cluster. For example, Amazon Redshift continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary for fault tolerance. Clusters can also be relocated to alternative Availability Zones (AZs) without any data loss or application changes.
  • Granular access controls: Granular row and column level security controls ensure that users only see the availability of data that they can access. Amazon Redshift is integrated with AWS Lake Formation, ensuring that Lake Formation’s column level access controls are also enforced for Redshift queries on the data in the data lake.
  • Multi AZ: The new Redshift Multi-AZ configuration further expands the recovery capabilities by reducing recovery time and guaranteeing capacity to automatically recover with no data loss. A Redshift Multi-AZ data warehouse maximizes performance and value by delivering high availability without having to use standby resources.
  • End-to-end encryption: With just a few parameter settings, you can set up Amazon Redshift to use SSL to secure data in transit, and hardware-accelerated AES-256 encryption for data at rest. If you choose to enable encryption of data at rest, all data written to disk will be encrypted as well as any backups. Amazon Redshift takes care of key management by default.
  • Network isolation: Amazon Redshift lets you configure firewall rules to control network access to your data warehouse cluster. You can run Amazon Redshift inside Amazon Virtual Private Cloud (VPC) to isolate your data warehouse cluster in your own virtual network and connect it to your existing IT infrastructure using an industry-standard encrypted IPsec VPN.
  • Tokenization: Amazon Lambda user-defined functions (UDFs) lets you use an AWS Lambda function as a UDF in Amazon Redshift and invoke it from Redshift SQL queries. With the help of Tokenization, you can write custom extensions for your SQL query to achieve tight integration. With the help of Lambda UDFs, you can also enable external tokenization.

Amazon Redshift Node Types

Redshift offers node types to classify your work loads. You can choose one based on your required performance, data size and growth.

It’s always better to choose the best cluster configuration that you’d need. You can quickly scale up your cluster, pause and resume the activity. You can also switch between the node types with ease.

First, learn more about node types so you can choose the best cluster configuration for your needs. You can quickly scale your cluster, pause and resume the same, and even switch between the node types with a single API function call or in the Redshift console.

  • RA3: Allows you to optimize your data warehouse by scaling and paying for compute and managed storage independently. With RA3, you choose the number of nodes based on your performance requirements and pay only for the managed storage you use. You should size your RA3 cluster based on the amount of data you process daily.
  • DC2: It enables compute-intensive data warehouses with local SSD storage included. Choose the number of nodes you need based on data size and performance requirements. DC2 nodes store your data locally for high performance, and as the data size grows, you can add more compute nodes to increase the storage capacity of the cluster. For datasets under 1 TB uncompressed, we recommend DC2 node types for the best performance at the lowest price. If you expect your data to grow, we recommend using RA3 nodes so you can size compute and storage independently to achieve the best price and performance.

Amazon Redshift Pricing

With Amazon Redshift, the prices can start off with $0.25 per hour and would increase based on your requirement of number of servers and storage required. It’s always better to choose the properties based on your requirement, cause billing depends on the purchased quantity. With provisioned Amazon Redshift, you can choose On-Demand Instances and pay for your database by the hour with no long-term commitments or upfront fees or choose Reserved Instances for additional savings. Alternatively, Amazon Redshift Serverless allows you to pay for usage by automatically starting up, shutting down, and scaling capacity up or down based on your application’s needs, so you pay only for capacity consumed while processing the workload.

Once you make your selection, you may wish to use Elastic Resize to easily adjust the amount of provisioned compute capacity within minutes for steady-state processing. With Resize Scheduler, you can add and remove nodes on a daily or weekly basis to optimize cost and get the best performance

Azure also provides free tier available, through which any organization can avail this two-months free tier of large DC2. These are limited to 750 hours of run time and 160 GB of SSD.

Amazon Redshift Price Calculator

You can ty the free price calculator to estimate your charges before deciding your services.