aws redshift_AWS Redshift入门

aws redshift

This article gives you an overview of AWS Redshift and describes the method of creating a Redshift Cluster step-by-step.

本文概述了AWS Redshift,并逐步介绍了创建Redshift集群的方法。

介绍 (Introduction)

AWS Redshift is a columnar data warehouse service on AWS cloud that can scale to petabytes of storage, and the infrastructure for hosting this warehouse is fully managed by AWS cloud. Redshift operates in a clustered model with a leader node, and multiple worked nodes, like any other clustered or distributed database models in general. It is based on Postgres, so it shares a lot of similarities with Postgres, including the query language, which is near identical to Structured Query Language (SQL). This Redshift supports creating almost all the major database objects like Databases, Tables, Views, and even Stored Procedures. In this article, we will explore how to create your first Redshift cluster on AWS and start operating it.

AWS Redshift是AWS云上的列式数据仓库服务,可以扩展到PB级存储,用于托管该仓库的基础架构由AWS云完全管理。 Redshift在具有领导者节点和多个工作节点的集群模型中运行,通常与任何其他集群或分布式数据库模型一样。 它基于Postgres,因此与Postgres有很多相似之处,包括查询语言,与结构化查询语言(SQL)几乎相同。 该Redshift支持创建几乎所有主要的数据库对象,如数据库,表,视图,甚至存储过程。 在本文中,我们将探索如何在AWS上创建第一个Redshift集群并开始对其进行操作。

先决条件 (Pre-requisites)

An AWS account with the required privileges is required to use the AWS Redshift service. To create an AWS account, you would need to have a credit card or a payment method supported by AWS. First-time users who intend to open a new AWS account can read this article, which explains the process of opening and activating a new AWS account.

要使用AWS Redshift服务,需要具有必需特权的AWS账户。 要创建一个AWS账户,您需要拥有AWS支持的信用卡或付款方式。 打算开设新的AWS账户的初次用户可以阅读本文 ,其中介绍了开设和**新的AWS账户的过程。

Once you have a new AWS account, AWS offers many services under free-tier where you receive a certain usage limit of specific services for free. New account users get 2-months of Redshift free trial, so if you are a new user, you would not get charged for Redshift usage for 2 months for a specific type of Redshift cluster.

拥有新的AWS帐户后,AWS将免费提供许多服务,您可以免费获得特定服务的特定使用限制。 新帐户用户可获得2个月的Redshift 免费试用 ,因此,如果您是新用户,则对于特定类型的Redshift群集,您将在2个月内无需支付Redshift使用费。

创建您的第一个AWS Redshift集群 (Creating your first AWS Redshift Cluster)

It is assumed that the reader has an AWS account and required administrative privileges to operate on Redshift. If you are a new user, it is highly probable that you would be the root/admin user and you would have all the required permissions to operate anything on AWS. Once you log on to AWS using your user credentials (user id and password), you would be shown the landing screen which is also called the AWS Console Home Page.

假定读取器具有一个AWS账户并具有在Redshift上进行操作所需的管理权限。 如果您是新用户,则极有可能您将成为root / admin用户,并且您将拥有在AWS上进行任何操作所需的所有权限。 使用用户凭证(用户ID和密码)登录到AWS后,将显示登录屏幕,该屏幕也称为AWS控制台主页。

In AWS cloud, almost every service except a few is regional services, which means that whatever you create in the AWS cloud is created in the region selected by you. The default region in AWS in N. Virginia which you can see in the top-right corner. If you wish to create your Redshift cluster in a different region, you can select the region of your choice. You can learn more about AWS regions from this article. After selecting the region of your choice, the next step is to navigate to the AWS Redshift home page. Type Redshift on the search console as shown below, and you would find the service name listed.

在AWS云中,除少数服务外,几乎所有服务都是区域服务,这意味着您在AWS云中创建的任何内容都将在您选择的区域中创建。 您可以在右上角看到弗吉尼亚州北部AWS中的默认区域。 如果要在其他区域中创建Redshift集群,则可以选择所需的区域。 您可以从本文了解有关AWS区域的更多信息。 选择所需区域后,下一步是导航至AWS Redshift主页。 如下所示,在搜索控制台上键入Redshift,您将找到列出的服务名称。

aws redshift_AWS Redshift入门

Click on the service name and you would be navigated to the home page or the dashboard page of Redshift as shown below.

单击服务名称,您将被导航到Redshift的主页或仪表板页面,如下所示。

aws redshift_AWS Redshift入门

Once you are on the home page of AWS Redshift, you would find several icons on the left page which offers options to operate on various features of Redshift. To get started, we need to create a cluster first, then log on to the cluster to create database objects in it. On the right-hand side of the screen, you would find a button named Create Cluster as shown above. Click this button to start specifying the configuration using which the cluster would be built.

进入AWS Redshift的主页后,您会在左侧页面上找到几个图标,这些图标提供用于操作Redshift的各种功能的选项。 首先,我们需要先创建一个集群,然后登录到该集群以在其中创建数据库对象。 在屏幕的右侧,您将找到一个名为Create Cluster的按钮,如上所示。 单击此按钮开始指定用于构建集群的配置。

集群配置 (Cluster Configuration)

aws redshift_AWS Redshift入门

Once you are on the cluster creating wizard, you would need to provide different details to determine the configuration of your AWS Redshift cluster. Firstly, provide a cluster name of your choice. The next detail is Node Type – which determines the capacity of nodes in your cluster. DC2 stands for Dense Compute Nodes, DS2 stands for Dense Storage and RA3 is the most advanced and latest offering from Redshift which offers the most powerful nodes having a very large compute and storage capacity. By default, it would be shown as the recommended option. But for first-time users who are just getting started with Redshift, they often do not need such high capacity nodes, as this can incur a lot of cost due to the capacity associated with it. DC2 usage is covered in the free-tier and it offers a very reasonable configuration at an affordable cost for modest data volumes. So, select dc2.large node type which offers 160 GB of storage per node. You can read more about Redshift node types from here.

进入集群创建向导后,您将需要提供其他详细信息来确定AWS Redshift集群的配置。 首先,提供您选择的集群名称。 下一个详细信息是“节点类型”-它确定集群中节点的容量。 DC2代表密集计算节点,DS2代表密集存储,而RA3是Redshift提供的最先进和最新的产品,它提供了具有非常大的计算和存储容量的最强大的节点。 默认情况下,它将显示为推荐选项。 但是对于刚开始使用Redshift的首次用户,他们通常不需要如此高容量的节点,因为与之相关的容量可能会导致大量成本。 DC2的使用在免费层中涵盖,并且它以合理的成本提供了非常合理的配置,以适应中等数据量。 因此,选择dc2.large节点类型,每个节点可提供160 GB的存储空间。 您可以从此处阅读有关Redshift节点类型的更多信息。

The next step is to select the number of nodes in a cluster. We can create a single node cluster, but that would technically not count as a cluster, so we would consider a 2-node cluster. The default value for the number of nodes is 2, which you can change as required. Below are the number of nodes, it shows that the cost of running this cluster for the entire month is $320. It’s recommended to terminate the cluster once the cluster is not in use. The cluster creating process is very concise and it hardly takes minutes to create or terminate a cluster. You can either pause/terminate a cluster when not required depending upon your use-case. First-time users are covered under free tier, so they would not get charged anything for Redshift usage of DC2 2-node cluster for a couple of hours.

下一步是选择集群中的节点数。 我们可以创建一个单节点集群,但是从技术上讲,这不会算作集群,因此我们将考虑一个2节点集群。 节点数的默认值为2,可以根据需要更改。 下面是节点数,它表明整个月运行此群集的成本为$ 320。 建议在不使用群集时终止群集。 集群创建过程非常简洁,创建或终止集群几乎不需要几分钟。 您可以根据需要,在不需要时暂停/终止集群。 首次使用的用户可以享受免费套餐的服务,因此他们在几个小时内都不会因Redshift使用DC2 2节点群集而获得任何费用。

数据库配置 (Database Configuration)

aws redshift_AWS Redshift入门

The next step is to specify the database configuration. The default database name is dev and default port on which AWS Redshift listens to is 5439. You can change this configuration as needed or use the default values. In this case, we would be using the default values.

下一步是指定数据库配置。 默认数据库名称为dev,AWS Redshift侦听的默认端口为5439。您可以根据需要更改此配置,也可以使用默认值。 在这种情况下,我们将使用默认值。

aws redshift_AWS Redshift入门

After specifying the database name and port, the next required detail is the master username and password, which is the administrative credential that provides full access to the AWS Redshift cluster. The default username is an awsuser. Provide a password of your choice as per the rules mentioned below the password box. This completes the database level configuration of Redshift.

指定数据库名称和端口后,下一个必需的详细信息是主用户名和密码,这是提供对AWS Redshift集群的完全访问权限的管理凭证。 默认用户名是awsuser。 根据密码框下方提到的规则,提供您选择的密码。 这样就完成了Redshift的数据库级配置。

其他配置 (Additional Configurations)

aws redshift_AWS Redshift入门

Cluster permissions is an optional configuration that allows specifying Identity and Access Management (IAM) roles that allow the AWS Redshift clusters to communicate/integrate with other AWS services. It can be modified even after the cluster is created, so we would not configure it for now.

集群权限是一个可选配置,允许指定身份和访问管理(IAM)角色,以允许AWS Redshift集群与其他AWS服务进行通信/集成。 即使在创建集群之后也可以对其进行修改,因此我们暂时不对其进行配置。

In the additional configurations section, switch off the Use Defaults switch, as we intend to change the accessibility of the cluster. We intend to use the cluster from our personal machine over an open internet connection. This is generally not the recommended configuration for production scenarios, but for first-time users who are just getting started with Redshift and do not have any sensitive data in the cluster, it’s okay to use the Redshift cluster with non-sensitive data over open internet for a very short duration. The additional configuration allows specifying details like network configuration, security, backup management, parameter and option groups that allow to control the behavior of the Redshift cluster and well as maintenance windows.

在“其他配置”部分中,由于我们打算更改群集的可访问性,因此请关闭“使用默认值”开关。 我们打算通过开放的Internet连接从我们的个人计算机使用群集。 对于生产场景,通常不建议使用此配置,但是对于刚开始使用Redshift并且集群中没有任何敏感数据的首次用户,可以通过开放Internet将Redshift集群与非敏感数据一起使用在很短的时间内。 附加配置允许指定详细信息,例如网络配置,安全性,备份管理,参数和选项组,这些信息可以控制Redshift集群的行为以及维护窗口。

The only option which we need to change here is the Publicly Accessible setting as shown below. The default value for this setting will be No. Change it to the value of Yes, so that it would make the necessary network changes to allow the use of AWS Redshift cluster over open internet using the cluster endpoint that would be created.

我们需要在此处更改的唯一选项是“公共可访问”设置,如下所示。 此设置的默认值为No。将其更改为Yes,以便它将进行必要的网络更改,以允许使用将要创建的集群终端节点在开放Internet上使用AWS Redshift集群。

aws redshift_AWS Redshift入门

Once this configuration is complete, click on the Create Cluster button. This will start creating your cluster and you would be navigated to the clusters window, where you would find the status of your cluster in Modifying status. Do not get alarmed by the status, as you may wonder that you are just creating your cluster and instead of showing a creating/pending/in-progress status, it’s showing modifying. This is the terminology that AWS uses for creating or modifying any type of cluster.

配置完成后,单击“创建群集”按钮。 这将开始创建您的集群,您将被导航到“集群”窗口,您将在“修改状态”中找到集群的状态。 不要对状态感到震惊,因为您可能想知道您只是在创建集群,而不是显示正在创建/正在等待/进行中的状态,而是在显示正在修改。 这是AWS用于创建或修改任何类型的集群的术语。

aws redshift_AWS Redshift入门

Once the cluster is created you would find it in Available status as shown below.

创建集群后,您将发现其处于可用状态,如下所示。

aws redshift_AWS Redshift入门

Once you click on the Dashboard, you would find you would be able to see the statistics of the cluster, for example, 1 Cluster(s), 2 Total nodes etc. Consider exploring this page to check out more details regarding your cluster.

单击仪表板后,您将发现可以查看集群的统计信息,例如1个集群,2个节点总数等。请考虑浏览此页面以查看有关集群的更多详细信息。

aws redshift_AWS Redshift入门

查询AWS Redshift集群 (Querying AWS Redshift Cluster)

Click on the Editor icon on the left pane to connect to Redshift and fire queries to interrogate the database or create database objects. This page will require you to provide your master username and password to log on and start using the database from the browser itself, without the need to use an external IDE to operate on Redshift. Provide the details as shown below and click on Connect to database button.

单击左窗格上的“编辑器”图标以连接到Redshift并触发查询以查询数据库或创建数据库对象。 该页面将要求您提供主用户名和密码,以从浏览器本身登录并开始使用数据库,而无需使用外部IDE在Redshift上进行操作。 提供如下所示的详细信息,然后单击“连接到数据库”按钮。

aws redshift_AWS Redshift入门

Once you successfully log on, you would be navigated to a window as shown below. The data objects list the system objects and schemas. The Query editor window facilitates firing queries against the selected schema.

成功登录后,将导航至如下所示的窗口。 数据对象列出了系统对象和架构。 查询编辑器窗口有助于针对所选架构触发查询。

aws redshift_AWS Redshift入门

You can start firing DDL (Data Definition Language) and DML (Data Manipulation Language) queries from the Query Editor window as shown below. You can read more about the AWS Redshift query language from here.

您可以从“查询编辑器”窗口开始触发DDL(数据定义语言)和DML(数据操作语言)查询,如下所示。 您可以从此处阅读有关AWS Redshift查询语言的更多信息。

aws redshift_AWS Redshift入门

删除AWS Redshift集群 (Deleting AWS Redshift Cluster)

Once you are done using your cluster, it is recommended to terminate the cluster to avoid incurring any cost or wastage of the free-tier usage. Navigate to the dashboard page by clicking on the dashboard icon on the left pane. Select your cluster and click on the Delete button from the Actions menu.

一旦完成使用群集的操作,建议终止群集,以免产生任何成本或浪费免费层使用。 通过单击左窗格上的仪表板图标导航到仪表板页面。 选择您的集群,然后从“ 操作”菜单中单击“ 删除”按钮。

aws redshift_AWS Redshift入门

You would be prompted with a pop-up dialog that will ask you to create a final snapshot. If you do not have any data that you want to retain in a snapshot will have an additional cost, then you can uncheck this option as shown below. Click on the Delete button and this will start the deletion process and within a minute or two the AWS Redshift cluster would get deleted.

弹出对话框将提示您,要求您创建最终快照。 如果没有任何要保留在快照中的数据将产生额外的费用,则可以取消选中此选项,如下所示。 单击删除按钮,这将开始删除过程,一两分钟之内,AWS Redshift集群将被删除。

aws redshift_AWS Redshift入门

结论 (Conclusion)

In this article, we covered the process of creating an AWS Redshift cluster and the various details that are required for creating a cluster. We briefly understood the way to access the cluster from the browser and fire SQL queries against the cluster. And finally, once the cluster is no longer required, we learned how to delete the cluster to stop incurring any cluster usage cost.

在本文中,我们介绍了创建AWS Redshift集群的过程以及创建集群所需的各种详细信息。 我们简要了解了从浏览器访问集群并对集群发起SQL查询的方式。 最后,一旦不再需要集群,我们就学习了如何删除集群以停止产生任何集群使用成本。

翻译自: https://www.sqlshack.com/getting-started-with-aws-redshift/

aws redshift