Starting with Amazon Redshift Cluster for SQL Database Developer

In this Amazon Redshift tutorial, I want to guide SQL database developers with creating Amazon Redshift Cluster connecting Redshift database with Query Editor and help them to build their SQL query codes correctly and execute them without a problem. I also provide links to reference documents for places where I find useful like how primary key and other constraints are handled by Amazon Redshift database and SQL commands supported on Redshift database, etc. I think SQL developers new on Redshift database will find useful especially the section about Query Editor.

Amazon Redshift

Create Amazon Redshift Cluster and Create First Database

Sign in to AWS Administration Management Console.
Go to the region where you want to create your first Amazon Redshift cluster.
Choose the Amazon Redshift service from Services.
Here is your Amazon Redshift Service screen on AWS Management Console.

Amazon Redshift Service on AWS Management Console

If you have not yet used Amazon Redshift you can try this service free for a two months period of time. Please review guide for how to try and use Amazon Redshift service for free.

Now press on Quick launch cluster button

When the quick launch cluster configuration screen is displayed, first check for appropriate Redshift cluster node types. At this point in this tutorial, I will choose dc2.large as cluster node type since price is more important than performance since this Redshift cluster will be created for this tutorial only.

Amazon Redshift Cluster node types

For the number of nodes, I'll set this configuration value to 1.
In this case my Redshift cluster will be a single node cluster and this node will perform both as leader node and compute node.

In the following section, you will give a name to your Redshift cluster by typing in "Cluster identifier" textbox.

Redshift database identifier, master user and password

Also type a new database name or accept the default suggested name "dev" as database name in your Redshift cluster.

5439 is the default port number on which the Redshift database accepts external connections.

And provide your master user name and password for your Amazon Redshift cluster

For the Cluster permissions though it is optional, if you plan to use other AWS services you can choose additional IAM roles and associate with this Amazon Redshift cluster.
Redshift administrator can associate up to 10 IAM roles with a Redshift cluster.
Please note you can only select roles with the principal "redshift.amazonaws.com" from the list.
I plan to use S3 Simple Storage Service, Glue or Amazon Specktrum with a NOSQL RDS database but now I will leave empty.

In Default settings section, some default values for network, security, backup and maintenance settings are configured for you. You can switch to advanced settings and configure those values for your requirement.

Press Launch cluster to start creation of Amazon Redshift Cluster.

After a short time on Clusters screen of Amazon Redshift service, you will see your Redshift cluster is running and available in healty status

Amazon Redshift clusters

Connect to Amazon Redshift using Query Editor

Now switch to Query editor from the left Redshift service menu.
Using Query editor database developers can connect to Amazon Redshift database on the target cluster.
Provide the database, master user name and password on following screen and press Connect

connect Redshift database using Query Editor

Here is the Query Editor screen

As you can see, it is possible to display tables from different schemas. The default displayed schema is information_schema and tables under this schema.

Database developer or Redshift administrator can work with more than one query screen. Execution results of the query are displayed under query editor screen.

Amazon Redshift Query Editor

Working with Amazon Redshift Query Editor

You can paste following SQL Select query to list database tables existing on our cluster and execute it in Query Editor screen


select * from information_schema.tables;

Code

If you don't use the schema name and execute the SELECT statement directly on "tables" table, you will get the execution exception:
[Amazon](500310) Invalid operation: relation "tables" does not exist;
If you get such an error, be sure you are using the schema name in front of the database table name and check the schema name is correct.

For running your first queries, take care of following syntax issues.
You can assume that you are executing multiple select statements if you have ";" between queries or at the end of each command. But that is not correct. Only the last SELECT query is executed on Redshift database. As a result on the query results screen the database developer can only see the last SQL query output.


select top 3 * from information_schema.tables;

select top 1 * from information_schema.tables;

Code

Later when we create our first database table we can test executing a SELECT query following an INSERT command on sample table. There we will see that the first command, INSERT will not be executed by Redshift database.
The last command or in this case the second commend which is SELECT query will be executed.

For comments, database programmer can use either of the syntax


-- This is a comment

/*

This is a comment, too

*/

SQL Comments

Query Editor tool is not providing an encouraging experience for database developers.
For example, following SQL script will return no resultset on screen as query results since the last command is comment command "--"

Query editor problems for Amazon Redshift

Guess what :)
If you leave an empty line as the last command line, or even a single space " " after last command's ";" end character, database developer will get an SQL exception!
Query can not be empty

Redshift database error: Query can not be empty

If you get used to these issues, you can start working on Query Editor and expect what can cause problems and how to deal with errors
Otherwise, if you type your SQL scripts as you are used to from your previous experiences on different query editor tools, you might get crazy in a short time.
Probably, once you execute a query you will get a result set but not on the second execution. Then you will search on development forums for why Redshift sometimes does not display query results, etc.

With Query Editor, there are some good features too that we are used to from other database SQL editors. For example, on Redshift Query Editor, you can highlight a code line execute that SQL command only.

Redshift Query Editor feature

Create Table on Amazon Redshift Database

As an SQL database developer, we can create a table on Redshift database easily. But before we execute CREATE TABLE command, maybe we can first check supported data types on Amazon Redshift.

I just type the data types below to reference them easily in future

Supported or Valid Data Types on Amazon Redshift Database
smallint - Signed two-byte integer
integer - Signed four-byte integer
bigint - Signed eight-byte integer
decimal - Exact numeric of selectable precision
real - Single precision floating-point number
double precision - Double precision floating-point number
boolean - Logical Boolean (true/false)
char - Fixed-length character string
varchar - Variable-length character string with a user-defined limit
date - Calendar date (year, month, day)
timestamp - Date and time (without time zone)
timestamptz - Date and time (with time zone)

It is possible to create IDENTITY columns.

On Amazon Redshift database a table with Primary Key can be created.
But this does not mean Amazon Redshift checks uniqueness of primary key column values.
Primary Key on Redshift database table is for information only.
Oh, strange, isn't it? I cannot understand why a check constraint like Primary Key is not implemented on Amazon Redshift.
For more details on not-enforcing uniqueness on primary keys, etc please refor to this documentation.

Let's make the primary key problem on Amazon Redshift more clear with a sample case.

We can also create our first Redshift database table in our Redshift tutorial. Please pay attention to IDENTITY column ID and our PRIMARY KEY column CODE.


create table Country (

 id integer identity(1,1) not null,

 code varchar(3) not null primary key,

 country nvarchar(100)

);

SQL Table Creation

Then execute below two INSERT commands one by one and then the last SQL SELECT statement


insert into Country (code, country) values ('TR','Türkiye');

insert into Country (code, country) values ('DE','Deutschland');


select * from Country;

SQL code

primary key on Amazon Redshift database table

SQL SELECT query result shows that Identity column works as expected. By default it auto generates integer values for the ID field starting from 1 and increasing by 1.

Now I will try to insert a third row which has its primary column value same as one of previous examples. So what we expect from SQL Server for instance, is to throw SQL exception something like "duplicate key value", etc.
Let's try this on Redshift database and see how it behaves.


insert into Country (code, country) values ('TR','Turkey');

SQL

When you execute above SQL Insert command, Redshift database will successfully execute it without any error or exception. Is this successful, it can be discussed!

Amazon Redshift and uniqueness on primary key columns

You see above, although Primary Key is defined in table creation, there can be multiple rows with same primary key column value.
So please do not forget, Amazon Redshift does not enforce uniqueness on primary key columns.

While creating and dropping database tables, SQL developers can use following DROP TABLE syntax on Redshift Query Editor


drop table if exists Country;

SQL Code

For database programmers, I can suggest them to have a look at SQL commands suppported on Amazon Redshift. For example, I am not familier with VACUUM command, etc. so I can learn what they are used for.