Connect Data Virtuality to Amazon Redshift Database

Data Virtuality is used to connect different data sources and provide Business Intelligence BI tools a single unified source of data. This logical data warehouse solution, Data Virtuality can also connect Amazon Redshift databases. In this tutorial, I showed how Data Virtuality Studio can be configured to add Amazon Redshift databases as a new data source. To connect to Amazon Redshift database, data sources can be configured on Data Virtuality Studio as shown in this guide.

Launch Data Virtuality Studio
On Data Explorer window on the left side of the screen, right click on Data sources and choose Add data source context menu option

adding new data sources to Data Virtuality server

With its huge number of data connectors, Data Virtuality as a Logical Data Warehouse supports many data platforms as its data sources.
When the Add data source dialog screen is displayed, Data Virtuality admins will see the possibilities of different data source options in the displayed list.
Under the JDBC node, a huge number of different data platforms are listed as possible data sources.

data source types available for Data Virtuality

For the list of all data connectors provided by Data Virtuality software, please visit Data Virtuality Connectors page.

Amazon Redshift is one of the data sources supported by Data Virtuality.
So we can add an Amazon Redshift database as a new data source to our Data Virtuality server using Data Virtuality Studio.

Choose Amazon Redshift from the list of data source types and click on Next button.
The next step will enable users to configure Amazon Redshift database connection details using following screen.

add Amazon Redhift database as new data source for Data Virtuality

In Alias, type a descriptive name to your new data source on Data Virtuality Studio.

Host is the IP address or the DNS name for the Amazon Redshift cluster.
Default port for Redshift is 5439.

As Database, provide the name of the target database you want to connect to from Amazon Redshift cluster.
It is possible to run multiple databases on an Amazon Redshift cluster. You should provide the name of the Redshift database you are trying to add into your Logical Data Warehouse LDW data sources in lower case.

User name and Password are authentication data of a valid username on Amazon Redshift. If these values are not correct, you won't be able to connect to target Amazon Redshift cluster.

There is no requirement to change the Data source parameters, Translator parameters or JDBC parameters. Keep them as default.

AWS S3 Load section and its parameter values are important.
Please note that for data import into an Amazon Redshift database, the best performing method is using Redshift's SQL COPY command.
Using COPY command, data stored in an Amazon S3 bucket in the form of a text file (delimeted, CSV file, etc.) can be imported into Amazon Redshift database table quickly.
Trying to migrate data into a Redshift table using INSERT statements can not be compared in terms of performance with the performance of COPY command.

So in AWS S3 Load section, it is good to provide a valid Amazon S3 bucket name, the region that AWS S3 bucket is related to, and a user's secret id and its secret key who has access to previousy defined S3 bucket.

Here is how a sample Amazon Redshift database connection can be defined on Data Virtuality add data srouce screen.

Amazon Redshift database connection settings on Data Virtuality Studio

Test connection using the button. If it is successfull with message "Successfully connected to Redshift", click on Next button.

The next screen enables Data Virtuality user to collect or gather statistics about for the data source.
Mark the checkboxes as you wish or keep them unmarked as given by default

gathering statistics for Redshift Data Virtuality data source

Click Finish to add the Amazon Redshift database as the next data source to your Data Virtuality Studio.

"Data source added." success message will be displayed if no error occurs during process.

The new data source will be listed under Data Sources window of Data Virtuality Studio by its alias name.

If you click on the data source name, tables or views of the sample Amazon Redshift database will be listed in a new screen.

Amazon Redshift database tables and views on Data Explorer

SQL developers can execute sample SQL SELECT commands on the new data source now by double-clicking on one of the listed tables in a new SQL Editor screen.


SELECT "catid", "catgroup", "catname", "catdesc" FROM "RedshiftSampleDatabase.category";;

Code

Or execute CALL procedure as follows available for Amazon Redshift data source type.


call "RedshiftSampleDatabase.native"(

 "request" => 'SELECT "catid", "catgroup", "catname", "catdesc" FROM "category";;'

);;

Code

In above screenshot, you can see executing sample SQL queries on Amazon Redshift database using Data Virtuality SQL editor on DV Studio.