SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software
Development resources, articles, tutorials, code samples, tools and downloads for AWS Amazon Web Services, Redshift, AWS Lambda Functions, S3 Buckets, VPC, EC2, IAM

Connect to Exasol Database from Greenplum using PXF Extension Framework


PXF Extension Framework (PXF) enables Pivotal Greenplum Data Warehouse professionals to connect external data sources on demand to query data stored on those remote databases using federated queries without replicating data physically. Exasol or ExaSolution Data Warehouse is the TPC-H benchmark leader among other databases on the market. This Greenplum tutorial shows SQL developers and Pivotal Greenplum administrators how to configure Greenplum database server for connecting Exasol databases via PXF Extension Framework. Both Greenplum Data Warehouse and Exasol Data Warehouse platforms are installed on an AWS account.

To summarize the process, Greenplum cluster administrators can follow below steps for enabling SQL developers to access Exasol database tables using PXF framework and create external tables.
First, download Exasol JDBC driver which will enable connection between Greenplum database to Exasol data warehouse.
As second step, connection properties configuration file will be created. The jdbc-site.xml file contains target Exasol database server host information, database user and password, etc.
Last step on Greenplum server configuration is restarting the PXF service.
After this point, SQL programmers can create external tables on Greenplum databases.
It is possible to use these external tables for data migration between Exasol database and Greenplum database, too.

As seen in below image PXF Platform Extension Framework enables Pivotal Greenplum to connect various data sources easily. For example please read referred tutorial to connect SQL Server database from Pivotal Greenplum using PXF.

Pivotal Greenplum Platform Extension Framework (PXF)

Let's start our Greenplum tutorial.


Download Exasol JDBC Driver

As mentioned before, for connecting Greenplum database to Exasol cluster, JDBC drivers are used by Greenplum PXF Extension Framework. SQL developers can download Exasol JDBC driver from official Exasol Download portal. For not to experience any problems, it is better to know the version of the target Exasol Data Warehouse cluster.

Exasol Data Warehouse version can be controlled on EXASolution web-based tool.
On Software Configuration page EXASolution database version can be easily detected.

check Exasol Database version using EXASolution

In this Greenplum tutorial, I will show how data professionals can connect Exasol 6.2.1 from Greenplum database.
Various downloads including JDBC drivers for Exasol 6.2.1 can be displayed at Exasol Version 6.2.1 Downloads

Since Pivotal Greenplum PXF is running on a Linux server, the required download file is EXASOL_JDBC-6.2.1.tar.gz
Administrators can choose the correct download file from the downloads section easily.
After downloading EXASOL_JDBC-6.2.1.tar.gz on a Windows computer, I extract the compressed file to a folder.
Then among extracted file contents, upload exajdbc.jar Exasol JDBC driver file to an Amazon S3 bucket which will be a bridge for JDBC file copy.

The AWS S3 bucket folder and full path which I used for this Greenplum tutorial is "database-jdbc-drivers/exajdbc.jar"

upload Exasol JDBC driver to Amazon S3 bucket

Let's now continue with Greenplum cluster PXF configuration for Exasol connection.


Greenplum PXF Configuration for Exasol Database

For configuring PXF virtualization framework to connect Greenplum and Exasol databases, there are three steps:
First upload JDBC driver to Greenplum server,
Then configure database JDBC connection file for connecting to Exasol,
And finally, restarting the PXF service

But before all these step, let's connect to Greenplum cluster, let's assume the server IP address is 10.159.82.30
Using PuTTY over an SSH connection, you will need the private key file (.ppk file) for the gpadmin Pivotal Greenplum cluster administrator user.

connect Greenplum database server using SSH with PuTTY


Copy Exasol JDBC Driver to Pivotal Greenplum Server

After we have connected to the Greenplum cluster, it is time to copy the JDBC driver from its location on S3 bucket to the appropriate file folder.

[gpadmin@mdw ~]$ echo $PXF_CONF
/usr/local/greenplum-pxf
[gpadmin@mdw ~]$ cd /usr/local/greenplum-pxf
[gpadmin@mdw greenplum-pxf]$ ls
conf keytabs lib logs servers templates
[gpadmin@mdw greenplum-pxf]$ cd lib
[gpadmin@mdw lib]$ ls
Code

connect Greenplum server and display PXF lib folder for JDBC drivers

Within "lib" folder, Greenplum server administrators can store JDBC drivers' .jar files which will enable PXF framework to connect external data sources including Exasol database.
List command "ls" displays the JDBC driver files in "lib" folder for Greenplum PFX framework. In my case, I had ngdbc-2.4.70.jar for SAP HANA database driver, mssql-jdbc-7.4.1.jre8.jar for SQL Server JDBC driver and RedshiftJDBC42-1.2.37.1061.jar for connecting to Amazon Redshift via its native JDBC driver.

Now copy the exajdbc.jar Exasol JDBC driver file into Greenplum server "lib" server using below AWS CLI command.

aws s3 cp s3://database-jdbc-drivers/exajdbc.jar exajdbc.jar
Code

copy Exasol JDBC driver from Amazon S3 to Greenplum server PFX lib directory

Now the JDBC driver required for Greenplum PXF data virtualization framework is copied into the correct folder on Greenplum server.


Connection Configuration

Greenplum administrators can now define JDBC connections to Exasol databases using the uploaded driver in previous step.
To define connection properties, admins have to create connection configuration files on the server. While we are still on the PuTTY screen connected to the Greenplum server, let's go up 1 level in directory structure and then into "servers" subfolder as seen below.

[gpadmin@mdw lib]$ cd ..
[gpadmin@mdw greenplum-pxf]$ ls
conf keytabs lib logs servers templates
[gpadmin@mdw greenplum-pxf]$ cd servers
[gpadmin@mdw servers]$ ls
b1p c3p default hana k0s pcs redshift sqlserver
Code

Each sub-folder within "servers" directory represents a separate database connection to a remote data source.
For example, in "sqlserver" subfolder I have provided JDBC connection details to connect to a specific SQL Server database instance. Similarly, the "redshift" subfolder includes a file storing connection string properties for accessing the Amazon Redshift cluster in my current account.

Pivotal PXF connection configuration files

Now create a new separate folder for our Exasol database connection. You can rename it "exasol" if you have only one target Exasol database connection.

Additionally, we are copying the template for JDBC connection properties file jdbc-site.xml into recently created "exasol" folder.

[gpadmin@mdw servers]$ mkdir exasol
[gpadmin@mdw servers]$ cd exasol
[gpadmin@mdw exasol]$ ll
total 0
[gpadmin@mdw exasol]$ copy $PXF_CONF/templates/jdbc-site.xml .
[gpadmin@mdw exasol]$ ls
jdbc-site.xml
[gpadmin@mdw exasol]$
Code

create PXF connection folder for Exasol on Greenplum

Now, Greenplum server administrator can edit the jdbc-site.xml connection properties file using "vi" editor by command "vi jdbc-site.xml"

Below screenshot displays the 4 important connection properties required for Greenplum data warehouse administrator to enable a successfull connection to Exasol database.

These 4 important JDBC connection properties are:
jdbc.driver: Class name of the JDBC driver
jdbc.url: The URL that the JDBC driver can use to connect to the database
jdbc.user: User name for connecting to the database
jdbc.password: Password for connecting to the database

The main class that will be used for Exasol JDBC driver is com.exasol.jdbc.EXADriver

The JDBC connection string is in the format of jdbc:exa:10.159.82.11:8563
If you have more than one nodes in your cluster, the correct connection URL for the JDBC will be in following format:
jdbc:exa:10.159.82.11,10.159.82.13:8563

For authentication on the Exasol database, a valid user and password is required.
As JDBC user, you can use "sys" user with its password in jdbc.password section. If you have any other database user, you can define it in the connection properties file instead of sys as well.

jdbc-site configuration file to connect Exasol database from Greenplum using PXF


Restart PXF Service

After JDBC file is copied into lib folder and under servers folder the configuration settings is created successfully for the target Exasol database, we can restart the Greenplum PXF service to take the changes into account.

Change your current directory to $GPHOME/pxf/bin and restart PXF service using "pxf cluster stop/start" commands as follows:

[gpadmin@mdw ....]$ cd $GPHOME/pxf/bin
[gpadmin@mdw bin]$ ls
pxf pxf-cli
[gpadmin@mdw bin]$ pxf cluster stop
Stopping PXF on 1 segment hosts...
PXF stopped successfully on 1 out of 1 hosts
[gpadmin@mdw bin]$ pxf cluster start
Starting PXF on 1 segment hosts...
PXF started successfully on 1 out of 1 hosts
Code

restart the Greenplum PXF service

After this point, we are completed with Pivotal Greenplum cluster node. We can now connect to Greenplum database using a database management tool like DBeaver in order to create database objects that will enable SQL developers to query remote data sources' data.


Create External Table on Pivotal Greenplum Server

In our Greenplum database tutorial, now SQL programmers can create database objects which are used as bridges between Greenplum and Exasol database tables. The Pivotal Greenplum PXF Extension Framework enables database developers create external tables just like "virtual tables" in some other data platforms.

Here is how Greenplum SQL developer can create an external table using "exasol" JDBC connection which is defined before in previous steps.
The syntax of "CREATE EXTERNAL TABLE" can be simplified as:
public.pxf_exasol_cities is the schema name and external table name on current Greenplum database.
Location is pointint to the remote data source and target table or view.
Profile is JDBC since we are using JDBC connection for target Exasol database.
Server value is "exasol" which is the name of the folder we have created in previous steps on Greenplum server including the jdbc-site.xml connection configuration file for Exasol database.
After "//" you can see "kodyaz.cities". This is the schema name and table/view name that is being queried on target Exasol database via PXF connection.

-- create external table
CREATE EXTERNAL TABLE public.pxf_exasol_cities
(
CITY_ID SMALLINT,
COUNTRY_CODE VARCHAR(2),
ZIP_CODE VARCHAR(20),
CITY_NAME VARCHAR(200),
DISTRICT VARCHAR(50),
AREA VARCHAR(50),
AREA_SHORT VARCHAR(50),
LAT DECIMAL(9,6),
LON DECIMAL(9,6)
)
LOCATION ('pxf://kodyaz.cities?PROFILE=Jdbc&SERVER=exasol')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
Code

So by creating above external table on Greenplum database, SQL developers can access to kodyaz.cities table on remote Exasol Data Warehouse easily.

create external table on Greenplum for Exasol database access

External table now enables Pivotal Greenplum Data Warehouse SQL developers to query data stored on remote data source, our target Exasol database table "kodyaz.cities".

select * from public.pxf_exasol_cities;
Code

Greenplum external table SQL query for Exasol database data


Import Exasol Database Table into Greenplum Database using PXF

To migrate data physically into Greenplum database from Exasol DWH, a database table can be created with same data structure. Data migration from remote Exasol data source can be done via "INSERT INTO ... SELECT FROM ..." SQL command shown in this section below.

Greenplum database administrator can create a table for the data to be imported from Exasol database table. We have already table definition and DDL script in previous step. Instead of creating an external table, this time Greenplum SQL developers can create a physical table within the database.

-- create local table
CREATE TABLE public.cities
(
CITY_ID SMALLINT,
COUNTRY_CODE VARCHAR(2),
ZIP_CODE VARCHAR(20),
CITY_NAME VARCHAR(200),
DISTRICT VARCHAR(50),
AREA VARCHAR(50),
AREA_SHORT VARCHAR(50),
LAT DECIMAL(9,6),
LON DECIMAL(9,6)
);
Code

After the table is created on Greenplum database, following SQL INSERT statement can be used to insert data into physical Greenplum database table from external table linked to Exasol Data Warehouse.

-- migrate data from external resource into current database table
insert into public.cities select * from public.pxf_exasol_cities;
Code

Now the same data in remote Exasol database table is replicated into current Greenplum database table physically. This is an easy method of data replication or data migration from Exasol database into Greenplum data warehouse platform.

I hope this Greenplum tutorial is useful for SQL developers to understand Pivotal Greenplum Platform Extension Framework (PXF) and how to use it for connecting Exasol to Greenplum in order to create external tables and query data virtually, or for data migration between these two data warehouse platforms.



AWS


Copyright © 2004 - 2024 Eralper YILMAZ. All rights reserved.