SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software
Development resources, articles, tutorials, code samples, tools and downloads for AWS Amazon Web Services, Redshift, AWS Lambda Functions, S3 Buckets, VPC, EC2, IAM

Connect SQL Server Database from Pivotal Greenplum using PXF


In this Greenplum database tutorial, I want to share the steps to connect SQL Server database using PXF Greenplum Platform Extension Framework and external tables. PXF enables SQL developers to define database connections to remote data sources and create external tables which are linked tables or virtual tables with data is read on demand from external resource. Since I did not come across with a sample that connects Greenplum to SQL Server database on the web, I wanted to show the required configurations in order with database programmers and Greenplum data warehouse developers.

Greenplum to SQL Server database connection using PXF


Connect to Greenplum Server Master Node

Connect to Greenplum master node using putty

putty to connect to Greenplum server over ssh

Login with "gpadmin" user and execute following command

sudo su - gpadmin
Code

logon to Greenplum server


Download SQL Server JDBC Driver

Download the appropriate JDBC driver for SQL Server from Microsoft SQL Docs web portal.

SQL Server JDBC driver downloads

If you choose Microsoft JDBC Driver 7.4 for SQL Server like me to download and install on your Greenplum Server,
For Linux the driver download file is sqljdbc_7.4.1.0_enu.tar.gz
And to install the driver on a Windows server, the download file is sqljdbc_7.4.1.0_enu.exe
Both have a size of 37.5 MB

If you unzip the compressed file, you will see instructions for how to install the JDBC driver on Linux

For accessing SQL Server from Pivotal Greenplum using PXF, the data warehouse administrators can use the mssql-jdbc-7.4.1.jre8.jar .jar file


Greenplum PXF Configuration to Connect SQL Server

Check PXF configuration folder and switch to that directory.
Run following command one after an other

Echo command will show us that the $PXF_CONF (PXF Configuration Directory) is usr/local/greenplum-pxf, for your notes. Then switch to PXF user configuration directory.

echo $PXF_CONF
cd $PXF_CONF
Code

If you list the contents of the folder, you will see that there are subfolders:
lib folder for external driver files,
templates folder where the PXF server connection configuration file templates are stored, servers folder where Greenplum PXF developers should create the server connection file

PXF user configuration directory

Switch to "lib" folder and copy the SQL Server JDBC driver file mssql-jdbc-7.4.1.jre8.jar into this directory using "cp" copy command

jdbc driver files in lib folder

Now, go back one level up from "lib" folder. Then switch to "templates" folder. You will see "jdbc-site.xml" template file in this folder. We will copy this file and make modifications on it in following steps

Greenplum PXF templates folder

Again return to $PXF_CONF folder which is back in one level up using "cd .." command
Now enter into "servers" folder.
The Greenplum administrator should create separate sub-folder for each external JDBC connection in this "servers" folder

If you have not yet defined anything, you will only see the "default" subfolder here.

As seen below, I have already created JDBC connections for PXF external tables since I have 3 more subfolders in this directory.

Greenplum PXF servers folder

To connect to SQL Server, you need to create a server connection definition.
Let's name it "sqlserver" . By the way, you can give any descriptive name here, this name is not related with the target data source platform.

Create the subfolder using following command

mkdir sqlserver
Code

And enter to sqlserver folder

cd sqlserver
Code

And now copy the jdbc connection template file into this new folder

cp $PXF_CONF/templates/jdbc-site.xml .
Code

Greenplum PXF templates directory

Now the Greenplum developers should edit the jdbc-site.xml file contents for the target SQL Server database instance.
If you look at the contents with "more jdbc-site.xml" command, you will see that we need following information to successfully connect to a remote SQL Server instance.
JDBC driver class name,
JDBC URL for SQL Server,
A valid database user and its password

jdbc-site.xml template for Greenplum JDBC connection to data sources

Using "Ctrl+C" you can exit from the file content display command.

For the JDBC connection URL for SQL Server is different from the one shared in the template which is valid for PostgreSQL databases.
So, if you are trying to connect to a different data platform than PostgreSQL (or its variants like Amazon Redshift, Greenplum, etc.), you should have a look at JDBC Drivers Reference

For SQL Server the JDBC URL syntax is in following format:

jdbc:sqlserver://<server>[:<port>];DatabaseName=<databaseName>
Code

Port is default 1433.
For example, if you want to access to a SQL Server database named "kodyazdb"running on host "10.160.80.40" thant the JDBC URL will be:

jdbc:sqlserver://10.160.80.40:1433;DatabaseName=kodyazdb
Code

As I said before the template is misleading the developers at this point, you should double-check the JDBC connection URL from other resources too.

Now edit the "jdbc-site.xml" connection configuration file using "vi" tool.

SQL Server connection settings for Greenplum PXF external table

Save the changes.


Create External Table on Greenplum Database

Now we are ready to create external table on Greenplum database which connects to SQL Server database table and reads data on demand using PXF, Greenplum Platform Extension Framework.

Here is the CREATE EXTERNAL TABLE command for SQL developers on Greenplum database.
In following above SQL SELECT command;
"dbo.states" is the schema name and the table or view name where the data is being read on the SQL Server data source. Since the database name is already defined in jdbc-site.xml file, database is not mentioned in the script.
"sqlserver", the value of "SERVER" parameter in Location argument is the name of the folder created for server configuration where the jdbc-site.xml file is created in

CREATE EXTERNAL TABLE pxf_states(code char(2), name varchar(50))
LOCATION ('pxf://dbo.states?PROFILE=Jdbc&SERVER=sqlserver')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
Code

Then by querying the external table pxf_states using following SQL Select statement;

select * from pxf_states;
Code

SQL developers building applications on Greenplum database can read data from external database tables easily using PXF, Greenplum Platform Extension Framework.

Pivotal Greenplum external table from SQL Server



AWS


Copyright © 2004 - 2024 Eralper YILMAZ. All rights reserved.