SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software
Greenplum Data Warehouse Administration and SQL Development tutorials, guides and code samples, tools and downloads for SQL Database Professionals

Display Greenplum Database Table Distribution Key and Data Distribution

Pivotal Greenplum database SQL developers frequently require to display table distribution keys and table data distribution among cluster segments. Since different number of management tools can be used with Greenplum Data Warehouse platform, the most convenient way for programmers and database adminstrators is listing table distribution keys using SQL. In this Greenplum tutorial, I want to share two SQL queries; one for displaying distribution keys of all database tables in Greenplum and the other is showing the data distribution of a specific Greenplum database table among cluster segments by giving row counts for each segment.


Distribution Key of Greenplum Database Tables

Greenplum introduced pg_get_table_distributedby() function for developers so that they can get the distribution key of a database table by passing the "oid" object id value in their SQL queries as follows.
Here is a sample SQL query which returns all database tables and their distribution styles and distribution keys as well as the data storage mode for a specific schema.

select
 c.oid,
 n.nspname as schemaname,
 c.relname as tablename,
 pg_get_table_distributedby(c.oid) distributedby,
 case c.relstorage
  when 'a' then ' append-optimized'
  when 'c' then 'column-oriented'
  when 'h' then 'heap'
  when 'v' then 'virtual'
  when 'x' then 'external table'
 end as "data storage mode"
from pg_class as c
inner join pg_namespace as n
 on c.relnamespace = n.oid
where
 n.nspname = 'poc'
 and c.relstorage = 'c'
order by n.nspname, c.relname ;

Executing above SQL query on a Greenpllum database and replacing the schema name (nspname field criteria in WHERE clause) with a target schema name (or removing), will give you a list of filtered database tables and their distribution styles as well as distribution keys or columns used for distributing table data among Greenplum cluster segments.

Greenplum database table distribution style and distribution keys by SQL query


Distribution of Greenplum Database Table Data on Segments

To display table data distribution among cluster segments, Greenplum database administrator can query table by using gp_segment_id column. Following SQL query can be used for a specific table's distribution between cluster segments.

SELECT gp_segment_id, count(*)
FROM poc.vbak GROUP BY gp_segment_id
ORDER BY gp_segment_id;

The output of the above SQL query showed me that the data distribution of VBAK table is quite good and nearly equally distributed among cluster segments. This proves that the distribution key VBELN column selected as VBAK table distribution key is a good choice.

Greenplum database table data distribution on segments

I hope the SQL SELECT statements will help Pivotal Greenplum data warehouse SQL developer and administrators.



Greenplum DWH


Copyright © 2004 - 2021 Eralper YILMAZ. All rights reserved.