Display Greenplum Database Table Distribution Key and Data Distribution

Pivotal Greenplum database SQL developers frequently require to display table distribution keys and table data distribution among cluster segments. Since different number of management tools can be used with Greenplum Data Warehouse platform, the most convenient way for programmers and database adminstrators is listing table distribution keys using SQL. In this Greenplum tutorial, I want to share two SQL queries; one for displaying distribution keys of all database tables in Greenplum and the other is showing the data distribution of a specific Greenplum database table among cluster segments by giving row counts for each segment.

Distribution Key of Greenplum Database Tables

Greenplum introduced pg_get_table_distributedby() function for developers so that they can get the distribution key of a database table by passing the "oid" object id value in their SQL queries as follows.
Here is a sample SQL query which returns all database tables and their distribution styles and distribution keys as well as the data storage mode for a specific schema.


select

 c.oid,

 n.nspname as schemaname,

 c.relname as tablename,

 pg_get_table_distributedby(c.oid) distributedby,

 case c.relstorage

  when 'a' then ' append-optimized'

  when 'c' then 'column-oriented'

  when 'h' then 'heap'

  when 'v' then 'virtual'

  when 'x' then 'external table'

 end as "data storage mode"

from pg_class as c

inner join pg_namespace as n

 on c.relnamespace = n.oid

where

 n.nspname = 'poc'

 and c.relstorage = 'c'

order by n.nspname, c.relname ;

Executing above SQL query on a Greenpllum database and replacing the schema name (nspname field criteria in WHERE clause) with a target schema name (or removing), will give you a list of filtered database tables and their distribution styles as well as distribution keys or columns used for distributing table data among Greenplum cluster segments.

Greenplum database table distribution style and distribution keys by SQL query

Distribution of Greenplum Database Table Data on Segments

To display table data distribution among cluster segments, Greenplum database administrator can query table by using gp_segment_id column. Following SQL query can be used for a specific table's distribution between cluster segments.


SELECT gp_segment_id, count(*)

FROM poc.vbak GROUP BY gp_segment_id

ORDER BY gp_segment_id;

The output of the above SQL query showed me that the data distribution of VBAK table is quite good and nearly equally distributed among cluster segments. This proves that the distribution key VBELN column selected as VBAK table distribution key is a good choice.

Greenplum database table data distribution on segments

I hope the SQL SELECT statements will help Pivotal Greenplum data warehouse SQL developer and administrators.