SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software
Greenplum Data Warehouse Administration and SQL Development tutorials, guides and code samples, tools and downloads for SQL Database Professionals

Collect Statistics About Greenplum Database Tables using Analyze and AnalyzeDB

Pivotal Greenplum collect statistics about a database and about the table and table columns in that Greenplum database using analyze command and analyzedb utility. The collected statistics data is stored in pg_statistic system table. All database platforms use collected statistics to build a performant SQL query execution plan by deciding on the least resource using execution plan and fastest result. For the correctness and efficiency of the execution plans created the statistics should be up to date and updated continuously

In this Greenplum tutorial, I want to show how analyze command and analyzedb utility can be executed by SQL developers.

Both Analyze and AnalyzeDB can be executed both on psql utility by connecting to a database on Greenplum Data Warehouse or on a SQL Editor like DBeaver, DbVizualizer or pgAdmin etc.
Let's execute the commands using psql. First connect to the master node of the Greenplum cluster and connect to the target database as seen below where I connected to dev database.

[gpadmin@mdw ~]$ psql
psql (9.4.24)
Type "help" for help.

dev=# \c dev
You are now connected to database "dev" as user "gpadmin".
dev=# analyze verbose poc.vbkd;
INFO: analyzing "poc.vbkd"
INFO: Executing SQL: select pg_catalog.gp_acquire_sample_rows(25709, 2500, 'f');
ANALYZE
dev=#

As seen above SQL developers can execute "analyze verbose" command followed by the database table name or directly "analyze" with the table name.

Analyze command to collect statistics on Greenplum database

Here is how analyze is executed to collect statistics about an other database table on Greenplum.

dev=# analyze poc.vbak;
ANALYZE
dev=#

Greenplum analyze table command

The same commands can be executed on DBeaver SQL editor screen too.

analyze storm.vbap;

Greenplum database analyze table command

Instead of collecting and updating statistics one by one about all database tables, Greenplum SQL administrators can execute Analyze command for the whole database at once.

analyze;

analyze all database tables on Greenplum

For more on analyze command please review the Greenplum documentation.

By connecting to master node, all database table statistics collection can be triggered by executing the analyzedb utility

After you logon to the Greenplum cluster master node, call analyzedb utility with -d and pass the database name as I did for dev database below

analyzedb -d dev

For more on analyzedb please refer to official documentation.

To check the version information about the analyzedb utility, execute:

[gpadmin@mdw ~]$ analyzedb --version
analyzedb version 1.0
[gpadmin@mdw ~]$

Greenplum analyzedb utility version

Especially after high number of data inserts or updates and bulk imports, it is critical to collect statistics and keeping statistics up-to-date for performance and for better execution plan creation for database users queries. Database adminsitrators can manually execute Analyze command and AnalyzeDb utility or execute them automatically by a schedule for more healthy statistics about database table and table columns.

Greenplum DWH


Copyright © 2004 - 2021 Eralper YILMAZ. All rights reserved.