Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Cluster indexes: Under the hood

Some interesting trivia about cluster indexes in DB2.

There is a widespread misconception that data rows are stored on the leaf level of a cluster index. With DB2, this is not true.

Why should we care? In some cases, it certainly pays to understand what's going on under the hood. This is one of those cases, so let us discuss cluster indexes in more detail.

It is quite easy to check how many pages a table uses and if the table has a cluster index:

SELECT NPAGES FROM SYSSTAT.TABLES WHERE TABNAME='SALES_DETAIL'
NPAGES      
-----------
         807 
SELECT INDEXTYPE,NLEAF,CLUSTERFACTOR,INDNAME FROM SYSCAT.INDEXES WHERE TABNAME='SALES_DETAIL'  AND INDEXTYPE='CLUS'
  0 record(s) selected.

In this case, the table has 807 pages and there is no cluster index.. Let us create one:

CREATE INDEX SALES_CUST_ID ON SALES_DETAIL(CUSTOMER_ID) CLUSTER;

Note: It is important to realize that the statistics for this index have not been created automatically, which is also easy to check:

SELECT INDEXTYPE,NLEAF,CLUSTERFACTOR,INDNAME FROM SYSCAT.INDEXES WHERE TABNAME='SALES_DETAIL'  AND INDEXTYPE='CLUS'

INDEXTYPE NLEAF       CLUSTERFACTOR            INDNAME
--------- ----------- ------------------------ -------------
CLUS               -1   -1.00000000000000E+000 SALES_CUST_ID

When NLEAF equals to -1, that indicates that the statistics have not been collected. Having created the statistics, it is very clear that a separate database object has been created:

RUNSTATS ON TABLE DB2ADMIN.SALES_DETAIL AND DETAILED INDEXES ALL;
SELECT NPAGES FROM SYSSTAT.TABLES WHERE TABNAME='SALES_DETAIL'

NPAGES      
-----------
         807

SELECT INDEXTYPE,NLEAF,CLUSTERFACTOR,INDNAME FROM SYSCAT.INDEXES WHERE TABNAME='SALES_DETAIL'  AND INDEXTYPE='CLUS';

INDEXTYPE NLEAF       CLUSTERFACTOR            INDNAME
--------- ----------- ------------------------ -------------
CLUS               83   +6.05269042343423E-001 SALES_CUST_ID

So here is an important point: a cluster index uses up just as many index pages as a regular index would.

Because the table and the cluster index are two separate objects, creation of a cluster index does not mean that the rows are automatically reordered. Just have a look at the cluster factors of the indexes on SALES_DETAIL table.

The cluster factor of the cluster index that we just created is more or less low (high would mean close to 1), while the cluster factors of all the other indexes on the table are unchanged so far. This means that the rows of SALES table have not been reordered yet. To actually reorder the rows, one needs to issue REORG TABLE statement explicitly:

REORG TABLE DB2ADMIN.SALES_DETAIL 

That done, now the rows definitely are stored in the order of the clustered index, aren't they? Let us double check:

SELECT INDEXTYPE,NLEAF,CLUSTERFACTOR,INDNAME FROM SYSCAT.INDEXES WHERE TABNAME='SALES_DETAIL'  AND INDEXTYPE='CLUS';

INDEXTYPE NLEAF       CLUSTERFACTOR            INDNAME
--------- ----------- ------------------------ -------------
CLUS               83   +6.05269042343423E-001 SALES_CUST_ID

According to the statistics, nothing has changed! The cluster factors of the cluster index on the table is exactly the same. What is going on? In fact, REORG utility did reorganize the table and did store the rows in the order of its cluster index. However, the utility did not refresh the table's statistics. Having run the RUNSTATS utility again, we shall see that the cluster factor is exactly 1, as it should. These examples illustrate the need to run statistics after reorganizing a table.

Now the results are exactly what we had expected. The rows are stored in the order of the cluster index, but for the time being only. Although index entries are always stored in order no matter what, the data rows aren't. DB2 will definitely try to maintain the once established order, but as the time goes by, there may appear more and more rows stored out of order. As a result, the cluster factor of the cluster index may degrade. This degradation may be detected by running RUNSTATS utility and fixed by reorganizing the table.

About the Author

Alexander Kuznetsov has 15 years of experience in software design, development and database administration. Currently he improves performance of applications running in a multi-terabyte database environment. Alexander is an IBM Certified Advanced Technical Expert (DB2 Cluster) and an IBM Certified Solutions Expert (Database Administration and Application Development).

For More Information

  • Feedback: E-mail the editor with your thoughts about this tip.
  • More tips: Hundreds of free DB2 tips and scripts.
  • Tip contest: Have a DB2 tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize -- submit your tip today!
  • Ask the Experts: Our SQL, database design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.
  • Forums: Ask your technical DB2 questions--or help out your peers by answering them--in our active forums.
  • Best Web Links: DB2 tips, tutorials, and scripts from around the Web.

Dig Deeper on DB2 UDB (universal databases)

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

SearchDataCenter

Close