Let us consider a very common situation. A GUI application issues a query, then the first 25 rows from the result...
set are displayed on screen. If the user presses Page Down, then the next 25 rows will be displayed, and so on. It is well known that FETCH FIRST 25 ROWS and OPTIMIZE FOR 25 ROWS clauses of SELECT statements are very useful in this situation. In fact, using either of them is a straightforward way to tell the optimizer what you actually need. The optimizer, in turn, might choose an entirely different access plan to satisfy a query with such a clause. For example, the query
SELECT * FROM CUSTOMER ORDER BY LAST_NAME
may be satisfied by a tablespace scan, followed by a sort. As a result, the initial response time may be quite high. However, this is the fastest way to retrieve the whole result set. Adding an OPTIMIZE FOR, as well as a FETCH FIRST, clause may cause the optimizer to choose an index access plan, so that the first 25 rows are returned as soon as possible. The immediate response time usually comes at a price: should we ever need to retrieve all the rows, the overall execution time would be high. It could be significantly higher than the execution time of a tablespace scan.
You may also have been advised to use an OLAP function ROW_NUMBER() to limit the output to 25 rows at a time. This advice is more controversial. Let us discuss it in more detail and see its potential drawbacks. If you use Visual Explain to compare execution plans of these two statements:
SELECT ROW_NUMBER() OVER() AS N, S.* FROM SALES_DETAIL S WHERE N BETWEEN 26 AND 50; SELECT * FROM SALES_DETAIL FETCH FIRST 50 ROWS ONLY
you will not notice much difference. Both plans will be tablespace scans. If you run these two queries against a small table, you will not notice much difference in response time either. Although the plans look very similar, the actual execution is quite different:
- the first statement will scan the whole table, checking every row against the criteria N BETWEEN 26 AND 50
- the second statement will also start scanning the whole table, but the scanning will stop as soon as the required amount of rows is retrieved.
Do not assume that these two statements will perform equally well against a big table. If you use db2batch utility to measure real execution costs, you will be able to detect the difference: the first statement will scan the whole table, while the second one will read just several data pages containing the first 25 rows. Should the table in production environment grow really big, the difference would be dramatic, so it is usually a good idea to test your application against realistically big amounts of data from the early stages of development.
As we have seen, the FETCH FIRST and OPTIMIZE FOR clauses are usually more appropriate for limiting the amount of rows if the result set is big. Also I have demonstrated the importance of considering both execution plan and real execution costs.
About the Author
Alexander Kuznetsov has 15 years of experience in software design, development and database administration. Currently he improves performance of applications running in a multi-terabyte database environment. Alexander is an IBM Certified Advanced Technical Expert (DB2 Cluster) and an IBM Certified Solutions Expert (Database Administration and Application Development).
For More Information
- Feedback: E-mail the editor with your thoughts about this tip.
- More tips: Hundreds of free DB2 tips and scripts.
- Tip contest: Have a DB2 tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize -- submit your tip today!
- Ask the Experts: Our SQL, database design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.
- Forums: Ask your technical DB2 questions--or help out your peers by answering them--in our active forums.
- Best Web Links: DB2 tips, tutorials, and scripts from around the Web.