I see a trend in the IT industry today. It manifests itself by a call for "someone else to take care of things for me." How many IT professionals have gained a new employer as a result of a company decision to outsource? How many database administrators have gained a new application to manage as a result of a company decision to purchase an application package? I don't think this trend is going to come to halt, so here are some suggestions designed to help you cope with some of these decisions are being made outside your control.
Looking at the DB2 application marketplace, IBM must be extremely pleased as the license count for DB2 on OS/390 goes through the 9,000 mark headed for numbers as high as the Dow Jones Industrial Average. A large portion of this growth must be attributed to applications that have chosen this rock solid database as their platform. So what can a capacity planner, performance analyst, system programmer or database administrator do when asked by his company to manage the performance of someone else's SQL?
If performance is unacceptable, there is always one easy solution. In the words of Ted VanDuyn, "Memory solves all problems!" Therefore, you can always buy more memory and increase the size or number of your DB2 buffer pools. However, in most of the places I have traveled, that is not the politically correct answer. When a simple answer will not do, a more complicated answer should be explained. Hopefully, this answer can be simple enough for all to use.
When faced with tuning someone else's SQL, you have two basic choices. First, you can add or drop indexes from the application schema or second you can add or expand DB2 buffer pools. That sounds pretty simple. It is not all that complicated. I have listened to many presentations on this subject and have left thinking that the only person that could possible execute on what was presented was the speaker. I want to convince you that you can tune someone else's SQL and make it perform efficiently.
Index changes are usually called for when the access patterns for the application vary from what was anticipated by the designer. The number of indexes that an application should have is a subjective value. The number of indexes is not as important as ensuring the usefulness of the indexes that are defined and creating the indexes that would improve performance. Index analysis should be done from a big picture perspective as opposed to helping one SQL statement to cure an immediate and troubling problem.
Let me start with the targeted end result of index tuning, which should be one of two goals, or both: reduced elapsed time for DB2 threads and reduced I/O traffic through DB2. This means that you would need some idea of what response time and I/O traffic are today. This can be obtained from DB2 accounting information. Any DB2 monitor will load this data into DB2 tables. Once it is there, you can summarize the data on a daily basis. Continue this practice as you change indexes and use this data to determine if an improvement has been made.
Many people ask "How do I get the right indexes?" It is not a matter of right or wrong, good or bad. Look upon this design process as determining the ones that are most beneficial and having them available at appropriate times. Indexes positively affect an application when used to materialize a result set with fewer I/Os to DB2 objects. Indexes negatively affect an application when data is updated and the indexes have to be changed as well. Effective indexes are the ones that provide the greatest I/O reduction when contrasted with the time required to keep the indexes updated. Some indexes may only be valuable during certain types of application processes. If these processes are infrequent, create the indexes before the process begins and remove them when finished.
The design process for application indexes requires some data to be collected. Just as you would interview the programmers for a new application, you will need to interview your vendor to obtain information about their product. You will need to know type of SQL used by the product, whether it used Static, Dynamic SQL or a combination of both. If it uses Dynamic SQL, it is important to know if the product uses host variables or literals in its predicates. From the DB2 catalog you can also get a list of objects that are accessed by package or DBRM. As you gather this information, create a spreadsheet or table that contains the packages, the objects accessed the type of access that will be performed and some weighting factor that will give priority to those processes performed most frequently. The type of access to the objects will provide you with a picture of the application's use of its data. If there are indexes that are only updated and never used for select processing, consider these indexes prime targets for elimination. However, before removing the indexes, check for those infrequent processes that might be benefited by the index for short periods of time. For those processes that have high update activity, index evaluation is important to ensure the ones chosen are providing value.
Verifying the index design process is built upon the accounting data mentioned earlier. As you change the index structure, review the response time and I/O statistics for corresponding reductions. Remember that sometimes it may take a week of processing to determine the impact of a change.
Buffer Pool Design
Buffer pool usage is a tradeoff. The tradeoff involves the use of memory to cache data so as to avoid I/O to disk drives. If you use too much memory for buffer pools, this tradeoff can be to the detriment of OS/390 if it causes increased paging activity. Memory for buffer pools is currently limited to a private address space or to an Expanded Storage Only Hiperspace. In Version 6, DB2 will provide for buffer pools in a Data Space and with this change each customer will need to re-evaluate their buffer pool design.
While the limitation exists for pools in central or expanded storage, the design of pools must be based upon the amount of memory available in each category. Some history might be useful. Expanded storage was created when IBM made CPUs with new fast memory and less new slower memory. Central storage was the new faster memory and expanded storage was the less new slower memory. Central Storage is byte addressable in OS/390 and expanded storage is page addressable. This made sense when the price of memory was expansive and when cheaper storage could be used to reduce OS/390 paging to disk drives. With the CMOS processors from IBM, all of the memory is the same speed and costs the same. Therefore, the limitation today is the 2GB limitation of byte addressable storage in OS/390. If you have 2GB or less of memory per LPAR, then it makes sense to use all of this for virtual storage and create Virtual Buffer pools without Hiperpools. If you have more than 2GB of memory or more per LPAR, use the storage over 2GB as expanded storage and create Hiperpools to use that storage limiting the use of this storage only when it increases OS/390 paging.
Are there any other limitations? Only one other limitation exists today. The buffer pools are allocated in the DB2 database address space which is limited to 2GB of virtual storage. The practical limitation based upon what else must be loaded in that address space is usually 1.5 - 1.6 GB of virtual storage. This could be less if you have an application that utilizes dynamic SQL heavily and have chosen to cache the SQL prepares in the EDM pool.
What are the considerations for buffer pool design? The objective of buffer pool tuning should be to increase hit ratios of all pools as much as possible accomplishing I/O avoidance. To accomplish this, application data should be isolated into separate pools as much as possible based upon data usage. Lets start with some basics. DB2 system objects must be in BP0. I also suggest isolating sort work objects in DSNDB07 in BP7. However, what would you do with your application data?
Code tables seem to be a part of all applications and they tend to be small in size and high in access. Attempt to put code tables in a pool to itself. As a suggestion, run statistics, then add up the NACTIVE statistic for the tablespaces containing these tables, and then determine if you can allocate a pool with that number of pages. Be careful also of multi-table tablespaces that don't contain all code tables. It probably won't happen, but you can never be too careful. If you can do this, the data will be loaded into a pool and then always accessed within memory. Even if you can't allocate the entire amount, the closer you get to that total amount will influence the hit ratio for that pool. Remember also that other buffer pools will be created so don't allocate all the memory you have for this one pool. You can also come back later and increase the size if you have memory left over after the entire tuning process.
Indexes are next in importance. My suggestion is to have at least two pools for indexes, one for high priority applications and a second one for all the rest. However, each of these could be divided one more time, making a total of four pools for indexes. The second division of these pools is based upon the access patterns to the indexes. My suggestion is to have one for indexes that are primarily probed and a second pool for those whose leaf pages are scanned the majority of the time. The indexes that probe for RIDs will probably provide the best hit ratio with the smallest amount of memory compared to the total number of pages in the indexes. Those indexes that are scanned will require more memory to achieve the same hit ratios.
Finally, the table data should be allocated across multiple buffer pools. Allocation of pools for this data is the hardest to quantify into generic suggestions. Tables will require some creativity and intuition to achieve the best results. There are vendor tools that provide buffer pool usage data that are very beneficial in configuring pools for your data. What I can suggest is a place to start. Begin with a minimum of four pools using the same characteristics as the indexes, two for high priority tables and two for low priority tables. Separating these further by scans and random processing can only be accomplished with performance trace data. I would not recommend the traces, rather use your best judgment and then change based on performance.
Establish a starting point for your subsystem using 10 buffer pools or less. Monitor your threads response time and your buffer pool hit ratios with your DB2 monitor. Make minor changes from one pool to another and observe the changes. For buffer pools that achieve over a 95 percent hit ratio, you may want to examine the activity to these pools. If the getpage count is very high, this pool may be over allocated. This is especially true if you have other pools with hit ratios that are less than 20 percent. Be aware that buffer pool ratios can be a negative number. This condition is caused when a prefetch I/O brings in 32 pages into the pool and the pages are stolen before the application can process all the rows on the page. When the application finally tries to process a row on the page, DB2 will bring in the page with a synchronous I/O. This second I/O will usually be all that is required, but for threads that experience excessive lock wait time this process of synchronous I/O for processing a row could be repeated multiple times. Another point to consider is that adding more pages to this pool may exacerbate the problem instead of fixing it. In this case, you probably have multiple DB2 objects being scanned at the same time.
A DB2 administrator should be able to get started down the road of tuning some else's SQL by using these steps. Remember the assumption was made that no SQL could be changed. As a result, the suggestions do not lead to a perfectly tuned application. That normally requires changing the SQL to get the most from DB2. Most vendors want their application to run on as many relational databases as possible. Therefore, they write their SQL generically without specific changes to achieve performance on DB2 for OS/390. I wish you good luck when tuning someone else's SQL. Write or e-mail me with your thoughts or suggestions as to how this process can be improved.
About the Author
Tom Moulder is a Software Consultant at BMC Software.
For More Information
- The Best Database Design Web Links: tips, tutorials, scripts, and more.
- Have a database design tip to offer your fellow DBA's? The best tips submitted will receive a cool prize--submit your tip today!
- Ask your technical database design questions--or help out your peers by answering them--in our live discussion forums. Also, give us your feedback about this tip in the "Sound Off" forum.
- Check out our new Ask the Experts feature: Our database design gurus are waiting to answer your toughest questions.