Sometimes we need to insert, update or delete a lot of rows. Doing that in one SQL statement may be unwise. Even if the SQL statement is absolutely correct, it might fail for a very simple reason: the resulting transaction might be too large for your server to handle. Another unpleasant possibility would be a transaction that eventually succeeds, but only after hours of utilizing all or almost all the server's resources.
There are several easy ways out of this situation. Basically we need to split a large transaction into several smaller ones. This may be quite easy to accomplish. For example, this statement will assign "Preferred" status to all the customers who opened accounts at least one year ago:
UPDATE CUSTOMER SET STATUS='PREFERRED' WHERE OPEN_DATE < CURRENT DATE - 1 YEAR
It can be easily split into smaller ones, like this:
UPDATE CUSTOMER SET STATUS='PREFERRED' WHERE OPEN_DATE < CURRENT DATE - 1 YEAR AND LAST_NAME LIKE 'A%'; UPDATE CUSTOMER SET STATUS='PREFERRED' WHERE OPEN_DATE < CURRENT DATE - 1 YEAR AND LAST_NAME LIKE 'B%'; UPDATE CUSTOMER SET STATUS='PREFERRED' WHERE OPEN_DATE < CURRENT DATE - 1 YEAR AND LAST_NAME LIKE 'C%';
and so on. This approach may or may not work, depending on how the data is distributed. In fact, in many cases it won't work well enough because of data skew: just a few rows, if any, will match the condition LAST_NAME LIKE 'Z%', but the statement with the criteria LAST_NAME LIKE 'S%' will still touch too many rows.
Let us discuss how to split the workload into even chunks, provided that the table has a surrogate primary key. Here is the table structure and some sample data:
CREATE TABLE CUSTOMER( CUSTOMER_ID INT NOT NULL PRIMARY KEY, STATUS CHAR(20), LAST_NAME VARCHAR(30), OPEN_DATE DATE NOT NULL); INSERT INTO CUSTOMER VALUES (1, 'REGULAR CUSTOMER','LASTNAME1',DATE('01/01/2001')), (209, 'REGULAR CUSTOMER','LASTNAME2',DATE('01/01/2001')), (31, 'REGULAR CUSTOMER','LASTNAME3',DATE('01/01/2001')), (445, 'REGULAR CUSTOMER','LASTNAME4',DATE('01/01/2001')), (57, 'REGULAR CUSTOMER','LASTNAME5',DATE('01/01/2001')), (609, 'REGULAR CUSTOMER','LASTNAME6',DATE('01/01/2001'));
We shall modify these 6 rows in 2 chunks, 3 rows in each. First, let us use ROW_NUMBER() to provide consecutive numbers for the primary keys of the rows to be modified:
CREATE TABLE MATCHING_ROWS(NUM INT, CUSTOMER_ID INT); INSERT INTO MATCHING_ROWS SELECT ROW_NUMBER() OVER(ORDER BY CUSTOMER_ID) AS NUM, CUSTOMER_ID FROM CUSTOMER; CREATE INDEX MATCHING_ROWS_NUM ON MATCHING_ROWS(NUM,CUSTOMER_ID); SELECT * FROM MATCHING_ROWS; NUM CUSTOMER_ID -------------------- ----------- 1 1 2 31 3 57 4 209 5 445 6 609
Now it is very easy to update 3 rows at a time
UPDATE CUSTOMER SET STATUS='PREFERRED' WHERE CUSTOMER_ID IN (SELECT CUSTOMER_ID FROM MATCHING_ROWS WHERE NUM BETWEEN 1 AND 3);
We have seen how to split a large transaction into several smaller ones using very simple means. Next time we shall discuss an entirely different approach to the same problem.
About the Author
Alexander Kuznetsov has 15 years of experience in software design, development and database administration. Currently he improves performance of applications running in a multi-terabyte database environment. Alexander is an IBM Certified Advanced Technical Expert (DB2 Cluster) and an IBM Certified Solutions Expert (Database Administration and Application Development).
For More Information
- Feedback: E-mail the editor with your thoughts about this tip.
- More tips: Hundreds of free DB2 tips and scripts.
- Tip contest: Have a DB2 tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize -- submit your tip today!
- Ask the Experts: Our SQL, database design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.
- Forums: Ask your technical DB2 questions--or help out your peers by answering them--in our active forums.
- Best Web Links: DB2 tips, tutorials, and scripts from around the Web.