Find duplicate records in a file

Find duplicates files in SQL statements.

If I have a file and need to find the duplicate records in the file (having two or more records in the file with

the same value in specific fields), I use the following code to find them:

1. I want to see what duplicate data there is in FILEA, using a set of fields. E.g. to check if there is more than one record with a value of 11 in FIELDA in FIELA:

a. If there is already a field in FILEA itself that can be updated with a duplicate flag (e.g. "D") then I would skip step 1.a.

b. If there is no field in FILEA that can be used to flag a record as a duplicate then I would do step 1.a. and create a field called "FLAGFIELD" using SQL which would create another interim file called FILEATEMP.

1. a. Changing the "Interactive SQL Session Services" create a temp file (FILEATEMP) with the following selection in SQL:

SELECT ' ' FLAGFIELD, FIELDA, FIELDB, FIELDC, FIELDD, FROM FILEA.

The above select will create FILEATEMP with the following fields (FLAGFIELD will be 1 Char blank value) :

 
 DBU                  Mode . . . : Display     
 Control . . .                                                   
              1           2           3              4       5
   Record#  FLAGFIELD     FIELDA  FIELDB  FIELDC    FIELDD         

2. Create a temp SQL output file FILEB with the following statement:

SELECT FIELDA, COUNT(*) FROM FILEA GROUP BY FIELDA HAVING COUNT(*) > 1

If FILEA has more than one record with a value of 11 in FIELDA then that record will be output to FIELB.

3. Now execute the following statement in SQL . . (Use FIELA if the file already has the field that needs to be updated or use FILEATEMP if an interim file was created using SQL as in step 1.a. above)

UPDATE XYZLIB/FILEA A1 SET FLAGFIELD = 'D' WHERE exists (select * from FILEB B1 where A1.FIELDA = B1.FIELDA)

. . which will update FILEA (If a field exists in FILEA which can be updated OR FILEATEMP created in SQL in which an additional field was created.

4. This is the final output from the file where FLAGFIELD has a value of "D" for record #s 10 to 15 as the number 11 makes those records duplicates in the file:

The update in step 3 above will show the following message after the update is complete:

 " 6 rows updated in FILEA in XYZLIB  "

 Control . . .                                   
                 1                2              3         4                5     
 Record#     FLAGFIELD         FIELDA FIELDB   FIELDC    FIELD
     
       1                        2            123     ABC1        QAC  
       2                        3            456     ABC2        QBC  
       3                        4            600     ABC3        QCC  
       4                        5            789     ABC4        QDC  
       5                        6            159     ABC5        QEC  
       6                        7            753     ABC6        QFC  
       7                        8            645     ABC7        QGC  
       8                        9            312     ABC8        QHC  
       9                       10            978     ABC9        QIC  
      10         D             11            FDE     AB10        QJC  
      11         D             11            IUF     AB11        QKC  
      12         D             11            LE      AB12        QLC  
      13         D             11            OIV     AB13        QMC  
      14         D             11            P-90    AB14        QNC  
      15         D             11            45C     AB14        QOC  

5. Alternatively, I could also move the number of count e.g. in this case a value of 6 in the SQL001 field derived in FILEB during step 2 into the FLAGFIELD instead of using a value of "D" and then sort the destination file (FILEA OR FILEATEMP) in FLAGFIELD order to show the minimum to the maximum number of occurrences of a duplicate record.

*** End of document ***

==================================
MORE INFORMATION ON THIS TOPIC
==================================

The Best Web Links: tips, tutorials and more.

Ask your programming questions--or help out your peers by answering them--in our live discussion forums.

Ask the Experts yourself: Our application development gurus are waiting to answer your programming questions.


This was first published in September 2004

Dig deeper on iSeries CL programming

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchEnterpriseLinux

SearchDataCenter

Close