If I have a file and need to find the duplicate records in the file (having two or more records in the file with the same value in specific fields), I use the following code to find them:
1. I want to see what duplicate data there is in FILEA, using a set of fields.
E.g. to check if there is more than one record with a value of 11 in FIELDA in FIELA:
a. If there is already a field in FILEA itself that can be updated with a duplicate flag (e.g. "D") then I would skip step 1.a.
b. If there is no field in FILEA that can be used to flag a record as a duplicate then I would do step 1.a. and create a field called "FLAGFIELD" using SQL which would create another interim file called FILEATEMP.
1. a. Changing the "Interactive SQL Session Services" create a temp file (FILEATEMP) with the following selection in SQL:
SELECT ' ' FLAGFIELD, FIELDA, FIELDB, FIELDC, FIELDD,
FROM FILEA.
The above select will create FILEATEMP with the following fields (FLAGFIELD will be 1 Char blank value) :
2. Create a temp SQL output file FILEB with the following statement:
SELECT FIELDA, COUNT(*) FROM FILEA GROUP BY FIELDA HAVING
COUNT(*) > 1
If FILEA has more than one record with a value of 11 in FIELDA then that record will be output to FIELB.
3. Now execute the following statement in SQL . . (Use FIELA if the file already has the field that needs to be updated or use FILEATEMP if an interim file was created using SQ...
To continue reading for free, register below or login
To read more you must become a member of Search400.com
');
// -->

L as in step 1.a. above)
UPDATE XYZLIB/FILEA A1 SET FLAGFIELD = 'D' WHERE exists (select * from FILEB B1 where A1.FIELDA = B1.FIELDA)
. . which will update FILEA (If a field exists in FILEA which can be updated OR FILEATEMP created in SQL in which an additional field was created.
4. This is the final output from the file where FLAGFIELD has a value of "D" for record #s 10 to 15 as the number 11 makes those records duplicates in the file:
The update in step 3 above will show the following message after the update is
complete:
5. Alternatively, I could also move the number of count e.g. in this case a value of 6 in the SQL001 field derived in FILEB during step 2 into the FLAGFIELD instead of using a value of "D" and then sort the destination file (FILEA OR FILEATEMP) in FLAGFIELD order to show the minimum to the maximum number of occurrences of a duplicate record.
*** End of document ***
==================================
MORE INFORMATION ON THIS TOPIC
==================================
The Best Web Links: tips, tutorials and more.
Ask your programming questions--or help out your peers by answering them--in our live discussion forums.
Ask the Experts yourself: Our application development gurus are waiting to answer your programming questions.