If I have a file and need to find the duplicate records in the file (having two or more records in the file with...
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
the same value in specific fields), I use the following code to find them:
1. I want to see what duplicate data there is in FILEA, using a set of fields. E.g. to check if there is more than one record with a value of 11 in FIELDA in FIELA:
a. If there is already a field in FILEA itself that can be updated with a duplicate flag (e.g. "D") then I would skip step 1.a.
b. If there is no field in FILEA that can be used to flag a record as a duplicate then I would do step 1.a. and create a field called "FLAGFIELD" using SQL which would create another interim file called FILEATEMP.
1. a. Changing the "Interactive SQL Session Services" create a temp file (FILEATEMP) with the following selection in SQL:
SELECT ' ' FLAGFIELD, FIELDA, FIELDB, FIELDC, FIELDD, FROM FILEA.
The above select will create FILEATEMP with the following fields (FLAGFIELD will be 1 Char blank value) :
DBU Mode . . . : Display Control . . . 1 2 3 4 5 Record# FLAGFIELD FIELDA FIELDB FIELDC FIELDD
2. Create a temp SQL output file FILEB with the following statement:
SELECT FIELDA, COUNT(*) FROM FILEA GROUP BY FIELDA HAVING COUNT(*) > 1
If FILEA has more than one record with a value of 11 in FIELDA then that record will be output to FIELB.
3. Now execute the following statement in SQL . . (Use FIELA if the file already has the field that needs to be updated or use FILEATEMP if an interim file was created using SQL as in step 1.a. above)
UPDATE XYZLIB/FILEA A1 SET FLAGFIELD = 'D' WHERE exists (select * from FILEB B1 where A1.FIELDA = B1.FIELDA)
. . which will update FILEA (If a field exists in FILEA which can be updated OR FILEATEMP created in SQL in which an additional field was created.
4. This is the final output from the file where FLAGFIELD has a value of "D" for record #s 10 to 15 as the number 11 makes those records duplicates in the file:
The update in step 3 above will show the following message after the update is complete:
" 6 rows updated in FILEA in XYZLIB " Control . . . 1 2 3 4 5 Record# FLAGFIELD FIELDA FIELDB FIELDC FIELD 1 2 123 ABC1 QAC 2 3 456 ABC2 QBC 3 4 600 ABC3 QCC 4 5 789 ABC4 QDC 5 6 159 ABC5 QEC 6 7 753 ABC6 QFC 7 8 645 ABC7 QGC 8 9 312 ABC8 QHC 9 10 978 ABC9 QIC 10 D 11 FDE AB10 QJC 11 D 11 IUF AB11 QKC 12 D 11 LE AB12 QLC 13 D 11 OIV AB13 QMC 14 D 11 P-90 AB14 QNC 15 D 11 45C AB14 QOC
5. Alternatively, I could also move the number of count e.g. in this case a value of 6 in the SQL001 field derived in FILEB during step 2 into the FLAGFIELD instead of using a value of "D" and then sort the destination file (FILEA OR FILEATEMP) in FLAGFIELD order to show the minimum to the maximum number of occurrences of a duplicate record.
*** End of document ***
MORE INFORMATION ON THIS TOPIC
The Best Web Links: tips, tutorials and more.
Ask your programming questions--or help out your peers by answering them--in our live discussion forums.
Ask the Experts yourself: Our application development gurus are waiting to answer your programming questions.