I have a file (900000 records 1G) with multiple records with the same key (not very useful but result of multiple
copy). I would like to remove these duplicate records by a simple way (SQL, ...)
I already tried to create a file with unique key, then copy the data, but Its very long (probably because index are computed for each record).I've seen some SQL advice, but the last step (delete command on a view on the file) doesn't work on my V4R5.
DELETE FROM tmp/cq0942xxxx WHERE DUPRRN in (SELECT MRRN FROM tmp/cq0942xxx)
It really depends on the columns involved in your definition of "duplicate data". This query assumes that the column with the unique identifier is fine, but two other columns have duplicate data.DELETE FROM mytab WHERE EXISTS (SELECT idcol FROM mytab innertab WHERE innertab.Col1 = mytab.FirstName AND innertab.Col2 = mytab.LastName AND innertab.idcol < mytab.idcol )
Just about every method is going to be fairly slow when a large number of rows have to be scanned & processed.
MORE INFORMATION ON THIS TOPIC
Search400.com's targeted search engine: Get relevant information on DB2/400.
The Best Web Links: tips, tutorials and more.
Check out this online event, Getting the Most out of SQL & DB2 UDB for the iSeries.
Dig deeper on Oracle on iSeries
Related Q&A from Kent Milligan
To solve the SQL error -321 on IBM i6.1, use the new values statement to overcome the error. If you are using an older release, declare a cursor ...continue reading
When working with DB2 files with columns that have both short and long names, there is no option choose which column names are returned via ODBC ...continue reading
When developing tables in a parent-child relationship, use a primary key and a foreign key along with a unique ID to make your database easier to ...continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.