I have a table with 42 fields. No constraints are defined on any of them. Every hour fresh records are being inserted using SQL Loader. Average daily records being loaded is around 60 million.
Problem is quite a number of the records are duplicate and these are increasing the total size of the database. If I use a trigger for checking the duplicates during insert phase, it slows down the insert. Can somebody suggest the script, to clean the duplicate records so that the script can be run during lean hours and clean up the duplicate records(except the original record ofcourse)?
I am running 8.1.6 on Unix.
It could have been better to have a constraints, but no Problem.
Please have the index on the columns that you need to fnd the duplicates, once you are done ypu can drop that index. This way you will save a huge time.
create index idx_columnname on tablename (columnnames)
delete from tablename a
where rowid not in (select max(rowid) from tablename b
where a.duplicatedcolname = b.duplicatecolname
and same for the other columns where you don't want to have duplicated columns.
the spool off
Tablename : your Table name
Columnname : Your Columnname that needs to be searched for Duplicated values.
Export/Import is out of question due to the size of database.
I will follow what oracledba has suggested, that is creating index when necessary, deleting the duplicates and then dropping the index.