removing duplicates and sql performance

**rkiss** · 09-04-2003, 11:59 AM

Hi,

I have the following code to identify and remove duplicates. The problem is that I have around 27,000 records and the deduping using this code takes over an hour. Can someone suggest a faster code for this type of deduping. Thank you

----------------------------------------------------------
SELECT lname, fname, company, address_1, source, COUNT(*)
FROM leads
group by lname, fname, company, source, address_1
having count(*) > 1;

DELETE FROM leads_raw a
WHERE rowid < (
select max(rowid)
from leads_raw b
where b.fname = a.fname
and b.lname = a.lname
and b.company = a.company
and b.address_1 = a.address_1);

------------------------------------------------------------

Regards

Roman

**slimdave** · 09-04-2003, 12:06 PM

See if this works for you ...

Code:

Delete from
   leads_raw
where
   rowid in
   (
   select ri
   from
      (
      select 
         rowid ri,
         min(rowid) over
            (partition by 
                fname,
                lname,
                company,
                address_1) min_ri
      from
         leads_raw
      )
   where
      ri != min_ri
   );

**rkiss** · 09-04-2003, 12:35 PM

much faster.

Thank you

**sysdba** · 09-04-2003, 12:45 PM

How faster?
Can you show us the time of your method and slimdave's?

Thread: removing duplicates and sql performance

Thread Tools

Display

Hybrid View

removing duplicates and sql performance

Posting Permissions