SQL query to weed out old duplicate entries
DBAsupport.com Forums - Powered by vBulletin
Results 1 to 5 of 5

Thread: SQL query to weed out old duplicate entries

  1. #1
    Join Date
    Jul 2003
    Posts
    3

    SQL query to weed out old duplicate entries

    I have a database of job numbers and associated dates.
    I want write an SQL query to select the duplicate job numbers and of those duplicates, select all but the newest of the jobs. The end result being a list of the old duplicate jobs, which can then safely be removed from the database.

  2. #2
    Join Date
    Jul 2002
    Location
    Lake Worth, FL
    Posts
    1,477

    Cool

    Try:

    select job, job_date from jobs j
    where job_date < (
    select MAX(job_date) from jobs m
    where m.job = j.job);
    "The person who says it cannot be done should not interrupt the person doing it." --Chinese Proverb

  3. #3
    Join Date
    Jul 2003
    Posts
    3
    Ok, that has helped get the major roadblock out of the way: How to do a "for each item in a, do blah".

    Thank you for that much; but I'm still not where I need to be. I still need to figure out how return all the jobs who's date's aren't max, since it seems that the MAX(dateField) returns the highest date VALUE (based on pur numerical comparison) not the latest date (ie, 9/9/02 is returned because it contains the the 'biggest numbers' rather than the latest date of 7/11/03)...

    The other annoying thing, is the whole thing chokes up on the data when I tried the statements as suggested above. It just slows to a crawl for over 15 mins trying to process it. When it finally finished, it still seemed to return some weird values (and the dates are not accurate due to the 'MAX' problem described above. The database has ~11,000 entries in it. I'm not sure if that's too much for it to handle, but I would have thought it would be fine. The whole time I've taken to write this reply, it's not even 5% done with a simple 'A-Z sort' on the results that it returned earlier. Argh.

    I'll have to work on this more Monday. It's almost quitting time. (So I appologise in advance for not responding further until then.)

  4. #4
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    It sounds like the max problem is one of your own making, if you are storing dates as strings or numbers. you will have to convert them to dates, using an appropriate format picture, and then apply the max function to it.

    If you are on a high enough version you can use analytic functions to find the old jobs.

    Code:
    select
       job,
       to_char(datefield,'DD/MM/YYYY')
    from
       (
       select
          job,
          to_date(datefield,'DD/MM/YYYY') datefield,
          max(to_date(datefield,'DD/MM/YYYY') over
              partition by (job) max_datefield
       from
          jobs
       )
    where
       datefield < max_datefield
    A composite index on job and datefield would be nice, but that date format is really going to hurt you.
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

  5. #5
    Join Date
    Jul 2003
    Posts
    3
    Ok, I just looked at it, and for some reason the date/time field had been changed. I was able to change the field to the proper format, so that works now.

    Now that the date/time is fixed, I ran a test on a small subset of the data, and the first bit of code DOES work...It just takes a LOT of time to run when there are all the records in the database. I'll have to just let it churn...

    Thanks for ALL your help!
    It's very much appreciated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width