DBAsupport.com Forums - Powered by vBulletin
Results 1 to 6 of 6

Thread: How to eliminate outliers.

  1. #1
    Join Date
    Aug 2002
    Location
    Atlanta
    Posts
    1,187

    How to eliminate outliers.

    I'd like to remove the top and bottom N values. I think I'm close.

    Here's what I'm working with where I fudge the rownum values to make the query work
    as I know there are 220 rows. So I want all values without the top and bottom N.

    How can I at run time do something like

    rownum <= count(*) -N

    For rnum it would be easy enough as I could just do....

    rnum >= N

    Code:
    select * from 
    	( 
       		select p.*, rownum rnum
               		from (
    				select PRCSNAME,datediff('mi',enddttm,begindttm )*-1 
    				from PROCESS_SCHEDULER_HISTORY where prcsname=...some job name 
    				order by datediff('mi',enddttm,begindttm )*-1 desc
    			     ) p
              	where rownum <= 215
    	) 
    where rnum >= 5
    I'm stmontgo and I approve of this message

  2. #2
    Join Date
    Nov 2000
    Location
    Pittsburgh, PA
    Posts
    4,166
    This might give you something to work with. You may need to add columns to the sub query so that you can have them in the final query. There are some functions that will give you average and standard distribution which will help you to better focus on removing the outliers.

    Code:
    select PRCSNAME, elapsed_days
    	from ( select PRCSNAME, datediff('mi',enddttm,begindttm )*-1 elapsed_days, rownum rn
               from PROCESS_SCHEDULER_HISTORY 
              where prcsname='...some job name' 
              order by datediff('mi',enddttm,begindttm )*-1 desc ) p
     where rn between (rn*.25) and (rn*.975)
     order by elapsed_days;

  3. #3
    Join Date
    Aug 2002
    Location
    Atlanta
    Posts
    1,187

    thanks

    it didn't work - now rows returned.

    I think there is some magical distintion between the rownum and rnum.

    I was able to get it work by creating a function - not terribly elegant but it's working

    Code:
    create or replace function count_job_occurances (p_prcsname PROCESS_SCHEDULER_HISTORY.PRCSNAME%type)
    RETURN NUMBER
    IS
    v_count_occurances NUMBER;
    BEGIN
    SELECT count(prcsname) INTO v_count_occurances FROM PROCESS_SCHEDULER_HISTORY
    where PRCSNAME = p_prcsname;
    RETURN v_count_occurances;
    END;
    /
    
    
    select * from
       (
                       select p.*, rownum rnum
                               from (
                               select PRCSNAME,datediff('mi',enddttm,begindttm )*-1
                               from PROCESS_SCHEDULER_HISTORY where prcsname='PER099'
                               order by datediff('mi',enddttm,begindttm )*-1 desc
                            ) p
               where rownum <  count_job_occurances('PER099')-5
       )
    where rnum >= 5
    /
    I'm stmontgo and I approve of this message

  4. #4
    Join Date
    Nov 2000
    Location
    Pittsburgh, PA
    Posts
    4,166
    For the previous one try it without the outer where clause.
    You can also try this. Analytics can be useful, but hard to get right.
    Also I have not tried the sql to see if it compiles.

    Code:
    select PRCSNAME, elapsed_days
      from (select PRCSNAME, 
                   datediff('mi',enddttm,begindttm )*-1 elapsed_days, 
                   row_number() over
                 ( partition by datediff('mi',enddttm,begindttm )*-1
                      order by datediff('mi',enddttm,begindttm )*-1 ) rn 
              from PROCESS_SCHEDULER_HISTORY
                 ) start with rn = 1)
     WHERE rn between (rn*.25) and (rn*.975)
     order by elapsed_days;

  5. #5
    Join Date
    Aug 2002
    Location
    Atlanta
    Posts
    1,187
    Thanks - unfortunately that did not work.

    Forgive me for double dipping but I also posted on the metalink forum as well
    and I wanted to share what I ended up using.

    I really appreciate your help.

    Steve

    Code:
    select * from 
        ( 
               select p.*, rownum rnum 
                       from ( 
                    select PRCSNAME,datediff('mi',enddttm,begindttm )*-1 
                    from PROCESS_SCHEDULER_HISTORY where prcsname= 
                    order by datediff('mi',enddttm,begindttm )*-1 desc 
                     ) p 
                  where rownum <= (select count(*) - &&how_many from PROCESS_SCHEDULER_HISTORY where prcsname=)
        ) 
    where rnum >= &&how_many ;
    I'm stmontgo and I approve of this message

  6. #6
    Join Date
    Nov 2000
    Location
    Pittsburgh, PA
    Posts
    4,166
    Quote Originally Posted by stmontgo View Post
    Thanks - unfortunately that did not work.

    Forgive me for double dipping but I also posted on the metalink forum as well
    and I wanted to share what I ended up using.
    Fortunately double dipping in forums is not as bad as double dipping your chip in the dip.
    You seemed like you were close to figuring it out anyway.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width