Predicate pushing in views containing analytic functions
DBAsupport.com Forums - Powered by vBulletin
Results 1 to 6 of 6

Thread: Predicate pushing in views containing analytic functions

Hybrid View

  1. #1
    Join Date
    Apr 2001
    Posts
    118
    I'm hoping someone here can help me out with this problem. This is on Oracle 8.1.7 on a WindowsNT server.

    I have defined a view on a table that uses an analytic function to return a subset of rows from the table. I chose to use the analytic function because it is much more efficient than the approach that was used before, which consisted of joining the table back to itself and was found to be somewhat non-performant with certain queries against large tables. I have checked the logic of the new view and it does return the correct answers that the application requires.

    In my initial test, the analytic query approach was astoundingly better in both speed and in reducing logical/physical I/O. So I was happy. However, the view using the analytic function turned out to be much more inefficient in many cases than the old approach. :( I have been looking at this with some people and we have determined that the problem seems to be in how the CBO pushes the predicates, or doesn't push the predicates, when evaluating queries with this view. It seems that the CBO will only push a predicate based upon the columns in the PARTITION clause of the analytic function. When a query is written with a WHERE clause based upon the partitioning row, I can see that that predicate gets pushed "inside" the analytic function, the index on that column is used to select just the rows that we are interested in, the partitioning is applied to that small subset of rows, and the result is returned very efficiently.

    But when I query this view using a WHERE clause based upon a column that is not part of the PARTITION clause of the analytic function, the performance is horrible. We have determined that this prediciate is NOT pushed "inside" the analytic function, so the analytic function is processing the WHOLE TABLE and then applying the WHERE clause predicate. This is quite disappointing because it is actually much worse than what was done in the application beforehand.

    My questions are:

        1. Is this just how the CBO works with the analytic functions? Only predicates on the same column(s) as used in the partitioning clauses will be pushed "inside" of the analytic function to limit the number of rows processed?

        2. Or is my view construction just wrong and there is a better way to ask the same question so that the CBO will push the predicates as I hope?

    I'll post a reply containing some SQL for the view, test data and queries that demonstrate what I am seeing. This should help if I didn't explain things well in this post.

    Thanks,

    Heath


    [Edited by Heath on 04-16-2002 at 10:32 AM]

  2. #2
    Join Date
    Apr 2001
    Posts
    118

    Do this to see what I am seeing with my tests

    First, create a test table and populate it with some rows.

    Code:
    create table t( Group_id    VARCHAR2(32),
                    Group_Key   NUMBER,
                    col2        NUMBER );
    Populate it with some rows.

    Code:
    insert into t
        select object_name,
               ROW_NUMBER() OVER( partition by object_name
                                  order by object_id ),
               object_id
        from all_objects
        where rownum<=25000;
    
    insert into t
        select Group_id || '1',
               Group_Key,
               col2 * -1
        from t;
    
    commit;
    Create indexes on two of the columns and compute statistics.
    The GROUP_ID column will be used in the PARTITION clause of the analytic function.
    The COL2 column will be used to show the inefficient results.
    Code:
    create index IDX_T_GRPID on T(GROUP_ID);
    
    create index IDX_T_COL2 on T(COL2);
    
    analyze table t compute statistics;
    Create a package that will be used to keep track of the number of rows processed by the analytic function. It will also be used to reject a subset of the rows processed by the analytic function.

    Code:
    create or replace package test_anafunc is
        function test_func( x NUMBER ) return number;
    
        function get_Count return number;
        procedure reset_count;
    end;
    /
    
    create or replace package body test_anafunc is
    
        p_Count     NUMBER(38);
        
        function test_func( x NUMBER ) return number is
        begin
            p_Count := p_Count + 1;
            
            if( x > 1 ) then
                return 1;
            end if;
    
            return 0;
        end test_func;
    
        function get_Count return number is
        begin
            return p_Count;
        end get_Count;
        
        procedure reset_count is
        begin
            p_Count := 0;
        end reset_Count;
    
    begin
        p_Count := 0;
    end;
    /
    Define the view.

    Code:
    create or replace view test_view as
    select a.group_id, a.group_key, a.col2
    from (select t.group_id, t.group_key, t.col2,
                 ROW_NUMBER() OVER( PARTITION BY GROUP_ID
                                    ORDER BY GROUP_KEY ASC NULLS LAST ) RNUM
          from t
          where test_anafunc.test_func(GROUP_KEY) = 1 ) a
    where a.RNUM = 1;
    Here are the actual tests.

    Test 1: Query using column that is in the PARTITION clause.

    You should replace the string 'TRACE' with the name of some object in your database.

    The result will be to only see one row from table T for that name, the row having GROUP_KEY=2.

    The execution plan will show the use of the index on the GROUP_ID column.

    The TEST_ANAFUNC.GET_COUNT calls will show the number of rows processed by the analytic function. For my test case, this was 7 because I had seven rows in table T having GROUP_ID='TRACE'.

    This seems to show that the predicate was pushed the way I wanted.
    Code:
    heath@MY_DB>set timing on
    heath@MY_DB>set autotrace on explain statistics
    heath@MY_DB>select * from test_View where group_id = 'TRACE'
      2  /
    
    GROUP_ID                          GROUP_KEY       COL2
    -------------------------------- ---------- ----------
    TRACE                                     2       7942
    
    Elapsed: 00:00:00.10
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1 Bytes=57)
       1    0   VIEW (Cost=5 Card=1 Bytes=57)
       2    1     WINDOW (SORT)
       3    2       TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=3 Card=1 Bytes=27)
       4    3         INDEX (RANGE SCAN) OF 'IDX_T_GRPID' (NON-UNIQUE) (Cost=1 Card=1)
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
              0  db block gets
              3  consistent gets
              0  physical reads
              0  redo size
            486  bytes sent via SQL*Net to client
            425  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              1  sorts (memory)
              0  sorts (disk)
              1  rows processed
    
    heath@MY_DB>set autotrace off
    heath@MY_DB>select test_anafunc.get_Count from sys.dual
      2  /
    
     GET_COUNT
    ----------
             7
    Test 2: Query using column that is NOT in the PARTITION clause.

    You should replace the number 10816 with the actual object id of some object in your database that was put into table T.

    The result will be to only see one row from table T for that name, the row having GROUP_KEY=2.

    The execution plan will show a FULL TABLE SCAN, not the use of the index on the COL2 column.

    The TEST_ANAFUNC.GET_COUNT calls will show the number of rows processed by the analytic function. For my test case, this was 49684, the total number of rows in table T.

    This seems to show that the predicate was not pushed the way I wanted.

    Code:
    heath@MY_DB>execute test_anafunc.reset_Count
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:00:00.10
    heath@MY_DB>/
    
     GET_COUNT
    ----------
             0
    
    Elapsed: 00:00:00.20
    heath@MY_DB>set autotrace on explain statistics
    heath@MY_DB>select * from test_view where col2 = 10816
      2  /
    
    GROUP_ID                          GROUP_KEY       COL2
    -------------------------------- ---------- ----------
    TABLE_X                                   2      10816
    
    Elapsed: 00:00:04.97
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=40 Card=497 Bytes=28329)
       1    0   VIEW (Cost=40 Card=497 Bytes=28329)
       2    1     WINDOW (SORT)
       3    2       TABLE ACCESS (FULL) OF 'T' (Cost=36 Card=497 Bytes=13419)
    
    Statistics
    ----------------------------------------------------------
              8  recursive calls
             11  db block gets
            236  consistent gets
            251  physical reads
              0  redo size
            505  bytes sent via SQL*Net to client
            425  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              0  sorts (memory)
              1  sorts (disk)
              1  rows processed
    
    heath@MY_DB>set autotrace off
    heath@MY_DB>select test_anafunc.get_Count from sys.dual
      2  /
    
     GET_COUNT
    ----------
         49684
    
    Elapsed: 00:00:00.20
    Contrast this to what I wanted to happen behind the scenes.

    Code:
    heath@MY_DB>execute test_anafunc.reset_Count
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:00:00.10
    heath@MY_DB>/
    
     GET_COUNT
    ----------
             0
    
    Elapsed: 00:00:00.20
    heath@MY_DB>set autotrace on explain statistics
    heath@MY_DB>select a.group_id, a.group_key, a.col2
      2  from (select t.group_id, t.group_key, t.col2,
      3               ROW_NUMBER() OVER( PARTITION BY GROUP_ID
      4                                  ORDER BY GROUP_KEY ASC NULLS LAST ) RNUM
      5        from t
      6        where test_anafunc.test_func(GROUP_KEY) = 1
      7            and col2 = 10816 ) a
      8  where a.RNUM = 1
      9  /
    
    GROUP_ID                          GROUP_KEY       COL2
    -------------------------------- ---------- ----------
    TABLE_X                                   2      10816
    
    Elapsed: 00:00:00.11
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=57)
       1    0   VIEW (Cost=4 Card=1 Bytes=57)
       2    1     WINDOW (SORT)
       3    2       TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=27)
       4    3         INDEX (RANGE SCAN) OF 'IDX_T_COL2' (NON-UNIQUE) (Cost=1 Card=1)
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
              0  db block gets
              3  consistent gets
              0  physical reads
              0  redo size
            505  bytes sent via SQL*Net to client
            425  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              1  sorts (memory)
              0  sorts (disk)
              1  rows processed
    
    heath@MY_DB>set autotrace off
    heath@MY_DB>select test_anafunc.get_Count from sys.dual
      2  /
    
     GET_COUNT
    ----------
             1
    
    Elapsed: 00:00:00.11

  3. #3
    Join Date
    Apr 2001
    Posts
    118
    Nobody has any thoughts or comments on this?

  4. #4
    Join Date
    Apr 2001
    Posts
    118
    If anyone cares, I managed to get an answer from Oracle support on this. Predicates will not be pushed if the view contains a window function (OVER keyword). If it does and the accessing query only accesses the columns in the PARTITION BY clause, then the predicate will be pushed into the view. If the accessing query references a column not in the PARTITION BY clause, then the predicate will not be pushed.

    Not the answer that I wanted to hear, but agrees with the behavior that I observed. Unfortunately, the would seem to limit the applicability of using the analytic functions. :(

  5. #5
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,136
    Thanks.
    You did a lot of research.
    Good job.
    Keep it up.
    Unfortunately, many DBAs do not have time to investigate what you have found.

  6. #6
    Join Date
    Apr 2001
    Posts
    118
    I actually was able to submit this question to Tom Kyte. If anyone is still interested, you can read Tom educating me at http://asktom.oracle.com/pls/ask/f?p...3469884600671,.

    Edited to fix the URL.

    [Edited by Heath on 04-19-2002 at 04:47 PM]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width