My first try at using one of the new analytic functions isn't very performant. :(
DBAsupport.com Forums - Powered by vBulletin
Results 1 to 3 of 3

Thread: My first try at using one of the new analytic functions isn't very performant. :(

  1. #1
    Join Date
    Apr 2001
    Posts
    118
    Somebody please straighten me out on this one. :)

    Let me describe the problem first.

    There are two tables in the schema, table a and table b. Table a holds information about the individual units of an item. The business logic is that there will only be zero to three rows in table a for each item. Table b holds related information about activities performed on each unit of each item in table a.


    create table TABLE_A(
        KEY NUMBER PRIMARY KEY,
        ID NUMBER,
        UNIT_ID NUMBER,
        COL_X VARCHAR2,
        COL_Y VARCHAR2,
        other columns );

    create table TABLE_B(
        ID NUMBER,
        UNIT_ID NUMBER,
        ACTIVITY VARCHAR2,
        other columns );


    Single column indexes exist on TABLE_A.ID and TABLE_B.ID.

    The rows in the two tables are related by their ID and UNIT_ID values. There can be more than one row in table b for each ID, UNIT_ID pair.

    I was asked to write a SELECT statement that generates a single string of the COL_X and COL_Y values of each unit of each item in table a, with the information for each unit being separated by a newline character. This is another example of the classic bounded table pivoting problem. So, I threw the following together assuming that the UNIT_ID values for each item ID were sequential numbers starting with 1.


        select a.ID,
            SUBSTR(
                MAX(
                    CASE WHEN a.UNIT_ID = 1 THEN CHR(10) || a.COL_X || ' ' || a.COL_Y
                         ELSE NULL
                    END )
                ||
                MAX(
                    CASE WHEN a.UNIT_ID = 2 THEN CHR(10) || a.COL_X || ' ' || a.COL_Y
                         ELSE NULL
                    END )
                ||
                MAX(
                    CASE WHEN a.UNIT_ID = 3 THEN CHR(10) || a.COL_X || ' ' || a.COL_Y
                         ELSE NULL
                    END), 2 )
                "LABEL"
        from TABLE_A a, TABLE_B b
        where NVL( b.ACTIVITY, 'I' ) = 'I' and a.ID = b.ID and a.UNIT_ID = b.UNIT_ID
        group by a.ID;


    This worked just fine and returned the results quickly, but then I learned that my assumption was wrong. While there will be only up to 3 UNIT_ID values for a specific ID, no assumptions can be made as to the actual values of those unit identifiers. I then decided to try one of the new (at least to me) analytic functions since this customer is running 8i (8.1.6 on WinNT).

    Here's what I wrote:


        select a.ID,
            SUBSTR(
                MAX(
                    CASE WHEN inner.UNITROWNUM = 1 THEN CHR(10) || a.COL_X || ' ' || a.COL_Y
                         ELSE NULL
                    END )
                ||
                MAX(
                    CASE WHEN inner.UNITROWNUM = 2 THEN CHR(10) || a.COL_X || ' ' || a.COL_Y
                         ELSE NULL
                    END )
                ||
                MAX(
                    CASE WHEN inner.UNITROWNUM = 3 THEN CHR(10) || a.COL_X || ' ' || a.COL_Y
                         ELSE NULL
                    END ), 2 )
                "LABEL"
        from(
            select TABLE_A.KEY,
                   ROW_NUMBER() OVER(
                       PARTITION BY TABLE_A.ID
                       ORDER BY TABLE_A.UNIT_ID) AS "UNITROWNUM"
            from TABLE_A, TABLE_B b
            where NVL( b.ACTIVITY, 'I' ) = 'I' and TABLE_A.ID = b.ID
                and TABLE_A.UNIT_ID = b.UNIT_ID ) inner,
            TABLE_A a
        where a.KEY = inner.KEY
        group by a.ID


    This new query works, but it is much slower than the original one. I looked at the execution plans of both statements and for the first, both of the indexes on the ID columns of table a and b are used. For the second select (using the analytic function) the index on the ID column of table a is used, but there is a full table scan on table b instead of using its index on the ID column.

    I hope all of the above made sense. :)

    Any suggestions as to what I can do to speed up the second query?

    Thanks,

    Heath

  2. #2
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    Part of the problem may be that you are joining A in twice. Try this:

    SELECT
    ---A.ID,
    ---SUBSTR(
    ------MAX(
    ---------CASE WHEN T.UNITROWNUM = 1 THEN CHR(10) || T.COL_X || ' ' || T.COL_Y
    ---------------ELSE NULL
    ---------END )
    ------||
    ------MAX(
    ---------CASE WHEN T.UNITROWNUM = 2 THEN CHR(10) || T.COL_X || ' ' || T.COL_Y
    ---------------ELSE NULL
    ---------END )
    ------||
    ------MAX(
    ---------CASE WHEN T.UNITROWNUM = 3 THEN CHR(10) || T.COL_X || ' ' || T.COL_Y
    ---------------ELSE NULL
    ---------END ), 2 )
    ------LABEL
    FROM
    ---(
    ------SELECT
    ---------A.KEY,
    ---------ROW_NUMBER() OVER(
    ---------------------------PARTITION BY
    ------------------------------A.ID
    ---------------------------ORDER BY
    ------------------------------A.UNIT_ID
    ---------------------------)
    ------------AS UNITROWNUM
    ------FROM
    ---------TABLE_A---A,
    ---------TABLE_B---B
    ------WHERE
    ---------NVL( B.ACTIVITY, 'I' )---=---'I'---------AND
    ---------A.ID---------------------=---B.ID---------AND
    ---------A.UNIT_ID---------------=---B.UNIT_ID
    ---) T
    GROUP BY
    ---T.ID


    If that doesn't work, hint the index.

    Let me know if that helps,

    - Chris

  3. #3
    Join Date
    Apr 2001
    Posts
    118
    I rewrote the select and removed the extra join on the table per your suggestion. The optimizer still did not use the index, so I forced them with hints. At that point, the performance improved somewhat but it still isn't all that great. It is still taking the statement a few seconds to execute on our test database. :(

    While waiting for some help with the analytic function based solution, I figured out a solution more similar to my original one, one that does not use analytic functions and does not assume specific values for the UNIT_ID values. The performance of this approach was good, so I started to wonder what the difference was between the two statements. I turned on the database timing and sql logging and ran a series of tests using both approaches. I defined two views whose definitions were the two solutions to the problem. VIEW_X was defined using the approach without the analytic function. VIEW_Y was defined using the approach with the analytic function.


    create or replace view VIEW_X as
    select x.ID,
    SUBSTR(
       MAX(    
       CASE WHEN x.UNIT_ID = T.MIN_UNITID THEN CHR(10) || x.COL_X || ' ' || x.COL_Y
            ELSE NULL
         END )
         ||
       MAX(
         CASE WHEN x.UNIT_ID = T.MAX_UNITID AND x.UNIT_ID != T.MIN_UNITID THEN CHR(10) || x.COL_X || ' ' || x.COL_Y
            ELSE NULL
         END )
         ||
       MAX(
         CASE WHEN x.UNIT_ID = (T.SUM_UNITID - T.MIN_UNITID - T.MAX_UNITID) THEN CHR(10) || x.COL_X || ' ' || x.COL_Y
            ELSE NULL
         END ), 2 )
         "LABEL"
    from(
       select a2.ID,
           MIN(a2.UNIT_ID) "MIN_UNITID",
           MAX(a2.UNIT_ID) "MAX_UNITID",
           SUM(a2.UNIT_ID) "SUM_UNITID"
       from VIEW_A x2, VIEW_B b
       where NVL( b.CD_ACTIVITY, 'I' ) = 'I' and a2.ID = b.ID and a2.UNIT_ID = b.UNIT_ID
       group by a2.ID ) T,
       VIEW_A a
    where a.ID = T.ID
    group by a.ID


    create or replace view VIEW_Y as
    select t.ID,
       SUBSTR(
         MAX(
           CASE WHEN T.UNITROWNUM = 1 THEN CHR(10) || T.COL_X || ' ' || T.COL_Y
              ELSE NULL
           END )
         ||
         MAX(
           CASE WHEN T.UNITROWNUM = 2 THEN CHR(10) || T.COL_X || ' ' || T.COL_Y
              ELSE NULL
           END )
         ||
         MAX(
           CASE WHEN T.UNITROWNUM = 3 THEN CHR(10) || T.COL_X || ' ' || T.COL_Y
              ELSE NULL
           END ), 2 ) "LABEL"
    from(
       select /*+ INDEX( a IDX_TABLEA_ID)
                 INDEX( b IDX_TABLEB_ID ) */
           a.ID,
           a.COL_X,
           a.COL_Y,
           ROW_NUMBER() OVER(
             PARTITION BY a.ID
             ORDER BY a.UNIT_ID ) AS "UNITROWNUM"
       from VIEW_A a,
          VIEW_B b
       where NVL( b.CD_ACTIVITY, 'I' ) = 'I' AND a.ID=b.ID AND
         a.UNIT_ID = b.UNIT_ID ) T
    group by T.ID


    For the test, I executed the same query 4 times against each view, but I massaged the data between each execution so that it tested the cases where there were 3, 2, 1 or 0 units for the item. I was surprised at the difference in the amount of work that is performed using the analytic function. Contrast the two below:

    Execution plan for statement without the analytic function:

    ********************************************************************************

        select *
        from VIEW_X
        where ID=3852112

    call    count      cpu    elapsed      disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        4      0.03      0.03          0          0          0          0
    Execute      4      0.00      0.00          0          0          0          0
    Fetch        4      0.00      0.00          0        120          0          3
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total      12      0.03      0.03          0        120          0          3

    Misses in library cache during parse: 1
    Optimizer goal: CHOOSE
    Parsing user id: 19  (MY_USER)

    Rows    Execution Plan
    -------  ---------------------------------------------------
          0  SELECT STATEMENT  GOAL: CHOOSE
          1  VIEW OF 'VIEW_X'
          1    SORT (GROUP BY)
          3    FILTER
          4      MERGE JOIN (CARTESIAN)
          2      VIEW
          2        SORT (GROUP BY)
          3        FILTER
          4          NESTED LOOPS
          4          TABLE ACCESS  GOAL: ANALYZED (BY INDEX ROWID)
                          OF 'TABLE_A'
          4            INDEX  GOAL: ANALYZED (RANGE SCAN) OF
                          'IDX_TABLEA_ID' (NON-UNIQUE)
          6          TABLE ACCESS  GOAL: ANALYZED (BY INDEX ROWID)
                          OF 'TABLE_B'
        12            INDEX  GOAL: ANALYZED (RANGE SCAN) OF
                          'IDX_TABLEB_ID' (NON-UNIQUE)
          3          NESTED LOOPS
          6          TABLE ACCESS  GOAL: ANALYZED (BY INDEX ROWID)
                          OF 'TABLE_B'
          6            INDEX  GOAL: ANALYZED (RANGE SCAN) OF
                          'IDX_TABLEB_KEY' (NON-UNIQUE)
          3          INDEX (UNIQUE SCAN) OF 'IDX_TABLEC'
                          (UNIQUE)
          0          INDEX (UNIQUE SCAN) OF 'IDX_TABLEC' (UNIQUE)

          3          NESTED LOOPS
          6          TABLE ACCESS  GOAL: ANALYZED (BY INDEX ROWID)
                          OF 'TABLE_A'
          6            INDEX  GOAL: ANALYZED (RANGE SCAN) OF
                          'IDX_TABLEA_KEY' (NON-UNIQUE)
          3          INDEX (UNIQUE SCAN) OF 'IDX_TABLEC'
                          (UNIQUE)
          0          INDEX (UNIQUE SCAN) OF 'IDX_TABLEC' (UNIQUE)

          4      SORT (JOIN)
          3        TABLE ACCESS  GOAL: ANALYZED (BY INDEX ROWID) OF
                      'TABLE_A'
          4        INDEX  GOAL: ANALYZED (RANGE SCAN) OF
                        'IDX_TABLEA_ID' (NON-UNIQUE)
          3      NESTED LOOPS
          6      TABLE ACCESS  GOAL: ANALYZED (BY INDEX ROWID) OF
                      'TABLE_A'
          6        INDEX  GOAL: ANALYZED (RANGE SCAN) OF
                      'IDX_TABLEA_KEY' (NON-UNIQUE)
          3      INDEX (UNIQUE SCAN) OF 'IDX_TABLEC' (UNIQUE)
          0      INDEX (UNIQUE SCAN) OF 'IDX_TABLEC' (UNIQUE)

    ********************************************************************************


    Execution plan for statement with the analytic function:

    ********************************************************************************

        select *
        from VIEW_Y
        where ID=3852112

    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------ -------- ---------- ---------- ---------- ---------- ----------
    Parse        4      0.24       0.75          0          0          0           0
    Execute      8      0.00       0.00          0          0          0           0
    Fetch        4      8.27      14.30        284     303636          8           3
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total       16      8.51      15.05        284     303636          8           3

    Misses in library cache during parse: 4
    Optimizer goal: CHOOSE
    Parsing user id: 19  (MY_USER)

    Rows     Execution Plan
    -------  ---------------------------------------------------
          0  SELECT STATEMENT   GOAL: CHOOSE
          1   VIEW OF 'VIEW_Y'
          1    SORT (GROUP BY)
          3     VIEW
       6818      WINDOW (SORT)
       6818       FILTER
       6820        NESTED LOOPS
       6820         TABLE ACCESS   GOAL: ANALYZED (BY INDEX ROWID) OF
                        'TABLE_A'
       6820          INDEX   GOAL: ANALYZED (FULL SCAN) OF
                         'IDX_TABLEA_ID' (NON-UNIQUE)
      13638         TABLE ACCESS   GOAL: ANALYZED (BY INDEX ROWID) OF
                        'TABLE_B'
      16516          INDEX   GOAL: ANALYZED (RANGE SCAN) OF
                         'IDX_TABLEB_ID' (NON-UNIQUE)
       6818        NESTED LOOPS
      13636         TABLE ACCESS   GOAL: ANALYZED (BY INDEX ROWID) OF
                        'TABLE_B'
      13636          INDEX   GOAL: ANALYZED (RANGE SCAN) OF
                         'IDX_TABLEB_KEY' (NON-UNIQUE)
       6818         INDEX (UNIQUE SCAN) OF 'IDX_TABLEC' (UNIQUE)
          1        INDEX (UNIQUE SCAN) OF 'IDX_TABLEC' (UNIQUE)
       6818        NESTED LOOPS
      13636         TABLE ACCESS   GOAL: ANALYZED (BY INDEX ROWID) OF
                        'TABLE_A'
      13636          INDEX   GOAL: ANALYZED (RANGE SCAN) OF
                         'IDX_TABLEA_KEY' (NON-UNIQUE)
       6818         INDEX (UNIQUE SCAN) OF 'IDX_TABLEC' (UNIQUE)
          0        INDEX (UNIQUE SCAN) OF 'IDX_TABLEC' (UNIQUE)

    ********************************************************************************


    FWIW, tables A and B in my original example aren't really tables, they are views. In the execution plans above, VIEW_A corresponds to the original TABLE_A and is defined across the actual TABLE_A and TABLE_C. VIEW_B corresponds to the original TABLE_B and is defined across the actual TABLE_B and TABLE_C.

    IDX_TABLEA_ID is a single column index on the ID column of the actual TABLE_A.
    IDX_TABLEB_ID is a single column index on the ID column of the actual TABLE_B.
    IDX_TABLEA_KEY is a single column index on the KEY column of the actual TABLE_A.
    IDX_TABLEB_KEY is a single column index on the KEY column of the actual TABLE_B.
    IDX_TABLEC is an index on the actual TABLE_C which is used to join it to either TABLE_A or TABLE_B in VIEW_A and VIEW_B respectively based upon their KEY column values.

    TABLE_A has 6820 rows in our test database.
    TABLE_B has 124,075 rows.

    Both tables and all of their indexes have current computed statistics.

    I have a solution that works that doesn't use the analytic function, but my curiosity has been piqued as to why so many more rows are processed using the analytic function. I would like to think that using them is an acceptable approach to this type of problem because the only reason that I could find a solution without using the analytic function was that the specific business case allowed me to only worry about a maximum of three units per item. My non-analytic function solution will not work if there could be more than three units, so I would like to have an answer for this type of problem when that assumption is no longer valid.

    Is this the nature of the analytic functions that they process so many rows, or is this caused by how I wrote the statement?

    Thanks,

    Heath


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width