DBAsupport.com Forums - Powered by vBulletin
Results 1 to 4 of 4

Thread: when is histogram appropriate

  1. #1
    Join Date
    Apr 2001
    Posts
    3

    Question

    when is histogram appropriate

  2. #2
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    IMHO, histograms are most useful in the following 2 situations:

    - Severely Skewed distributions
    If you have a lookup code that can be 1 of 5 values, an even distribution would have 1/5th, or 20% of the records containing each value. Severely skewed statistics would be more like:
    Value1 - 60%
    Value2 - 35%
    Value3 - 3%
    Value2 - 1%
    Value1 - 1%

    In such circumstances, it is almost impossible for the optimizer to choose an execution plan that will be optimal for all values. A Histogram will give it the information it needs to make the proper choices for each different value.

    - 'Extreme' Data Values
    This is simply an extrapolation of the first case.
    Say, for example, you have an indexed date field and 99.9% of the values are within the past ten years. Now what happens if there is a single date that has the year 1895? Such an entry will severely affect the optimizer's view of the data in that field. The ANALYZE process finds the lowest and highest values and stores these stats. The optimizer assumes that the distribution between these low and high values is 'flat'. That extreme date value will make the optimizer think that you have just as many 1899 entries as 1999 entries, which is certainly not the case. Again, the optimizer will make bad choices based on this incomplete data. In particular, it will affect ranged queries, which are quite common in date fields, hence my example. ie WHERE DateColumn BETWEEN '1/1/00' and '2/1/00'

    In both of these cases, creating a histogram will provide the optimizer with enough information to make 'good' decisions when building the execution plans for statements using these fields.

    Now, there are several problems with histograms.

    - They *do not* work with binds. The values *must* be hard-coded in the statement. This means that they also will not work with CURSOR_SHARED=FORCE.

    - Hard-coding these values will mean more statements in your shared pool and more parsing.

    - Histograms must be re-built after every ANALYZE, which wipes them out.

    - I hit a problem that I cannot find any corroboration on with exporting and re-importing stats with histograms via the DBMS_STATS package.

    - Histograms go out the window with distributed queries.

    However, given all that, they were still a life-saver for performance on my current project where many columns were already using hard-coded values, several columns have 'extreme' date values and *every* bloody column has severely-skewed distributions.

    HTH,

    - Chris

  3. #3
    Join Date
    Nov 2000
    Posts
    212
    Originally posted by chrisrlong
    IMHO, - They *do not* work with binds. The values *must* be hard-coded in the statement. This means that they also will not work with CURSOR_SHARED=FORCE.

    - Hard-coding these values will mean more statements in your shared pool and more parsing.
    If your system is OLTP or running 24x7 and/or has heavy load by many SQL with hardcoded values and it depends on Oracle version as well (some say 816 is quite ok), then the point above can be real performance killer (even cursor_shared=force will consume 25% of CPU more per parse).
    The ixora site and OAUGnet-DBA mailing list has A LOT OF problem descriptions regarding that and my experience is that while on development system you may not experience the problem on a real system it may mean system almost freezing.

    If your system suffers from that the only solutions folks found are NOT TO USE hardcoded values(this is a stringly recomended solution ) or run alter system flush shared pool on some regular basis, e.g. before starting work day.

    The real problem is, as noted by some authors, that Oracle does contradicts itself: binding of variables is ok for shared pool performance, but not for optimizer, hardcoding of values can be a killer for shared pool but is good for optimizer.

  4. #4
    Join Date
    Mar 2000
    Location
    Chennai.Tamilnadu.India.
    Posts
    658

    What is Histogram ?

    Hi, 25th April 2001 17:34 hrs chennai

    Histogram===>These are statistics collected by ANALYZE command which is stored in data dictionary views and used ny the CBO to tune the execution of SQL statement.

    CBO==>Cost Based Optimizer.

    For Uniformly distributed data the CBO accurately guesses the cost of executing a particulat statement.However when the data is not uniformly distributed the optimizer cannot accurately estimate the selectivity of a query .In this casesHISTOGRAMS are necessary to describe the distribution of a particular columns.

    When to USe Hostograms
    -----------------------------

    For columns frequently used in WHERE clauses, i.e indexed columns and moreover having highly skewed data distribution.

    When Not to USe Hostograms
    ----------------------------------
    On columns used in predicate with bind variables
    On uniformly distributed data columns
    on unique cols used in equality predicates

    How to generate Histograms Statistics
    --------------------------------------------
    Analyze Table compute statistics for columns
    size 10;

    The SIZE colum states the max number of buckets

    If the number of occuring distinct values in a col is relatively small then it is useful to set the number of buckets to be greater than the number of frequentlu occuring distinct values.

    D views
    =====
    DBA_HISTOGRAMS
    DBA_TAB_HISTOGRAMS
    DBA_PART_HISTOGRAMS

    Cheers

    Padmam
    Attitude:Attack every problem with enthusiasam ...as if your survival depends upon it

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •  


    Click Here to Expand Forum to Full Width