Synopsis. Oracle 10g Release 2 (10gR2) has
improved significantly the methodology for tracking performance metrics within
the database. This article – the last in this series – demonstrates how to implement
adaptive thresholds and use their metrics to effectively detect significant
performance threshold violations while tuning out the "noise" from false
positives.
In the prior
article in this series, I discussed how Oracle 10gR2's new adaptive
thresholds feature can improve the detection of threshold violations for
about 15 critical metrics and how they can filter out false positives to
provide more accurate warning mechanisms. In this article, I will spend
a lot less time on theory. Instead, I will concentrate on how simple it is to
set up adaptive thresholds with some actual examples of how to create metric
baselines using both static metrics and rolling time periods. I will also
demonstrate how to set up adaptive thresholds using metric baselines to elevate
your Oracle 10gR2 database's warning and critical thresholds to new heights of
sensitivity and meaningfulness.
Enabling Metric Baselines
Before I create any metric baselines, however, I need to
activate the metric baseline feature from within Oracle 10g Enterprise Manager
Database Control. When I select the Metric Baselines breadcrumb for my test
database, the Enterprise Manager screen shown in Figure
2.1 is displayed when no metric baselines have been enabled yet. After
I click on the Enable Metric Baselines button, Enterprise Manager
requests a final confirmation; then, when I click on the OK
button, metric baselines are automatically activated.
Creating a Metric Baseline
My next step is to create a metric baseline. Oracle
10gR2 allows me to do this using either statistics gathered from a rolling
time period immediately prior to today, or from a static but specific
range of dates. Figure
2.2 shows the Enterprise Manager screen that determines which method to
use. Note that I can use a time frame as short as the past seven days for the rolling
time period – the minimum for the creation of any metric baseline – up to
the maximum of a rolling 91-day time period.
On the other hand, to create a static metric baseline,
I instead specify the name of the new static baseline as well as a range of
dates. This date range must correspond to a minimum of seven contiguous days.
The time period chosen should be representative of the type of adaptive
thresholds that utilize these metrics. For example, if I know that my database
has been performing in a relatively stable fashion during a particular time
period, the corresponding statistics would be valuable metrics for comparison
against future performance.
Once I've selected a time frame, I can just click on the Compute
Statistics button, and Oracle 10gR2 will determine if there are
minimum sufficient data to facilitate creation of at least one adaptive
threshold measurement. Figure
2.3 shows Enterprise Manager's response if there are insufficient data.
Note that Oracle marks with a warning symbol any metrics that cannot be
determined; also, no data will be displayed when I click into any of the
"eyeglass" icons for each faulty metric.
Even in this case, all is not lost! I could also change the
variables that are used to calculate the time series for the metric static
baseline. For example, I could change the granularity of the Day Grouping
series from By Hour of Day to By Day of Week; I could also modify
the Week Grouping to reflect Weekends and Weekdays instead of By
Day of Week, or even eliminate a grouping scheme altogether. Once I've done
this, I would simply click on the Compute Statistics button, and Oracle
10gR2 will once again determine if there are minimum sufficient data to create
the static metric baseline.
On the other hand, if sufficient data do exist, then Oracle
will display a checkmark to indicate this. Figure
2.4 shows Enterprise Manager's response when there are sufficient data
to create the metric baseline. In this case, I have created a new static metric
baseline, MB1 using all available data for the time period between March
12, 2005 and March 21, 2005. I can simply click into each of the
"eyeglass" icons to see how Oracle 10gR2 will calculate the values for the
metric baseline. Clicking the OK button on this page completes the
creation of the metric baseline, establishing it for immediate use with
adaptive thresholds. (Remember from the previous article that Oracle 10gR2
permits the creation and retention of multiple static baselines for use
by adaptive thresholds.)
Creating Adaptive Thresholds from a Metric Baseline
Now that I've established a valid static metric baseline, it
is time to use that baseline to activate appropriate adaptive thresholds. As
shown in Figure
2.5, I will select the MB1 baseline from the Metric Baselines
screen by clicking the Set Adaptive Thresholds button.
Oracle then passes control to the Manage Adaptive
Thresholds page (Figure
2.6). From this screen, I can set specific adaptive thresholds for each
of the displayed metrics. In this example, I have selected just the Number of
Transactions (per second) adaptive threshold for editing; once selected, I can
then set up adaptive thresholds for the metric by clicking on the Edit
button on this page.
Figure
2.7 shows the two different sets of options available for setting
adaptive thresholds:
-
If I choose the Significance Level option, Oracle will use
percentiles to determine when an adaptive threshold has been breached. For this
example, I've set the warning and critical levels for Number of Transactions to
High and Very High. Oracle will then raise a warning alert when
the number of transactions measured reach 95% of the adaptive threshold value
as of the time specified; likewise, Oracle will raise a critical alert when 99%
is reached.
-
The other option, Percentage of Maximum, Oracle will
determine when an adaptive threshold has been breached based on the specified
percentages, using the 99th percentile value for the calculation.
For example, if I set the warning and critical threshold values to 125% and
150% respectively for the Number of Transactions metric, and the 99th
percentile value is 10 logins as of a particular point in time, then a warning
alert would be raised at 13 and 16 logins, respectively.
-
Note that in either case, I can also specify the total concurrent
number of threshold violations that must occur before an alert is raised.
If I click on the "eyeglass" icon for any of the Metric
Names displayed, Oracle 10gR2 will show me a graph (Figure
2.8) that demonstrates how adaptive thresholds will be applied for the
selected metric. Note that there are two distinct sets of adaptive thresholds:
one for normal daytime hours (7AM – 7PM) and one for normal nighttime
and early morning hours (7PM – 7AM). When I created the MB1 static metric
baseline, I selected the By Day and Night option for grouping the
baseline's statistics.
Using Adaptive Thresholds: It's the Deflection That Counts
Now that I have established at least one adaptive threshold,
I can monitor them via what I like to call a deflection graph. For each
of the three categories of baselined metrics (Performance Metrics, Workload
Volume Metrics, and Workload Type Metrics), I can choose and
position three different metrics for display in the deflection graph. For this
example, I have selected the nine metrics as shown in Figure
2.9 on the Configure Normalization Metrics page. I accessed this
page by clicking on the Baseline Normalized Metrics breadcrumb from the
Metrics Baseline page (see Figure
2.4).
Since all of the metrics have been selected, Oracle 10gR2
provides a series of graphs that show the deflection from the established
adaptive thresholds for the metrics. Figure
2.10 shows a typical display of these deflections on the Baseline
Normalized Metrics page, which I have accessed from its corresponding
breadcrumb on the Metric Baselines page. The most interesting feature of
these graphs is that they have no X-axis; indeed, they all essentially use the
same axis because they really represent the common deflection as measured
across all adaptive thresholds.
Testing Adaptive Thresholds: Observing the Deflection
Note that the graph that represents the Number of
Transactions metric shows virtually no deflection from the established adaptive
thresholds, while many of the other metrics do indeed show some apparent
deflections. However, that's about to change dramatically in my next
demonstration.
I first executed the SQL code in Listing
2.1 against my Oracle database at 7:55 PM, passing it a value of 40
to simulate a rapid increase in the number of transactions per second (i.e.
over 40/sec). At 8:01 PM, I once again ran the same code, but passed in a value
of 12. In both cases, this caused an obvious and observable deflection
in the graph on the Baseline Normalized Metrics page for the Number of
Transactions (per second) graph as shown in the highlighted section of the
graph in Figure
2.11.
When I executed the same SQL code with a value of six (6)
at 8:05 PM, however, Oracle doesn't even detect the deflection because the
upper boundary of significance is below ten (10) transactions per second as
of that the time of day. This is the essence of adaptive thresholds: Oracle
will only signal that a warning or critical threshold has been
breached when the violation exceeds the threshold that's in force as of the
specific point in time.
One final point of interest on the page that displays
these deflection graphs: I can also "tune out" even more of the "white noise"
for the adaptive thresholds in force by selecting the High setting
instead of the Medium or Low setting for Noise Reduction
as shown in Figure
2.11. These settings tend to filter out even more of the
less-noticeable deflections, thus yielding a more obvious view of the true
"spikes" that have been detected so far.
Deactivating Adaptive Thresholds
If I want to deactivate all adaptive thresholds, I simply
choose the Disable Metric Baselines button on the Metric Baselines page.
As Figure
2.12 shows, Oracle 10gR2 prompts me for a final confirmation; once I
click on the OK button, all adaptive thresholds will be deactivated.
Caveats
Finally, a warning: Oracle strongly recommends
gaining experience with adaptive thresholds in a test environment before
implementing them in a production environment. Since these thresholds are more
complex to understand than regular arithmetic thresholds, and since it is
possible to tune out "noise" accidentally, it is not impossible to configure
them incorrectly the first time out. I also heartily recommend running a
representative workload against the test environment to insure that threshold
violations are being trapped as expected!
Conclusion
With the addition of Adaptive Thresholds to my arsenal of
DBA tools, Oracle 10gR2 lets me monitor several critical performance metrics
for my database based on their significance as of a specific point in time.
With a few simple mouse clicks in Oracle 10gR2 Enterprise Manager, I can easily
construct the necessary baseline metrics for either a specified historical time
frame or based on a rolling time period for the last several days, weeks, or
even months. Once the baselines are established, a few more mouse clicks set up
the adaptive thresholds that Oracle 10gR2 then monitors for violations. These
adaptive thresholds help to eliminate those irritating "false positives" that
plain arithmetic thresholds tend to detect erroneously, but they also tend to
detect "near-miss" violations that plain arithmetic thresholds most often
ignore.
References and Additional Reading
Even though I've hopefully provided enough technical
information in this article to encourage you to explore with these features, I
also strongly suggest that you first review the corresponding detailed Oracle
documentation before proceeding with any experiments. Actual implementation of
these features should commence only after a crystal-clear understanding exists.
Please note that I've drawn upon the following Oracle 10gR2 documentation for
the deeper technical details of this article:
B14214-01 Oracle Database
New Features Guide
B14231-01 Oracle Database
Administrator's Guide
B16241-01 Oracle Database
Enterprise Manager Concepts
Previous
Back to DBAsupport.com