Synopsis.
Oracle 11g Release 2 (11gR2) provides excellent tools that help isolate and
identify potential root causes of poor I/O performance. This article – the sixth
in this series – illustrates the wealth of I/O performance tuning information
that Oracle 11gR2’s Automatic Workload Repository (AWR) reports provide, how to
use that information to determine if any I/O performance bottlenecks actually
exist, and how to form initial hypotheses to resolve those bottlenecks within
the database’s underlying input/output (I/O) subsystem.
The prior
article in this series continued the demonstration
of how to use Quest Software’s Benchmark Factory tool in concert with Oracle
Database 11gR2 to:
-
Generate repeatable and realistic database application workloads
-
Capture
appropriate metrics from the target Oracle database and operating
system to quantify I/O response times and
throughput
Now
that I’ve at last successfully generated both TPC-H and TPC-E workloads, it’s
time to turn my attention to how well my database’s I/O infrastructure is
actually performing … and that means it’s time to pore over the resulting
performance metrics that Oracle 11gR2
is quite excellent at capturing. In this article I’ll therefore concentrate on:
-
Evaluating AWR reports
and OS statistics for possible indication of any I/O performance
bottlenecks
-
Formulating
initial hypotheses on how well my database’s storage infrastructure
is responding to different application workloads
I/O Performance Analysis: First Steps
One
of the features of Quest’s
Benchmark Factory tool that I really appreciate is the ability to view in
real time how the simulated application workload being generated is affecting
components of my database system’s performance. Figure 6.1.1 below shows an example of the TPC-E 50-User
application workload generation at just about the halfway mark of its execution
cycle. As you can see, several “spikes” have occurred as the workload
progressed, most notably in the average execution time of the workload and the
average throughput in bytes per second.
Figure 6.1.1. Benchmark Factory: Displaying RealTime Statistics
While
this is certainly interesting information, it really tells me very little about
how well the database system is really handling the generated workload’s
demand. To find that out, I’ll turn my attention to what I’ve found is the most
valuable avenue for attacking just about any Oracle database performance issue,
whether perceived or real: the Automatic Workload Repository (AWR)
reports for the time period(s) when sub-optimal or questionable database response
times have been reported.
(Yes,
I know: AWR is a separately licensed feature of the Oracle 11gR2 Diagnostic Pack, but in my experience,
it has become the de facto database
performance tuning tool for most Oracle database shops. While Statspack reports
will certainly provide sufficient
details for I/O performance tuning, AWR reports offer significantly more
detail, are easier to navigate, and are more efficient to execute because they access
performance metric data that’s already been captured within AWR snapshots …
and those snapshots were gathered in real time using direct memory access.)
I’ve
captured the complete contents of the AWR reports for both the TPC-E 50-user workload as well as for the TPC-H 15-user workload, but for now I’ll
focus on separate sections of these reports to demonstrate what specific
information I normally investigate to determine if there are any real I/O
performance-related issues.
Comparing AWR Reports from Different Periods.
It’s not unusual for Oracle DBAs to support competing workloads at different
times of the month, week, and even day – especially for “hybrid” databases that
may have started out as OLTP platforms, but now encapsulate a DSS or OLAP
reporting component. But trying to identify the I/O performance “footprints”
that represent application workloads over different timeframes can be downright
tedious! I’ve found that many Oracle DBAs aren’t aware of one of my favorite AWR
features: its ability to highlight dramatic differences between two different application
workloads by directly comparing their corresponding performance metrics via its
Compare Periods
feature.
To
illustrate, I’ve created an AWR
Compare Period Report that shows the difference between the TPC-E and TPC-H
workloads. What I really like about this feature is that it immediately
highlights the major differences between the application workloads generated
during these two timeframes. For example, note the significant difference
between:
-
The number of physical
reads in the TPC-H workload versus the number of physical writes in the TPC-E workload
-
The number of redo
transactions created during TPC-E versus TPC-H workload execution
-
The amount of logical
reads - almost three times more! - performed from the database’s buffer cache in the TPC-E workload versus
the TPC-H workload
I/O Performance Analysis: Top Five Wait Events
The
first thing I’ll focus on is whether or not there are any I/O related wait events
that need investigation, and I’ll do that by reviewing the Top Five Wait Events section in each of
the AWR reports. I’ve reproduced them for the TPC-E and TPC-H application
workloads in Figure 6.2.1 and Figure 6.2.2 below. (For an excellent
summary of what each of these wait events actually portend, I recommend Section
10.3 of the Oracle Database 11gR2
Performance Tuning Guide.)
|
Event
|
Waits
|
Time(s)
|
Avg wait (ms)
|
% DB time
|
Wait Class
|
|
direct path read
|
248,069
|
46,559
|
188
|
83.28
|
User I/O
|
|
db file sequential read
|
16,846
|
4,249
|
252
|
7.60
|
User I/O
|
|
db file parallel read
|
481
|
1,436
|
2985
|
2.57
|
User I/O
|
|
DB CPU
|
|
1,123
|
|
2.01
|
|
|
enq: KO - fast object checkpoint
|
666
|
900
|
1351
|
1.61
|
Application
|
Figure 6.2.1. TPC-E Workload: Top Five Wait Events
|
Event
|
Waits
|
Time(s)
|
Avg wait (ms)
|
% DB time
|
Wait Class
|
|
direct path write temp
|
86,267
|
17,790
|
206
|
55.13
|
User I/O
|
|
direct path read
|
139,711
|
6,875
|
49
|
21.30
|
User I/O
|
|
direct path read temp
|
1,530,266
|
6,208
|
4
|
19.24
|
User I/O
|
|
DB CPU
|
|
1,367
|
|
4.24
|
|
|
log file sync
|
60
|
4
|
72
|
0.01
|
Commit
|
Figure 6.2.2. TPC-H Workload: Top Five Wait Events
From
these two report sections, it’s pretty obvious that my database is performing
significant amounts of I/O during each of these application workload execution
timeframes…and that’s not necessarily a bad thing. After all, if my
database is spending most of its time executing SQL statements and either
reading data from or writing data to my database’s datafiles, then it’s doing
work rather than waiting for, say, a latch or mutex to be released. What’s
important at this stage of the analysis is to realize that there may indeed be
opportunities for tuning the I/O subsystem or even the database instance itself
to take better advantage of what’s clearly already a pretty well-tuned set of
SQL statements.
I/O Performance Analysis: I/O Profiles
I/O Profile Section. Starting in Oracle 11gR1, I/O-related performance metrics and
statistics are also available, both from within Enterprise Manager for
real-time I/O performance “firefighting” and within AWR snapshots. Oracle 11g
breaks out this information across several dimensions:
-
I/O By Function.
This table shows how read and write I/O activity is distributed across the
various foreground and background processes for the database instance. I’ve
reproduced this report section for the TPC-E application workload in Figure 6.3.1, which makes it immediately
obvious that the majority of the I/O activity is the direct physical loading (via
physical reads) of data blocks into the database buffer cache.
-
I/O By File Type.
As the sample from the TPC-E workload in Figure
6.3.2 shows, this table offers a completely different perspective: which
type of file is absorbing the heaviest read or write workload. For the TPC-E
application workload, I/O activity is obviously concentrated upon reading from
the database’s data files. Also interesting is that small (random) reads
outnumber large (sequential) reads by a factor of almost 150%. – again, not a complete
surprise for an intense OLTP workload.
-
Combined I/O Within
Function and File Type. This table combines the two dimensions for a
closer look at which background processes are most heavily performing I/O
against which type(s) of files.
|
Function Name
|
Reads: Data
|
Reqs per sec
|
Data per sec
|
Writes: Data
|
Reqs per sec
|
Data per sec
|
Waits: Count
|
Avg Tm(ms)
|
|
Direct Reads
|
240.3G
|
183.84
|
182.304
|
0M
|
0.00
|
0M
|
248.1K
|
187.14
|
|
Buffer Cache Reads
|
295M
|
17.27
|
.218603
|
0M
|
0.00
|
0M
|
19.5K
|
379.93
|
|
Others
|
55M
|
2.33
|
.040756
|
91M
|
1.47
|
.067433
|
4384
|
8.60
|
|
DBWR
|
0M
|
0.00
|
0M
|
35M
|
1.87
|
.025936
|
837
|
346.45
|
|
LGWR
|
0M
|
0.00
|
0M
|
20M
|
3.23
|
.014820
|
0
|
|
|
Direct Writes
|
0M
|
0.00
|
0M
|
1M
|
0.01
|
.000741
|
0
|
|
|
Streams AQ
|
0M
|
0.01
|
0M
|
0M
|
0.00
|
0M
|
10
|
69.70
|
|
TOTAL:
|
240.6G
|
203.45
|
182.564
|
147M
|
6.58
|
.108931
|
272.8K
|
198.52
|
Figure 6.3.1. TPC-E Workload: I/O Profile Within Function
|
Filetype Name
|
Reads: Data
|
Reqs per sec
|
Data per sec
|
Writes: Data
|
Reqs per sec
|
Data per sec
|
Small Read
|
Large Read
|
|
Data File
|
240.5G
|
201.17
|
182.528
|
38M
|
1.89
|
.028159
|
308.56
|
187.42
|
|
Control File
|
49M
|
2.22
|
.036310
|
22M
|
1.01
|
.016302
|
3.68
|
37.67
|
|
Flashback Log
|
3M
|
0.01
|
.002223
|
46M
|
0.37
|
.034087
|
15.88
|
21.00
|
|
Other
|
3M
|
0.04
|
.002223
|
21M
|
0.08
|
.015561
|
8.00
|
8.86
|
|
Log File
|
0M
|
0.00
|
0M
|
20M
|
3.24
|
.014820
|
|
|
|
Temp File
|
0M
|
0.00
|
0M
|
0M
|
0.00
|
0M
|
116.50
|
|
|
TOTAL:
|
240.6G
|
203.44
|
182.568
|
147M
|
6.59
|
.108931
|
273.77
|
187.40
|
Figure 6.3.2. TPC-E Workload: I/O Profile Within File Type
|
Function/File Name
|
Reads: Data
|
Reqs per sec
|
Data per sec
|
Writes: Data
|
Reqs per sec
|
Data per sec
|
Waits: Count
|
Avg Tm(ms)
|
|
Direct Reads
|
240.3G
|
183.84
|
182.304
|
0M
|
0.00
|
0M
|
0
|
|
|
Direct Reads (Data File)
|
240.3G
|
183.84
|
182.304
|
0M
|
0.00
|
0M
|
0
|
|
|
Buffer Cache Reads
|
294M
|
17.27
|
.217862
|
0M
|
0.00
|
0M
|
18.5K
|
244.49
|
|
Buffer Cache Reads (Data File)
|
294M
|
17.27
|
.217862
|
0M
|
0.00
|
0M
|
18.5K
|
244.49
|
|
Others
|
52M
|
2.29
|
.038533
|
70M
|
1.39
|
.051872
|
3601
|
7.41
|
|
Others (Control File)
|
49M
|
2.22
|
.036310
|
21M
|
1.01
|
.015561
|
2997
|
3.74
|
|
Others (Flashback Log)
|
3M
|
0.01
|
.002223
|
47M
|
0.37
|
.034828
|
523
|
27.71
|
|
Others (Data File)
|
0M
|
0.06
|
0M
|
2M
|
0.01
|
.001482
|
81
|
12.04
|
|
DBWR
|
0M
|
0.00
|
0M
|
35M
|
1.87
|
.025936
|
0
|
|
|
DBWR (Data File)
|
0M
|
0.00
|
0M
|
35M
|
1.87
|
.025936
|
0
|
|
|
LGWR
|
0M
|
0.00
|
0M
|
20M
|
3.24
|
.014820
|
0
|
|
|
LGWR (Log File)
|
0M
|
0.00
|
0M
|
20M
|
3.24
|
.014820
|
0
|
|
|
Direct Writes
|
0M
|
0.00
|
0M
|
1M
|
0.01
|
.000741
|
0
|
|
|
Direct Writes (Data File)
|
0M
|
0.00
|
0M
|
1M
|
0.01
|
.000741
|
0
|
|
|
Streams AQ
|
0M
|
0.01
|
0M
|
0M
|
0.00
|
0M
|
10
|
69.70
|
|
Streams AQ (Data File)
|
0M
|
0.01
|
0M
|
0M
|
0.00
|
0M
|
10
|
69.70
|
|
TOTAL:
|
240.6G
|
203.41
|
182.561
|
126M
|
6.51
|
.093369
|
22.1K
|
205.84
|
Figure 6.3.3. TPC-E Workload: I/O Profile Within Function and File Type
I’ve
provided a
link to the I/O Profile Section for the TPC-H workload’s corresponding AWR
report.
I/O Performance Analysis: Tablespace and Datafile I/O Profiles
The
final two sections of the AWR report that are most helpful for isolating I/O
bottlenecks describe the I/O patterns for the database’s tablespaces and each
tablespace’s corresponding datafiles. For the TPC-E workload, I’ve captured
just the first five tablespaces that are the “hottest” in terms of combined
read/write activity. This report section makes it simple to see that the TPCE_TRADE_DATA
tablespace is getting pummeled the hardest. This makes sense since that
tablespace contains the data segments for table TPCE.E_TRADE_DATA, and that table
is most heavily accessed during the TPC-E workload:
|
Tablespace
|
Reads
|
Av Reads/s
|
Av Rd(ms)
|
Av Blks/Rd
|
Writes
|
Av Writes/s
|
Buffer Waits
|
Av Buf Wt(ms)
|
|
TPCE_TRADE_DATA
|
248,884
|
184
|
3.24
|
126.11
|
994
|
1
|
244
|
330.12
|
|
TPCE_TRADE_HISTORY_IDX
|
2,581
|
2
|
159.05
|
1.25
|
11
|
0
|
95
|
320.21
|
|
TPCE_SETTLEMENT_DATA
|
2,313
|
2
|
164.38
|
1.29
|
265
|
0
|
25
|
244.40
|
|
TPCE_DAILY_MARKET_DATA
|
2,417
|
2
|
57.19
|
35.45
|
8
|
0
|
1,037
|
126.75
|
|
TPCE_TRADE_HISTORY_DATA
|
2,368
|
2
|
185.03
|
1.25
|
29
|
0
|
14
|
165.71
|
Figure 6.4.1. TPC-E Workload: Five “Hottest” Tablespaces
The
picture is quite different, of course, for the corresponding tablespaces for
the TPCH schema objects. Here, just two of the
several tablespaces were accessed predominantly during that workload’s
execution, and most prominent was the temporary tablespace named TPCH_BIGTEMP. This
is not unexpected, since the TPC-H workload is meant to simulate a DSS
environment, and that means there’s going to be a fair amount of sorting and
aggregation that takes place outside the database’s program global area (PGA):
|
Tablespace
|
Reads
|
Av Reads/s
|
Av Rd(ms)
|
Av Blks/Rd
|
Writes
|
Av Writes/s
|
Buffer Waits
|
Av Buf Wt(ms)
|
|
TPCH_BIGTEMP
|
1,530,257
|
668
|
0.00
|
6.89
|
86,242
|
38
|
0
|
0.00
|
|
TPCH_REF_DATA
|
139,842
|
61
|
0.03
|
123.51
|
0
|
0
|
47
|
50.43
|
|
SYSAUX
|
140
|
0
|
0.00
|
1.86
|
492
|
0
|
0
|
0.00
|
|
UNDOTBS1
|
71
|
0
|
0.00
|
1.00
|
364
|
0
|
0
|
0.00
|
|
SYSTEM
|
0
|
0
|
0.00
|
0.00
|
37
|
0
|
0
|
0.00
|
Figure 6.4.2. TPC-H Workload: Five “Hottest” Tablespaces
Datafile I/O Metrics. Since this section of
the AWR report tends to be among the lengthiest, I’ve provided links to the
extremely granular datafile I/O information for the TPC-E workload and the TPC-H workload, respectively. I’ll use the
information within these report sections in the next article in this series to
probe which datafiles and temporary files might benefit most from faster I/O
devices or removal of other potential I/O bottlenecks.
I/O Performance: Hypotheses For Improvement
Now
that I’ve gathered these metrics, I can begin to make some intelligent
decisions about where to concentrate my I/O performance tuning efforts:
-
Is there any way I can improve the ratio between the number of physical reads to logical reads?
For example, is it possible to expand the amount of server memory that’s
available for the SGA so that a buffer could be retained for a longer period of
time?
-
Could any of the datafiles that correspond to the
tablespaces, which the TPC-E and TPC-H workloads access most heavily benefit from I/O devices with faster I/O capabilities?
-
Finally, is there anything I can do to eliminate
I/O bottlenecks that result from the OS file
system itself?
Next Steps
I’ll
begin to provide answers to the questions I’ve just proposed in the next
article in this series, which will concentrate on how to:
-
Improve I/O response time via intra-database solutions
-
Improve I/O response time through enhancements to the I/O subsystem
References and Additional Reading
Before
you proceed to experiment with any of these new features, I strongly suggest that
you first look over the corresponding detailed Oracle documentation before
trying them out for the first time. I’ve drawn upon the following Oracle
Database 11g Release 2 documents
for this article’s technical details:
E10881-02 Oracle Database 11gR2 New Features
E10595-05 Oracle Database 11gR2 Administrator’s Guide
E10713-03 Oracle Database 11gR2 Concepts
E10820-02 Oracle Database 11gR2 Reference
E10500-02 Oracle Database 11gR2 Storage Administrator’s Guide
Back to DBAsupport.com