Tuning the performance of Oracle Database 11gR2 is still just as much of an art as it is a science. Jim Czuprynski focuses on the detection and analysis of performance bottlenecks within the database's underlying input/output (I/O) subsystem.
Synopsis.
Oracle 11g Release 2 (11gR2) continues the trend toward automated database
performance tuning tools that started in Oracle 10g. However, tuning the
performance of an Oracle 11gR2 database is still just as much of an art as it
is a science. This article - the first in this series - provides a primer that
focuses on one of the most challenging arenas of database performance tuning:
the detection and analysis of performance bottlenecks within the database's
underlying input/output (I/O) subsystem.
As a
fledgling Oracle DBA many years ago, one of the first lessons I learned about
database performance tuning was that there are only three potential types of
bottlenecks to investigate whenever a database application performance issue
rears its head:
Memory.
An Oracle database instance may simply not have enough memory in the right
places to execute all of the SQL statements issued against it, then the
database is said to be memory-bound. For example, while there may be
sufficient memory, it may have been seriously mis-allocated between the
database's Shared Global Area (SGA) and Program Global Area
(PGA). Another possibility is that the memory components within the SGA itself
have not been divided up properly, and the database buffer cache is
significantly undersized for the workload being generated, or the library cache
is oversized as compared to the number of SQL statements being issued.
As of Oracle
11g Release 1 (11gR1),
however, these memory bottlenecks are easily detected through the automated
diagnostic infrastructure that Automatic
Database Diagnostic Monitor (ADDM) provides. Oracle 11gR1's Automatic Memory Management
(AMM) feature can also be activated to detect and automatically adjust memory
allocations as the database instance's workload changes. Finally, if the Oracle
DBA decides against using AMM, Oracle 11g
Enterprise Manager
(EM) provides several elegant GUI-based memory tuning advisors that encompass
the instance's complete memory footprint (SGA + PGA) as well as individual
advisors for the SGA, PGA, Database Buffer Cache, and the Library Cache; there
are also SQL-based memory advisors for the Streams Pool, Large Pool, and Java
Pool.
CPU.
A database application's SQL statements may be poorly written, or they may be
unable to take advantage of efficient execution plans because of missing
optimizer statistics. SQL statements may also be efficiently written but may be
heavily dependent on complex arithmetic computations or complex set
manipulation. In these cases, a database instance may quickly become CPU
bound; in other words, there are simply not enough CPU cycles to complete
the necessary calculations required to return a query's result set or to
process DML statements.
Fortunately,
Oracle 11gR1 has
already made it simpler than ever to detect, analyze, and resolve these types
of "problem" SQL statements. ADDM, in concert with Automatic Workload Repository
(AWR) snapshots and reports, are excellent at identifying the top SQL
statements that are most degrading to the overall database instance's
performance within any specific time frame or workload, and Active Session History
(ASH) reports can analyze an application workload at an even deeper
granularity. Oracle 11g's
SQL Tuning Advisor
and SQL Access Advisor
respectively offer specific tuning recommendations for individual SQL
statements or even entire application workloads. Finally, Oracle
11g's SQL Performance Analyzer
(SPA) and SQL
Plan Management (SPM) features can be used to execute
performance regression analysis for application workloads and even "lock
in" a known, better-performing execution plan when upgrading from a prior
database version to 11g.
I/O.
In my experience, if a database application is experiencing extremely poor
performance, it's more likely to stem from the Oracle database instance being
either significantly memory-bound or CPU-bound, so it makes sense that Oracle
11g's performance tuning tools are aimed primarily at those classes of root
causes. However, it's also possible for a database instance to suffer from poor
I/O performance. A database thus becomes I/O bound when it cannot
respond quickly enough to requests to either read data from and/or write data
to the underlying I/O subsystem.
Unlike
CPU-bound or memory bound systems, however, the identification, analysis,
isolation, and eventual repair of the root causes for poor I/O response time is
significantly more difficulty because of the complex nature of today's storage
platforms. So let me attempt to pull back the curtain on I/O performance tuning
by explaining it from a completely different perspective: that of the storage administrator.
Isolating the Real Problem
Here's the
first big secret that storage administrators know intimately, but Oracle DBAs
sometimes ignore, when tackling an I/O performance tuning issue: No database truly performs I/O
directly and immediately against disk. In fact, even in the
simplest case - say, a single disk drive with a native installation of either the
NTFS or EXT3 file system - disk
I/O is always buffered in the cache memory that's part of each
individual disk's I/O controller.
Expanding
upon this extremely simple case, I/O performance issues can be much more
difficult to accurately diagnose because of various factors, including:
- the
individual speed (e.g. 10K, 15K) and resulting rotational latency of
each drive
- the
number of physical spindles available within the disk array
- the
type of RAID protection chosen (e.g. RAID-5) and the parity
selected (e.g. 6+2, 7+1)
For
mid-range and enterprise-level high-performance storage systems, other
configuration options may introduce additional complexity:
- the number
and capacity (i.e. bandwidth) of the incoming or "front end"
I/O ports that funnel I/O to the storage platform's internal I/O controllers
- the number,
capacity, and speed of the I/O controllers themselves
- the number
and capacity of the outgoing or "back end" I/O ports that
accept the I/O from the storage platform's internal I/O controllers and route
that output to the actual disk drives
- finally,
if the storage subsystem should offer it, the amount of internal cache
memory
Application Workload I/O Profiles
If the
potential complexity of the I/O subsystem doesn't make an Oracle DBA's head
spin, here's another factor: I/O performance tuning is often compounded by
mixed database application workloads. Only a few years back, Oracle recommended
that database application workloads should be isolated at the database level - for
example, by creating separate OLTP and DSS databases, each with their own set
of storage - and that was certainly helpful when attempting to isolate an I/O
performance problem.
In the last
few years, however, the dramatic increase in the maturity of Oracle Real
Application Cluster (RAC) database software in Oracle Release 10g has made it simpler than
ever to deploy RAC database environments in a matter of hours. Oracle RAC
certainly adds to the stability and consistency of database applications because
the loss of one node or instance no longer means that the Oracle database is
completely unavailable. The addition of RAC database
services in Oracle 10g
also meant it was easier than ever to control and serve up sufficient bandwidth
for multiple applications against a single RAC database, but this also means
that it's not uncommon today to encounter a single RAC database that's
servicing multiple application workloads with dramatically different I/O
performance profiles.
The good
news is that there are only four basic types of I/O performance profiles that
any Oracle DBA need be concerned about:
- Random
reads (RR) are found predominantly within OLTP application workloads.
Random reads typically occur when a query requests a small number of blocks,
usually as the result of an indexed search that is concentrated on a small set
of unique values. For example, if an order entry application needs to locate a
specific customer's ordering profile, only a very few blocks - perhaps less
than 10 - might need to be retrieved into the database's buffer cache.
- Likewise,
random writes (RW) are found predominantly within OLTP
application workloads as well. Continuing the example above, if the application
needs to update a customer's profile, place a new order for that customer, add
some new line items to an existing order, or update the customer's paid
balance, only a very few blocks will need to be retrieved, dirtied, and then
eventually written back to disk when the Database Writer (DBWn)
background process flushes those blocks back to disk.
- Sequential
reads (SR), on the other hand, typically occur when a query is
executed and that query needs to return a large result set, especially when a
table scan is required to answer the query, or if a fast full index scan is
needed for value retrieval or set operations. It's important to remember,
however, that sequential read I/O is triggered not only by large query
operations; it also happens during index creation as the index column values
are read, and when Recovery Manager (RMAN) is commanded to back up all the
blocks in an Oracle datafile.
- Finally,
sequential writes are most common during batch update procedures, or
during extraction, transformation and load (ETL) operations for a data
warehouse. But other background operations are also sources of sequential write
workloads, such as when Log Writer (LGWR) writes out redo
entries to the database's online redo logs or when RMAN restores a database
file from a backup source or writes out a backup piece to a backup set.
Here's the
second big secret that our storage administrators keenly understand, sometimes
better than even experienced Oracle DBAs: Ultimately,
all I/O performance tuning revolves around determining what type(s) of
application workload(s) the database is actually performing versus what
workload the underlying storage has been configured to handle most of the time.
So if
the underlying storage platform on which the Oracle database's files exist has
been tuned primarily for random read/write access, then sequential read/write
application workloads (DSS) will most likely perform significantly worse than
an OLTP workload; and vice versa, if the storage has been tuned for
predominantly sequential read/write access (like a data warehouse), OLTP
application workloads are more likely to encounter serious I/O performance
bottlenecks.
Next Steps
Enough
theory! It's time to put at least some of these concepts to work. The next
article in this series will delve more deeply into how to:
- create
extra-database and intra-database I/O performance workloads
- measure
expected and actual I/O performance using established I/O monitoring tools
Additional Resources
ADDM Enhancements in Oracle Database 11g
Oracle 11g Data Guard: Grid Control Management
Oracle 10.1.2.2.1 Identifying I/O Problems Using V$ Views<
Oracle 10.3.2.2 Managing Excessive I/O
Oracle 8.3 I/O Configuration
Back to DBAsupport.com