We have Oracle 10.1.0.3 intalled on a Redhat linux box. The server has 4GB of physical memory. Oracle has been configured to shared servers (wasnt configured by me, i would used dedicated servers). We have 25 shared servers. Now when I do a top i see:
As can be seen, most of the memory is been used by oracle processes, these were traced to shared processes (some using 1.1G!). Besides changing to dedicated mode, what areas need to looked in order to reduce this overhead?
SGA_TARGET is set to 1.4GB, and PGA_AGGREGATE=486MB.
The process size that an utility like top shows is not the actual size of heap (data). It is a combination of (a) SGA size + (b) oracle executable being used by the process + (c) the actual heap (data). On sun boxes, you can verify the actual size of the heap portion of the process through the use of pmap command. For example :
On a solaris10 box, here's the output from top:
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
9177 ora9i 1 1 0 1597M 1569M sleep 45:04 0.03% oracle
As you can see, the process size is listed by top at almost 1.6GB.
From PMAP, the actual private heap / data size is only 1.4 MB.
Translating the SGA address from the pmap output (ism shmid=0x55 ): 0x55 translates to 85
$ ipcs -m
m 85 0xdc126148 --rw-r----- ora9i software
ipcs -a should tell you as to how much oracle's allocated for SGA.
As you can see, the process size of top is not a good indication of the heap portion of the process.
You are better off relying on the UGA and PGA statistics from v$sesstat to see if you are using excessive sql work area memory.
Until 9i the automatic tuning of work areas was disabled when you use shared servers since most of the work area allocations that are part of the run time memory were handled within SGA for shared servers. Starting 10G work area allocations that are part of the run time memory is handled within PGA and so large sort operation or hash joins or windowing operation on resultsets can potentially cause your private heap size to grow.
Thank you very much for the very informative feedback. I can understand that because of the changes in 10g, large sorts etc can take place within the PGA however can point me in the direction of what exactly to look for when i check v$sessat?
I also noticed that there are 25 shared servers allocated and all of these start up with the instance, is there a gain in reducing the number to shared servers? Along with this there is only one dispatcher allocated. Surely this needs to increase if shared servers were to remain at 25?
Am new to shared servers so any advice will be greatly apprecaited.
(1) You can do this to check UGA and PGA allocations for a given session:
from v$statname a,v$sesstat b,v$session
where a.statistic#=b.statistic# and b.sid = v$session.sid
and b.sid=&sidvalue and a.name like '%ga%memory%'
(2) To just check PGA allocations for a given session:
select PGA_USED_MEM,PGA_ALLOC_MEM,PGA_FREEABLE_MEM,PGA_MAX_MEM from v$process,v$session where sid = &sidvalue and addr=paddr
Also to keep in mind is the fact that database sessions that do direct path reads due to Sort IO (when a sort does not fit in memory) or uses parallel Query slaves or performs I/O to LOB segments (which are not cached in the Buffer Cache), add to the PGA's growth which in turn can affect the process size.
"I also noticed that there are 25 shared servers allocated and all of these start up with the instance, is there a gain in reducing the number to shared servers?"
I guess it depends on the load / concurrent connections to your database. You could always control initial shared server process startup Vs. the total number through the use of shared_servers and max_shared_servers parameters.
This is oracle's recommendation for the number of dispatchers:
"The value of MAX_DISPATCHERS should at least equal the maximum number of concurrent sessions divided by the number of connections for each dispatcher. For most systems, a value of 250 connections for each dispatcher provides good performance. "
You might also want to take a look at 10g's Automatic Shared Server Configuration feature (metalink note 265931.1).
Thank you very much for the feedback. I have gone deeper into the database to investigate this high CPU usage. I ran several AWR reports and that the execute to parse ratio is only 3%. I checked out several queries that were being run (and that are consistently run throughout the day) that most likely sap all the CPU. One such query had a cost of 141913! It was performing 4 full table scans (out of which one table is 37 million rows!). Altough the cost was higher on the hash joins it the query was trying to perform. Any advice on reducing this (generally)?
I also noticed that stats were not update to on any of the tables (e.g the stats for the table which had 37 mill rows dated back 3 weeks). As a first point of action i have suggest an analyze on the schema.
It seems like that this ineffiecient SQL code is the reason for this high CPU usage.
I also have a high number of waits on db file sequential read:
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
db file sequential read 3,292,788 8,545 37.10 User I/O
Is this due to ineff sql as well due to the high amounts of i/o produced by these queries?
Having a low execute to parse ratio depends on the type of application being run against the database. If you have OLTP type of an applications that parses a statement and re-executes it numerous times within the session then this ratio will tend to be higher. On the other hand if you have a batch processing system that parses, executes and does not re-execute the same statement then this ratio will likely be low. I would concentrate on soft parse % (which will give us an idea of the amount of hard parse) and Non-parse CPU% which could tell you as to how much CPU time was actually spent on parsing.
High CPU usage might be an indicative of a different problem where in some cases, the process might be actually waiting on the IO (full scan) to complete.
Something must be causing oracle to favor hash joins. Could be that all of your tables that the optimizer's doing full scans on (incl. the 37 mill rows table) might have parallelism enabled and not have large number of blocks or the outer cardinality of the join might be high.
You could always enable the 10053 event for the session to see as to why oracle arrived at the plan it did.
Hope you've enabled monitoring on the tables so that stats are collected by oracle only if a certain percentage of data has changed. Frequent collection of stats can cause plan instability.
Narrow down on the actual objects where the sequential reads are happening. You could query active session history (ASH) to get the specific details. 99% of the time this will point to tweaking the execution plans with having the optimizer pick the right indexes for the sql statement.