Shared server memory usage
We have Oracle 10.1.0.3 intalled on a Redhat linux box. The server has 4GB of physical memory. Oracle has been configured to shared servers (wasnt configured by me, i would used dedicated servers). We have 25 shared servers. Now when I do a top i see:
As can be seen, most of the memory is been used by oracle processes, these were traced to shared processes (some using 1.1G!). Besides changing to dedicated mode, what areas need to looked in order to reduce this overhead?
195 processes: 190 sleeping, 3 running, 2 zombie, 0 stopped[K
CPU states: cpu user nice system irq softirq iowait idle[K
total 4.1% 0.0% 65.2% 0.0% 0.7% 29.0% 0.7%[K
cpu00 3.7% 0.0% 64.6% 0.0% 1.5% 28.5% 1.5%[K
cpu01 4.5% 0.0% 65.9% 0.0% 0.0% 29.5% 0.0%[K
Mem: 3921888k av, 3899804k used, 22084k free, 0k shrd, 7960k buff[K
2963208k actv, 566312k in_d, 55244k in_cK
Swap: 2044072k av, 5168k used, 2038904k free 3408592k cached
7m PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND[K[0m
29453 oracle 16 0 1110M 1.1G 1107M R 28.6 28.9 12:38 0 oracle
8 root 16 0 0 0 0 SW 28.2 0.0 218:07 0 kscand
7 root 16 0 0 0 0 SW 3.0 0.0 212:15 1 kswapd
29455 oracle 15 0 1041M 1.0G 1030M R 2.2 27.0 4:19 0 oracle
29449 oracle 15 0 1136M 1.1G 1124M D 1.8 29.5 33:03 0 oracle
25029 root 20 0 1244 1244 880 R 1.5 0.0 0:00 0 top
29451 oracle 15 0 1124M 1.1G 1112M D 1.1 29.2 35:58 1 oracle
29457 oracle 15 0 1022M 1.0G 1019M D 0.7 26.6 4:06 0 oracle
SGA_TARGET is set to 1.4GB, and PGA_AGGREGATE=486MB.
Any advice will be much appreciated,
Thanks in advance,
Last edited by davey23uk; 05-02-2007 at 01:01 PM.
The process size that an utility like top shows is not the actual size of heap (data). It is a combination of (a) SGA size + (b) oracle executable being used by the process + (c) the actual heap (data). On sun boxes, you can verify the actual size of the heap portion of the process through the use of pmap command. For example :
On a solaris10 box, here's the output from top:
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
9177 ora9i 1 1 0 1597M 1569M sleep 45:04 0.03% oracle
As you can see, the process size is listed by top at almost 1.6GB.
Doing pmap :
$ pmap -x 9177
Address Kbytes RSS Anon Locked Mode Mapped File
00010000 49744 48896 - - r-x-- oracle <<- Executable
030B2000 496 344 56 - rwx-- oracle <<- Executable
0312E000 16 8 8 - rwx-- oracle <<- Executable
03132000 1512 632 632 - rwx-- [ heap ] <<- Private Data
20000000 1568768 1568768 - 1568768 rwxsR [ ism shmid=0x55 ] <<- SGA
FE720000 240 216 - - r-x-- libresolv.so.2
FE76C000 16 16 - - rwx-- libresolv.so.2
FE780000 2392 2096 - - r-x-- libvas.so.4.2.0
From PMAP, the actual private heap / data size is only 1.4 MB.
Translating the SGA address from the pmap output (ism shmid=0x55 ): 0x55 translates to 85
$ ipcs -m
m 85 0xdc126148 --rw-r----- ora9i software
ipcs -a should tell you as to how much oracle's allocated for SGA.
As you can see, the process size of top is not a good indication of the heap portion of the process.
You are better off relying on the UGA and PGA statistics from v$sesstat to see if you are using excessive sql work area memory.
Until 9i the automatic tuning of work areas was disabled when you use shared servers since most of the work area allocations that are part of the run time memory were handled within SGA for shared servers. Starting 10G work area allocations that are part of the run time memory is handled within PGA and so large sort operation or hash joins or windowing operation on resultsets can potentially cause your private heap size to grow.
Thank you very much for the very informative feedback. I can understand that because of the changes in 10g, large sorts etc can take place within the PGA however can point me in the direction of what exactly to look for when i check v$sessat?
I also noticed that there are 25 shared servers allocated and all of these start up with the instance, is there a gain in reducing the number to shared servers? Along with this there is only one dispatcher allocated. Surely this needs to increase if shared servers were to remain at 25?
Am new to shared servers so any advice will be greatly apprecaited.
UGA and PGA checks:
(1) You can do this to check UGA and PGA allocations for a given session:
from v$statname a,v$sesstat b,v$session
where a.statistic#=b.statistic# and b.sid = v$session.sid
and b.sid=&sidvalue and a.name like '%ga%memory%'
(2) To just check PGA allocations for a given session:
select PGA_USED_MEM,PGA_ALLOC_MEM,PGA_FREEABLE_MEM,PGA_MAX_MEM from v$process,v$session where sid = &sidvalue and addr=paddr
Also to keep in mind is the fact that database sessions that do direct path reads due to Sort IO (when a sort does not fit in memory) or uses parallel Query slaves or performs I/O to LOB segments (which are not cached in the Buffer Cache), add to the PGA's growth which in turn can affect the process size.
"I also noticed that there are 25 shared servers allocated and all of these start up with the instance, is there a gain in reducing the number to shared servers?"
I guess it depends on the load / concurrent connections to your database. You could always control initial shared server process startup Vs. the total number through the use of shared_servers and max_shared_servers parameters.
This is oracle's recommendation for the number of dispatchers:
"The value of MAX_DISPATCHERS should at least equal the maximum number of concurrent sessions divided by the number of connections for each dispatcher. For most systems, a value of 250 connections for each dispatcher provides good performance. "
You might also want to take a look at 10g's Automatic Shared Server Configuration feature (metalink note 265931.1).
Thank you very much for the feedback. I have gone deeper into the database to investigate this high CPU usage. I ran several AWR reports and that the execute to parse ratio is only 3%. I checked out several queries that were being run (and that are consistently run throughout the day) that most likely sap all the CPU. One such query had a cost of 141913! It was performing 4 full table scans (out of which one table is 37 million rows!). Altough the cost was higher on the hash joins it the query was trying to perform. Any advice on reducing this (generally)?
I also noticed that stats were not update to on any of the tables (e.g the stats for the table which had 37 mill rows dated back 3 weeks). As a first point of action i have suggest an analyze on the schema.
It seems like that this ineffiecient SQL code is the reason for this high CPU usage.
I also have a high number of waits on db file sequential read:
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
db file sequential read 3,292,788 8,545 37.10 User I/O
Is this due to ineff sql as well due to the high amounts of i/o produced by these queries?
Thanks in advance,
Having a low execute to parse ratio depends on the type of application being run against the database. If you have OLTP type of an applications that parses a statement and re-executes it numerous times within the session then this ratio will tend to be higher. On the other hand if you have a batch processing system that parses, executes and does not re-execute the same statement then this ratio will likely be low. I would concentrate on soft parse % (which will give us an idea of the amount of hard parse) and Non-parse CPU% which could tell you as to how much CPU time was actually spent on parsing.
High CPU usage might be an indicative of a different problem where in some cases, the process might be actually waiting on the IO (full scan) to complete.
Something must be causing oracle to favor hash joins. Could be that all of your tables that the optimizer's doing full scans on (incl. the 37 mill rows table) might have parallelism enabled and not have large number of blocks or the outer cardinality of the join might be high.
Nested Loop Join cost = outer access cost + (inner access cost * outer cardinality)
Hash join cost = (outer access cost * # of hash partitions) + inner access cost
You could always enable the 10053 event for the session to see as to why oracle arrived at the plan it did.
Hope you've enabled monitoring on the tables so that stats are collected by oracle only if a certain percentage of data has changed. Frequent collection of stats can cause plan instability.
Narrow down on the actual objects where the sequential reads are happening. You could query active session history (ASH) to get the specific details. 99% of the time this will point to tweaking the execution plans with having the optimizer pick the right indexes for the sql statement.
Click Here to Expand Forum to Full Width