Slow OLTP on 14CPU SunFire

Printable View

Show 40 post(s) from this thread on one page

06-07-2005, 06:16 AM
marist89

Quote:

Originally posted by blguy
We have 5 members per group, each member 256MB.

Did I read that right? Are you saying you are writing to five redo logs at the same time?
06-07-2005, 06:42 AM
pando

That is too many! 2 is more than enough + Hardware mirror
06-07-2005, 07:08 AM
waitecj

blguy - Spotlight says 5 groups, did you mean that? Not 5 members surely?
06-07-2005, 07:44 AM
blguy

Eh? did I get this the wrong way round?

5 redo groups multiplexed.

Code:

DISK1 DISK2 ----- ----- REDO1A REDO1B REDO2A REDO2B REDO3A REDO3B REDO4A REDO4B REDO5A REDO5B

After talking to my UNIX chaps there may be something that *might* not be helping me very much...

Story: (Before I joined the BL, I add)
BL needed to build a test partition on our 6800, so we split the 6800 into two partitions, one for prod with 14 CPU's and a second test with 4 cpus.

We split up the SAN that has 12 T3 arrays (each 500GB RAID5 arrays) between the two systems, 6 T3's each. However, we did not have any non RAID 5 storage for REDO. So we purchased a SUN 3510 array, that was directly attached to the primary partition - dedicated to REDO.

(I may have said before, our UNIX hardware guy's are very good, but don't have any real Oracle background) Somehow, the spec was translated and watered down...

The figure of 140GB (per drive for redo) was quoted (not sure from who) as this is the usual drive we put into the arrays, and this was used as the baseline. So UNIX support purchased the 3510 with 6 146GB drives.

When they went to build the 140GB partitions, a single 146GB drive (after formatting) did not create a 140GB partition on it's own - so two drives were joined and used to create 140GB partition. (130GB + 10GB).

New mount points added, and we moved the REDO's from the R5 array to the new drives. But no change in performance from dedicated drives (same wait times, etc).

As I was spec'ing a new Oracle system, aware of the problems we had, I sat with the UNIX chap and walked through the problems we were having.

It turns out that SUN do 36GB 15K RPM drives for the 3510, instead of the 146GB 10K RPM drives. The drive buffer caches are different too (it seems SUN do small fast drives specifically).

Also - direct attaching the 3510 is not a good idea - although it save a few hundred pounds and the motherboard had a built in controller - you can't beat a dedicated drive controller for critical stuff!

So - lessons learnt? If you're not directly responsible for purchasing the equipment (i.e. you have dedicated hardware people) make ABSOLUTELY SURE they understand ALL your disk requirements - not just disk size, but speed, latency, rpms etc. Otherwise they will just reuse an old quote or will try to cut costs, and bugger up your nice disk management layouts.

I will have to live with the previous purchasing decision for the main system until we can re-purpose the 140GB drives and swap with 36GB drives. Obviously, people want proof that the small drives are what we require (the 36GB drives dont have a very MB/£ ratio), and the dedicated controller is required (technically everyone agrees - but the man with the cash needs to be convinced)

Thanks everyone for your help. If I do get a chance to swap the new drives in I'll update this (or create a new) thread with new stats.

And - if I have any other questions - after asktom (if I can ever get to ask a question) I'll know where to come.

Guy
06-07-2005, 10:59 AM
marist89

Quote:

Originally posted by blguy

And - if I have any other questions - after asktom (if I can ever get to ask a question) I'll know where to come.

After? We don't have a limit on how many questions can be in the queue at one time....
06-07-2005, 11:43 AM
aarroyob

In addition to redo ...

Are your data tablespaces DMT?

Class Waits Time (cs) Time (cs)
------------------ ----------- ---------- ---------
data block 252 899 4

If yes check if you have freelist contention.

Cheers
06-07-2005, 04:59 PM
jgarry

Quote:

The question remains - if you have a hight OLTP system with 100's of commits a second - and you run a "high redo" operation (create index / copy table, update many rows, insert many rows, CTAS) - would you EXPECT at 14CPU machine to stop processing the very light OLTP requests?

Well, if the operation is going through the MTS rather than on a dedicated connection, yes, I would.
06-08-2005, 03:22 AM
blguy

Quote:

Originally posted by jgarry
Well, if the operation is going through the MTS rather than on a dedicated connection, yes, I would.

Can you give me your reasoning for this? What happens?

Why would dedicated connections be better (Because it would prevent....?)

Switching from MTS to dedicated would be quite easy (change to tnsnames on client) - but what impact would that have on the system / response times / connection times / SGA usage?
06-08-2005, 04:20 AM
pando

Quote:

Originally posted by blguy
Can you give me your reasoning for this? What happens?

Why would dedicated connections be better (Because it would prevent....?)

Switching from MTS to dedicated would be quite easy (change to tnsnames on client) - but what impact would that have on the system / response times / connection times / SGA usage?

As I said, if your app is web based you should be using connection pool in the application server
06-08-2005, 05:43 AM
yoursmanish

Re: Slow OLTP on 14CPU SunFire

Hello,
as i followed ur the thread i found that u r using 2 redo groups thus server waits till other group is not archived. And u have three members in each group thus it waits for Log Sync. i think u should use 2 members per group and should have more than 3 groups and make sure that u dont place consecutive groups on same disk controller.

i think this will decrease ur log related waits.

manish
[Hope for the Best, But prepare for the Best]

Quote:

Originally posted by blguy
We have a large (3TB) Oracle 8.1.7 OLTP database running on a 14CPU SunFire server (48GB RAM). This drives a website with around 100 users requests per second (transactional commits per second). We use MTS to manage web server connection requests.

Generally the system runs fine - easy sub second responses. However, this can be very easily upset if we need to do maintenance tasks (data loads and index rebuilds) and we find that this brings the system to a virtual halt.

We are planning a large upgrade and data migration on the system - which will have to include work during working hours (it's a 6 month project). As the DBA, I'm concerned that the system this powerful cannot support OLTP users and our data management tasks.

Is this typical? Do other large OLTP Oracle system have the same "feature" where a 5 minute "CREATE TABLE AS SELECT" command in SQLPLUS turns millisecond web responses into 5+ second responses.

Example - During normal working day, a look at V$SESSION will show 100 sessions, with only 5 or so "ACTIVE" at that very moment. (Each web request is sub second so you don't see them!) However, if I run a "CREATE TABLE newtable AS SELECT * FROM table_with_a_million_rows" then V$SESSION will show up all the active users (60+) and their active session time goes into many seconds. User response becomes very poor (the phone starts ringing off the hook!).

If I run the same in "NOLOGGING" mode, then there is no noticable problem. Is this a LGWR problem?

I know this post does not have many "facts" - I will be willing to supply as requred.

Thanks in advance - Guy

Show 40 post(s) from this thread on one page