I was wondering whether anyone has experienced poor backup times with oracle databases on RAID 5? We are using netbackup in conjunction with rman to backup a 1.5TB database that takes about 16 hours to complete. A host of rman parameters have been tried - combinations of filesperset and maxopenfiles - with no benefit seen. Our database uses RAID 5 spread over about a 100 disks. We have noticed that no matter what we do the maximum throughput for the backup is only 50mb/s. We are using an LTO3 tape library whcih can write at 60mb/s per tape drive. But no matter how many tape drives we allocate for use we only get 50mb/s as the total throughput - resulting in a backup that is over 16 hours. I have monitored v$backup_aysnc_io and have noticed that there is for the most part poor read rates as seen by the effective_bytes_per_second for TYPE='INPUT'. From that I can only infer that the slower the read done by rman the slower the backup will be. We have seen with a filesystem backup we are able to achieve a rate of about 100mb/s - although this type of backup is different to an rman backup there is quite a difference seen in the rate. Just wondering if there is someway to work out if the RAID configuration could be the issue here?
Have you tried allocating more channels?
"The person who says it cannot be done should not interrupt the person doing it." --Chinese Proverb
What do you have BACKUP_TAPE_IO_SLAVES set to? You should ensure that your LARGE_POOL_SIZE is set large enough.
You also may want to try and adjust the BLKSIZE parameter.
Last edited by ebrian; 08-17-2006 at 03:09 PM.
RAID 5 is excellent for reads. There are many factors involved in this.
Processors, network bandwidth and load.
What's your backup strategy? you do incremental backups?
"What is past is PROLOGUE"
Your LTO3 Drives are connected to your machine via SCSI or FCAL
Hmmm, couple things to think about:
1. LTO3 drives have a theorhetical sustained rate of about 68MB/s. At 50MB/s each, I'd think they are working pretty hard.
2. Your backplane may be the limiting factor if you're trying to spin X tape drives and Y disk drives.
While my backup is running, I'd measure my utilization using iostat to see how hard the disks and tape are working. Only then you will be able to tell what the bottleneck is.
Thanks everybody for the updates. I shall try to address each reply in one go.
So far there are no incremental backups - just full backups. with respect to the large pool - no errrors reported in the alert log file - we have also tried various blksize values - to no avail.
On allocating more channels - Oh yeah, I can use 1 channel , 2 channel , or 3 channels (one channel per tape drive), and it makes no difference the total throughput i am told is 50mb/s. The netbackup admins, use the netbackup gui tool and see that the total rate is at 50mb/s for the full db backup - no matter how many channels are allocated. You would think with 3 drives id at at least be able to get 150mb/s - however according to the netbackup admins the total backup shows a rate of 50mb/s. See what appears to happen is as additional channels are allocated the total 50mb/s is divided over all the channels, one channel may operate at 15, another at 25 and the third at 10 giving a total of 50. if we stick to two channels, each channel would more or less give a rate of about 25mb/s - sticking to the total of 50mb/s.
Which makes you wonder why can't we get over 50mb/s. I'm going to have access to this system to run iostat to see whats going on myself - hopefully that might reveal some sort of io issue.
Jeff: Could you please elaborate on: '
Your backplane may be the limiting factor if you're trying to spin X tape drives and Y disk drives.'
To me the backup being written will only be as good as what rman can read (keeping all else sane) - to me it seems to be a read issue for rman - this is 24x7 billing system - its under constant load (statspack confirms that). Could it be that the load and the additional reads introduced by the rman backup bog down the i/o subsystem - thus limiting the amount of data that can be shipped (read from storage) off for backup.??
I am told that a file placed on the the same storage as the db was able to show up on the netbackup gui with 100mb/s - without the use of rman - a netbackup file backup.
on a side note - a reporting database - which is more or less a clone of the database above exhibits the same poor backup rate when rman is used to back it up.
Oh and yes - the tape library is connected to the db server over a private network dedicated for backups only - gigabit.
Then your limiting factor is how fast you can pump the data to the backup device. Typically, mb/s means mega bits/second. If your rate is 50 mega bits/second, that is pretty poor. If it is 50 mega bytes/second, that you might be doing OK. (Granted 400 Mbps on a gigabit ethernet is not fully utilized, but coming from one interface it might be OK).
Originally Posted by naqi76
Sure, your computer's backplane/bus can only move so much data. If you have a bunch of devices hooked to it locally the bus speed may limit how much data you can move. However, since you're backing up over the network, I'd say this is most likely not the problem.
Could you please elaborate on: 'Your backplane may be the limiting factor if you're trying to spin X tape drives and Y disk drives.'
To me the backup being written will only be as good as what rman can read (keeping all else sane)
Take rman of the picture. Use a plain filesystem copy to backup the datafiles across the network and see what the speed is (Of course, you can't use this as a real backup, but you'll be able to compare time/throughput). If rman is the problem, you can fix it. If rman is not the problem, your network admins need to fix it.
Use rman to backup to a local disk and time it. That will tell you how fast rman can pump the data. (Note, you will have to make sure the local disk can support that level of write activity so the disk is not the bottleneck).
jeff, thanks for the input. made my trip over to the client site today. Remember the 100mb/s (mega bytes) backup i was told about on the filesystem. Well I had a tablespace with sufficient data put in backup mode. then had netbackup backup the files that belong to that tablespace. What happened? it wasnt 100mb/s it was indeed 50mb/s the same rate as rman. thats 50mega bytes/s.
The interesting thing is, we started with 2 channels, filesperset=1 and maxopenfiles=1 - this gave us 50mb/s - as usual. I then added an additional channel kept the filesperset and maxopenfiles settings the same and that took the rate up to 60mb/s - a 100gb/hr improvement. I then proceeded to open up another channel - 4 channels - and sure enough the rate dropped - seems the threshold is 3 after which io read rates degrade - hence the write rate falls.
This was all done on a reporting db which at the time did not have much load - statspack shows the load on this system is much less compared to production.
We then tried the same backup settings on production and could only throttle at 50mb/s - to me this indicates an io problem? does it not? I cant seem to generate the iostat's myself (not allowed) the system admins tell me iostat results are fine and there are no waits etc. What else could i possibly do to show that the io on the system is being maxed out? and hence rman backups will be slow (well even non rman backups). This is indeed if my hunch about the io is correct .
on the network side of things, even though other backups are not oracle related. the sysadmins tell me if they create a 4gb file on the same san (not an oracle datafile) and they backup that 4gb the get a rate of 100mb/s which is double that of an oracle backup. I make the distinction here by saying oracle backup - becasue from my own tests - whether i use rman or not netbackup only goes at 50mb/s for production and a slight improvment on reporting at 60mb/s (likely due to less load). So if a file other than an oracle datafile can go at 100mb/s on the same storage - why do oracle datafiles suffer, rman or no rman? Could it be io generated by the system due to heavy load - but this load is no on reporting and the most i can get out of reporting is an additional 10mb/s (60mb/s) after which the rate begins to dropped - as more channels - or more files are added to the backup. Could the configuraiton of RAID 5 be questioned here - i wonder
How fast does the backup to local disk go?
The fact that your sysadmins say they can backup a plain file at 100 MBps means your RAID 5 filesystem is not the bottleneck.
You could be running into a CPU bottleneck. RMAN backups are very CPU intensive and if you are starved for CPU resources, more channels might make it worse. How many CPUs?
Click Here to Expand Forum to Full Width