What will be Zero Down Time Setup?

**nabeel** · 05-23-2007, 08:12 AM

Hello Friends,

I came across to a setup at one of my friends workplace and was wondering if they are doin it rite, also now they want to implement zero down time topology. So I thought of giving it a try and draw up couple of scenarios for this site. I would definitely needing your valuable comments and advices before working out a final plan for him. So goes the details:

Current Setup is as following:

2 nodes Redhat OS clustered with each node having
2 x Inter Dual Core 2.67 GHz with 667 MHz BUS Speed
4 GB RAM.

Node1 Called Oracle1, Running OVPI and OVSD as 2 Oracle 9i Instances and Databases with following file system mounted : /U02 and /U03 Each oracle instance have one user OVPI and OVSD in the other instance.

Node2 Called Oracle2, Running OVO (OPENVIEW) and OVNNM as 2 Oracle 10g Instances and Databases with following file system mounted : /U04 and /U05 with 1 user in each database openview and ovnnm

Shared Storage with files system /u01, /u02, /u03, /u04, /u05
/U01 => Oracle 9i and 10g Binaries
/U02 => Data files for OVPI Database
/U03 => Data files for OVSD Database
/U04 => Data files for OVO Database
/U05 => Data files for OVNNM Database

At Any Given Time Each node will be running 2 Oracle Databases and if One nodes go down, the databases are relocated to Other Node which means that Single Node will be running 4 oracle DBs 9i 2 databases and 10g 2 databases.

Now what are the possibilities of making this setup better… and implement ZERO downtime topology?

Node1 Oracle1 (9i)will have only 1 Database with 2 users OVPI and OVSD and Node2 Oracle2 (10g)will have only 1 Database with 2 users OVO and OVNNM. Use RAC? Or Dataguard? What will be the failover effect and what could be done to have no single point of failure?

I can think of 3 scenarios:

a) Node1 Oracle1 (9i)will have only 1 Database with 2 users OVPI and OVSD and Node2 Oracle2 (10g)will have only 1 Database with 2 users OVO and OVNNM. Use RAC for load balancing and availability, if one node codes down the database moves to the available node.

b) Have Node1 running 10g and 9i Database and Node2 will have the replica database which will be updated through dataguard.

c) What else can be here?

I really appreciate your time and ideas.

Sincerely,
NK

**ixion** · 05-23-2007, 10:24 AM

You can minimize your downtime, but not completely eliminate it.

1. Locate the servers in disparate geographic locations. At a minimum the two systems should be in differing data centers and even better if they are located a good distance apart.
Reasoning, avoid system downtime due to fire, water, power failure, Natural disaster or Human error.

2. If you agree that item 1 is reasonable. Then this means that RAC is eliminated as an option. Leaving you with DG, Streams, Advanced Replication or another vendors Replication solution, such as Ixion's, Golden Gate, or Quest.

Each soulution comes with its own set of advantages, disadvantages and costs.

**nabeel** · 05-23-2007, 10:31 AM

I'm mainly not concerned about DR site now, just wanted to brainstorm on what possible good options we have to make some what MAA

**marist89** · 05-23-2007, 01:07 PM

There's no such thing as zero downtime.

To limit your downtime, you have to limit your single points of failure. The closest thing to zero downtime is redundant nodes using RAC. In your case you could have one instance or two and separate them out to their primary use. However, then you're not really taking advantage of the scalability of RAC, you just have a passive cluster. However, RAC, by definition, needs shared storage. Your single points of failure here are storage subsystem, environmental, and maybe network.

Or, you could setup a DG in maximum protection mode where each node participates in the transaction. Each node would have a dedicated storage subsystem. This is still a passive setup as only one node can actually do the work and failover is probably not automatic. Your single points of failure here are environmental and maybenetwork. You can limit your environmental by locating the standby node in another geographic location, but then you are more dependant on the network.

Or, you could have a Multi-master replication setup where updates happen on both nodes in different geographic locations. Your single point of failure in this case may be the network, or it may be nothing. However, multi-master replication is a complex topic not suited for the everyday DBA.

Or, you could use a combination of the above to achieve your goals. For example, you have two RAC Clusters in two geographic locations. Each RAC Cluster participates in multi-master replication. Each RAC has two standby dbs; one local and one in yet another geographic location.

The question is are you happy with 99% uptime, or 99.9999% uptime and how much do you want to spend to get it?

**PAVB** · 05-23-2007, 01:22 PM

Excelent post Jeff, you keep raising the bar.

Having the business requirements and the monies to pay for it I would go with your last option, local RAC doing replication to a Disaster Avoidance remote location.

**nabeel** · 05-24-2007, 07:57 AM

Originally Posted by marist89

There's no such thing as zero downtime.

I agree but there are combinations of course which have its own cost by which we can eliminate every single point of failure rite?

If we remove

storage subsystem, environmental, and maybe network

then what is there we can enhance in current environment? If we have Node1 and Node2 each running 2 DB's then have them on RAC? or have Node1 running all 4 DB's and have Node2 replicated by Data Guard? Do you think this will help?

I was thinkin to merge these existing databases into one, I guess there is a script shipped with Oracle which can be used to check the compatibility rite?

But about the replication on storage level means OS thingy or something from Oracle? Also Hot backup standby technique can be archived by Data guard? True?

Its an open ended discussion, and I really wanted to have inputs from friends around the globe to share thoughts on this kind on discussion

**marist89** · 05-24-2007, 12:06 PM

Originally Posted by nabeel

If we remove then what is there we can enhance in current environment? If we have Node1 and Node2 each running 2 DB's then have them on RAC? or have Node1 running all 4 DB's and have Node2 replicated by Data Guard? Do you think this will help?

First off, if you're using RAC, you have a single point of failure - the storage subsystem. Many of today's high-end storage subsystems have redundant components, but if that storage array goes away because of a botched firmware upgrade or something else, you can kiss your db goodbye.

In your scenario, you've got one active node (node1) and two inactive nodes (node2 and standby-node1). Seems like a waste to me. Why not have node1 and node2 active all the time and have standby-node1 as your standby database in a different location.

I was thinkin to merge these existing databases into one

Sounds like a smart plan to me if you're going to use RAC.

But about the replication on storage level means OS thingy or something from Oracle?

OS replication is where your storage subsystem copies changed blocks to another server. Not sure what the point is since the technology comes bundled with Oracle.

Also Hot backup standby technique can be archived by Data guard? True?

All I can tell you to do is read. They're two separate things.

**ixion** · 05-24-2007, 06:25 PM

Jeff, I mostly agree. Great first post by the way.

Originally Posted by marist89

First off, if you're using RAC, you have a single point of failure - the storage subsystem. Many of today's high-end storage subsystems have redundant components, but if that storage array goes away because of a botched firmware upgrade or something else, you can kiss your db goodbye.

I totally agree with this! In fact I've had shared storage subsystem failures. Even some that have taken several hours to restart. And others that have been so horribly botched that some DB files had to undergo recovery.

Originally Posted by marist89

Sounds like a smart plan to me if you're going to use RAC.

Just be careful and test. Although oracle touts RAC as a scallable solution there are pitfalls to RAC as well. One thing that comes to mind is global locking.

Also what happens in the RAC configuration when you hit a bug? Typically both sides of the cluster must be restarted, especially if the bug is in the lock management layer... So much for zero downtime with rac.

Thread: What will be Zero Down Time Setup?

Thread Tools

Display

What will be Zero Down Time Setup?

Posting Permissions