If youve been holidaying in Siberia or similar places for about a year, you have probably not talked to an Oracle Sales rep yet about RAC. But you will no doubt find that theres a voice mail waiting for you when you turn your mobile phone on again after returning home from the vacation. RAC is being pushed very hard by Oracle. You will get high availability, incredible scalability, a much improved personal life, the ability to partition workloads, buy cheap Linux servers and what have you. It sounds pretty good. How can anyone say no to that kind of offer?
Fun Fact: The GES/GCS code was already in Oracle version 5. Bjrn Engsig, who has worked with Oracle source code since 1983, found out about this and implemented his own, very crude, lock manager on a Danish unix system running version 5. He got it to work, but only for demonstration purposes his home-written lock manager basically used database-level locking which is not really useful .
PING
Oracle had to make sure that a buffer wasnt modified by two different processes at the same time which one should then be written to disk later? So instead of just serialising the access to one copy of the block in one buffer (which can be achieved with the combination of hash buckets, chains and latches that we know so well), Oracle had to coordinate several copies in several buffer caches across nodes. This was achieved using a new kind of locking (called Parallel Cache Management or PCM locks) which was coordinated across nodes/instances using the DLM and various background processes. When there was a conflict, ie the same block/buffer was requested by more than one instance, the exclusive lock held by the first holder had to be down-graded to a shared lock held by all holders. This down-grade/sharing could only be done by first making sure that all holders were seeing the same image of the block/buffer. So the copy of the block that was in the buffer cache of the first holder was written to disk and then that copy of the block was read into the other buffer caches. The term ping was introduced to describe other instances requesting a buffer held exclusively by one instance. Pinging via disk is slow. If you had an index on a column that kept growing on the right-hand side the right-most leaf block could get pinged back and forth non-stop between instances. Pinging via disk could kill your systems performance. The workarounds included data partitioning, temporary tablespaces (introduced in 7.3) where each instance had their own latch instead of a shared Dictionary lock (STlock remember the ora-1575?), reverse indexes (7.3) which meant that it was random which leaf block you would hit even if you had monotonically increasing indexing) and other tricks.
By the way: Oracle had introduced their own, generic Lock Manager (LM) mechanism in Oracle 8.0, signalling that they would soon be pretty independent of the DLM code from the various vendors. You could say that the LM was the equivalent of the Oracle source code being OS independent and then having a small layer in the code known as the OSD (Operating System Dependent). With the introduction of the integrated LM Oracle only had to manage a small OS-dependent layer for each port the rest was generic code. Respect again to the engineers at Oracle Development.
Second, clever tricks have been put into the code in order to make all sorts of coordination tasks between instances faster, easier and sometimes even avoidable. The best tuning is as always not to do it at all. If Oracle doesnt have to send a copy of a buffer across to another instance it will try to not do. Does that mean that RAC will give you a better life? Yes and No. Or as any good consultant will say: It depends. Here are the things to consider before you go RACing all over the world: Price, availability, scalability, manageability, skills required and troubleshooting.
Price
This section talks about Oracle list prices. Discounts may vary . Oracle Enterprise Edition costs US$40.000,- per cpu or US$800,- per named user plus (NUP), as its called now. RAC costs 50% on top of that, which means US$60.000,and US$1200,- per cpu or per NUP. As I write this, Im aware that RAC has been offered at a 50% discount, ie US$10,000, on the American market since around January or February. But its not something officially reflected in the global price list. (By the way: The Partitioning option costs 25% on top of the cpu/NUP price. OLAP and Data Mining are 50% each. Spatial, Advanced Security and Label Security are 25% each. Diagnostics Pack, Tuning Pack, Change Management Pack and Management Pack for SAP R/3 are US$3.000,- and US$60,- per cpu/NUP.) So lets play around with Larrys vision of cheap Intel-based Linux clusters. Lets buy those two cheap, 4-cpu Intel boxes and put them together in a cluster with Oracle9i and RAC on top: Price for the hardware: About US$15.000,- or so. Price for the OS (Linux): About US$0.50,- or thereabout (it depends!) Price for Oracle w/ RAC: US$480.000,So thats half a million to Oracle. Put another way: Its 1 dollar to the box movers for every 32 dollars Oracle gets. Psychologically its hard for the customers to understand that they have to buy something that expensive to run on such cheap hardware. The gap is too big, and Oracle will need to address it soon. Theres nothing like RAC on the market, but that doesnt mean you have to buy RAC. I usually joke that its like buying a car for US$10.000,- that has all the facilities you need from a good and stable car. Airbags and ABS brakes are US$500.000,- extra, by the way. Well, airbags and ABS are wonderful to have and they increase your security. But its a lot of money compared to the basic car price.
There are other indirect costs associated with going RAC: Youll need more skills in your organisation, both with respect to RAC and clusters. If your organisation is not familiar with clusters youll need to learn a lot, for instance. Youll also have to consider to have a development environment (and maybe a test environment) that consist of both a cluster and RAC. Sometimes Oracle will let you run Oracle for free on those systems, sometimes not (it depends). RAC is very cool technology. But its expensive.
pay full license for the standby nodes if you use them more than 10 days a year. And always full price for the Data Guard nodes. You could of course also create something fancy and creative yourself. We used to do standby databases back in version 6 by applying archive logs manually on another database in constant recovery mode. Lots of issues, of course. But it was done. Or you could use log miner to extract DML from the archive logs and apply them on a standby database. Or you could have system triggers that caught all DDL and DML on a system and put them in load files that were then loaded in real time, near real time or much later on another system. Those kinds of alternatives will need a little work, but they have one thing in common: You could even do it with Oracle Standard Edition, which means that the price drops from US$40.000,- per cpu to US$15.000,-.
Could you duplicate your database, eg with Oracle DataGuard, so your users could be running on one database while the other is being patched? Im sure its possible, but I cant see how since DataGuard requires you to be on the exact same patch level on both Oracle and the OS. So it would appear that you need to shutdown and patch both databases at the same time. If its supported to let DataGuard run while upgrading the primary database to a higher version or patch level, Id be interested in the details. Did you notice what was missing from the list of actions above? The client didnt take a backup before upgrading. For very good reasons Oracle recommends that you perform a full and valid backup before applying patches or upgrading your system. This client didnt, but you should. Oracle might even recommend (again, for very good and valid reasons) that you take a new backup when youre done with your upgrade/patch actions. So theres RAC plus scheduled downtimes. But there are also emergency patches that need to be applied fast. This could be due to an error encounted in the environment or it could be a security patch. The time needed to apply emergency patches is hard to plan . With RAC you get duplicate nodes, duplicate instances and one database. That database can be hit by dictionary corruptions (Im sure weve all seen one) or it can be hit by the need for patching and upgrading. Thats downtime for your whole RAC system.
Scalability
Scalability is of course much better with RAC than with OPS, and you dont need as many fancy tricks in order to make it scale well. But. If you remove a bottleneck in any system (IT or other) a new bottleneck will now be present. It might be smaller than the old bottleneck (hopefully and usually, but not always), but its still a bottleneck. With RAC pinging is done using CPU resources. Yes, thats much faster than disk resources. But what if you are strapped for CPU in your system and RAC therefore cannot get enough CPU for the pinging? Mario Broodbakker from Digital/Compaq/HP in Holland has done some interesting benchmarks on RAC that prove two things: Its important to have enough CPU left for the RAC pinging activity. And you can still get into situations where traditional OPS workarounds are needed (data partitioning, etc.) in order to achieve maximum performance. Even the wonderfully complex and mythic GC_FILES_TO_LOCKS parameter can be useful at times. It has deliberately been removed from the documentation because it was seen by Oracle Marketing to send the wrong message.
So you say: Of course you should have the necessary CPU available for the RAC pinging activities hey, you should always size your system professionally. Yes. But what if you suddenly have a bunch of batch jobs or batch-like processes fighting over the available CPU resources (PX, DBMS_JOB, backup, file copying, whatever)? Yes, you can plan for many situations, but sooner or later your system will be in a situation where the system is running at 100% CPU, and thats when youll see some really bad performance with RAC. If youre interested in Marios whitepaper about his RAC testing let me know, and Ill be happy to send it to you. RAC Development are planning to address several issues he has pointed out or has already addressed them. Yet the lack of required CPU resources is not something Oracle or RAC currently can do anything about.
Manageability
OPS has actually never been that bad to manage. Assign an instance number to each instance, startup and shutdown the instances in the same order every time, create simple scripts for these things, and youre pretty much rolling. Its now possible to do very clever things with groups of instances, and OEM has been greatly enhanced to handle RAC but most customers will still run a two-node cluster with two Oracle instances on it and can do fine with the good, old features from the OPS days. But it still requires more skills and more time to manage RAC than not. Added complexity means additional skills and additional time. You can of course define your way out of it by calling it planned downtime or maintenance or service time instead of plain downtime. The end goal, though, is often to have the database (or rather the applications that depend on the database) available most of the time.
Skills Required
I have already touched on this in several places in the paper, so lets just repeat here that its not only RAC skills that are needed, but also (and probably most) cluster skills if your organisation is going RAC. Theres one external Oracle RAC class (three days) and one so-called DSI (Data Server Internals) class for internal Oracle consumption available out there. There are also RAC classes available from various external companies. And there are lots of other people out there that know about OPS and RAC. Listen carefully to the bitter, twisted old men whove worked with OPS. When RAC is pushed to the limit, you could still need to do the same things that were required with OPS.
Troubleshooting
Ah yes, troubleshooting. Ive seen many clusters that just froze for no apparent reason in my time. Its always possible to make the OS or Cluster software dump a trace/log file when it happens. The resulting trace/log file from the cluster will normally be the size of Texas, and only one or two people in the entire vendor organisation can truly understand them, you will be told. Then the files (often with sizes measured in GB) are shipped to the vendor and some months later they will report back that it wasnt possible to pinpoint the exact reason for the complete cluster freeze or crash, but that this parameter was probably a bit low and this parameter was probably a bit high. Thats what always happens. I have never really: never seen a vendor who could correctly diagnose and explain a hanging cluster or a cluster that kept crashing. As to Oracle trouble shooting Im not so worried. Oracle will either have a performance problem, which is easy to diagnose using the Wait Interface or youll get ora-600 errors that are fairly easy to diagnose, although youll need to spend the required 42 hours logging and maintaining an iTAR or SR or whatever the name is these days. In other words: Finding out whats wrong (if anything) in Oracle is much easier than finding out whats wrong with a cluster.
Conclusion
If you have a system that needs to be up and running a few seconds after a crash, you probably need RAC. If you cannot buy a big enough system to deliver the CPU power and or memory you crave, you probably need RAC. If you need to cover your behind politically in your organisation, you can choose to buy clusters, Oracle, RAC and what have you, and then you can safely say: Weve bought the most expensive equipment known to man. It cannot possibly be our fault if something goes wrong or the system goes down. Otherwise, you probably dont need RAC. Alternatives will usually be cheaper , easier to manage and quite sufficient. Now please prove me wrong.
Mogens Nrgaard (mno@miracleas.dk) was with Oracle Support in Denmark for 10 years (three as an RDBMS analyst, four as head of RDBMS Support and three as head of Premium Services). He is co-founder and technical director of Miracle A/S (www.miracleas.dk), which provides consulting, support and training on Oracle and SQL Server, in Maaloev, Denmark. First claim to fame: First manager within Oracle to demand that his team (about 40 people in Premium) used the YAPP performance diagnostics method created by Anjo Kolk. . Second claim to fame: The OakTable Network (www.oaktable.net) was named after his dining table where some of the better Oracle scientists will gather a couple of times each year. Mogens and his co-director Lasse (lch@miracleas.dk) will use the profits from Miracle A/S to start up a micro brewery that can stop Carlsberg from taking over the world. He believes Carlsberg is the Danish equivalent to the American Budweiser. If nothing else is available, though, hell drink both.