OakTable
OakTable
by Mogens Nørgaard, Miracle A/S
I would like to introduce a few (hopefully new) acronyms to you: BAARF, PIP, YODA, CTU and LOBN. If you ever wondered if military service was harmful or not, look at my use of acronyms. BAARF declaration, please go to www.baarf.com where you can also see the logo James created. The logo is also proudly displayed on http://www.miracleas.dk and http://www.oaktable.net . You’re most welcome to use it anywhere you please, of course. So we’re not discussing it anymore. Enough is enough. You can either be a member of the BAARF Party (it’s free as in free, not three) or not. Send me an email if you want to become a member. We will in time have splendid BAARF Party conventions for the members. So James and I had a beer the night before the conference, and for whatever reason the discussion turned to RAID-5. Again. We didn’t really have a discussion about the technicalities, because there’s nothing to discuss, really. It was rather about why we’re still discussing it? We’ve been arguing the same points, fighting the same battles, and drinking the same beers in frustration afterwards, for 15 years or more. Suddenly, back at the hotel, James writes down BARF. “What’s that?” I asked. Turns out it stands for Battle Against Raid Five. Then we had a cognac and decided two things: It should have an extra A for Any. And F should not only stand for Five, but also Four and … eh … Free, since both RAID-5, 4 and 3 have the parity problem. We haven’t worked out what to do about the dreadful RAID-6 which is still around in a few places. So we had another cognac and founded the BAARF Party. And BAARF now simply stands for Battle Against Any Raid F. We announced the formation of the party at the SIG on the following day. For the So why are we not prepared to discuss it anymore? Because everything that can be said has been said. No matter what, the laws of nature are still in place. The definitions of RAID are still in place. A stable or robust system has run for A Long Time without problems and has not been touched at all in its technology stack. I asked for help, and we had a show of hands, in Birmingham, and most of the So we are setting up a musical called ‘BAARF:The Musical’, which will have its World Premiere at the Database Forum at Lalandia in October. Gaja himself will star in the musical, which will also feature the shady organisation known as FEVER – Five Evil Vendor’s Eternal Rotherhood – as well as the OakTable Choir. The musical will be performed right after Dave Ensor’s speech at the Gala Dinner. It should rock! Our good friend Gaja Vahatneyhatneyhatney (or something to that effect), who’s a director in Oracle Enterprise Manager Development – and the author of the rather good book ‘Oracle Performance 101’ – once wrote something favourably about RAID-3. He shouldn’t have done that. Life running a database in a production environment should be really, really boring. It’s so boring you wouldn’t believe it when we talk about mainframe environments. Then why isn’t it boring when we talk Oracle on Unix or Windows? Why do we have exciting jobs? Because the systems are not stable. That’s fine if you’re a DBA or SysAdmin, because it means work (as a director of a company that sells database services I’m not exactly complaining either). It’s less fine if you’re a company that just needs their database to run and run and run. So I introduced my own definition of a stable or robust system at the UNIX SIG in Birmingham. But apart from that, the LOBN is common, and for all sorts of reasons we get our surprises every time we upgrade or patch a version.
LOBN and Stable systems
The Law Of Bigger Numbers (LOBN) is everywhere, and certainly in IT: If the number is bigger it must be better. Version 8.1.7.4 must be better than 7.3.4.2. 9.2.0.1.0 must be better than 9.2.0.3.0. 64bit must be faster than 32bit, right? Wrong. We do have some special numbering schemes to be aware of in the Oracle world. With 8i we learned that 1=5, 2=6, and 3=7 (8iR1 = 8.1.5, R2 = 8.1.6, R3 = 8.1.7). With 9i we learned that 1=0 and 2=2, so I guess it’s getting easier. On the other hand, the 9.0 release, aka 9iR1, is also often called nine-one when talking about the 9.0.1 release.
BAARF
On June 5th UNIX SIG had arranged a meeting in Birmingham. Among a string of excellent speakers with some very good presentations, they had also invited me to provide some contrast. Oh, and James Morle was invited to do an update on his excellent Sane SAN paper.
Parity Is Pain (PIP).
With any RAID-F the parity stuff means a lot of extra IO on any write. And don’t ever buy disk systems based on cache, please.
24
Oracle Scene Issue 15 Autumn 2003 / The UK Oracle User Group Journal /
92 people present voted for ALT to be defined as 42 weeks rather than 42 days, so that’s part of my definition now.
order to at least make it less unpredictable to upgrade or patch your system.
system where you can get the timing information out). ODA stands for Overview, Diagnosis, Analysis and refer to the three levels of YAPP measurements you can do: On the system level (Overview), session/job level (Diagnosis) and SQL level (Analysis). It came up as an idea when Martin Berg from Oracle Denmark Consulting and I had a chat about the uselessness of system-wide data. It is a tough one to swallow for most of us old-timers in the database business, but it’s necessary: You can never (as in 99.5% never) say anything meaningful about the reasons for a performance problem by looking at system-level data such as v$system_event, v$sysstat, bstat/estat, StatsPack, etc. If two experts are looking at the same data and arriving at two very different conclusions I suggest you’re not measuring the correct things, or you’re not measuring at the correct level. It can be witnessed when N economists arrive at N+1 conclusions based on the same numbers from the National Office of Trusted Statistical Observations (NOTSO) or whatever it’s called in various countries. It can also be witnessed when two experts (DBA’s, consultants, whatever) arrive at very different conclusions based on the same StatsPack or OS data.
You Patch The Database
As soon as you touch anything – really: anything – in the technology stack you cannot call it a stable or robust system any longer. You’ll have to wait ALT again in order to be sure of that. An upgrade from 9.2.0.1.0 to 9.2.0.3.0 is a change in the technology stack. So is a change to a script that changes the way you add partitions to a partitioned table. Or a new version of the JDBC driver. These are real examples, by the way, of systems going South after a smallish change. One of the examples mentioned above meant that a 700-user system with 12 CPU’s and 12 GB of RAM and what have you – ran out of memory in 30 minutes or less and had to be re-started. That went on for a few weeks. Rdb and VMS can do it. DB2 and MVS So you never know what will hit you and when your system will suddenly become unstable. Oracle does regression tests on anything that changes the first three digits in the version number, but not on the fourth and fifth digit. And they don’t do inter-operability testing (testing between products), for instance on iAS. So it’s up to you to test it. Have you noticed that it’s not possible anymore to find a system you can compare your own system with? 10 years ago that was fairly easy. Now there are so many components and options available that the number of combinations in itself rules out the chance that any two installations are just sort of similar. On top of that, there’s no way you can test whether an upgrade or patch will affect your system’s behaviour or not. Sure, you can test a few applications against it, but that’s all testing of functionality, not of performance or stability/availability. Only time (ALT) can show that. In the next column (the fourth) I’ll write about some of the things one can do in Please prove me wrong – I’d really like to know how it’s done. That’s why most shops will define their way out of it in contracts, making room for patching and upgrades. But what about emergency patches like security patches or patches needed after putting on a new version of an important application that wrecks the system? Without some sort of online patch or rolling upgrade capability I don’t see how one can ever talk about true 25/8/370 or 24/7/365 Oracle based systems. on mainframes can do it. Apparently, SQL Server can now do some clever stuff with applying patch stuff on a system with two replicated databases or something – I haven’t looked into it yet. Yes, you can use standby databases, Data Guard, symmetric replication, etc. But they all require that you run exactly the same version – down to the patch level. So now you have two databases or more that are down due to patching or upgrading. Since Oracle has the single database philosophy your system is down when applying patches or doing upgrades. Obvious, isn’t it? You don’t patch the instance, you patch the database. Oracle doesn’t have online patch capabilities, nor rolling upgrades. So you need to shutdown Oracle in order to apply a patch or upgrade. Running RAC, for instance, gives you multiple instances, but still only one database.
Cary Millsap’s splendid paper on Oracle operational timings is really an eye-opener on this topic – find it on www.hotsos.com or write to me.
About the Author
Mogens Nørgaard was with Oracle Support in Denmark for 10 years (three as an RDBMS analyst, four as head of RDBMS Support and three as head of Premium Services). He is co-founder and technical director of Miracle A/S, which provides consulting, support and training on Oracle and SQL Server, in Maaloev, Denmark. He can be contacted at mno@miracleas.dk 25
YODA
YODA (thanks to Jonathan Lewis for this suggestion) was originally the YAPP ODA model where YAPP is Anjo Kolk’s good, old method for performance measurements in Oracle based systems (or any other
/ The UK Oracle User Group Journal / Issue 15 Autumn 2003 Oracle Scene