Notes on high availabilityHA has this mythical 100% uptime idea that business people love to brag about at the golf course. There are many reasons for trying to achieve this high standard, and they include
- Regulatory: Your industry requires that you have availability needs
- Law: Your business may need to be available for consumers because of local laws about your industry
- Competition: You need to be available for your consumers because if you're not, your direct competitor will be.
So now you're building or rebuilding your plans -- what tools will you need in your chest? The answer depends on your need for uptime. When I get to the "Moat" portion of this article, I'll go over some pitfalls to your planning. We will always need
- Faster than you think you need CPU
- To achieve less need for downtime while processing, or speed up the backup process.
- CPU will absorb the overhead that could be evident while capturing live transactions in very high availability situations.
- Backup devices
- Make 'em as fast as you can afford (within reason)!
- Remember, with Disk today you can create virtual tape drives, and backup things at the BUS speed -- or the speed of your mother board. The slowest thing is your media (disk drives, tape drives and DVD and CD-Rom storage).
- Faster then you think disk
- You'll appreciate this: Your disk farm may contain years of disk drives that may be slightly out of date. Rule of thumb: Because computers are running almost all of the time without rest, you'll get about three to five years life span on your disk. Keep in mind, with newer technology you're also storing things far faster. Mix that in with older technology, and you're still going the speed of your slower disk.
- Backup (e.g., BRMS)
- Home grown to assist in failing over, or recovery
- HA software
- Phones: Nobody can exist without a telephone where people can reach your business
- Environment: Where will you be placing your people? Is it livable under any circumstance? Great for a temporary location, but wouldn't want to live there? These are things that need to be planned for.
- People: Who will keep your company going? Who will get your company going?
GloatsSo what are we going be able to gloat about when done? Well, when we talk about HA we are really talking about business continuity -- and you need people for that.
Be proud of your accomplishments and tell everyone! Really, everyone needs to know where to turn for various things you'll need while the smoke may be billowing up around you. People need to know the "company hotline" and make it standard operating procedure whenever they need information about current events. All this needs is a voice mail account that can be forwarded to a cell phone if you're truly experiencing a disaster.
"Should I show up?" is a common problem following a disaster. Communication while the disaster is happening is impossible. So you need to have a good phone system, which includes alternate cell phones and satellite phones.
You are also going to need a non-nerdy person to type up detailed instructions on how to recover. Why? The person breaking open the "In case of emergency: break glass" box may not be a technical person -- it will be whoever is available or convenient.
MoatsThis is the portion that people don't like: What may break down or what to look out for. First off, the creative mind will win this war at times. You're going to need to think of the many things that may not bring your business down, but may dent your functionality. Stupid things like not having toner for a fax machine can stop an important portion of your communications. Many things can be prevented by maintenance that must be performed. These include
- Operating system fixes
- Up to date hardware firmware
- Disk free space
- UPS maintenance: How are the batteries doing? Is it replacement time? What about capacity -- have you planned for at LEAST 25% to 50% more load?
- Dual UPS?
- Generator: Great that you have one, but when was the last time you serviced and started it? How about put an electrical load onto it? What do you want to run on your generator and UPS? Prioritize your power to make it last longer.
- Servers and plugs: I remember a seminar when IBM was showing me a really redundant system with three power supplies, and backup fans. It was great until I asked "Hey, why is there only one plug?"
- Power: You need AMPS to run big servers. How many on one circuit? What size are the circuits?
- Fuel for the generator: You must plan on how you will re-fuel this crucial part of your infrastructure. I know a company that bought an old Army surplus fuel truck during the aftermath of a hurricane.
- Communications: How will your business get to and from the internet or phones?
- Air conditioners: General health and maintenance is necessary. Do you have a portable A/C hanging around just in case your primary A/C unit dies? Do you have alarms for humidity and temperature in your server room?
- Changing world environment: Are you in a flood zone? Maybe it's time to move?
The above list can help you prevent a disaster -- however your biggest enemy is when the environment is so bad you need to flee! Why? The police don't care if you have generator, UPS, and communications -- they will escort you out of your building because even they need to evacuate in some disasters. You may also have to flee due to extended times when your power will be out for weeks. Keeping a generator going that long requires maintenance -- you may need a second or third generator, so can you get one?
You may also want to have a "Disaster Kit" containing packing tape, rope, leather gloves, hammer, rubber mallet, screwdrivers, screws, mollies, extra wires, networking wires, phone wires, flashlight, list of important phone numbers, satellite phone, etc. Anything you think you'd need handy if you were hooking up your equipment potentially in the dark.
Your HA plans need to be thought of now because preparation and planning take time. Testing the plan regularly takes time, and updating the plans when new technologies or software is added to your environment is also necessary.
Disasters are a reality these days. Your job is to keep calm, and keep your company functioning. That's what you're, like, paid to do.
This was first published in May 2008