|
As a manager responsible for your companys computer
system, whats the worst possible nightmare situation that could happen?
Without a doubt, that would have to be a computer disaster
where irretrievable data loss results from an event that was in your control!
Having made this statement, I would like to further say that it is the responsibility of
every computer manager to research and implement a backup strategy that PREVENTS
DATA LOSS and ENSURES THE ABILITY TO RECOVER AFTER A DISASTER!
What could be a worse scenario than trying to explain to your
boss why inadequate safeguards were in place after a disaster?
The first step in disaster prevention is to define your requirements. There are two basic
designs for backup: centralized and decentralized. Both methods have advantages and
disadvantages and effect capacity planning. Budget and personnel constraints frequently
push a decision towards centralized versus decentralized backup simply because an
organization does not have enough qualified system operators to administer backups. On the
other side of the coin, network constraints can push a decision towards decentralized
backups due to lack of sufficient bandwidth to perform network backups. Designing and
implementing the right hardware and software solution is the key to a successful backup
and disaster recovery strategy.
Buying appropriate hardware and software for backup is important but not always a critical
element in a backup strategy. The choices of hardware are often limited by budget. My
advice is to always buy a bigger, faster tape drive than what you think you need now.
Its amazing how fast you can outgrow a tape drive. A good example of this is the
difference in capacity and speed between DDS-2 and DDS-3 4MM technology. DDS-2 drives
offer 4GB native capacity with up to 510KB per second operation. The newer DDS-3 drives
offer 12GB native capacity with 1.2MB per second operation. With smart shopping, the cost
differential is less than $300 between the drives. Why pay $1,200 for lesser
capacity and speed when $1,500 buys you top of the line? Its a false economy to buy
a DDS-2 drive with 9GB hard drives getting cheaper by the week.
Its probably an accurate statement to say that just about every backup application
available will do an adequate job for backup; however, not every backup application is
equal when comparing disaster recovery operation. Too frequently, backup shoppers pay more
attention to the backup process instead of focusing on how easy and how fast it is to
restore a down system. Backup happens every night. No one really looks at it when
things are going right its only when a critical restore is required that
inadequacies become glaringly obvious! My advice is to look past the interface and
determine if the backup program offers easy to use, flexible operation with all the
functions and features necessary. In todays market, backup software must offer
both traditional file-by-file backup and the latest "image" technology. Only by
combining the two technologies does an organization get the best of both worlds. A
file-by-file backup goes through the operating system with guaranteed file integrity. An
image backup bypasses the operating system and does a bit level backup of a hard drive.
The latter is perfect for quickly restoring a failed operating system hard drive, upgrading
a hard drive to a larger size and cloning new systems, but is ill advised for backing up key
files like relational databases.
Reducing or eliminating a single point of failure is a highly desirable object in
designing and implementing a disaster prevention strategy. This can mean buying redundant
hardware (e.g. two tape drives and two SCSI buses) and software that is capable of
switching over a failed backup from one device to another. While many organizations focus
their budget on buying RAID to eliminate disk drive failures, not many spend much money at
all to protect their backups from failing. Another form of backup redundancy is to perform
a full backup to tape and an incremental backup to online disk. This has the added
protection of essentially performing an off-site backup by "pushing" the
incremental backup to a secure network disk drive either in a fire proof area of the
building or to an adjacent building in a campus environment. Fault tolerance also includes
tape duplication for organizations with two tape drives. Tape duplication can be either a
hardware or software option.
Another form of fault tolerance that is often overlooked by
companies looking to purchase software is the method by which the backup application
restores files from media. Typically, a quality third-party backup program creates a
catalog of the files it backs up, which is updated after each backup. What is less
appreciated is that some backup programs are literally useless if their catalog is
corrupted or destroyed. What good is an off-site tape if you cant restore anything
from it because the catalog for the tape was destroyed in a natural disaster or was
sabotaged by an unhappy employee? Granted, a utility may be provided to rebuild a catalog
from a tape, but this potentially can take hours to perform before a restore can be
initiated. How can this be avoided? Make sure that the backup application writes both a
catalog to online disk for fast restores and to the output media for allowing restores
which are almost as fast as from online disk. Additionally, having a catalog on the media
allows a tape to be sent to another site for restoration without the time consuming and
tedious wait which may be required to first build a catalog from the tape.
Lastly, lets discuss tapes. How many backups have
failed because the operator simply forgot to put a tape in the drive for nightly backup? A
good software application should be able to test for this condition some number of minutes
before a nightly backup is scheduled to occur and, if found to be a problem, send an email
to the backup administrator on call. If the software tape status check finds a tape is in
the drive and starts the backup, does it check the tape to make sure that it is not
inadvertently overwriting last nights backup because the operator either forgot to
exchange the tape or replaced it with an "unexpired" tape? This creates a
dilemma. A smart backup program should not overwrite an unexpired tape and should not fail
to perform the backup. (As discussed previously, if you are using a program that writes to
both tape and disk, this would not be a problem.) In such a case, the backup program
should space to the end of the "wrong" tape and attempt an automatic append
operation. A typical full backup clears the tape before starting a new backup by starting
from the first block of the tape while an append spaces to the end of the last backup
before starting its write operation.
No disaster recovery plan can exclude off-site media storage. How can an organization
fully protect itself against a natural or man-made disaster without having an off-site
backup available for recreating their computer environment from scratch? Off-site storage
can be a simple daily exchange of tapes -- last nights backup tape for the tape from the night before. At one time, a tape exchange could have been a hundred pounds of mylar tape on
huge open reels. Now its more like a small cassette that will fit in your shirt
pocket. As long as that tape is somewhere other than on the immediate premise, the
likelihood of surviving a disaster is greatly enhanced. Every major city in the U.S. has one
or more companies that make a living by picking up tapes from businesses like yours and
storing them in a safe, secure, media-conditioned environment and then returning them for
reuse after some number of days. If off-site media storage is not currently in your backup
strategy, just ask yourself how your business would get back to work after a fire guts
your building and those "secure" tapes in the fireproof safe were rendered
unusable after the natural gas line ruptured and exploded?
Analyzing downtime and data loss versus backup protection costs is a key management
function. I believe that many upper level managers find funding a worthy backup plan too
expensive until after suffering through a disaster. An interesting statistic is that the
majority of burglar alarms are installed after the first burglary. Most would agree that
it would have been far cheaper to have installed a prevention and detection system before
rather than after. I have a business friend who had an experience where a thief broke into
his business and stole all of his disk drives right off the computer. The point is -- dont put off designing and implementing a backup and disaster recovery plan until
after the fact. Like death and taxes, every business will suffer a computer disaster and
loss sooner or later. Only by adequately planning against the worst case scenario can you
protect your business against the extreme case of unrecoverable data loss.
Written by Morgan Edwards, President and CEO of UltraBac Software.
|