|
For mission-critical applications, such as e-commerce applications, you might
not be able afford the downtime involved in restoring a server. In that case,
consider using replicated servers or clustering as a supplement to backups. This
adds redundancy, so if one of your servers goes down, users will still be able
to access the same application and data on the remaining servers. It also allows
you to restore the downed server, without interrupting service.
Be sure to use time synchronization on your servers. Network Time Protocol (NTP)
is a well-established standard for providing time synchronization services with
implementations on most platforms, including Windows, Unix, and many network
appliances, such as routers and switches. A complete explanation of NTP is
beyond the scope of this article, but in general, you set up an NTP server on
each of your LANs (see Resources). Each of these servers receives its time
information from another parent NTP server on your WAN, or from a public server
on the Internet. The NTP clients, in this case your application servers, request
time information updates periodically from the local NTP server. Without time
synchronization, it's difficult to tell which data is the most current. Also,
accurate time information in log files is critical when you attempt to determine
the sequence of events leading to a failure. This, in turn, helps you prevent
the failure from reoccurring.
Review the recommended backup and recovery procedures for each of your software
components. Most commercially available server applications include detailed
white papers or sections in their manuals on backup and recovery options and
procedures. Be sure to examine all these resources when planning and
implementing your own strategy.
Create a Recovery Strategy
Once you have a backup strategy, you need a plan should you ever have to use it.
When you create your recovery strategy, start with a plan that includes
everything required to restore the application completely, from the ground up,
including the network, hardware, and software. This can be daunting for
applications that have been in place for some time. For this reason, you might
want to consider consulting external parties that specialize in backup and
recovery to assist you. Several companies provide a range of support, from
consulting to complete hardware and software solutions and services for disaster
recovery. Determine which spare hardware components can be kept on site so
they're readily available. Ideally you should have a replacement for every
component, but this is rarely affordable. Evaluate each spare part based on
cost, the likelihood of failure, how long it will take to obtain a replacement,
and how important that part is to running of your application.
Recovery involves more than just restoring from backups. Have "what if" sessions
to determine possible types of failures. Then, create plans with all the
necessary steps to recover from each particular failure. Obviously, you won't be
able to think of every possible failure scenario, but consider a full range of
failures, from a hard drive failure to a flood or a fire. If a major disaster
was to strike, plan how you would replace all the existing physical facilities,
networks, hardware, and software. You should also consider building
relationships, or striking agreements, with hardware and software vendors and
network service providers so you can replace everything that might be lost
quickly.
Many of the choices that go into a backup and recovery plan are complex business
decisions. For this reason, most companies can benefit from performing detailed
risk and cost-benefit analyses. As part of this process, you might want to
consider outsourcing all, or some, of your operations to meet your particular
backup and recovery needs more cost effectively. Service providers have
redundancy and detailed backup and recovery procedures already in place
typically, and can provide a higher level of service for less than what it costs
if you do it on your own. Also, as part of your risk analysis, prioritize your
business processes and applications. This helps determine which processes and
applications will be recovered first.
If you haven't done so already, you'll need to get buy-in from upper management
and will need to obtain funding. This can be difficult, as the costs of creating
and implementing such a plan is often expensive. However, if you make your case
using a few "what if" disaster scenarios, the value of such a plan usually
becomes apparent rapidly. For mission-critical applications, consider using
third parties to advise you on your backup and recovery plan, and/or provide a
complete solution for you in case of a disaster.
Once your backup and recovery plans are complete, print them out and store
duplicate copies off site. If your backup and recovery plans exist only in an
electronic format, you might not be able to get to them if your IT
infrastructure is down.
Even though you've completed your plan, you're not finished yet. You must test
and refine your plan to ensure it will work when you need it. The testing will
probably take several iterations to work out all the issues you discover during
this time.
Start by creating a test environment to simulate the production environment. The
expense involved with creating this environment can be a hard sell with the bean
counters. However, you can justify the cost by using the test environment for
other purposes, such as quality assurance testing, testing updates and patches,
and possibly for a pilot if you're deploying a new application.
When you create your test environment, make it match your production environment
as closely as you can. Be sure to use backup copies of existing data, so you can
discover as many problems as possible during your testing. The test environment
should be isolated, enabling you to simulate various failures without impacting
your production network.
When you test your new backup and recovery plans, you're sure to discover
several issues you can use to refine your plan. Implement any changes, then
re-test your plans. Keep in mind that whenever you make changes, patches,
updates, or add new features to your application, you'll need to update your
plan, then test it again.
In addition to testing your backups during the initial setup of your backup and
recovery plan, you should test your backups from the production environment
periodically. The best way is to attempt to restore the production backups to
your test environment. This will give you confidence that your backups are
complete, and that they can be used to restore your data if needed. There's
nothing worse than finding out your backups are incomplete or won't restore your
system when you need them most.
You might also want to institute scheduled or surprise recovery drills. This
ensures your staff is familiar with all the steps and procedures in your
recovery plan, and further validates it. Typically, you won't want to simulate
failures on your production network, so these drills are another way to use your
test environment.
It's important that you have some sort of monitoring in place to ensure
automated backups are being performed successfully. There are many software
backup solutions that do this, or you can implement your own if the
off-the-shelf software does not meet your needs (see Table 2).
In the unfortunate event you have to put your backup and recovery plan into
action, make the most of it. Perform a complete review of how the staff and the
plan performed, so that you can make adjustments and further refine the details.
This way, if it happens again, you'll be better prepared.
Factor in .NET
Backup and recovery plans don't change significantly in the .NET world, but the
.NET Framework has a few facets you should pay particular attention to. .NET
applications are often distributed—such as when you use XML Web services—and
this can be both an advantage and a cause for concern.
On one hand, it's often easier to duplicate a distributed application's
components, so they can serve as a backup in case the primary component fails.
Also, the different components can reside in different physical locations,
offering some protection against a single point of failure. For example, suppose
you have an e-commerce site that uses a Web service to provide current product
availability. If you have two separate servers, your application can fail-over
from one to the other automatically.
On the other hand, with a distributed application, you might have less control
over how backups and recoveries are performed. If in the previous example, the
supplier provides the product availability Web service, you're subject to that
third party's backup and recovery procedures. Be sure to consider reliance on
third parties in your own plan, along with what their plans entail. This way you
can add responsive measures to your own plan to compensate for any shortcomings.
For example, you might choose to copy the supplier data locally on a regular
basis. Then, if there was an outage, you wouldn't have the most recent supplier
data available, but it would be better than nothing. Also, consider establishing
Service Level Agreements (SLAs) with the third parties. This gives you a more
concrete sense of what the third party is able to live up to, and might give you
some recourse if it fails to meet the agreement. .NET applications rely heavily
on the network infrastructure, so pay special attention to your internal
network's redundancy and recoverability, and to your Internet connectivity if
your application uses it.
Finally, you might want to get creative and use Web services to assist in
performing backups across the network for smaller data sets. For example, you
could create a Web service for your company that allows your employees to back
up files from their laptops easily, either connected to the local LAN or through
the Internet. You could implement Web services to help you monitor your backup
processes. By writing Web services that have a common interface for two or more
different backup applications, you'll be able to aggregate views for several
different backup logs into a single integrated view. Or, using Web services and
the ubiquity of the Internet, you could store data on a server that in the past
might have had to be stored on a client machine. For example, suppose you have a
sales force in the field that currently gives their sales managers feedback on
potential clients by e-mailing the managers a spreadsheet. You could write a Web
service to capture this information. The service stores the information
centrally, and, thus, makes it much easier to back up.
These examples should provide you with a starting place for implementing or
refining your own backup and recovery plan. The amount of effort and money your
business invests in a backup and recovery plan is a decision that must be based
on your particular circumstances. But remember: In business, and especially in
IT, it always pays to be prepared for the worst.
Click here to view full article.
Todd Walker holds MCSE and MSCD certifications and is CTO for Hunter Stone Inc. in
Columbia, S.C. Hunter Stone is a Microsoft Certified Partner, providing custom
software application development for medium- to large-sized companies based on
the Microsoft .NET platform. Todd can be reached through e-mail at twalker@HunterStone.com.
|