A Primer On Availability And Disaster Recovery Options In Virtualized Environments

In a world where the spend for technology seems virtually limitless, server virtualization has proven itself as a good way to combat server sprawl and save money. According to the results of the annual Server Virtualization Life Cycle Report published by CDW, virtualization has already found its way into a large number of IT departments. Indeed, no fewer than 90 percent of respondents to this survey said they have implemented server virtualization at some level.

Virtualization’s rapid proliferation, fueled by the promise of savings on hardware, environmental costs, management and administration, is understandable and is expected to continue, says Gartner Group. The results of its 2010 CIO Survey indicated that virtualization is the top priority for more than 1,500 global CIOs who weighed in.

While many of the benefits of virtualization appear undeniable, one serious concern for those who manage information systems has to do with the simple age-old notion of placing all of one’s eggs in one consolidated basket. Also worrisome is the vulnerability associated with an additional layer of technology— specifically the hypervisor—that is instrumental in enabling virtualization.

The reality is, every organization has to deal with unwanted downtime, whether the cause can be traced back to planned maintenance or—at the opposite end of the continuum—some unexpected catastrophic event. To that end, the Small Business Administration divides businesses into two categories: those who have experienced a disaster and those who are about to experience one. Furthermore, a study by CIO.com revealed that it costs 42 percent of companies $1,000 per hour of downtime, 26 percent of companies $10,000 per hour of downtime—with upper ranges at more than $50,000 per hour.

Understand your exposure
For companies trying to figure out the best way to protect their virtual machines (VMs), here are a few points to consider:

First, in order to avoid spending too much time and money on a solution that far exceeds what might be a relatively simple requirement, figure out your cost of downtime. This process can help you understand how much data and time you can afford to lose if your VMs are interrupted or if they fail altogether.

To figure out how much you have to lose, think about how much money the company would lose if all the transaction data and e-mails from the last 12 to 24 hours were lost, if there are compliance risks with not being able to produce data, and what it would cost to have employees recreate the last 12 to 24 hours of data. Or, if you want to know exactly what an hour of downtime will cost you, you can figure it out by using the following formula:

Cost Per Occurrence = (TO + TD) x (HR + LR)

TO = Length of Outage
TD = Time Delta (the length of time since your last backup)
HR = Hourly Rate of Personnel (monthly expense per department divided by the total number of work hours)
LR = Lost Revenue per Hour (applies if the department generates profit—a good rule is to look at profitability over three months and divide by the number of work hours)

Second, before deciding how to protect your data, it is important to consider what the data is being protected from. Four reasons why companies often protect their data are listed as follows:

Loss of a single resource – In this scenario, a single important resource fails or is interrupted. For example, losing a virtual server that end users depend on for product ordering would cripple a business that depends on electronic procurement. Likewise, many businesses would be seriously affected by the loss of one of their primary e-mail servers. Planning for this case usually means providing both backup and availability for the virtual server.
Loss of user data files – This unfortunately common scenario involves the accidental or intentional loss of important data files. The most common solution is to restore the lost data from a backup, but this can involve going back to a previous snapshot of the server— often with data loss.
Planned outages for maintenance or migration – The goal of planned maintenance or migrations on VMs is usually to restore, repair or patch service. Migrations usually mean users won’t have access to applications and data on VMs while the migration is in progress and is tested. If you can tolerate downtime and your IT staff doesn’t mind putting in a night or weekend, the most basic migration software will do. If you require availability and must have users remain online during a migration, more advanced software will allow you to perform migrations with no service interruption.
Loss of an entire facility – In this scenario, entire facilities and all of their resources are unavailable as a result of natural disasters, extended power outages, failure of the facility’s environmental conditioning systems, fire, flood or any other disaster or outage that takes out power. In this situation, it’s best to resume operations at another physical site. The amount of downtime you can tolerate will determine if you require just backup or backup and availability.

Few companies can tolerate days of downtime, so they feel good about having an availability and disaster recovery plan in place; however, they may not feel as great when it comes to sorting through the options. The good news is that if you’ve done your downtime homework, it’s not so bad.

Option 1: Tape backup and recovery
Tape backup can provide for the longterm archival needs of virtual servers—it’s portable, fairly secure and the cost per megabyte is low. Tape backup is right for you if you can tolerate a day or more of downtime while you rebuild a whole VM or a whole server, if you have staff that won’t forget to switch tapes, if you aren’t regulated and if you don’t have a lot to lose in a day’s worth of data.

Option 2: Replication-based backup and recovery
Backup software installs on your production and backup virtual servers and replicates data anywhere from once-a-day to “snapshots” or point-in-time recovery to continuous, real-time options. If your production server fails, you’ve got a copy of your data (and sometimes applications) waiting on you. If you can make a home for your backup server outside of your zip code, you’ll avoid building and local disasters. If you can get it out of your region, you’re even better protected. When evaluating products, look for hardware- independent capabilities—that way, if your production server is actually destroyed, you can recover to a different make and model and won’t have to go looking for an exact copy of the old server. Software-based backup is a great option if you can tolerate a few hours of downtime, don’t want to constantly manage the tape backup process (or can’t, as in the case of branch offices) and can’t afford to lose too much data.

Option 3: Add availability
At the apex of these options is softwarebased backup and recovery that includes continuous access to data and applications on your VMs in any situation. It’s usually not much more expensive than simple backup and recovery software, but the right product provides a whole other level of comfort if you can’t afford to lose data or time. Most availability options don’t offer a whole lot of granularity, but there are some that offer immediate failover and recovery. A comprehensive product in this category offers everything backup and recovery software does, except if your virtual server melts down, the backup server steps in immediately with a current copy of your data and applications. With quality software, the switch happens so seamlessly that your end users won’t notice anything out of the ordinary in the server room.

Option 4: The cloud
The newest option is backup and recovery in the cloud. Simply by installing cloud software on your VMs and renting a chunk of space from a cloud provider, you can have a secure virtual data center for a fraction of the cost of a physical data center. If you’re choosing cloud recovery, it’s probably a good idea to pick a software and storage combo that allows you to recover, or start up applications and data, from the cloud.

If you’re running VMs, your eggs are all in one basket, so you need to have a cogent backup and recovery plan. Once you figure out how much downtime you can tolerate, you can set a realistic budget and pick the right solution for putting that plan in place.

Related Articles

A Primer On Common Availability Solutions

Disaster Recovery

Use Resources Wisely