Determining optimum availability is not an either/or proposition .
The economically appropriate level of availability varies among systems and industries and even across applications within a single enterprise.
The challenge is to strike the right, cost-effective balance. It may be acceptable to occasionally schedule a fiveminute shutdown during a slow period to switch over to a backup system when planned maintenance is required. Buying and running the backup hardware and software means incurring some extra costs; however, with the proliferation of eBusiness, globalization and other time pressures, organizations are increasingly unwilling to shut their systems down for an entire maintenance process that may last days. Furthermore, when it comes to losing just an hour of data and application access during peak business hours, very few companies are willing to accept any more than the smallest of risks.
The following is a summary of common hardware and software options available to you. First, here are a few hardware-centric paths you can take:
Backup and restore
The most basic level of data protection is to back up data so that it can be easily restored, if necessary. Backups are generally performed on a regular schedule, such as nightly. To thoroughly protect data integrity, it is generally recommended that the ongoing processing of data be stopped during the backup. Otherwise, if a transaction occurs wholly or partly during a backup process, the backup file may contain an incorrect view of the results of that transaction. However, it takes a long time to back up large databases. In a 24/7 environment, it may not be possible to stop systems long enough to back up even small databases, let alone large ones. Even in a “nine-to-five” business, it may not be possible to back up a particularly large database in a single overnight window.
Uninterruptible power supplies
Power outages are the most common cause of abrupt system failures. Thus, uninterruptible power supplies (UPSs) go a long way toward reducing downtime frequency. When configured properly, they can also help to reduce the duration of downtime. In the event of a primary power outage, the internal power source in a UPS can maintain system operation long enough to save main storage. This protects data and simplifies system startup when the power returns.
Some hardware is designed with fault tolerance built in. It comes with internal redundancy and the means to monitor each component’s operating status and seamlessly switch to a backup component when necessary. Inoperable components can typically be replaced without stopping the system.
There are several means of protecting data from disk failure. Redundant array of independent disks (RAID) spreads enough information across multiple disks to allow the disk subsystem controller to recalculate any missing information in case of a disk failure. RAID does not protect against the failure of other disk-related hardware, such as a controller, an I/O processor or a bus, however.
Disk mirroring, when properly configured, can eliminate single points of failure. Often used in combination with RAID, this approach requires that data be concurrently written to each unit in a set of identical disks, incurring minimal CPU overhead or an increase in system complexity. However, mirroring all data requires at least twice as many disks.
Non-clustered multiple systems
A multiple-systems approach that continuously maintains real-time replica systems offers an exceptionally high level of availability. Even the best planned backup strategy and most complete DASD (disk drive) mirroring scheme cannot eliminate all types of user processing interruptions. A multiple-system approach can. Multiplesystem availability solutions can be quickly implemented using commercially available software and services. With the appropriate software in place, the second system needn’t exactly duplicate the primary system nor be totally dedicated to providing backup protection. It can be used for other processing jobs, limited only by its capacity.
The functionality of clusters varies among the platforms on which they have been implemented. High availability and continuous operations functions are supported by most implementations. Clustering for availability is similar to the non-clustered, multiple-system approach described in the preceding section. The main difference is where the continuous availability functionality is performed. A non-clustered solution relies on third-party software and/or hardware to perform all of the functionality —replication, failure detection, switchovers and failovers. In a cluster, the operating system takes over some of these functions; this requires cluster-enabling modification of the application (through tools and services) to allow close integration with the operating system. In doing so, a portion of the continuous availability functionality is brought closer to the machine level and is, therefore, more responsive and flexible. The level of functionality assumed by the operating system depends on the platform. At a minimum, it typically provides some failure-detection capabilities and participates to some extent in any failover activities.
Server virtualization provides a number of benefits, some of which are similar to utilizing clustered systems. Many vendors and analysts also suggest that virtualization automatically provides DR and HA benefits. However, these benefits are often very limited, unless serious effort is made to integrate automated physical and virtual server monitoring and failover processes. For example, many virtual servers are often hosted in a single physical server. While failover or recovery between “co-located” virtual servers may provide some protection against unexpected application downtime, it does nothing to protect against failure of the physical host server or its shared communications, memory or power resources.
Alternate communication paths
Core business functions often depend on a variety of systems and sites. For example, a company may take orders at one facility and fulfill them at a remote warehouse, with order data transmitted between the two over communication lines. If the line is disrupted, orders will have to be sent to the warehouse manually or, worse, they cannot be transmitted at all. Most communication failures are outside of the control of the IT organization. The only way to combat these exposures is to maintain alternate communication paths.
There are also several software-based solutions available that will address your downtime challenges.
Disk-based data recovery solution
With the falling cost of disk storage today, disk offers an increasingly attractive alternative or supplement to tape as a backup and recovery medium. Combined with the reasonably low cost of bandwidth, backup data can be sent anywhere, minimizing any effects from unplanned downtime.
Disk-based strategies require more than once-a-night bandwidth. Transmitting a large database over network lines in burst mode would consume all of the available bandwidth for a considerable time, leaving none for regular operations. Likewise, the production server’s data capture mechanism would consume the entire disk I/O capacity on that system. A better approach captures data changes on the primary system and transmits and applies them to the remote backup storage continuously or in small batches when most efficient. Depending on the software and hardware used, the backup need not be a readily useable system. This means that the remote site stores a copy of the data, without fully configuring the system and applications. All production hardware need not be duplicated at the backup site. With the low probability of all production severs becoming unavailable at the same time, the remote site can be equipped to handle only the load of a few production servers. Should an unplanned event strike a production system, the necessary equipment at the backup site can then be configured to take over.
Even better is continuous data protection (CDP), an any-point-in-time recovery software solution that is gaining increasing adoption. CDP is like Tivo® for data, enabling you to recover or restore data to a point before accidental deletion or corruption occurred.
High availability solutions
A true high availability software solution provides near zero data loss and near 100 percent business uptime. It offers comprehensive information protection capabilities, an array of automated management functions and the autonomics and redundancies to assure the integrity of a switchover or failover.
High availability means everything is redundant. A second system, preferably in a remote location, provides optimum protection and availability regardless of planned or unplanned events. Sophisticated software provides real-time or near real-time replication of any changes to production business data, system data, applications and other system objects to keep the backup system perfectly synchronized and ready to take over operations at a moment’s notice in case of disaster or other business interruption.
Some tasks can be removed from the production system entirely, as well. For example, since the secondary system contains an up-to-date replica of the production data, it can serve as a platform for creating backup tapes without affecting production operations. Having a redundant system enables time-consuming reports to be run without interfering in normal processing.
Automatic healing and failover capabilities continuously monitor the health of the production system and automatically initiate a failover whenever necessary. When the secondary system assumes production operations, the software captures any data changes made on that system so that the software can automatically resynchronize the primary system when it comes back online.
A high availability solution, using a second server in a remote location, provides instant disaster recovery. If a disaster occurs, the solution simply switches users to the backup location. With automated failover capability and IP impersonation, the switch occurs transparently to users.
When maintenance must be done on the primary system, an administrator can initiate a switchover to the backup system instantly. The high availability software will capture any changes and replicate them to the production system when it returns to service.
In the end, the best information availability solution will meet the availability requirements of each system in your organization and achieve optimum RTO and RPO, provide scalability for future growth and offer the most value in terms of total cost of ownership.