| | |
|
| |
| | |
|
|
| | |
|
File system replication
Breaking new ground in data availability for the enterprise
Introduction
Replicating data between two
storage systems (EDR) such as
snap server NAS from Adaptec, connected by a
network is a well-known technique for protecting against loss of digital data
due to disasters that incapacitate entire storage systems or data centers. In
its essence, data replication is simple. Identical copies of a set of data are
established on two separate storage systems with a network connection between
them. One of the copies is designated as the primary or master copy, and
is processed by client applications. The other is the secondary copy, or
replica. Each time a client or application modifies the contents of the
primary copy, the modification is transmitted to the storage system that hosts
the secondary copy, where it is applied, or replicated, keeping the contents of
primary and secondary data sets identical, or synchronized .
Typically, the storage system that
hosts the primary data copy is located in a production data center. The system
hosting the replica may be in the same data center, providing protection against
failures of the primary storage system, or at a remote disaster recovery site,
providing a basis for business continuity if a disaster incapacitates the entire
production data center.
Data can be
replicated in the form of virtual
storage device contents, file system contents,
or in the form of data manager-specific objects, such as database transactions
or redo logs. This paper contrasts the virtual device and file system forms of
replication, and describes the agámiFSR™ file system replication option for
agámi Information Server (AIS) systems in detail.
Part I: Data replication basics
If a failure incapacitates the storage system hosting theprimary
data copy, or if the data center experiences adisaster that makes it
inaccessible, the replica can beactivated to substitute for it, and the
enterprise can resume accessing and processing its critical digital information.
The growing importance of data replication
Enterprises are increasingly reliant on continuous access to
their digital information for success (and survival). As a consequence,
data replication is supplanting restoration of
offline tape backups as the preferred technique for recovering from large-scale
failures and disasters (offline tapes remain the preferred method for storing
long-term data archives, as well as the last line of defense for recovering from
a data corruption event such as a virus or software or procedural error).
Intuitively, the reasons for the growing popularity of
replication are easy to understand:
Recovery time.
Backup tapes must be located and restored
to disk storage devices before databases and applications can be recovered
and clients can reconnect. By contrast, a
replica instantly ready to restart when a
disaster occurs, so database and application recovery can begin immediately.
Recovery point. When backup tapes are restored, the “age” of the
recovered data (called the recovery point) is the difference between
the time of the disaster and the time at which the backup copy was made. For
example, if an enterprise backs up its critical data daily, the recovery
point of a restored copy may be as much as 24 hours out of date. For some
slow-changingapplications (e.g., library catalogs), this may be tolerable;
for most, however, it is not. By contrast, the recovery point for a replica
is zero (or nearly so)—the replica is at within a few updates of the state
of live data when a disaster occurs.
Instant recovery times and zero recovery points have been a
desirable goal since computers were first used to manipulate data. It is only
recently, however, that technology has made it possible for most enterprises to
approach these ideals affordably. In particular, developments in two underlying
technologies have made replication of online data a feasible option for the
average enterprise:
Networks: Within the data center and its immediate environs,
Ethernet provides very low-cost reliable interconnections at gigabit per
second transfer rates, with ten-gigabit rates available today on the
backbone and on the near horizon for edge connections. Readily available
low-cost dark fiber provides high bandwidth across longer distances.
High-performanceconnections between storage systems reduce the latency of
replicating data remotely to a point where, for many applications, the
impact is not discernable.
Disk drives: The evolution of disk drive capacity is one of the
most remarkable stories in computer technology. Today’s disk drives have a
thousand times more capacity than their predecessors of 25 years ago, while
occupying less than a half percent of the space. The cost of this capacity
has dropped accordingly—from tens of dollars per megabyte to less than a
dollar per gigabyte. From a replication standpoint, every new generation of
storage devices makes the cost of
replicating data close to the cost of
storing a single copy using the previous generation’s technology.
Thus, affordable base technology exists to update data at a
disaster recovery site as it changes in the production data center. It has
remained for storage system developers to create manageable, reliable,
affordable systems that exploit these base technologies to deliver the required
business benefit of rapid recovery of access to digital assets after large-scale
failures and disasters.
Fitting solutions to problems
When costs are weighed against benefits,
data replication is usually found to be an
appropriate solution to the problems of protecting critical digital data against
loss and restoring access to it after a disaster. When making such calculations,
however, one should consider the whole cost of the solution, not just that of
the storage system and network facilities required to implement it. Recovering
access to digital data after a disaster also requires:
Getting appropriately skilled administrative personnel to the recovery site
At
least a small interval of service outage as applications are restarted and
client connections are restored
In
most cases, eventual failback, or return of live data and service to
the primary data center These unavoidable conditions and costs make using an
online data replica to restore access to data after a failure
expensive and disruptive. It is usually best to treat failover
to a distant replica as a solution of last resort, and not as a mechanism for
recovering from faults that can be dealt with locally. For example, individual
disk failures and uncorrectable read errors can be dealt with locally using RAID
data reconstruction. Protection against LAN link and switch failures can be
provided by redundant connections of critical systems to independent physical
networks.
| |
| | |
|
|
|