Backup vs Archive: 3 Key Differences Between Backup and Archive
It is commonplace to hear the terms backup and archive mentioned together which is not surprising since both technologies support primary data storage. However, the commonalities end there. I often encounter users who imply that archive is analogous to backup. Simply put, backup and archive are not the same. In this post
we will look at key differences between the two technologies.
According to SNIA's dictionary, the terms are defined as follows:
Backup: A collection of data stored on (usually removable) non-volatile
storage media for purposes of recovery in case the original copy of data is lost
or becomes inaccessible; also called a backup copy.
Archive: A collection of data objects, perhaps with associated metadata, in a
storage system whose primary purpose is the long-term preservation and retention
of that data.
SNIA states that backup and archive serve different purposes (recovery vs long-term preservation and retention) and so let’s review three key differences between the two solutions.
The Data
Backup – When backing up your data, you are protecting both active and
inactive information which encompasses all of your production data. As part of
the process, you are copying your vital information to a backup target such as
disk or tape. It is critical to recognize that a backup is a copy of production
information and the actual data still resides on the production storage systems.
Thus, if your backup system suffers a catastrophic data loss, your operations
could still continue normally since your production data would not be impacted;
however, you would be operating at an elevated risk.
Archive – Archive solutions solve a different problem. These technologies are
typically used to maintain older or inactive data for extended periods of time.
Archive systems typically move older or inactive information off of primary
storage to dedicated systems which are optimized for low cost long-term storage.
A key differentiator from backup is that the data stored in an archive is actual
production data and hence a loss of an archive system will result in permanent
loss of production information.
Access
Backup – Backup applications have historically been optimized for large scale
recoveries. Backup data is written in large blocks to dedicated hardware like
tape libraries or deduplication appliances. This format is optimized for
accelerated access to large volumes of information. Backup systems are often
configured to protect not just individual data objects, but also application and
OS files. You can restore objects of all sizes with a backup system, but the
process is optimized for larger scale recoveries so accessing a single file
often takes about the same amount of work as recovering an entire server. In
short, a backup application is the right tool to use if you want to recover an
application or a complete system.
Archive – Archives are designed with very different access profiles. These
systems typically store individual data objects such as files, databases or
email messages and usually also capture metadata associated with each item. The
result is that an archive can provide immediate granular access to stored
information and so accessing an individual file or email is typically very easy
in an archive system. (The metadata component can even include full content
search which can further simplify access.) However, unlike backup systems,
archives cannot provide full server or volume level recoveries since they
typically only contain a subset if enterprise data.
Disaster Recovery
Backup – Disaster Recovery (DR) is a core component of backup and most IT practitioners consider these two processes closely related. Typically, administrators run a backup job to protect their data and then another process to get their information offsite for DR purposes. (The offsite process typically includes either a copy to tape or a replication of backup data). These processes are mature and are typically automatically incorporated as a single streamlined protection process.
Archive – The process of maintaining archive system DR can be complex and
costly. Many customers rely on replication functionality embedded inside the
archive platform for DR. The challenge is that most replication implementations
are proprietary and so organizations are required to purchase two identical and
often costly archive systems – one for the production environment and the other
for the DR site. Furthermore, the ability to control replication, to rollback
data to previous restore points and to manage bandwidth usage varies widely
depending on the archive system. In summary, the DR process for archive systems
is very different from traditional DR.
In summary, backup and archive are two processes that solve very different problems. It is not uncommon to find customers using both in a complementary fashion. Backups are used as the primary method to protect corporate data and to enable large scale recoveries when needed. Archives, in contrast, enable cost effective retention and rapid access to important information for compliance or cost savings purposes. These two different use models have dictated very different design choices that have been made by backup and archive system providers. However, this delineation frequently confuses customers and
we often find users who rely exclusively on backups as an archive.
Contact us at BackupWorks.com for your next Backup and/or
Archive solution. Call your at 866 801 2944 we have everything from Tape
to Disk, from LTO-7 autoloaders to LTO-7 Tape Libraries, from NAS to SAN, and
Removable Disk Storage.