Selecting an Archiving Product: Buying Guide
Data archives
have become a core element of the storage infrastructure. Archives today
have two purposes: They hold a vast spectrum of data that does not require
frequent access, and they ensure that more relevant data is retained (and
then deleted) to meet regulatory compliance needs.
Whereas
storage area network (SAN) storage
emphasizes performance, archival storage relies on low-cost,
high-capacity SATA drives and employs a
combination of
RAID and traditional backups to guard disks
against failure. Some archives are little more than "dumb" disk arrays, but
the more sophisticated archives provide data deduplication for
single-instance storage, robust power conservation features and immutability
for data that may be needed as evidence in litigation.
The hardware used for
archival storage is only part of the challenge. Software plays a central
role in a wide variety of archiving tasks, from optimizing and organizing
access to email records to supporting data indexing and searching across
millions of files to setting policies for file handling, which sets the
stage for data migration and retention in an archive.
Let's first look at the eight
criteria for evaluating products associated with data archiving initiatives.
Which data requires
archiving? Not all data belongs in an archive. Before purchasing any
archiving product, you should perform data classification, which will tell you
what data exists in your organization and which data types should be protected
in an archive for regulatory compliance, as well as everyday business needs.
Data classification should not be shouldered by IT alone. Human resources,
legal, accounting and other key departments should be asked to identify the
important applications and file types. Exchange server records, patient records
or medical imaging files may be appropriate for an archive, but marketing
presentations or user MP3 files are probably not. Another issue is how long to
retain each data type. Knowing what you need to keep and how long to keep it
will help you determine storage requirements and establish scalability
requirements for archive management tools.
Does the archiving product
accommodate retention and deletion requirements? You cannot evaluate an
archiving product without reviewing its data retention and deletion
activities. The archiving tool, as well as the software tools supporting the
archive, must be able to operate within the necessary retention period. Data
retention periods are often the same as those for similar paper-based
records and documents. For example, if paper-based employment records must
be kept for seven years, their electronic equivalent is often retained for
the same period. Four caveats related to retention:
-
Be sure to identify an
appropriate means of deleting data.
-
Do not keep data past its
accepted deletion date (unless it is being held for litigation
purposes).
-
Ensure that you can
confirm deletion in a manner acceptable to your compliance environment.
-
Changes to retention
periods will impact data that has already been archived.
What is the level of
integration and automation? Storage administrators cannot migrate, track and
delete every file manually. Any archiving product must provide automated
features. Indexing tools should be able to add meaningful metadata to each file
automatically, then integrated with search tools that can wade through metadata
to locate files requested by users. Policy manager tools should be able to apply
migration and retention data across file types while restricting data types to
certain tiers. Since this allows the tools that move the data to migrate aging
data between storage tiers, as well as guide retention and deletion activity,
this requires tight integration with other tools.
What is the level of
interoperability and heterogeneity? New archive storage systems must
interoperate with tools, such as policy managers and data movers, and new
software tools should offer the heterogeneity needed to support the current
archive hardware. The automated features of the archiving hardware and software
must work together seamlessly. Lab testing is important here.
Longevity of the archive
technology, media and tools. Archiving poses problems of long-term
standardization and natural media degradation. The media may only retain data
reliably for 10 years, and tapes written today will probably not be readable on
standard tape drives available 20 years from now. A similar problem exists with
optical discs (CDs and DVDs) and all types of hard drives. Organizations face a
dilemma: either retain old equipment in order to read old media, or periodically
refresh the data (e.g., rewrite optical discs or hard drives) to whatever new
media standard is in use. While it's easier to maintain backward compatibility
with software, changes to the tools can also render older archive media
unreadable. A version of email archiving software released in 2028 may not be
able to read Exchange archives produced today.
Backup strategies.
Archives are not backups. The files located on a disk-based archive may be the
only working copies of that data in the enterprise. While disk-based archives
rely on RAID for general data protection, archive platforms are typically
included in the backup process. An established archive may be completely
backed up to tape every few months, then use
delta differencing to protect changes to the archive on a daily or weekly basis.
Data reduction techniques, such as data deduplication can reduce the total size
of the archive and speed the backup process. Bottom line: Find the most
effective means of protecting your archival data.
Tracking and reporting
features. It's critical to track any activity that occurs with a file and
report that activity to the storage administrator. In some cases, tracking and
reporting merely help an administrator follow normal changes to the hundreds of
millions of files retained in the archive. In other cases, tracking and
reporting are essential elements of storage compliance. This may include
tracking data migration between tiers, flagging search and access attempts to
learn which users are attempting to find data, alerting IT when archived data is
changed, and reporting on deletions to document the appropriate disposition of
obsolete data.
Maintenance and TCO.
Ultimately, any archiving platform or tool will cost more than the initial
purchase price. Hardware platforms carry the additional expense of routine
maintenance and potential upgrades. Software tools entail recurring costs such
as annual licensing, patching and updates. By estimating total cost of ownership
(TCO), storage managers can compare the pricing of archiving products more
objectively.