Archival Data -
Disk or Tape
Relentless digital data growth is inevitable as data has
become critical to all aspects of human life over the
course of the past 30 years and it promises to play a
much greater role over the next 30 years. Much of this
data will be stored forever mandating the emergence of
a more intelligent and highly secure long-term storage
infrastructure. Data retention requirements vary widely
based on the type of data, but archival data is rapidly
piling up everywhere. Digital archiving is now a key
strategy for larger enterprises and has become
a required discipline for hyperscale data centers.
Many data types are being stored indefinitely
anticipating that its potential value will eventually
be unlocked. Industry surveys indicate nearly 60% of
businesses plan to retain data in some digital format
50 years or more and much of this data will never be
modified or deleted. For many organizations, facing
terabytes, petabytes and potentially exabytes of
archive data for the first time can force the redesign
of their entire storage strategy and infrastructure. As
businesses, governments, societies, and individuals
worldwide increase their dependence on data, data
preservation and archiving has become a critical IT
practice.
Fortunately, the required technologies are now available to manage the archival upheaval
What is Archival Data?
Archiving involves moving data that is no longer frequently accessed off primary systems to lower cost storage for long-term retention
and protection. Archive retention requirements of 100 years or more are data is unstructured and includes office documents, video,
audio, images, and basically anything not in a database. Big data is mostly unstructured data which is difficult to search and analyze with
traditional methods. Fortunately, many of the tools that analyze big data are beginning to exploit metadata and global namespaces to
make search and access capabilities much easier.
How much Data is Archival?
Industry estimates vary but the amount of data projected to be actually stored in 2025 is
believed to be ~7.5 ZB according to IDC’s 2018 Data Age report. The effects of the global
Covid-19 pandemic on storage demand remain unclear.
What we do know is that approximately 1.1 ZB of total storage capacity was shipped in 2019
across Non-Volatile Memory devices (SSDs), HDDs, and magnetic tape media with HDDs
making up the majority of the shipped capacity. Today at least 60% of all data can be
classified as archival and it could reach 80% or more by 2025, making it by far the largest
and fastest growing storage class while presenting the next great storage challenge.
Most archival data have never been monetized as the value of data remains unknown, but companies are just now realizing that digital
archives have great potential value. Companies looking to be relevant between now and 2025 will need to understand the role archive data
can play in their organization’s success and how data archiving strategies will evolve during that period. Given this trajectory, the archival
storage paradigm will need to reinvent itself.
When does Data reach Archival Status?
Archival data will continue to be the largest and fastest growing data
classification segment. As data ages after its creation, the probability P(a)
(probability of access) begins to fall after one month and typically falls below
1%, most often at 90 - 120 days. Some data becomes archival upon creation and
can wait years for access or further analysis adding to the archival pileup.
Today the most cost-effective solutions for archival data are tape robotic
libraries used in local, cloud and remote locations.
Backup and Archive Are Very Different Processes?
Many businesses continue to confuse the backup and archive processes often thinking they are the same thing. Backup is the process of
making copies of data which may be used to restore the original copy if the original copy is damaged, corrupted, or after a data loss event.
Archiving is the process of moving data that is no longer actively used, but is required to be retained, to a new location for long-term storage
freeing up space on the source location. Most archive applications treat archive data as read-only to protect it from modification, while
others treat archive data as read and write capable.Archiving moves the original data to more cost effective location for long-term storage. Remember backup occurs
on your time – recovery occurs on company time.
High capacity disk and tape store most of the world’s archive data. However, coping with non-stop rapid accumulation of archival data
cannot be cost effectively achieved using a strategy of just increasing capacity with more costly disk drives. From a capital expense
perspective, the cost of acquiring disk drives and keeping them operational can quickly skyrocket. The deployment of additional disk arrays
increases spending (TCO) on administration, data management effort, floor space and energy consumption compared to more costefficient
tape solutions as storage demand grows. Unlike disk, tape capacity scales by adding more media, not more drives, making tape
currently the most cost-effective and scalable archival solution.
Comparing Disk and Tape for Data Archiving
Tape and disk are today’s primary options for large-scale data center archiving and share
the archive market. A disk drive can consume from 7 W to 21 W of electrical power every
second to keep them spinning and to cool them making energy costs a major component
of overall disk TCO. Tape also provides WORM (Write-Once-Read-Many) and encryption
capabilities enabling an immutable, secure storage medium for valuable archival files. The
Tape Air Gap adds significant protection against cybercrime and cybercrime attacks by
providing an electronically disconnected copy of data that hackers can’t access. The table
below compares tape and disk for key archival functions used to implement an optimized
archive.
Archive Functionality |
Tape |
Disk |
|
TCO |
Favors tape for archive as much as 5-8x over disk and cloud |
Much higher TCO, more frequent conversions and upgrades |
|
Long-Life Media |
30 years or more on all new enterprise and LTO tape |
~4-5 years for most HDDs before upgrade or replacement |
|
Reliability |
Tape BER (Bit Error Rate) @ 1x1019 versus 1x1016 for disk |
Disk BER has fallen behind tape by three orders of magnitude |
|
Inactive Data Does Not Consume Energy |
Most tape data doesn’t consume energy.
“If the data isn’t being used, it shouldn’t
consume energy” |
TCO studies indicate that disk is 90x more
expensive for energy than tape and produces 89x
more Co2 |
|
Highest Security Levels |
Encryption and WORM available on all
tape, “air gap” prevents hacking |
Encryption and WORM are available, not
frequently used on disk |
|
Capacity Growth Rates |
Roadmaps favor tape over disk for
foreseeable future – 300-400 TB
cartridge have been demonstrated |
Slowing capacity growth as roadmaps project disk
capacity to lag tape for foreseeable future |
|
Scale Capacity |
Tape can scale by adding cartridges |
Disk scales by adding more drives |
|
Data Access Time |
LTFS, the Active Archive, TAOS and
RAO improve tape file access time |
Disk is much faster (ms) than tape (secs/mins) for
initial access and provides random-access capability |
|
Data Transfer Rate |
400 MB/sec for TS1160, 360 MB/sec for
LTO-8, RAIT multiplies data rates |
Approx. 160-220 MB/sec for typical HDDs |
|
Portability - Move Media for DR Without Electricity |
Tape media is removable and easily
transported to another location in
absence of data center electricity |
Disks are difficult to physically remove and to
safely transport |
|
Cloud Storage Archives |
Tape Improves Cloud Reliability and
Security, Lowers Storage Costs |
HDDs become very expensive as CSPs and
Hyperscale data centers grow |
|
key Point
The tape industry continues to innovate and deliver compelling new features with lower economics and the highest
reliability levels. This has established tape as the most cost-effective choice for archiving as well as playing a larger role for
backup, business resumption and disaster recovery.
The size of preserving digital archives is now reaching the order of petascale (1x1015),
exascale (1x1018) and will approach zettascale (1x1021) capacities in the foreseeable
future. A strategy to move low activity, but potentially valuable archival data to the
optimal storage tier immediately yields the greatest cost savings. Archive storage
growth and requirements have no foreseeable limits and could demand a new deep
archive storage tier in the next few years. Unless a new archival technology arrives, the
numerous improvements in tape have made it the clear-cut optimal data archiving
choice for the foreseeable future.