Active Archives - Annual Report 2023
The Active Archive Alliance serves as a vendor-neutral, trusted source for providing
end users with technical expertise and guidance to design and implement modern active
archive strategies that solve data growth challenges through intelligent data management.
Executive Summary
Business and IT leaders well understand the challenges of massive, double-
digit data growth. More devices and applications generate more data from the
edge to the public cloud. Copying and replicating data for protection, the need
to keep data for longer periods, and even the fear of deleting corporate data
add to storage demands. A 40% annual data volume growth rate drives the
experience for many industries.
Key to handling the data growth challenge, especially in the context of flat or
slowly growing IT budgets, is effective data management. A working definition
of data management involves the processes for gathering and storing data
efficiently, securely, and cost-effectively. Without effective data management,
data growth overwhelms an organization.
Effective data management brings other crucial benefits. Numerous challenges
face IT organizations today, including ransomware threats, budgetary pressures, skill set shortages, and digital transformation.
Intelligent, effective data management addresses these issues through cyber resiliency, reducing costs, simplifying data
administration, and data accessibility features.
Beyond problems to be solved, data-driven organizations recognize data as a strategic, enterprise asset. In a future world
where AI and ML workloads permeate and drive business processes and decision-making at all levels, effective data
management becomes imperative. Organizations without intelligent data management processes that feed into business
intelligence workloads risk being left behind by their competitors who do.
And this is where the active archive model serves today’s modern and future enterprises.
Active Archiving solves data growth challenges through:
- An intelligent data management layer to place data where it belongs for cost or performance
- Adaptability to any storage architecture, media, or protocol
- Applicability across the entire data lifecycle, from data creation through archiving and eventual purging
- Security and protection features that safeguard data from threats and risks
An active archive positions organizations to cost-effectively manage their growing data and address industry pressures,
while laying a foundation to profit from tomorrow’s opportunities.
The Heart of an Active Archive - Intelligent Data Management Software
At the center of an active archive resides an intelligent data management system.
This software system plays the central role of automatically placing data where it
belongs for cost, performance, and workload priorities. Using technologies such
as metadata and global namespaces, the data management layer makes data
accessible, searchable, and retrievable on whatever storage platform or media it
may reside.
Among its many features, the intelligent data management layer adds value by:
- Automating the decisions for tiering data to long-term storage
- Automating data management processes such as:
- Applying data protection and security policies
- Cleansing data
- Alerting for anomalous conditions
- Surveying and analyzing the enterprise data landscape
- Discovering data for which IT administrators are not aware
- Presenting visual representations of an organization’s data through charts,
graphs, and dashboards for better decision-making
- Simplifying the skill set needed to oversee and manage large, growing
volumes of data
And the data management software does this work in the background without
affecting performance.
The Active Archive
Integrates Intelligent Software and Scalable Storage for the Optimal Archive Solution
The Benefits of an Active Archive
Access:
An essential active archive principle asserts that today’s enterprise needs online
access to legacy data. Some use cases may need fast access; in other instances, a
longer retrieval time is acceptable. The organization determines its access needs
and permissions for users and groups.
Access to an active archive serves the organization through the following:
- Business intelligence: Companies can analyze retained data for insights
into trends and patterns. Monetizing data becomes the ultimate objective to
transform storage costs into profit opportunities.
- Legal requirements: Ongoing access ensures legal teams can search and
retrieve data from cold storage in response to litigation.
- Offloads IT resources: Configuring online access to inactive data so users
can retrieve these files without IT intervention. Self-service access saves time
and money.
Cost Savings
Most data growth comes through unstructured data represented by video, audio,
images, presentations, email, and documents. The likelihood for users to access
this data 30 days after its creation drops considerably; after 100 days, it falls below
1%. To keep this growing, inactive data on primary storage becomes inefficient
and costly.
Through intelligent data management software, an active archive moves inactive
data to low-cost storage. For some organizations, data management software can
tier older data to warm storage such as hard drives. Then when, by policy, data has
aged sufficiently, files can move to more cost-effective storage such as economy
disk, tape, optical, or even the cloud. Other organizations benefit by moving data
immediately to archival-type storage. An example might be healthcare, where
a medical image is archived immediately, but a cached copy remains on local
storage for 30 days.
By moving inactive data off primary storage, the IT architecture
benefits through a lean, primary storage supporting the most active
data sets. The freed space on primary storage can deter the need
for additional hardware purchases. Data consolidation opens cost-
saving possibilities through reduced labor costs, licensing fees, and
energy costs. Backup costs for primary storage also benefit as backup
software has a reduced workload, needing less time and energy for
its work.
An active archive saves money through data insights. The analysis
provided through data management software helps IT decision-
makers understand how and why their organization uses data. This
understanding, in turn, shapes how administrators optimize their data
for cost savings. Through trend analysis, data intelligence helps IT
leaders plan and budget for storage growth.
Legal teams find an active archive saves money through cost
avoidance. Most corporations must comply with regulatory
requirements to keep and store data in a specific way for a specific
time. An active archive helps companies comply with these data
security requirements and avoid legal costs from non-compliance.
Security
The threat of a successful cyberattack worries organizations around the world.
Over the last two years, ransomware has remained the number one security
concern for chief information security officers. A successful ransomware attack
can result in data loss, business interruptions, revenue losses, fines, and legal fees.
Added up, the average total expense for business recovery from a ransomware
attack: $1.85 million.
An active archive can supply a wide range of security features and cyber resiliency
capabilities to secure and protect data from cyber threats facing today’s businesses
and institutions.
Examples include:
- Encryption
- Multi-factor authentication
- Access Controls Lists (ACLs)
- Role Based Access Controls (RBACs)
- Zero-trust security models
Because archival data typically remains
unchanged, administrators may use
WORM or view-only mode features to
prevent data from being deleted or
overwritten, safeguarding data’s integrity,
availability, and confidentiality.
As data management software moves inactive data onto active archive media,
the exposure target for malware infection decreases for primary storage. Further,
several media technologies, such as tape or optical, feature easy-to-deploy air-gap
defenses, where IT personnel can establish a literal separation from any online
path to prevent unauthorized electronic access.
Storage administrators can employ 3-2-1-1-0 best practices for their
active archive data, which is also a best practice for backup storage:
- Maintain at least 3 copies of data - where the primary archive file
counts as one of these copies
- Store 2 of the copies on different media (e.g., tape and HDD)
- Ensure at least 1 of the copies is stored offsite
- Store at least 1 of the copies offline
- Verify the copies have 0 errors or virus infections
- And periodically test for restoration!
While organizations should depend on cybersecurity software as a first line of defense against malware, they should presume
that a successful attack can occur at any time. When dealing with massive data growth, which only expands the attack surface
for cyberattacks, these capabilities and practices ensure an organization’s data assets remain secure, protected,
and recoverable.
Active Archives for Emerging Trends
In addition to access, cost, and security benefits, an active archive provides
technical leaders the flexibility to adapt to new industry trends and growth areas
like sustainability, artificial intelligence, and edge computing. This flexibility helps
enterprises thrive with possibilities for market leadership, increased revenue, and
competitive advantages.
Sustainability and Active Archives
From an IT perspective, sustainability concerns itself with how corporations
use IT systems to minimize their negative impact on the environment and
society while maximizing their positive impact. Sustainability focus areas for IT
include energy consumption, e-waste, and supply chain efficiency. Above legal
requirements for sustainability, business leaders recognize the public is more
likely to support companies committed to products and services that help the
environment. Further, sustainability practices that save energy and reduce waste,
save money.
Data centers consume a significant amount of the world’s energy, with estimates
as high as 3% of the global energy supply.5 Servers and storage are of particular
note for energy consumption. Each new year of data growth adds to the energy
needs required to run servers, storage systems, and networking equipment for
processing, storing, and transmitting data. An active archive’s adaptability to an
organization’s priorities makes it well-suited for supporting sustainability goals.
For example, and as noted before, the intelligent data management software
layer automatically places data for cost, performance, and workload priorities.
By tiering inactive data off energy-intensive devices such as flash or performance
HDDs, inactive data can move onto less energy-intensive technologies like
(some) HDDs, tape systems, and optical storage, known for their minimal energy
requirements and low cost per terabyte.
By virtualizing the underlying storage infrastructure, data management software
optimizes storage resources for energy savings. Reporting and analysis from its
software guide decisions for data that can be retained or purged for energy and
cost savings. Analysis can help enterprises consolidate their storage resources
to reduce energy consumption. Enterprise leaders can use their reports to link
storage technologies to energy consumption. These analysis features contribute
to reporting requirements organizations may have for sustainability activities.
And so, through an active archive, intelligent data management becomes
intelligent energy management.
Active Archives for AI/ML Frameworks
Artificial intelligence (AI) and machine learning (ML) workloads will permeate
the workplace as an enterprise tool for operations management and decision-
making at all levels. Market research indicates 35% of organizations have
already invested in AI, and 44% plan to invest in AI within the next year.
As effective data management improves AI, effective AI improves data
management. AI expands the intelligent data management software layer
beyond analysis and reporting. AI will bring value to data management and,
therefore, to an active archive through:
- Tailored optimization for primary, backup, and archival storage for
availability, cost, performance, and workload priorities
- Applying and enforcing data security and protection protocols
- Monitoring, detecting, and countermanding security threats
- Increasing the scope and productivity of IT employees
- Accelerating and automating complex data management processes
- Automating data recovery of critical workloads in the event of cyber
attacks or other disruptions
Through an AI-driven intelligent data management software layer, AI will
automate and autonomize active archives by:
- Automatically cleansing, normalizing, classifying, and making accessible
long-term data and metadata for AI workloads
- Automating metadata tagging, indexing, and data cataloging
inactive data
- Identifying and archiving sensitive information
- Presenting Natural Language Processing (NLP) interfaces for
archive inquiries
- Efficiently allocating compute, storage, and networking resources in
support of sustainable efforts
Ultimately, AI depends on well-organized data for success Which
highlights yet again why effective data management through an active
archive is crucial for an AI future.
Edge Computing and Active Archives
Edge computing moves data processing out of the data center and at or near
the source where it originates. Data processing may be in a device or a small
server. Fueled by IoT, 5G, and advances in small form factor server
technologies, the edge computing market is expected to rise to $116.5 billion by
2030.
Computing at the edge brings IT benefits through fast responses, reduced
network transmissions, and reduced costs. The distributed nature of edge
computing complicates data storage, where edge devices may number from hundreds
to hundreds of thousands. These devices may range from small sensors to small
servers. Enterprises are faced with questions about collecting, processing,
storing, and managing the data generated from edge devices and applications.
An active archive adds value to edge computing in many of the same ways as data generated within the data center and
cloud. The intelligent data management layer helps IT organizations with managing edge data through:
- Automatically placing edge data where it belongs for
cost, performance, and workload priorities
- Automating data management processes such as:
- Applying data protection and security policies to
edge data
- Tiering edge data for cost and workload priorities
- Analyzing and presenting a visual representation of
edge data through charts, graphs, and dashboards for
better decision-making
Additionally, some edge data may be eligible to be moved to
an active archive environment immediately. Edge data to be
retained can be stored in a WORM or otherwise immutable
format to protect it from malware. IT organizations can use
their active archive to make edge data readily available when
needed for analytic workloads for business insights
Video Surveillance
One of the many use cases of edge computing involves
video surveillance for public security. Video surveillance
installations are an ideal candidate for collecting, retaining,
and accessing recorded video data volumes.
With estimates of over a billion devices worldwide, a large international
airport may generate hundreds of terabytes of video surveillance data each day.
Even an organization with a hundred cameras can generate tens of terabytes
daily.
An active archive serves as a practical video surveillance
storage solution for organizations requiring long-term video
retention or having large numbers of surveillance cameras.
Management: An active archive’s intelligent data
management policy can tier video files onto HDD and then
to an active archive storage tier.
Costs:
Tape technologies bring cost-effective, scalable
solutions for active archive video surveillance footage.
Security: An active archive, through its data management
software, can automatically apply security policies to ensure
regulatory compliance and protection.
Access: With integrated tiering into video management
software (VMS) systems, the ability to search and playback
all recorded video from either tier of video storage becomes
straightforward for the video operator.
Active archive, multi-tier, video storage solutions bring
cost-effective implementations for video surveillance use
cases. These solutions significantly reduce the initial cost of
required hardware, plus the overall total cost of ownership
benefits from power cost savings.
The sheer magnitude of data in the
next era of digital business requires
organizations to rethink the scale, cost,
durability, and intelligence underlying
their data management strategies.
The active archive solution answers the need for effective data
management while providing a cost-effective, scalable solution
addressing data growth challenges. Beyond a solution for
data growth and management, an active archive serves the
organization and its stakeholders by advancing the broader
initiatives for digital transformation.