Tape Media

Tape Drives

Ethernet LTO Tape Drives

Tandberg Data RDX QuikStor

Tandberg Data RDX QuikStation

HP RDX Removable Disk

Quantum SuperLoader 3

Quantum Scalar i3 LTO

Qualstar LTO Tape Libraries

Qualstar Q8 Tape Autoloader

Qualstar Q24 Tape Autoloader

Qualstar Q48 Tape Autoloader

Qualstar Q40 Tape Library

Qualstar Q80 Tape Library

Qualstar Tape Libraries

SymplyPro XTL Tape Libraries

Overland NEO Tape Libraries

Overland NEOs StorageLoader

Overland NEOs T24 Loader

Overland NEOs T48 Library

Overland NEOxl 40 Series

Overland NEOxl 80 Series

Tape Drive Autoloaders

HP StoreEver Tape Libraries

HP StoreEver MSL3040

Archiware P5 Software

XenData LTO Archive

Facilis Technology

SnapServer XSR NAS Series

Nexsan Storage

ATTO SAS / Thunderbolt

Cables & Terminators

Barcode Labels

Turtle Storage Cases

Quantum Scalar i3 Warranty

Removable Disk Storage

Imation RDX

Imation RDX Bundles

Tandberg RDXLock WORM

Quantum RDX

HP RDX+ Bundles

IBM RDX

Dell RD1000

Reconditioned Tape Drives


Custom Sequence Barcode Labels for all your Tape Media - DLT, SDLT AIT and LTO FREE LTO BARCODE LABELS

LTO-9 Tape Drives LTO-9 Tape Libraries Now Available

SymplyPro LTO Archiving Solutions LTO-8 and LTO-9

Browse by Manufacturer
Mailing Lists


Brookhaven’s Scientific Data and Computing Center Reaches 300PB on Tapes

Largest compilation of nuclear and particle physics data in USA, all easily accessible - with plans for much more.

The Scientific Data and Computing Center (SDCC) at the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory has reached a major milestone: It now stores more than 300PB of data. That’s far more data than would be needed to store everything written by humankind since the dawn of history – or, if you prefer your media in video format, all the movies ever created.

“This is the largest tape archive in the U.S. for data from nuclear and particle physics (NPP) experiments, and third in terms of scientific data overall (*),” said Alexei Klimentov, physicist, Brookhaven Lab, who manages SDCC.

“Our 300PB would equal 6 or 7 million movies,” said Tim Chou, engineer and data specialist, Brookhaven Lab. “Since the first movie was made in 1888, humans have generated some 500,000 movies. So, all the feature films ever created would fill only a small percentage of our storage.”

Written history, starting from Sanskrit to today, would fill just 50PB.

“We have 6x more data,” Klimentov said.

The current SDCC cache comes from experiments at the Relativistic Heavy Ion Collider (RHIC), a DOE Office of Science user facility for nuclear physics research that’s been operating at Brookhaven Lab since 2000, and the ATLAS experiment at the Large Hadron Collider (LHC), located at CERN, the European Organization for Nuclear Research. These 2 colliders each smash protons and/or atomic nuclei together at nearly the speed of light, thousands of times/second, to explore questions about the nature of matter and fundamental forces. Detectors collect countless characteristics of the particles that stream from each collision and the conditions under which the data were recorded.

“And amazingly, every single byte of this data is online. It’s not in bolted storage that is not available,” Chou said. “Collaborators around the world can access it, and we will mount it and send it back to them.”

Seamless access on demand

By mounting it, he means pulling the relevant information out of a state-of-the-art, high-tech tape storage library. When RHIC and ATLAS collaborators want a particular dataset – or multiple sets simultaneously – an SDCC robot grabs the appropriate tape(s) and mounts the desired data to disk within seconds. Scientists can tap into that data as if it were on their own desktop, even from halfway around the world.

“We have data available on demand,” Klimentov said. “It’s stored on tape and then staged on disk for physicists to access when they need it, and this is done automatically. It’s really a ‘carousel of data,’ depending on what RHIC or ATLAS physicists want to analyze.”

Yingzi (Iris) Wu, engineer, SDCC, noted that the system requires very good monitoring, some redundancy of data and control paths, and good support of the equipment.

Improved capacity:  One of the original storage tapes from the beginning of the data center in 1999, and the latest rendition in use today (LTO-9). It would take 900 of the original tapes to hold the data stored on the newer variety.

“We have developed our own software and website to generate plots that let us know what is going on with the data transfers,” she said. “And we added a lot of capabilities for monitoring how the data goes into and out of the High Performance Storage System (HPSS).”

HPSS is a data-management system designed by a consortium of DOE labs and IBM to ensure that components of complex data storage systems – tapes, databases, disks, and other technologies – can ‘talk’ to one another. The consortium developed the software physicists use to access SDCC’s data.

“We install and configure the software and let users use it,” Wu said. “But we always need to improve the way to talk to those systems to get data in and out. Our log system has alerts. If anything is not performing well, it will send out alerts to the team,” she said.

AI and ML algorithms can help detect such anomalies and reduce the operational burden on the computing professionals and engineers who provide support for users of HPSS and SDCC’s storage systems, Klimentov noted. “This is something SDCC staff plan to develop more over the next few years to meet future data demands.”

Energy and cost savings

Why use such a complex 2-tiered tape-to-disk system? The answer is simple: cost.

“When you consider the cost per terabyte of storage, tape is 4 or 5x less expensive than disk,” said Ognian Novakov, engineer, SDCC.

In addition, for data to be available on disk, the disks have to spin in computers, eating up energy and emitting heat – which further increases the energy needed to keep the computers cool. Tape, which is relatively static when not in use, has lower power demands.

“Tape storage is generally designed for deep storage, deep archives. You write the data and almost never read it, unless you need it for recovery or to meet compliance requirements,” Novakov said.

“But in our case, it’s a very dynamic archive,” he said. ”The robots frequently access the tape archive to move/stage requested data to disk, then the staged data gets deleted from disk so there’s space for the next request. “HPSS plus our tape libraries provide the functionality of an infinite file system,” Novakov said.

Klimentov noted that the more efficient storage has allowed SDCC to reduce the data volume on disk ‘by a factor of 2.’

”Cutting down on disks has another benefit since they have an average lifetime of just 5 years, compared to tape with a shelf life of about 30 years,” Chou said.

And tape capacity keeps improving. “The storage capacity on tape generally doubles every four to five years,” Novakov said. “We started 26 years ago with 20GB tape cartridges; now we are at 18TB on one cartridge – and it’s even smaller in physical size. By periodically rewriting data from older media to new, we are freeing a lot of slots in the library.”

Meeting ever-increasing data demands

Most of the SDCC’s tape libraries are now located in a facility with power and cooling efficiencies designed specifically for data systems. And there should be enough room for expansion to meet the ever-increasing demand of current experiments as well as those planned for the future.

“RHIC’s newest detector, sPHENIX, with a readout rate of 15,000 events/second, is projected to more than double the data we have now,” said Chou.

After RHIC has completed its science mission toward the end of 2025, it will be transformed into an Electron-Ion Collider (EIC). This new nuclear physics facility is currently in the design stage at Brookhaven and is expected to become operational in the 2030s. Around the same time, a ‘high-luminosity’ upgrade to increase collision rates at the LHC is expected to ramp up the ATLAS experiment’s data output by about ten times!

Plus, SDCC handles smaller data loads for a few other experiments, including the Belle II experiment in Japan, the Deep Underground Neutrino Experiment based at DOE’s Fermi National Accelerator Laboratory, and some experiments at the National Synchrotron Light Source II and Center for Functional Nanomaterials, 2 other DOE Office of Science user facilities at Brookhaven.

“Space wise, we probably have to grow our physical capacity to one-and-a-half or two times our current size [by adding more racks to the existing facility], while in data capacity, we are growing by a factor of 10 or more,” Novakov said.

Chou has spec’d it out: “From our calculations, our existing tape room can probably hold 1.5 or 1.6EB of data with existing old technology. One exabyte is 1,000 petabytes – a billion billion bytes,” he said. “But we know the capacity of tape technology will grow exponentially. We think with technology upgrades, we can hold 3EB without major upgrades to our facility.”

The tape data archive of Brookhaven Lab’s Scientific Data and Computing Center is very dynamic. When physicists want access to a particular dataset – or multiple sets simultaneously – a robot grabs the appropriate tape(s) and mounts the desired data to disk within seconds. Onsite processors can perform local data analyses, but even scientists located halfway around the world can tap into the data as if it were on their own desktop.

Search
Shopping Cart
Your cart is empty.

Tandberg Data RDX Quikstor Removable Disk Cartridges

RDX 10 Pack Promotion - celebrating 10 Years of RDX Technology

SnapSever XSR120 and XSR40 Available

Quantum Scalar i3 LTO-9 Now Available and Shipping

Free Shipping UPS Ground - $500 min. order


Repair Services - 6 Month Warranty Fast Turnaround

Outlet Center - Refurbished Tape Drives - 6 Month Warranty