Tape Media

Tape Drives

Removable Disk Storage

Imation RDX

Imation RDX Bundles

Imation RDX Media Secure

Quantum RDX

Quantum RDX 8000 Library

HP RDX Removable Disk

Tandberg Data RDX QuikStor

IBM RDX

Qualstar Tape Libraries

Overland NEO Tape Libraries

Tape Drive Autoloaders

HP StorageWorks

Tandberg Data Storage Library

Quantum SuperLoaders

Quantum Scalar i40

Overland Storage NEO S

NEO S Series Support

NAS Storage

Snap Server NAS

SnapServer DX DynamicRAID

SnapServer DX1/DX2 Bundles

SnapScale X2 Clustered NAS

SnapScale X4 Scalable NAS

Snap Server Support

Snap Server Software

Tandberg Data BizNAS

LenovoEMC StorCenter NAS

LenovoEMC px4-400d/ px4-400r

LenovoEMC px12-400r NAS

LenovoEMC px12-450r NAS

Iomega NAS Service Plans

Pegasus RAID Storage for MAC

Pegasus2 RAID Thunderbolt

Netgear ReadyNAS

SnapSAN S1000 Storage Array

SnapServer S2000 iSCSI SAN

Overland Storage REO VTL

Lenovo ThinkServer SA120

Nexsan Storage

Adaptec SCSI HBA Cards

ATTO SAS/SATA/FC HBA

ATTO ExpressSAS RAID

ATTO 10GbE NIC Cards

iSCSI/ FC HBA Cards

SATA/ SAS HBA Cards

Cables & Terminators

Barcode Labels

Turtle Storage Cases

HP Toner Cartridges

Imation Defender Flash Security

Repair Services

Reconditioned Tape Drives


Custom Sequence Barcode Labels for all your Tape Media - DLT, SDLT AIT and LTO FREE LTO BARCODE LABELS

  AUTHORIZED PARTNER

Iomega StorCenter Network Storage Appliances

mTape LTO-6 with Thunderbolt Connectivity for MAC

Browse by Manufacturer
Mailing Lists


Data De-Duplication: FAQ's Top 10 questions


1. What does the term "data de-duplication" really mean?
There's really no industry-standard definition yet, but we're getting close. Everybody agrees that it's a system for eliminating the need to store redundant data, and most people limit it to systems that look for duplicate data at a block not a file level. That's an important feature. Imagine 20 copies of a presentation that have different title pages to a file-level data reduction system they look like 20 completely different files. Block level approaches would see the commonality between them and use much less storage.

The most powerful data de-duplication uses a variable-length block approach. Products using this approach look at a sequence of data, segment it into variable length blocks, and when they see a repeated block, they store a pointer to the original instead of storing the block again. Since the pointer takes up less space than the block, you save space. In backup, where the same blocks show up over and over, users can typically store 10 to 50 times more data than on conventional disk.

2. How can data de-duplication be applied to replication?
Replication is the process of sending duplicate data from a source to a target. If you replicate all the backup data then you need a relatively high performance network to get the job done. But with de-duplication, the source system-the one sending data-looks for duplicate blocks in the replication stream. If it has already transmitted a block to the target system, then it doesn't have to transmit it again-it simply sends a pointer. Since the pointer is much smaller than the block, we need much lower bandwidth networks for replication.

3. What applications does data de-duplication work with? Are there any that it doesn't work with?
When it's being used for backup, it supports all applications-email, databases, print and file applications, etc-and all qualified backup packages. Variable block length de-duplication can find redundant blocks in the backup stream for all of them. Certain file types-some rich media files, for example-don't see much advantage the first time they are sent through de-duplication because the applications that write the files already eliminate redundancy. But if those files are backed up multiple times or backed up after small changes are made, de-duplication can have very powerful capacity advantages.

4. Is there any way to tell how much de-duplication advantage I will get with my data?
There are really four primary variables. How much the data changes (that is, how many new blocks get introduced), how well it can compress, what your backup methodology is (full vs. incremental, for example), and how long you plan to retain the data. Some vendors offer sizing calculators to estimate the effects.

5. What is the real benefit of using data de-duplication?
There are really two. 1) Data de-duplication technology lets you keep more backup data on disk than with any conventional disk backup system-which means you can restore more data faster. 2) It makes it practical to use standard WANs and replication for DR protection-which means users can reduce their tape handling.

6. What is variable-block length data de-duplication? How do you get variable-length blocks and why would I want them?
It's easiest to think of the alternative. If you divided a stream of data into fixed-length segments, every time something changed at one point, all the blocks downstream would also change. The system of variable-length blocks allows some of the segments to stretch or shrink, while leaving downstream blocks unchanged-this increases the ability of the system to find duplicate data segments, so it saves significantly more space.

7. If the data is divided into blocks, is it safe? How can it be restored?
The technology for using pointers to reference a sequence of data segments has been standard in the industry for decades, you use it every day, and it is safe. Whenever you write a large file to disk, it is stored in blocks on different disk sectors in an order determined by space availability. When you "read" a file, you are really reading pointers in file's metadata which point to the various sectors in the right order. Block-based data de-duplication applies a similar kind of technology. And de-duplication vendors typically build in a variety of data integrity checks to verify that the system is sound and the data remains available.

8. Where does data de-duplication take place during the backup process?
There are really two choices. You can send all your backup data to a backup target and perform de-duplication there, or you can perform the de-duplication on the host during backup. Both systems are available and both have advantages. If you de-duplicate on the host during backup, you send less data over your backup connection, but you have to manage software on all the protected hosts, backup slows down because de-duplication adds overhead, and it can slow down other applications running on the host server. If you de-duplicate at the backup target you send more data over the connection, but you can use any backup software, you only have to manage a single target, and the performance is normally much higher because the hardware system is specially built just for de-duplication.

9. Can de-duplication technology be used with tape?
No and yes. Data de-duplication needs random access to data blocks for both writing and reading, so it needs to be implemented in a disk based system. But tape can easily be written from a de-duplication data store and in fact that is the norm. Most de-duplication customers plan on keeping a few weeks or months of backup data on disk, and then use tape for longer term storage. When you create a tape from de-duplicated data, the data is re-expanded so that it can be read directly in a tape drive and will not have to be written back to a disk system first.

10. What do data de-duplication solutions really cost?
There's a lot of variability, but there is a pretty good rule of thumb starting point. Assuming an average de-duplication advantage of 20:1-that's a number widely used in the industry–we have seen list prices in the range of $1/GB. So a system that could retain 20TB of backup data would have a list price of around $20,000-that's much lower than if you protected the same data using conventional disk.

Search
Shopping Cart
Your cart is empty.

Tandberg Data RDX Quikstor Removable Disk Cartridges

FREE IBM LTO Ultrium Tape Promotion - 35L2086

Imation FREE Docking Station Promotion with Purchase of select RDX Cartridges

HP RDX - FREE RDX External USB 3.0 Docking Station

ATTO ExpressSAS 6Gb/s RAID Adapters

Free Shipping UPS Ground - $500 min. order


Repair Services - 6 Month Warranty Fast Turnaround

Outlet Center - Refurbished Tape Drives - 6 Month Warranty