| | |
|
| |
| | |
|
|
| | |
|
AI, Tape’s Future & The Next Memory Revolution Podcast by Symply with Mark Broadbent from XenData
Transcript
My guest today is Mark Broadbent, CTO of uh XenData, a
company that's been at the forefront of archiving for a number of decades. Mark,
it's a pleasure to have you in our studio. It's a great pleasure to be here with
you, Nick. Great. Well, this is a really interesting topic. It's LTO and
archiving is something that's, you know, important to me and I'm sure important
to you. But before we get going, um I'm going to hit you with a few quick fire
questions just to help set the scene for our our viewers. Uh so if you can keep
your answers to one word, uh and let's see how we go. Okay. Um is 321 data
protection important for archives? Yes. Do you think tape is dead? No. What will
be the go-to medium for archiving in say 10 years? I think it depends. I can't
give you a one-word answer to that question. We'll come back. Um, how about
another one? In a hundred years, will future generations be able to read data
that I write today? Not if you don't rewrite it in some way. And um, thinking
about AI, do you think AI is just going to be managing all our archive
requirements in the near future? Not on its own. Not. Brilliant. So, we're going
to come back to these questions during the topic of our conversation. Um, so I
think but before we kick off, Mark, could you just tell us a little about a
little bit about yourself and your journey at XenData just to give our listeners
an overview? Well, I can give you uh kind of my whole journey if you like. No,
absolutely. Professionally at least. Um, so I' I've been working in the data
storage industry since the late 1990s. Um we were uh and I met Phil Story who's
my co-founder of XenData in in the late 90s at a company called Plasmon which no
longer exists. Um so Phil and I have worked together in that company and then in
a German company and then we decided we would start XenData in 2001. It was
actually the 11th of September 2001 was the day that we decided we were going to
do the business. And so we were sat in my kitchen at home uh talking about
business plans and so on as the world mayhem was going on elsewhere. Um and it
was actually quite a good time to start a business because we had nothing to
sell and nobody was buying anything. So it was a perfect match. So we developed
the first version of the product over about a year, started showing it at trade
international trade shows. Um started making sales almost immediately and the
company has kind of grown steadily ever since. Brilliant. And in terms of the
technologies that you use to archive or allow your customers to use to archive,
what are they? We started with tape and specifically with worm tape made by Sony
in those days. Little um AIT tapes which were 50 gigabytes. Um pretty slow, you
know, a bit we by today's standards. Um but we started with that technology
because we'd been selling worm technology with plasmon and there are legislative
requirements in some cases where you have to use a a right once technology and
we thought we could supplant optical discs with tape um using the Sony worm tape
um and we were used to selling into like banking applications and email
archiving applications and things like that but then we discovered media and
entertainment. We went to the IBC show in Amsterdam and just went round and
talked to people and thought here is our niche, here is our opportunity. And so
really since uh I don't remember what the year that was but uh 2002 2003
something like that we've been focusing on media and entertainment and and
related things. So we have quite a lot of installations in houses of parliament
in various countries for example or in big corporate um video production
facilities. You know we could if if if if you were doing this on a grand scale
we would be archiving this data. Well you never know small small yeah icons uh
aons and not trees and all those kind of things. Um so I think where where I'd
like to start to talking to you both uh talking to you now is about um how to
build an archive the kind of strategies behind archiving and maybe first we
start with you know what does digital archiving mean in the world today and you
know is is it the same as it was 10 years ago or five years ago how's I think we
have our own definition of what archiving means we tend to talk about active
archives So sorry. So if you think about the this long-term archive where you
just want to keep a copy of the data because it's it's your intellectual
property and you know you it's the value of the business and you need to keep
it. But we're kind of more focused on active archives, which is where you're
reusing the content and so um so you're accessing the data and that's kind of
where we focus. Yeah. And I guess that's the differentiation between kind of
pre-digital or older digital archives is is that one copy compliance for legal
or medical or things like that versus media and entertainment where really the
media is always kind of churning in in that respect. Yes. Um with medical
records there are different environments actually. In some cases you have to
keep it keep data for the lifetime of the patient but actually in many cases
it's not relevant for the lifetime of the patient because something like an echo
cardiogram for example um changes over time. So there's no point in looking at a
15 year-old you know me 15 years ago my heart is in a very different state now
probably. Yeah. um with media and entertainment, you know, a a an organization
has a history that goes back to the foundation of that organization and a lot of
it will if if it's an older organization, a lot of it will have been shot on,
you know, maybe even acetate film. Yeah. Um most people now have transcribed
that into some form of digital storage, you know, digital negative kind of
thing. essentially. Yes. And got as much and sometimes it's a you know you've
only have one go at that. Um so today we're talking I think almost entirely
about digital archiving of data files. Okay. And when customers come to you and
they're considering building out these active archives you primarily for media
or other markets obviously you know if we've talked about XenData you support
disk you support tape and you support cloud. How do you um and optical? Oh, and
optical. Sorry. And I am going to come back to that in a minute. Um how do you
how does a customer navigate the treacherous waters if you will of what
technology to choose or is that really a tape vendor will say you have to use
tape, the cloud vendor will say you have to use cloud and never the two you'll
meet. But obviously really there's a compromise and there's a right tool for
different parts of the job. Yeah. And so if you've got an archive of if I've got
an archive of my diving videos, for example, I've got a few terabytes of data. I
wouldn't personally use tape storage. I do because I can because it doesn't
really cost me anything. But I but if if I was paying for it, I wouldn't use
tape storage for archiving a few terabytes of data because there are far more
effective solutions. If you go to the other end of the scale, if you look at
somebody who's got multie-terabytes, and there are plenty of organizations that
have tens or hundreds of petabytes or more, you're probably not going to use
standard tier S3, for example, because it would just be extraordinarily
expensive. And so the volume of data that you've got plays a very large part in
it. Yeah. You talked about 321 earlier. Um, and actually that so having a copy
on prem on tape and another copy in a an offline storage tier in the cloud
actually meets that requirement of well two copies but in two different places
and on two different storage media. And that really actually is really
interesting is where I wanted to go is obviously um five or six years ago
everything was about 321. Everything is now about 32 1 0. Not quite as catchy
but um as you said it's three copters of your data two different mediums one in
a different location. the the extra one adds uh the fact that you've got to um
make uh have a uh um verified copy is pointless writing all this data if you're
not verifying it or testing that you can retrieve it and of course one off you
know completely offlined or airgapped um so do you see you know I see that a lot
of customers are you know specifically when you come to ransomware and we're
going to talk about that a little bit later but they've got a focus on
protecting maybe their workstations and their maybe primary storage, but I'm not
really sure the message gets down to the archive level or the longer term
storage level. I don't know whether you've got any thoughts on that. I'm not
quite sure what you're getting at. So, I guess it's kind of two parts is one
you've sort of already covered is that actually by putting data on prem and and
in the cloud or or somewhere else or if you are on prem you're making two
copies. you're putting tape sets sort of where else but you know is is that it's
really a question of I I speak to some customers and they say I've archived the
data and say oh great well where is it and they go I've got a copy on tape and
it's in a box somewhere and it's like well that's not you know yes it might be
archived but it's not protected in any way because your only copy is on a on a
tape so do you see customers I guess that you deal with falling into that trap
or There are there are all sorts of nuances in there actually. Um so absolutely
knowing where your data is. So knowing which tape has this particular file yeah
is or or which cloud storage location has this particular file is paramount
because if anything happens then you need to know exactly what has happened.
Yeah. Um, when we keep multiple copies, we our tape copies are exact replicas.
So, so you know, you've got one and I think that's important as well. So, you've
so so if you're contrasting, for example, backup with archive, backup typically
won't do that. Yes. Even if you make two backups, you end up with different data
on different tapes and it then becomes administratively really difficult if you
have to figure out where that file is because you want to retrieve it for some
reason. So I think a crucial part of archive is knowing where the data is. Yeah,
absolutely. And you as you raise a valid point, there's a lot of general
misconception in the market or misunderstanding in the market between backup and
archive. They were often referred to as the different sides of the same coin,
but you know they are funly different. Have you got you know could you give our
listeners or viewers just a quick overview of you know how they differ? Sure. I
mean, I think we've covered a fair amount of that, but um backup really is about
keeping copies of your working data. So, you back up your laptop, you back up
your servers, primary storage, etc. Yeah. You Yes. your and and you you in a
media and entertainment industry, maybe you back up your edit storage, for
example. Um, but that's a sort of that's for recovery in case of of a of a
failure of some kind. I mean, it might be recovery from ransomware, uh, or it
might be recovery from a hardware failure, but it's a short-term strategy. and
backup you will typically if you're backing up to tape for example you will
typically reuse those tapes. Um archive is about a long-term strategy. It's
about keeping your data forever and knowing where it is. So I guess offering
advice to someone that's looking to build an archive. I guess what are the
common pitfalls? As I said, one of my kind of common pitfalls is people, you
know, making one copy, think it's an archive, think it's data protected. But
what are what are the kind of other, you know, pitfalls or our blind alleys that
customers might go down that, you know, might cause them sort of problems later
down the line? Um, I think you have to look at archive as being a part of your
whole overall workflow. So particularly in the case of something like as we say
active archives, you know, if you are going to be reusing your content, how are
you going to reuse it? How are you going to get it from the archive to the edit
storage that you're that you're using or or um or or to a playout server or
whatever it is. And so typically an archive is part of a bigger infrastructure.
Um, and so there'll be a a media asset manager of some kind at the heart of it.
And I think picking vendors that can work together and picking a system
integrator that can put a whole system together for you uh is probably very
important. I know whether you see this as well, but I kind of increasingly see
that customers are treating their active archives as more nearline storage. you
know they might you know in the case of XenData they you know your system
without going into too much detail it holds a certain can hold as much data as
you want on a disc cache in front of a tape library or in front of a cloud
service but in my experience customers that sometimes deploying these kind of
LTO archive solutions are really using them as nearline is that kind of you know
an experience that you find yeah they do um we had an instance um quite a few
years ago actually where uh it was a Sony AIT tape and it said it had reached
its design lifetime in a tape alert message and we and that was 2,000 cartridge
loads and we said why has that happened and it turned out that they were playing
out adverts from tape directly from tape. So, you know, thinking about how you
want to use the system, the overall design of the system is is is always good.
Yeah, we've talked a lot about tape and as I said at the adv at the start, I'm a
real big advocate for tape, even though the company I work for simply, we do we
do cloud, we do disk and things, but you know, I'm I you know, I'm I spoke to
one guy at IBM and he said it's a bit of a secret society. Everybody kind of
uses it really, you know, whether they access cloud storage or, you know, the
financial records, everything ultimately ends up on a LTO tape somewhere. Even
if you don't know you're using it, you kind of are. But what um, you know, and
LTO is really the cornerstone of a lot of archives. What do you think that keeps
that kind of relevant in 2025? you know what what's what is it about tape that
makes so the really big thing about tape is that it's inexpensive. Yeah. So as
data volumes increase and you know as so as as um camera resolutions increase as
broadcast standards in increase in resolution we're just in a a data explosion
and it's the amount of data is exponentially increasing and because tape is the
lowest cost method of storing data that's what keeps it relevant. The
consortium's just announced LTA 10 so we're now up to 30 terabytes native on a
cartridge. So if I guess if customers are thinking about it, you can add in
extra capacity and I know certainly in a case of kind of XenData you're slot
like you're not a capacity tax capacity model license model are you? No our
license models um are very straightforward actually. We fundamentally um we
charge by the amount of hardware that we're controlling. So it would be you know
going from a single tape drive or a single optic drive or whatever up to a
10,000 slot tape library there's a kind of increase in the licensing costs
across that and and the other thing that I think LTA does well and it's again
it's an I guess another common misconception people think it's slow but
obviously if you've got your library on premise then you've actually got quite a
lot of bandwidth you know back to your man or your applications and and probably
much faster than a connection to the cloud. Yes, it depends on what you're doing
of course, but again, if you look at media and entertainment type archives, big
files, um an LTO 10, we've been testing an LTO10 drive as I know you're very
well aware and uh that can transfer 400 megabytes a second to or from the tape.
Yeah. So the the slowness comes from the fact that you if you're in a robotic
library, you have to move the tape from a slot to a drive and then you have to
wind it to the right point on the tape before you start transferring data. Now a
there's pro in in any kind of active archive more data is archived than
restored. And when you're archiving you always write it to the end of the tape.
So archiving if you can get the data coming into the system at 400 megabytes a
second or more then you can keep the tape drive fully occupied and you will get
through a phenomenal amount of data in a in a day. You know for most
organizations for archiving a single tape drive is enough there it's exceptional
to need to have to need more than that. Um so the amount of hardware that you
need really is dominated by the number of restores that you're doing and that's
one of the things where you want to get the scaling right. Yeah. Just so so you
need to know how you're going to use the system. That goes back to the kind of
planning of implementation is balanced to kind of data written out to what you
expect to kind of read. Yeah. The other big advantage of LTO is, you know, is
not only maybe the cost of the drive and the media per terabyte is, you know,
the reduce reduction in the cost of energy. Obviously, if you've got this big
library, you know, or even a small library, you can have, you know, and I guess
here's a question. Do you find I guess it's mainly broadcasters that have the
larger libraries and keep the content in the library whereas maybe more post-p
production you know production you know they might have a smaller library and
manage more cartridges out of it. Yeah. So we we are uh just about to launch a
product range which is based on an eight slot Mhm. medium changer loader. Um we
prefer that to a standalone tape drive because uh it can automatically manage
replication. So you know you you need to have two tapes available to do to do
replication. Um but you can use that and you we have customers that have very
small libraries um and pab many petabytes of data stored on a shelf and that
again is about knowing where the where the data is where the file is. So you
want to retrieve a particular file. Well, the tape will have a barcode label on
it which is machine readable but also human readable. And so you know you want
to retrieve a particular file. Mhm. You will get a message through our event
notification system that says put this tape with this barcode label back into
the library or back into a standalone drive and then you can retrieve the data
and you end up with a very low-cost archive. And I guess that's again that's
where you know a larger on-rem archive if you're looking at maybe like a disk on
premise object store versus a tape library I guess you know if you needed that
instant access to the data maybe you only need couple hundred ter maybe if your
archive runs into petabytes you could have this mix of kind of an on prem kind
of object storage system that connects into your archive you know. Yeah. Or you
can make the archive into the on-prem object storage system, you put you can put
an S3 interface onto the front of one of our archives and then it effectively
becomes an object storage. Just add more disk and that's your so you know if you
take maybe talking about sports teams they could keep a simplistic explanation.
You keep a season on you know last season stays on disk but it's also on tape
but provides that instant access. Yes. If customers need it. Yeah. And in terms
of the technology of writing to LTO, obviously uh with LTO5, um the linear tape
file system LTFS came, you know, was an was born out of the consortium and that
had the ability to make tapes interchangeable. But before that, it's probably
incorrect to say they were proprietary because they were written in, you know,
tar and pax formats and such which were kind of open formats. But I guess the
difference is they're not self-describing in in the same way as an LG. Well,
they are self-describing. Pax or TAR is self-describing, but you have to read
the whole tape to see what's on the tape. The real benefit that LTFS brings is
that there is an index at the front of the tape which can be which is updated
every time you add a file. So I don't know how many technical details you want
me to go into but they use they use something called partitioning which means
that you can divide the tape into essentially multiple subtapes if you like. Um
with LTFS you have two partitions. One of them is used to store the index and
one and the other one is used to store the data. And so when you put an L an
LTFS tape into a drive, you go to the index partition, you read the index, and
you've got the contents of the tape just kind of immediately available. So what
uh you looking at your customers at the moment, new customers, are they tending
to go with LTFS archives predominantly or almost everyone goes with LTFS? There
are edge cases where you might not want to um if you've got a very large number
of small files, the LTFS index overhead becomes a problem. Um but that's pro
probably the only really significant case where LTFS wouldn't be appropriate.
Yeah. And just wrapping up this sort of thing on LTO and how it's implemented.
Um I guess you know certain cloud providers that are prevail hot cloud storage
and there are a number of them including simply um tend to make a big fuss about
migration fact you've got to migrate you know so I think a lot of that is fear
uncertainty and doubt from a sales and marketing point of view but you know it
would be just interesting to get your thoughts on migration you know that could
be a whole topic in itself in a in a podcast, but just you know as a general
overview you know Zen you know the XenData fast system that has strategies to
help with it. So is is it all the pain and anxiety that you know people make it
out to be or when um I'm going to answer that in two different ways. When we
were working for Plasmon, we were selling optical discs and we were saying these
have a lifetime of 50 years and we could kind of prove that they had a lifetime
of 50 years because you do accelerated age aging with and you do something
called arinius testing and so on and you say yes the data will be there 50 years
later. Well, the data is still there right now. Yeah. But what are you going to
read it with? Yeah. So, and I think that is the problem with every type of
digital storage. You know, there are possibly some exceptions, but how many
digital products can you buy today that are still at version one when they've
been for sale for 5 years, let alone 50? Yeah. So, I think migration is a
necessary evil. It also brings benefits because if you migrate from LTO5 to LTO
10, you end up with a much smaller physical footprint amount of space used. Um,
you're probably going to have to refresh the hardware anyway because again what
there are tape drives working that are 50 years old, but it's quite expensive to
keep them working. So the key is to make it easy to do. So you build tools in
and up. Yeah, product has tools built in that just make a migration easy. Yeah,
we've stolen a lot of my thunder from sort of questions later on, but we will
come back to migration as a a topic kind of towards the end or really the
relevance of if I've written data now, can I read it in a 100 years time, but
that's really interesting. We're going to pick that up. Security and compliance,
I feel, is a is something that is really important to data protection in
general. Again, I think traditionally in the media environment, maybe pre the
Sony hack, obviously that was a long time ago, but that was a real wakeup call
in the industry that everyone uses password colon one or left workstations
unlocked and there's really nothing, you know, nothing to stop anybody getting
access. Obviously times have changed and that's that's sort of moved on but you
know do you feel that media and entertainment companies you know small to large
are really taking that kind of ransomware threat seriously or do they think well
it's not going to be me I know XenData published an article we were chatting
about this a few weeks ago on LinkedIn that you guys have put out of an article
you know about ransomware you know how was that received and you know so Phil
published a blog about ransomware and about our approach to ransomware. He's got
I don't know how many LinkedIn followers he's got, but it's like in many
hundreds half of them read it. Yeah. Half of his entire population on LinkedIn
read that blog. Yeah. We've never seen that before. We have probably at least
three times a month and sometimes more than that, one of our customers will
experience a ransomware attack. Um typically with tape it has all the advantages
of air gap and so on. Uh especially if you have an offsite copy which is you
know they are remotely completely inaccessible you've got the data. Mhm. Um,
typically what happens in a ransomware attack, and as I said, we've seen a lot,
is that the file is copied and encrypted. Mhm. And then the original file is
deleted. That's what render does. Mhm. With a tape, you don't you can't delete a
f. You can delete the whole tape. You can't delete an individual file on a tape.
So what happens with a tape is that a new version of the file gets written to
the end of the tape or to a new tape. Yeah. Our file system has always had
version control built into it. So you in file explorer on Windows, you right
click on the file and you can get the complete version history of the file. And
so what we do when our customers suffer a ransomware attack is we just roll back
time. Yeah. and our support people have developed strategies for doing that that
are very effective. Yeah. So normally and and as I said especially if there's an
offline copy uh we the AR we can recover the archive very quickly that doesn't
mean the customer recovers quickly because their whole infrastructure has been
compromised. Compromised. Yeah. Yeah. Typically, it's throwing out servers and
they or at least throwing out discs and SSDs and and a complete rebuild for them
be up and running. But interesting to know that, you know, at least from the,
you know, maybe the, as we talked about at the at the beginning, the business
critical information, their lifeblood, if it's media company, the content they
own is ultimately secure on tape. Obviously we've been talking about the you
know the natural resilience of tape to ransomware and malware obviously that
offline copy but do you think in this kind of more modern era again maybe going
back to the you know cloud naysayers about tape you know you hear this you know
media where you know LTO's where media goes to die and these kind of things that
you see on LinkedIn do you think sometimes that kind of air gapping you know
these companies promise immutable storage which is effectively worm in the cloud
and things do you think you know tape is sometimes being overlooked for these
kind of always on archives and do people fully understand that you know they can
still be locked out of their cloud account you know if you're hit by a
ransomware attack everything's up for grabs really. Yeah. Um well, you know, if
you're storing data in the cloud, ultimately you are storing it on tape. Yeah.
As well, because there is there is a a copy on tape and and if you look at
things like uh the the Glacia type or the um Microsoft archival tier, they have
characteristics that look very like tape. I'm not saying I know how they do it,
but um yeah, ultimately that dollar a terabyte Yeah. is only achievable on tape.
Yes. And maybe this isn't so much for you know we've been largely talking about
media entertainment but uh compliance and obviously we're talking about medical
uh records and you know maybe financial data these kind of things. So in
archiving I guess you have to consider in in these sort of situations who has
access to the data you know like with any primary storage equally important for
your archival storage. And I guess secondly is you know if you're looking at
medical data maybe your echo cardiogram you know it's not relevant anymore how
do you ensure that if you're going to delete it that it's safely deleted you
know you know audit processes and things like that um tape is a timeline a tape
is a timeline a file gets written a file gets written a file gets written a file
gets written if you and and so all the all the data on that tape, depending on
how much data you've got, all the data on that tape is from about the same time.
Yeah. So when you get to your seven-year deadline or whatever it is, destroy the
tape. You can destroy the data. Yeah. And that's pretty simple to do. Yeah.
Yeah. Okay. I I know there are companies out there that do it. You can either
sort of shred it or hit it with a big magnetic charge and it kind of disperses
the servo tracks. EMP. Yeah. EMP. So yeah, I know they do it for hard drives and
such. So yeah. Well, I think we're doing really well. Covered a lot of topics,
but one thing that I've started to see more in the news, you storage news at the
moment is um what's next for storage technologies, what's next for specifically
archiving. you know, if we've talked about all the data in the cloud, whether
it's probably on Facebook or Google, wherever it sits, AWS, they're putting
they're securing the data on some sort of cold storage tier. And that cold
storage tier is, as you would know, is just ballooning, you know, and I guess
even generative AI is, you know, our friend in this case, you know, even
churning out even more Yeah. even more, you know, media to be archived. You
mentioned obviously you worked at um Plasmon uh which were uh you know leading
an optical charge back uh back in the late 90s and into 2000s. Um obviously
they're no longer around. Sony uh had an optical archive system which I know you
said you guys support customers using. you know what do you think, you know, in
that Blu-ray, for one of a better word, that Blu-ray type disc. Do you think
that's kind of dead now? You know, do you think there's, you know, I I don't see
any kind of commercial? Haven't seen really anything. There's different glass
technologies and we'll go on to talk about that, but that kind of compact disc
type. So, the DVD type storage, if you want to call it, that would be a great
type storage. Yeah. Um it is it's individual discs however they're packaged. You
have multiple discs in one cartridge. But the fact that you've got multiple
discs in you've got it's cartridge based system means that it has you know the
disadvantage of tape in terms of it's not instantly accessible. Mhm. Um it's it
struggles to be as quick as tape or to be as inexpensive as tape. Yeah. Um just
be I I don't really know fundamentally why but that's just an observation. could
never really hit the performance, never get to the kind of capacity, although
they did have the advantage of this, you know, 100 year, 50 year, you know, blah
blah blah. If you can find a drive to put it in. Indeed, if you can find a drive
to put it in. And that kind of brings us on to maybe some more challenges that,
you know, I'm going to talk about um uh cerabyte, I think is how it's called. Um
and that's a glass system with like a ceramic nano layer and they use kind of
Yeah. electron beams or you know don't understand the technology but they're
they've got some big investment. Western Digital have come into them into
investing in them. Um you know obviously they've got money to spend on these
kind of projects. Uh and they're talking about a go-to commercial product in a
few years time um you know and being able to store data for thousands of years.
But it kind of strikes me for that and the project silica which is kind of a
Microsoft I've heard it kind of dubbed the superman crystal kind of you know
crystal in a thing and read your data. Again I think they use lasers opposed to
you know ceramics and electron beams but you know broadly the same type of
concept of we can store your data for thousands of years sort of type product.
Yes. Um, so there there's a what you're talking about there is a range of
essentially optical products. So holographic based or something like that. Yeah.
I think I think you know talking about that where those guys are going is this
kind of right once read in 10,000 5,000 years which obviously we're going to
come back and talk about how you read it. Yeah. But it kind of strikes me for
those guys um is how would they, you know, that all the automation has to be
built from scratch, I guess. And I guess they have to do that. You know, you're
going to have to have a, you know, a project syllabus. Well, unless you make it
look like a tape. Yeah. Which is what exactly the next company I'm going to talk
about is a UK company called Hollow Me. And that, as I understand it from their
website, is exactly what they've done is they're um they've got a polymer,
they're writing on a tape, and that tape is the same format as an LTO cartridge.
And I believe they're, you know, they're planning to build drives, which could
just be used to replace LTO drives in a in a library. Yes. So, and so hopefully
it'll support partitioning. Yeah. And hopefully it will you'll be able to write
to it with RTFS for example. I see no reason why not. And you know when that
product becomes commercially available we will support it if we can and I and I
expect to be able to. Yeah. Um so that's one way of doing it. But the broader
you know holographic stories has been around for a long time. I um did some work
with a university group in Germany on holographic storage. They had a product
that could store a gigabyte in a cubic centimeter or something like that. And it
did work. Um and we were thinking about commercializing it and didn't in the end
because we felt that it wasn't the capacity wasn't there really was it um so as
you said a lot of big companies are putting a lot of money into I think
fundamentally optical storage and I I'm kind of that's fine with me because the
just being able to store however much data in however much volume does not give
you an archive. Yeah. Um so that's going to need something around it to control
maybe moving individual discs or whatever it is and to make it interface to the
rest of the system to so that you know which of this silicate discs has got your
file on it. All back to where's my data, where's my file, you know, whether it's
written on X, Y, or Z. So any and all of those technologies as far as I'm
concerned has the potential to be viable. Mhm. And has the potential to be used
in a in an archive and should fit in Yeah. fairly seamlessly into one of our
systems. But and the best strategy for them would be if you were giving them
advice would be to kind of emulate LTA because that's just and that's probably
also where Sony maybe went you maybe had trouble is that you know their
interface was different you know and had to be specifically coded for maybe more
difficult to adopt. Um or no comment I can't I did it. Yeah we did it. Mhm. Um
Sony uses a used a ISO 13346 format which is a standard for optical discs. I sat
on the standardization committee for that format. So I was quite well positioned
to be able to be able to support it. Um but I guess one thing you know and I
hate to keep going back to tape but as you probably guessed I don't pet passion
of mine. We can't forget tape as is going to still be around with us. I mean I
think three or four years ago Fujifilm and IBM have already put half a petabyte
on a tape with structin fit and you know that's the kind of road map for uh LTO
14 which is I guess late 2030s early 2040s. So it's not an inconsiderable amount
of tape density really. Yeah. And you know a few years ago I started to worry
that tape would lose its relevance because it would be easy to store all the
data in the world on on discs for example. Why did I worry? Yeah. Yes. It takes
clever technology. Obviously Seagate have got their hammer their heat assisted
recording medium out there but it's taken them some time to do it. the discs are
finally going to be hitting sort of channel. I know they've been supplying them
to kind of um you know the hypers scale environments but they're finally going
to be coming through to you and me as it were. Yeah. And there's SMR. Yes. For
discs as well which helps but has its own challenges. It does indeed. So, but
and this is kind of you know again probably another whole podcast is you know
aerial density on disc platters versus tapes and kind of it is a whole podcast
but the thing is that all of that research that goes into discs is going into
magnetic storage and what's tape magnetic storage so just briefly before we
leave this um this topic of future archiving of course one thing that I didn't I
did admit but it is a bit way out there although it is being investigated is you
know molecular storage DNA and I guess you know couple of grams of DNA can store
just truly immense amounts of data but you know I guess that's very specific my
cells has got it back well isn't it yeah so um but I talking actually again I
was talking to a guy at IBM and this is about you know can I read my data in in
in a 100 years and he gave an example of uh the state-of-the-art story digital
story from IBM in 1928 was a punch card. Yes. It had 80 characters about what 80
bytes. I I programmed using them. Oh, you programmed using them. Okay. Um and um
he said that uh by 1959 1960 uh the US National Archive or section had it had um
around 20 million of those cards stored. And I I think they're still stored
today. And that's I think if my maths is right about 1.6 gig. So it's warehouse
think Indiana Jones you know that kind of thing. And this comes back to the
relevance of h you know can you even read a punch card? But if you even if you
could read the punch card, what's the point of reading and can you understand
what's written? Can you understand what's written on it and what's the point of
the data? And this comes back to and you've kind of as I said you'd sort of
stolen my sort of finish to this artic part of the section was migration is
really the key. It doesn't matter whether you you you writing in the cloud or on
tape or optical operating systems change, interface changes, processes change.
Just think where technology has moved from 1928 to here to 2025. You know
migration is dare I say an a necessary evil or whatever word is you have to
migrate to keep your data and make sure your data remains relevant. Yes. Yeah.
Uh and the other thing you have to do is is make sure that your data is stored
in and this usually does happen but it's stored in a standardized format. So
there's no point in being able to retrieve the file if you don't understand the
contents of the file or if it is so proprietary that you know the company that
existed 200 years ago but doesn't anymore um you know you standards. Yeah,
absolutely. Well, we're kind of drawing towards the end, but I can't probably
finish up without um talking about the multi-billion dollar elephant in the
room, which is AI. Um, and I'm sure your customers are already using it or
thinking about implementing it. Of course, in in media and entertainment, it's
been implemented in some form for a while. Yes. Taken away kind of laborious
task like tagging and um, you know, aiding like facial recognition, object
recognition, speech to text, all those kind of things. I know, you know, there
are applications out there now that will help with slots to shot selections and
even kind of creating timelines. So I guess as a rough percentage how many c you
know out of your customers you know what sort of percentage are either using AI
or actively you know looking at impact I think they're active they're actively
looking at it I think they're using it I would say on a fairly small scale at
the moment um there are challenges so for example um you know you can imagine
wanting to extract metadata from an archive that's that So that you can it gives
you opens up the opportunity to surge more effectively. Yeah. So face extraction
face recognition is absolutely is a way of doing that. You can put a name to a
face and associate that with a particular file. Um speech to text is another
application where so because once you've got text you can search on it. But the
challenges are things like, you know, speech to text is 98% accurate, but the
problem is that the bits where it's not accurate are the bits that you're really
interested in. Yeah. Because there'd be unusual words. They're the place names
for a local TV station or something like that. And so there is ongoing work and
we are doing some work ourselves in in in in this area to try and figure out how
to use it most effectively. And I guess you probably have the situation is that
maybe even early adopters of AI and they're running facial recognition are
running some sort of language model against their archive already or about to
you know in three or four years time with a change in AI is probably going to
have to pull that data back and yeah and do it again or you might be able to you
know I don't know what the correct word is scrape or you know find you know find
more nuance data that you didn't know well with Um, with some of our more recent
systems, we are producing low resolution copies of video data as the data is
being ingested. The idea of that is um that you can you can store that on more
accessible storage tier. So if you produce a 2% proxy for example, you can
probably store that on hot storage of some sort. And then when you need to go
back and reprocess it, you can reprocess the low resolution proxy which is
probably good enough to still to extract a face or to extract the speech. So
that's AI in in terms of what you maybe the sharp end of the production or
content sort of market. What do you think AI could do for smarter role playing a
smarter role in actually archiving or retrieval? um you know maybe deciding what
gets archived, where it gets archived to, yeah, dare I say it, what gets
deleted. And I prefer you didn't say that. Um and what would you know what
systems need to be put in place to make you know an operator trust that the AI
is making those kind of right decisions on media? AI accountability is a really
hot topic basically and um delusions and wanting to please and all of this kind
of thing. Um and I so the in terms of the trust thing again I think that is an
active area of research um not something that we're involved with but being able
to make an account for what it's saying and what its sources are. Um, in terms
of using it for some of those other things, if um I many years ago sponsored
somebody to do a PhD in the plasma days on uh retrieval patterns and whether you
could look at a p look at the pattern of accesses to data and figure out what
was going to come next. Now that person called Dean still works for me today.
Um, back then we couldn't do that. It didn't we couldn't find any obvious way to
do it. Fundamentally what AI is doing is pattern matching. And so with the power
that we've got now, maybe you can do that and maybe there's a point to doing
that. I'm not sure that anyone's working on that, but it may be possible to do
something like that. Um, and possibly to tell you which to answer the storage t
question. So this this is the sort of thing that we look at quite frequently.
So, we'll keep it hot. Yeah. And that just looks like nobody's interested, but
we'll keep it anyway. We'll keep it very cold. A very interesting area. Yeah.
Yeah. Because there's a lot of software applications out there that Yeah. We'll
analyze a file system and say we haven't touched this and that this and that,
you know. So, is it, you know, great leap for AI to you some sort of algorithm
to learn your patterns and then say, well, you know, Mark doesn't need his
diving videos. He hasn't watched those for, you know, 10 years, you know, five
years really likes when the shark was right there, you know. Um, it's just
something that's interesting. AI seems to be coming creeping into everything is,
you know, yeah, is it is it just a matter of time before it starts taking and I
know I was being a bit joyful with deleting. I don't think anyone trusts it to
delete, but you know, in in doing that kind of Yeah. Again, it's just a
laborious task of someone to kind of hunt around and see what to archive. You
know, fundamentally, if you're, you know, if you're managing a multi-tier
storage system, because that's what we're doing, you need an algorithm that
decides what tier something should be on. And again, our more recent systems can
do look at access patterns and know that you've read this file four times in the
last we haven't released a product that actually uses this yet, but they do know
that you've accessed this file four times in the last month. Yeah. Um, and of
course they know how big the file is and so you can invent an algorithm that um
sort of multiplies the size of the file and divides it by how long it is since
you last looked at it or whatever. Yeah. Um and gives you a ranking for every
file in the system. And you can certainly invent more and more sophisticated
algorithms for doing that. and pattern matching is certainly a part of that. So
from a technology perspective, it sounds like yeah, that's where we'll be going.
Yeah. So IBC 2026. So uh we come to the end of our time. Um so I just wanted to
wrap up with a few final thoughts, I guess, if you will. Um, uh, probably the
first one I'd like to go with is your customers or in the market, what do you
think the biggest misconceptions are about digital archiving? I think the
biggest mistake that some of our customers make, that's not quite answering your
question, is that they don't use replication. Mhm. And that means that if you
damage a tape for whatever reason, you've lost access to that data. Yeah. And we
have customers to whom that happens. And they only lose one tape. It's not as if
they lose the whole archive, but we have customers that that happens to and you
ask them where the other copy is and they say, "No, we don't keep another copy."
And really, that's the beauty of LTO, you know, whatever 70, $80 or whatever it
is for an LTA9 tape. I mean, really, that's what value you're putting on your
data. Yeah. and it kind of comes back to the you know I think that's the common
misconception that I come across is I've archived it is data protected even
though I've kind of got one copy. Um and I guess and you can drop a tape. We've
seen that happen. Somebody drops a tape and it ends up spread all over the
server room. Yeah. Yeah. If not that you should take apart tapes, but there's
all sorts of springs and bits in them and it's very difficult to put it back
together. Yeah. Because I've tried. Yeah. Um, I know we talked about trends in
AI and trends in storage, but is there any particular trend in the market, you
know, that you're seeing that particularly excites you at the moment? It's like
this you this is new and this is going to be the future or I think it's a very
interesting thing is is the adoption of cloud and exactly how cloud is being
adopted and hybrid cloud and on prem archives offers some real opportunities and
I think that's an area that's developing is it is that in case of you know
proxies or lighter weight versions in in the cloud for collaboration or maybe as
we're talking about yeah AI tools to do their thing with and sort of syncing it
back or you exchanging metadata any kind of um collaboration cloud is a
fantastic use case um you know you could you can put a web server on the front
again of one of our archives and you get you then get the media port or you get
access to it and you can provide that access to so you know we are a cloud
provider in that sense. Um but any kind of collaboration cloud is ex is
extremely powerful but it is also the the off offsite copy. Yeah. But um cloud
as a DR copy if it's the only copy that you've you know you've got a you've got
an on-prem copy on disk or something like that and a cloud a cloud copy in in an
archive tier in Glacier or Declassia the retrieval costs are substantial. Sure.
If you We were talking about this not so long. Yeah. If you need to retrieve a
petabyte of data Mhm. from the cloud, it's going to cost you. Yeah. Not only
just a petabyte, even a few terabytes can expensive. A tape's worth. Yeah. You
can probably afford. Yeah. It might hurt, but you can afford it. Yeah. But a
thousand tapes worth is going to get rather expensive. And I think you know
without dwelling on yeah as we said cloud is really good for collaboration you
know it it could be conceived as an insurance policy but I know speaking to uh
some other companies that we work with they're saying people aren't just
considering that maybe it's a dollar or $2 a terabyte to store but when they've
got applications that are in this S3 connected everything's connected and those
applications start looking at data that's in these cold archive tiers that's,
you know, S3 command calls which are build and your kind of dollar or$2 dollars
a terabyte suddenly starts creeping up to well and again active archives we're
um about to publish a blog on this actually so I give a sneak preview um a
dollar a dollar a terabyte on um Glacia deep glacia if you're in the United
States that's what it costs if If you're in London, it costs 50% more. If you're
in Singapore, it costs twice as much. If you're in Sa Paulo, it costs four times
as much. So, location, location, location. Um, but if you then want to retrieve
10% of your archive a month, which is, you know, an act an active archive, you
know, your costs, again, depending on region, your costs increase quite
substantially. Yeah. So, and it's very difficult. You get surprises with cloud
pricing constantly. People get surprises. You didn't know that there was an
egress charge for example because it's on a different part of the price list or
you or as you said um API calls. Now, typically with large files, number of API
calls isn't that high compared with the storage cost. But there are a lot of
unexpected costs charges and you're looking at the bill and you there's not
really any way to kind of work out what's going on. Yeah. Exactly. Yeah. So
let's wrap up by going back to our quickfire questions at the start. So 321 data
protection absolutely relevant. Do you think tape is dead? No, it's not. You
stumbled on this one or that says no you stumbled. But can I push you for an
answer on what will be the go-to medium for archiving in 10 years? If I had a
big archive, it would be tape. Tape. Uh, and we were talking about a little bit
unfair, the hundred year archive. We've kind of covered that. And AI might be
managing our archives in 2026 at IBC. We'll see. Mark, it's been an absolute
pleasure to have you here. It's been great to be here really and hope to have
you back soon because I think there's many more topics we can discuss. Yeah.
Great. Perfect.
Contact your BackupWorks Account Rep today and ask about LTO Tape and Xendata for your backup and archiving at 866 801 2944
| |
| | |
|






|
|