Skip to content

Spectra Logic Backup and Recover Blog

LTFS versus TAR: Which one, or perhaps both?

by Matthew Star, CTO, Spectra Logic

LTFS, linear tape file system, is sometimes called long term filesytem. No matter what you call it; LTFS lets tape behave like removable disk. Having tested various LTFS applications, I can tell you it is shaping up to become the new standard in tape interchange, particularly in LTO-based archives. LTFS is an open standard that uses two partitions to split directory contents from its associated file data. But what about the other open formats like TAR (derived from “tape archive”), which are open and have compatibility across multiple platforms? How does LTFS stack up in comparison?

Let’s look at both. TAR is a formatted data archive, usually written to tape and designed around sequential media. LTFS is a format used to make tape look more like random access media to the user or consumer of the storage. So, which is better? It depends on your needs and risks profile. TAR has been around for 30 some years and is available in source or binary format on nearly any operating system (OS) imaginable. LTFS currently only runs on about three OS's. TAR is self-describing, but must be accessed in sequence. You really cannot know the whole content of the archives on tape without reading, at a mimimum, the headers of each archive. In other words, TAR requires that you read the whole tape. LTFS, on the other hand, stores its directory, or header, information on a separate partition and thereby only loads a very small amount of data to be able to fully describe the contents of the entire tape.

There are some downsides to using tapes as a random access device. First and foremost, tape was not designed for a random access pattern. So, writing millions of small files to an LTFS formatted tape, then attempting to retrieve every other file on that tape can be a recipe for disaster, as the performance of the drive decreases significantly. This is where TAR works really well, because TAR bundles all of those millions of tiny files into one archive which is then stored as a single file on a single tape. Plus, TAR can restore data as fast as the drive can read it. If on the other hand you are writing hundreds of larger files to tape and want random access to any one of these files, LTFS may just be the trick you’re looking for.

The other advantage LTFS has over TAR is LTFS’s ease of use after the applications and drive stack are installed. LTFS makes the tape look just like a large USB key. TAR must be used with a command line interface (like tar -tvf /dev/tape1) just to get the contents of the one archive on a single tape.

So which one would I use? Both or either--depending on the environment and my needs. I don’t believe you should consider LTFS over TAR as a solution to your petabyte archive. But if you want an easy way to move data from place to place or are deploying a smaller archive, you should review LTFS’s features and benefits.

CTO Insight: Big Data; Why Tape?

CTO Insight: Big Data: Why Tape?

 By Matt Starr, Spectra Logic’s CTO

I have watched the tape market’s growth over the last two years, which seems mostly due to the increasing number of archive installations.  With much larger system implementations projected through 2014, this growth will continue for the foreseeable future.  Military low-altitude and high-altitude video surveillance in countries like Afghanistan, the media and entertainment industry’s drive to 4K file data and the growth in PACS data are just a few of the many market segments driving the implementation of large archives. 

These are areas where dedupe and disk, in general, fall down, precisely because of the raw quantity of data involved--the disk resources required would be enormous, and use enormous quantities of power-- and the delays in time to deduplicate, then reduplicate is unacceptable.   

EMC’s recent “Big Data” news splash did not mention tape, which kind of shocked me!   (It’s only kind of shocking, as EMC is tape-hostile.) Tape is Big Data:  80% of the world’s data is stored on tape[1]and tape is the only media that can scale to exabyte(s) and still be cost effective.  In fact, tape is the only cost-effective method of storing Big Data.   Tape storage is denser than disk storage, costs less up-front and is ten times less expensive to operate over time than a disk-based solution.  I am not implying that disk does not have a play in the Big Data world; it is just not well suited as the “meat” of a storage environment.  

So, where does disk belong in this Big Data world?  First, disk works very well as the cache system that interacts directly with the user via a Filesystem, WebDAV, FTP or other front-end system.   Second, disk is the right platform for meta-data storage.  For far too long, users have been saving data as file names and not objects with meta-data.  As archives grow, object storage and meta-data will take the front seat in how data is stored.   Lastly, disk has an important role in helping to make stored data searchable: why would you store data if you cannot get it when you need it?   In my opinion, roughly 10% of the total archive space should be dedicated to meta-data and search.   Add another 10% of the total archive as disk space for cache, and the picture starts to come together.   Roughly 20% of your total archive should be disk, with the other 80% consisting of long lived, reliable, cost-effective tape.

Reliable? Yes. The facts are absolute and irrefutable-- tape is extremely reliable—more reliable than disk.  Tape’s error correction is 10 to -17thup to 10 to the -19thbits, which blows disk’s reliability[2]statistics out of the water.   Additionally, modern tape libraries have features like Spectra Logic’s Media Lifecycle Management that predictively informs the user about the health status of the tape as it being used. Features like this layer on reliability even beyond tape’s already high reliability.   Through MLM and other features (stay tuned for a few upcoming announcements this spring), Spectra’s T-Series libraries ensure that the data on the tape is intact and recoverable from the archive.

The architects and developers of data archives will continue to build systems based on disk and tape, not just disk.  When Big Data archives are based on disk alone, then one or more of the following scenarios is true:  1.) They are not a Big Data environment, but want to be (or think they are) 2.) They are wasting money and should be answering to their shareholders or voters.   3.) They have been mis-educated on tape.  In the end, tape is far from dead and will continue to prove itself as the ideal medium in the Big Data world.


[1] Moore, Fred. "When tape becomes mission critical: A white paper," META Group. February 2003.
http://findarticles.com/p/articles/mi_m0BRZ/is_2_23/ai_98709768/

[2] Tape reliability is “40,960 times greater than enterprise disk.” Newman, Henry, “Why Enterprise Tape Can't Get No Respect,” Enterprise Storage Forum, June 17, 2010
http://www.enterprisestorageforum.com/continuity/features/article.php/3888366