Skip to content

Spectra Logic Backup and Recover Blog

Primary Disk Deduplication's Impact on Backup

George Crump posed an interesting question when he asked if primary storage deduplication will kill archive and backup.  It is a great question, and one we should explore.  If you don’t want to read my ramblings, my short answer is no. 

There is a lot more to archive and backup than simply storing a lot of data, something deduplication has proven it can do well.  Backed up and archived data needs to be cataloged, indexes and managed through its life or retention.  That’s one of the reasons we don’t use tar and dump commands much these days.  Snapshots can remove much of the recovery burden from alternate storage devices.  I have seen customers recover almost all single file restores from snapshots.  But they never served as a replacement for backups.   As George said, we sleep better at night when copies of our data are on different systems.  There are lots of reasons for that.  We all worry about a bad firmware load.  If you have all your data on one array (or replicated to an identical one) a bad firmware release could wipe you out.  And of course there are physical failures. No matter how well designed a system is, something external can happen.  In the years I was in the field, I heard some unbelievable external failure stories where an non-IT event started the failure.   (Maybe I should start collecting them).

This leads me to conclude that proper architecture of a data storage environment includes dissimilar storage devices.  Your backup and DR copies need to be independent from production data, to prevent a cascading failure getting every copy.  For archive, the first copy could be on the primary storage platform, but the redundant copies (and all good archive systems maintain a minimum of 2 copies of the data) need the same.  It could be as easy as Spectra nTier disk and Spectra T-Series tape, or it could be more complex.  What it won't be is a single disk array for primary storage, archive and data protection.

Crying Wolf Over Data Breaches: How Active Archive Environments Can Help

The high importance of data protection is top of mind these days – specifically in light of some high profile cases of data loss in the UK. News of some potentially impending legislation this side of the pond has again drawn attention to the issue of how companies look after customer data.

The story that caught my eye is here – and covers news that a European Commission review of data laws will require data-breach notification from a wide range of businesses. Initially this will be aimed at telcos but there are no reasons I can see why the legislation will not be extended to other businesses.
When we talk data breaches we’re often talking about firewalls, DMZs, access control, encryption technology – the standard tools and techniques used to secure data within the corporate network. However, I also think this is very much a storage story as well – specifically in terms of how customers archive sensitive data.
 
If this legislation is passed we will need to find a happy balance between vigilance and pragmatism. What we don’t need is a situation where every single potential data breach is reported, causing panic every time there is the slightest possibility of information falling into the wrong hands. This will result in a situation very much like that faced by the ‘Boy who cried wolf’. People will soon turn off, and then the legislation becomes meaningless.  We need a system whereby organisations have a measured approach to assessing the extent of any potential breach and what data may have been compromised.
If we are going to achieve this balance then companies will have to put in place the procedures and technologies to give them a very granular view of what data is stored where. Helping customers achieve this for archived data is one of the reasons why Spectra Logic   became a founding member of the Active Archive Alliance. AAA has been set up to address some of the barriers which stop IT departments achieving the kind of satisfactory archiving architecture described above.
 
Much of the confusion around archiving has been caused by conflicting messages put out by vendors as well as a lack of integration between technologies at various levels of the overall archiving stack. Active archive environments are a better way to classify, manage and route data. From the point of creation, data in an active archive can be classified as sensitive (if necessary) and then managed within a framework of policies which govern where and how it should be stored, including the level of protection it should be given.
 
If Active Archives can help customers achieve these levels of granularity in the governance of archived data then we should be able to find a balance which makes this forthcoming legislation enforceable and valuable. Ideally we will get to the stage where data-breaches simply cannot happen but that is unrealistic. What should be realistic is having IT departments know exactly what data is where within their infrastructure and how it is being stored. This should ensure that we’re not inundated with ‘false-positive’ warnings and that when a company cries “Wolf!” the villagers lock their doors!

LTO-5 and T-Series Libraries - Another Trusted Milestone at Spectra

With LTO-5 drives released in our T-Series libraries it’s another trusted innovation milestone for us at Spectra. We keep our innovation conveyor cranking along here. As Spectra’s customers trust us to deliver valued tape innovation, millions trust LTO tape (> 3.3M drives sold, > 140M cartridges sold) with their data assets, now and for the long term. 

So I was dumbstruck (“a first”, some might say) when I recently read a commentary which chided the LTO consortium for “only” offering 1.5TB* native capacity on LTO-5. Seriously…
 
Meantime the analysts and customer feedback which we’ve received confirms that LTO-5 delivers an excellent blend of tape capacity, performance and feature innovation as it has done since the launch of LTO-1 a decade ago.
 
At Spectra we already knew that LTO-5 would be well received. The popularity of our unique LTO-5 pre-purchase program gave us good insight into the customer expectations. 
 
With each LTO generation doubling capacity (yes, give or take), improving throughput (1TB/hr compressed transfer rate for LTO-5) and providing useful management features (e.g. WORM, encryption, Media Partitioning) - our customers were eager to buy on the promise of future LTO-5 delivery. 
 
Back to the future and we’re delivering LTO-5 to our existing and new T-Series library customers. 
 
There are some deliberate synergies between LTO-5 and Spectra’s T-Series libraries. The increased capacity of LTO-5 combined with Spectra’s uniquely high density T-Series libraries relieves pressure on space-constrained datacenters now, and allows customers to plan confidently for data growth.
Low power and green data storage is synonymous with Spectra’s LTO tape solutions: T-Series libraries are the lowest power tape libraries, allowing customers to minimize their datacenter power budgets and manage their emissions.
 
For Spectra it’s all about customer-focused innovation. LTO-5 presents an excellent platform for Spectra to continue our unique approach to tape management with our BlueScale architecture and to extend the role of tape with new data management applications. 
 
With the trusted LTO roadmap now published to LTO-8, Spectra customers can be confident of continued development of high capacity, high performance, high reliability and cost-effective storage solutions. A trusted LTO roadmap and Spectra’s commitment to innovation: synergy at its best.
 
Now, aren’t we glad that our arithmetic is so much simpler for storage planning with LTO-5 at 1.5TB. Try multiplying by 1.6 when you’re next sizing storage needs…1.5 TB native capacity is good news for those of us who no longer carry slide rules or keep a pencil behind our ear. Now for LTO-6 we can expect 3.2TB native capacity…perhaps I will need my slide rule after all ;-)
 
*LTO-5 capacity was revised, and published, last year to be 1.5TB native. I’ve yet to speak to a customer or analyst who was either surprised or disappointed by the minor revision.    

http://www.spectralogic.com/LTO-5

 

More Storage Space, Please

I am amazed at how easy it is for people to consume all of their available storage space—and how quickly we can do it.  Just looking at my own storage situation, it is remarkable how much “stuff” I have to store; a snow blower, coolers, lots of car parts, bike parts, camping gear and who knows what else.   My old storage shed was recently hit by a tree, so I had to replace it this month.  I purchased the biggest one allowed by city zoning regulations, and already…. I realize I should have gone bigger. 

 
Storing data can be pretty similar.  No matter how much space you have, the data manages to grow to fill it.  Sometimes, the data starts to outgrow the new system before it is even installed.  The growth can seem exponential when you are managing data backup and archive processes, where 1 TB of primary data can grow to 20 TB of backup data, thus, the value of Deduplication.   Why do we have all of this data to begin with, and what we should do with it long term?, This might be the topic for a future blog post, but today, let’s focus on knowing we need space to store it and how to address that need with disk. Data center space constraints and a focus on cost savings cause storage admins to seek efficient, high density backup and archive disk solutions.  Spectra Logic’s new higher capacity nTier 700 disk appliance delivers on these requirements. 
 
Most disk arrays have some ability to grow and expand, but at the hidden cost of more rack space.  The Spectra nTier 700 can grow to 60 disks in a single 4U enclosure.  With Spectra Logic’s recent announcement of 2 TB drives in the nTier700, that totals  120 TB of capacity and 16GB of memory in a 4U chassis- for as low as $1.00 per GB. 
 
If you are up against a physical size and/or budgetary limitations in your data center— like I am in my back yard—Spectra Logic can help you scale to fit your needs.
 

The Active Archive Concept: why Tape?

The Active Archive Alliance formation was just announced last week. I wanted to take the opportunity to tell you how the concept originated, evolved and solidified, and why Spectra Logic is participating as a founding member.

I delivered a speech for Spectra Logic at the SC09 show in Portland Oregon in November and realized there is a new market trend that represents real customer needs that are not currently being well addressed for data storage and access.  Unstructured file data is rapidly growing (as we all know),   but budgets aren’t growing at the pace of data—which leaves customers needing to make a tradeoff between access to the data and budget limitations for new storage purchases. There is a real need for customers to be able to: 1) keep the data being created: 2) access and retrieve the data being created; and, 3) to do so affordably. At the SC09 presentation, I covered how far tape had come and evolved in functionality, reliability and intelligent feature sets over the past decade. A current HPC customer attended my session and mentioned a side conversation that took place with other attendees just after my Q&A. The prospect was curious to know if recent advancements in tape make it the perfect storage to pair with a file system interface to maintain access to data in their archive. They needed all of their data online and accessible for years into the future, though most of it was infrequently accessed. To this question, one of our current customers explained, “That’s exactly what we’re doing with our Spectra Logic tape library: using tape as storage for large amounts of file system data.” Through the show, we confirmed that using reliable, affordable tape as the data store behind a file system is both needed to solve data access problems within budgets available; and, most customers don’t realize it is possible to use tape to offload their file system data from the primary storage. We realized that we needed to unearth the myth that tape is just for backup and make it known how perfectly it suits file system archive. An idea was born.

Since November, we’ve been working with industry analysts, various partners and other experts on the topic and have since carved a real niche with which to serve our current and future prospects. Spectra Logic is a founder of the new Active Archive Alliance that will help bring best practices and solution education to end users on how to optimize and simplify data archive, and specifically, how to accomplish this with tape!

So why Spectra tape?
 
Spectra Logic has joined the alliance because our tape libraries are the perfect fit in an active archive. Several HPC and M&E customers already use our libraries as their file storage utilizing proprietary file management software. With the recent developments in applications that can run on standard operating systems, a tipping point has emerged. Spectra tape can now provide high availability on the hardware, can perform data integrity and media verification, and is fast enough to be utilized for file archive and access.  When you combine the new application functionalities to address tape for file archive with the latest developments in tape storage itself, Spectra Logic is now in a position to  build affordable tape-based active archive solutions for all sized organizations. Spectra’s T-Series tape libraries are uniquely suited to these environments due to their high performance, density, scalability, reliability and power efficiency.
 
I hope you will visit the new Active Archive Alliance web site at www.activearchive.com and provide feedback on how we can help you with your own archive solution.
 
If you have any questions about the Active Archive Alliance or Spectra tape libraries, feel free to reach out.
 
Join the conversation:
You can also follow the Active Archive Alliance’s updates on Facebook, LinkedIn and Twitter.

No shake of the dice here - Spectra Lands Its Second Product of the Year for T-Finity

Spectra Logic is excited to announce our second product of the year award for the Spectra T-Finity tape library. Brian Grainger of Spectra attended the second annual Data Intensive Computing Environment (DICE) Data Intensive Impact Awards event yesterday and accepted the award.

According to the press release issued at this week’s DICE Alliance 2010 event, the DICE Technical Advisory Panel  (TAP) selected showcase products and technologies that have enabled progress in HPC data management in locality, movement, manipulation and integrity, as well as power and cooling efficiencies.
 
This is what DICE TAP members had to say:
 
“Our team selected the T-Finity for its versatility in data intensive environments,” said Al Stutz, Avetec CIO and DICE team leader. “The product helps with the demanding archiving and backup environments experienced in the enterprise IT, federal, high performance computing (HPC) and media and entertainment space.”
 
“Spectra Logic’s new T-Finity tape library addresses the demanding storage and productivity requirements of HPC and enterprise markets,” said Steve Conway, IDC’s research vice president of technical computing. “It supplies multiple redundant components and unique features, while requiring only a single management interface, thus raising the bar on management simplicity within data-intensive environments.”

Cheers to our development team. We appreciate the recognition for our continued innovation and dedication to the high performance computing market.

 

Backup, Archive, HSM - What's the Difference Anyway?

Part One or Two

One of the interesting things I have discovered since I have been talking with so many HPC customers is that the term “backup” is seldom used.  You might ask if they aren’t doing traditional backups, then why would we, a backup solutions provider,  want to talk to them. Well, first you need to fully understand the difference between backup and archive.  Archive is a word you will hear more often in the HPC and M&E environments, especially if there is data in excess of the petabyte range and large files that aren’t accessed frequently but need to be kept indefinitely. 

In this blog, which is the first of a two part series, I will provide some fundamental information that can help you differentiate backup from archive.  In the subsequent blog, part two, we will peel the covers back on the process that is different from backup and archive and similar to the traditional HSM (Hierarchical Storage Management). This information will prove to be valuable for those HPC or other data intensive customers who may claim that they don’t do backups.  Stay tuned for more on this subject later.

The differences between backup and archive:

Backup: simply refers to the creation of a copy of data and storing it somewhere for restoration in the event the original version of the data was compromised in some way.  We evangelize the concept of backups because we know, and most customers realize, that data can accidentally be deleted, corruption could occur, data loss, or even worse, a natural disaster could wipe out the entire data center.

Backup is simply safeguarding or protecting the data that is being used by duplicating that data.  This is usually done in a rotating cycle or through schedules including: daily incremental which are kept for seven days, a weekly full kept for a month, a monthly full kept for a year and a yearly full kept for seven years.  Although this process has proven effective and most of the backup applications on the market today are ideal for doing this, problems occur when you start having multiple copies of the same data consuming a lot more hardware than necessary, not to mention the associated costs of running and managing that hardware. 

With backup – think business continuity

One of the key differences when comparing backup strategies to  archiving, is the difficulty of singling out select files for long term retention.  Everything in the backup gets lumped into the large full backup at the end of the year or seven years and called an “archive”.  It may in fact be called an archive but a recovery would function more like a backup recovery, which could be very costly and time consuming.  Backup strategies are more for business continuity purposes and not necessary for long term archiving.

With archive – think long-term retention

Archive: The main difference between an archive and a backup is that an archive refers to a single collection of records or data that is designated for long-term retention.  When the data is moved from the production environment to the archive environment it is tagged or indexed by metadata that assists in quickly locating that particular file or chunk of data through a search mechanism.  This process and the sophisticated software that performs it make locating a single file much more efficient than it would be in a traditional backup.  An archive is generally found in a common file system structure and the determination of where the file is located is a function of file system.  The file system may have several different storage devices that the archived data is stored on based on a number of attributes such as size, type, last accessed, etc.  This system could be a combination of expensive disk, such as fiber channel, less expensive disk, such as SATA or SAS and tape.  The key is how the data is “structured.”  In most cases, the data may never be accessed again, but it is necessary to keep it for historical purposes, regulatory compliance or unplanned event.  The goal with creating an archive is to keep it separate from the backup rotation cycle.  It is recommended that a separate copy of the archived data be made and kept in a separate location so there are at least two copies of the final archive.

Many environments will include both backup and archive.  Through the use of sophisticated software features that are available today, customers can establish policies that determine type, size, age, last accessed, remaining disk space and other characteristics of stored data that can automate the process of deciding whether to keep the data in the backup cycle or move it to the archive pool.

These two functions can be performed within a single library in separate partitions.  The software can then provide notification of what tapes need to be exported based on the function that was performed on those tapes, backup or archive.  I have seen numbers as high as 80% indicating how much data is duplicated within a storage infrastructure because the differences between backup and archive aren’t fully understood.  At the end of the day, knowing the difference and the benefits of backup and archive technologies, when to use them and how to balance the the two functions in an environment can drastically reduce the amount of redundancy, complexity and storage operating costs.

In Part Two of this discussion, we will look at how archives that contain production data, no matter how old or infrequently accessed, can still be retrieved online using high density and high speed tape systems and secondary disk systems.  Stay tuned for my next post which will look at enduring access to data. 

Want to talk more? I’ll be in Dearborn Michigan at the IDC HPC User Forum and DICE Alliance 2010 events next week. Contact me at jimm@spectralogic.com.

No Question About it: Sometimes Tape is the Answer

Dear Ms. Meade:
My data crunching company generates 2-3 TB of data per customer, and I need to store that somehow. However, I don’t have room for a tape library. The only thing I can think to do is put the data on some hard drives using Linux-based RAID software, then put the disk in a safety deposit box. Do you have any other suggestions?”
Sincerely,
Short on Space

Dear Ms. Space:
You have money and room for terabytes of disk storage, which you will squirrel away in a pretty large safety deposit box, but not a couple of dollars and rack units for a small library? Hmmm.

In pondering a polite answer to this question, Ms. Meade called to mind something similar posted on Slashdot, and is heartened that several intelligent points were discussed in that context. Being one to always encourage others in the path of light, Ms. Meade will summarize these intelligent comments and add to them.

The Short Version, by the way, in case you are averse to reading: Buy an LTO-4 tape drive and LTO-4 tapes. Forget the disk.

The Long Version: As fond as Ms. Meade is of disk, especially Spectra nTier disk, Ms. Meade understands that disk’s greatest asset is the speed at which it retrieves data—NOT its use for secure offline data storage.

Tape is cheaper than disk, even the disk to which you are likely referring. The Tape Equation: you can buy an LTO-4 tape drive for around $1400 (and likely for less), and at $40 per tape, store 800 GB of data; with these and a little compression, you are two tapes away from serious, long-term storage.  Assuming that you have more than a handful of customers annually, this pays for itself pretty rapidly, compared to purchasing cheap (and risky) jbod.

At $100/TB per hard drive and twenty customers each with 2 TB of compressed data, annually the company must shell out $4,000 per year. If, instead, the firm purchases a tape drive and LTO media, your costs are under that in just the first year-- about $2,000 for tape, and another $1,400 for the drive. You’ve paid for the drive in one year. After that, you save $60/TB. (That translates to thousands of dollars annually.)

You may want to consider a tape library, which truly are not space- or budget-hogs. Libraries such as the 4U Spectra T50e may be worth the space and time simply in convenience. This depends a great deal on your business volumes and staffing, and Ms. Meade acknowledges constraints due to current recessionary times. However, to emphasize the point: a relatively lightweight investment such as the purchase of a small library can automate data protection—and most companies that deal in data understand that their business also mandates data-caretaking.

For those unenlightened few who say that LTO tape is not a wise choice because eventually new technologies replace older ones, please consider that LTO has been around this past decade and shows no sign of going away. No migration will be necessary for years to come, given that current generations of LTO tape technology read data on tape that is two generations back, and write one generation back. With new generations about every 3 years, and giving the mobility of today’s clientele, the lifespan of at least 6-10 years is likely sufficient for your business requirements.

Frankly, the issue truly cries out for tape, and Ms. Meade is glad to add her voice to those doing the crying out.
 

Question: Can Disk Replace Tape? Answer: Unobtanium

Dear Ms. Meade,
I am charged with architecting a backup system without any single points of failure. Obviously, tape is SO failure-prone that I am not including it at all. How do you think I should configure such a system?
Sincerely,
Tape is Doomed

Dear Doomed,
You are doomed if you rely solely on disk for your data backup.  A possible interpretation of your question may be “How much disk does it take to replace tape?”  The answer is “unobtainium”—that is, you can’t replace tape using disk.

Further, the very concept of single point of failure is terribly funny in a terribly dark way. Failure is inevitable, unless you plan to address human imperfection? What about acts of natural and man-made disaster that may affect the national power grid? Switch problems? What about loose screws, including any screwed-up (or self-perceived screwed over) employee?

Instead, consider asking a question that does have an answer—“How can I reliably protect data?” The answer is “disk and tape.”

Ms. Meade is a major fan of disk with RAID 6, offered in Spectra’s nTier disk. With RAID 6, up to three disks can fail without affecting data integrity. Go disk and go RAID. However, disk (even with RAID 6) can’t be considered failure-proof because it has its own Achilles’ heel (aka single point of failure): the RAID controller. You can have all the data you want on all the spinning disk you want—but if the controller fails, the brains are gone, and the bits and bytes you’ve carefully protected are toast. Whither goest the RAID controller, so goeth the data. Dead controller= permanently decomposed data. So disk alone, even with the marvels of RAID, is not enough to provide true disaster recovery and continuity of operations.

Further, please note that your information about tape as failure-prone is completely wrong. Tape is, it turns out, incredibly reliable.  With tape’s reliability increase of 700% over the last decade, multiple layers of ECC protection, and smart Spectra libraries tracking media and drive health, tape meets and beats disk in terms of reliability. If you’re worried about a single point of failure,  make sure you get two tape drives. Consider the T950 and T-Finity libraries’ global spare feature—which is an installed drive that can be directed to take over in case of a drive failure.

Ms. Meade admits that she is curious about the pointy-haired boss who directed you to create the no single point of failure unobtanium backup environment….

 

More Innovation from Spectra with BlueScale 11.0

Seems like it was just yesterday when we announced and released our BlueScale 10.6 software. You remember that one don't you, when we introduced our Global Spare drive, an industry first in tape automation? Innovative, eh... 

But if you don't here's a link to BlueScale, Spectra's library and disk management software. BlueScale delivers a unique and feature-rich approach to tape library management, but don't take my word for it:  

http://www.spectralogic.com/index.cfm?fuseaction=products.showContentAndChildren&CatID=739&src=fly

One sure thing about Spectra is our committment to continuous innovation. BlueScale is a critical asset for Spectra in delivering customer benefits to mature and emerging market segments.

Why do customers care? Well, if you're buying a tape library it's safe to say that your needs and workflows (not to mention data storage volume etc.) may evolve while you own the library.

With on-going BlueScale feature innovation we help customers manage their tape automation proactively as change occurs, that's why continuous innovation is an important consideration for forward-thinking customers.

So what about this next BlueScale (11.0) release, what do you get?

In a nutshell you get more useful features to help you manage your data on tape. 

We've enhanced our media lifecycle management (MLM) capabilities with the ability to verify that a tape cartridge is ready to use - before you use it.  MLM PreScan checks the tape leader alignment, threading and ensure that the tape can be written when your host application needs to use it. 

Rest assured that when you come into the office you won't find that your scheduled job has failed for a trivial, yet commonly experienced, reason.

MLM PostScan has been also been added to ensure that your data on tape is there when you need it.  This is done by performing a background verification pass on your tapes.

MLM PreScan and MLM PostScan fit seamlessly into your operations as part of Spectra's integrated and complete media lifecycle management (MLM) capabilites.

But we're not done yet...we've now extended the easy-to-manage, proactive health status reporting capabilities of MLM to tape drives. 

Drive lifecycle management (DLM) shields customers from some of the complexities of day-to-day tape drive management with proactive, easy-to-read tape drive health status, easy-to-use tape drive verification tests and tape drive usage reports. 

Did I say easy? Did I say innovative?

We live to deliver customer benefits through innovation, watch this space...

More Entries