Skip to content

Spectra Logic Backup and Recover Blog

Archive on the Rise

Gartner last month announced the results of an enterprise infrastructure survey conducted with over 1,000 large enterprises -http://www.gartner.com/it/page.jsp?id=1460213 – and they make interesting reading. According to respondents; data growth is the biggest data centre hardware infrastructure challenge for large enterprises. Now, this in itself is probably not surprising – vendors, end-users and other industry analysts have been talking about this challenge for some time. The inescapable truth is that storage demands are growing, and the answer lies somewhere between provisioning greater capacity and making more efficient use of the resources available. What is particularly striking is that 62% of respondents reported that they will be investing in data archiving or retirement by the end of 2011.

From Spectra Logic’s perspective it is particularly encouraging to see data archiving and retirement projects cited by respondents as the most popular response to the challenge of data growth.  Many of the conversations we had recently with end-users at SNW Europe centred around this theme. Backup is still important to customers – after all, disaster recovery will always be a key capability for IT and the wider business – but archiving is moving up the agenda (and rapidly so). Not only was archiving a hot topic of conversation on the show floor at SNW Europe, but our VP of Marketing & Product Management, Molly Rector  gave a very well received presentation entitled Active Archive: Data Protection for the Modern Data Center. Archiving is clearly making the transition from ‘nice to have’ to ‘business imperative’ – (Gartner will have other far cleverer terms for this I’m sure!)

While this is great news for Spectra Logic in terms of validating our position and viewpoint, it also points to a broader trend; customers are clearly beginning to look more closely at some kind of tiering strategy and/or data categorisation. Previously archiving and backup have often wrongly been lumped together under an all-encompassing tier sitting beneath production storage. I would hazard a guess that for a lot of end-user organisations ‘tiering’ has not got much more sophisticated than using disk for production / transactional data and tape for everything else. A number of technologies and drivers are forcing organisations to reassess this approach.

We can't overlook the rise of SSD (another hot topic at SNW), in this movement - it is becoming a viable option for enteprises, but current prices suggest that IT departments will have to carefully assess what data resides on that medium. This may be kicking off a trickle effect, which starts at the top and works its way down the storage hierarchy, with customers doing much closer mapping of data to storage medium and working out the best fit in terms of cost and performance.

Customers will also be looking at what data can be moved off disk altogether, and this is where archiving – specifically active archiving – comes into play. IT departments that investigate active archives will see that this approach is much less of a trade-off in terms of accessibility and performance when compared to disk than they may think. Customers will probably be shocked at just how much data they have sitting on disk which would be much more appropriately stored within an active archive setup. The data is still online and therefore still of value to the business, but on a much more cost-effective medium.

Everything points to a more sophisticated hierarchical approach to data management. Technologies like deduplication and thin-provisioning will play their part in facing up to the challenges caused by data growth, but ultimately a more radical shape-up of storage architectures is required, with active archives a new and very distinct layer.

Crying Wolf Over Data Breaches: How Active Archive Environments Can Help

The high importance of data protection is top of mind these days – specifically in light of some high profile cases of data loss in the UK. News of some potentially impending legislation this side of the pond has again drawn attention to the issue of how companies look after customer data.

The story that caught my eye is here – and covers news that a European Commission review of data laws will require data-breach notification from a wide range of businesses. Initially this will be aimed at telcos but there are no reasons I can see why the legislation will not be extended to other businesses.
When we talk data breaches we’re often talking about firewalls, DMZs, access control, encryption technology – the standard tools and techniques used to secure data within the corporate network. However, I also think this is very much a storage story as well – specifically in terms of how customers archive sensitive data.
 
If this legislation is passed we will need to find a happy balance between vigilance and pragmatism. What we don’t need is a situation where every single potential data breach is reported, causing panic every time there is the slightest possibility of information falling into the wrong hands. This will result in a situation very much like that faced by the ‘Boy who cried wolf’. People will soon turn off, and then the legislation becomes meaningless.  We need a system whereby organisations have a measured approach to assessing the extent of any potential breach and what data may have been compromised.
If we are going to achieve this balance then companies will have to put in place the procedures and technologies to give them a very granular view of what data is stored where. Helping customers achieve this for archived data is one of the reasons why Spectra Logic   became a founding member of the Active Archive Alliance. AAA has been set up to address some of the barriers which stop IT departments achieving the kind of satisfactory archiving architecture described above.
 
Much of the confusion around archiving has been caused by conflicting messages put out by vendors as well as a lack of integration between technologies at various levels of the overall archiving stack. Active archive environments are a better way to classify, manage and route data. From the point of creation, data in an active archive can be classified as sensitive (if necessary) and then managed within a framework of policies which govern where and how it should be stored, including the level of protection it should be given.
 
If Active Archives can help customers achieve these levels of granularity in the governance of archived data then we should be able to find a balance which makes this forthcoming legislation enforceable and valuable. Ideally we will get to the stage where data-breaches simply cannot happen but that is unrealistic. What should be realistic is having IT departments know exactly what data is where within their infrastructure and how it is being stored. This should ensure that we’re not inundated with ‘false-positive’ warnings and that when a company cries “Wolf!” the villagers lock their doors!

Backup, Archive, HSM - What's the Difference Anyway?

Part One or Two

One of the interesting things I have discovered since I have been talking with so many HPC customers is that the term “backup” is seldom used.  You might ask if they aren’t doing traditional backups, then why would we, a backup solutions provider,  want to talk to them. Well, first you need to fully understand the difference between backup and archive.  Archive is a word you will hear more often in the HPC and M&E environments, especially if there is data in excess of the petabyte range and large files that aren’t accessed frequently but need to be kept indefinitely. 

In this blog, which is the first of a two part series, I will provide some fundamental information that can help you differentiate backup from archive.  In the subsequent blog, part two, we will peel the covers back on the process that is different from backup and archive and similar to the traditional HSM (Hierarchical Storage Management). This information will prove to be valuable for those HPC or other data intensive customers who may claim that they don’t do backups.  Stay tuned for more on this subject later.

The differences between backup and archive:

Backup: simply refers to the creation of a copy of data and storing it somewhere for restoration in the event the original version of the data was compromised in some way.  We evangelize the concept of backups because we know, and most customers realize, that data can accidentally be deleted, corruption could occur, data loss, or even worse, a natural disaster could wipe out the entire data center.

Backup is simply safeguarding or protecting the data that is being used by duplicating that data.  This is usually done in a rotating cycle or through schedules including: daily incremental which are kept for seven days, a weekly full kept for a month, a monthly full kept for a year and a yearly full kept for seven years.  Although this process has proven effective and most of the backup applications on the market today are ideal for doing this, problems occur when you start having multiple copies of the same data consuming a lot more hardware than necessary, not to mention the associated costs of running and managing that hardware. 

With backup – think business continuity

One of the key differences when comparing backup strategies to  archiving, is the difficulty of singling out select files for long term retention.  Everything in the backup gets lumped into the large full backup at the end of the year or seven years and called an “archive”.  It may in fact be called an archive but a recovery would function more like a backup recovery, which could be very costly and time consuming.  Backup strategies are more for business continuity purposes and not necessary for long term archiving.

With archive – think long-term retention

Archive: The main difference between an archive and a backup is that an archive refers to a single collection of records or data that is designated for long-term retention.  When the data is moved from the production environment to the archive environment it is tagged or indexed by metadata that assists in quickly locating that particular file or chunk of data through a search mechanism.  This process and the sophisticated software that performs it make locating a single file much more efficient than it would be in a traditional backup.  An archive is generally found in a common file system structure and the determination of where the file is located is a function of file system.  The file system may have several different storage devices that the archived data is stored on based on a number of attributes such as size, type, last accessed, etc.  This system could be a combination of expensive disk, such as fiber channel, less expensive disk, such as SATA or SAS and tape.  The key is how the data is “structured.”  In most cases, the data may never be accessed again, but it is necessary to keep it for historical purposes, regulatory compliance or unplanned event.  The goal with creating an archive is to keep it separate from the backup rotation cycle.  It is recommended that a separate copy of the archived data be made and kept in a separate location so there are at least two copies of the final archive.

Many environments will include both backup and archive.  Through the use of sophisticated software features that are available today, customers can establish policies that determine type, size, age, last accessed, remaining disk space and other characteristics of stored data that can automate the process of deciding whether to keep the data in the backup cycle or move it to the archive pool.

These two functions can be performed within a single library in separate partitions.  The software can then provide notification of what tapes need to be exported based on the function that was performed on those tapes, backup or archive.  I have seen numbers as high as 80% indicating how much data is duplicated within a storage infrastructure because the differences between backup and archive aren’t fully understood.  At the end of the day, knowing the difference and the benefits of backup and archive technologies, when to use them and how to balance the the two functions in an environment can drastically reduce the amount of redundancy, complexity and storage operating costs.

In Part Two of this discussion, we will look at how archives that contain production data, no matter how old or infrequently accessed, can still be retrieved online using high density and high speed tape systems and secondary disk systems.  Stay tuned for my next post which will look at enduring access to data. 

Want to talk more? I’ll be in Dearborn Michigan at the IDC HPC User Forum and DICE Alliance 2010 events next week. Contact me at jimm@spectralogic.com.

Almost Here: Famous Days in History

November 10, 1785:  Netherlands and France sign treaty… ahhhh, storage for all.

November 10, 1801:  Kentucky outlaws dueling… No more fighting for storage!

November 10, 1919: 1st observance of National Book Week… Need lots of storage for all those books.

November 10, 1946:  Communists win many seats at French parliamentary election…  Equal storage for everybody!

November 10, 1950:  Nobel for literature awarded to William Faulkner… Bill knows literature.  We know storage!

November 10, 1954:  Lieutenant Colonel John Strapp travels 632 MPH in a rocket sled… That’s fast.  So is our storage.
 
November 10, 1969: 
"Sesame Street" premieres on PBS TV… Simple.  Everybody gets it.  Just like our storage.

November 10, 1982:  IMF lends Mexico $3.8 billion due to threatened bankruptcy… Probably because they bought too much EXPENSIVE storage!

November 10, 1983:  Federal government shut down… Because they didn’t have enough storage?

November 10, 1989:  Germans begin demolishing Berlin Wall…  Achieving storage freedom!

November 10, 2009:  Spectra Logic announces something new…  More storage!  Storage for everybody!

See us tomorrow to find out what the next big thing in storage is and why you should get it.
 
 
 
November 10 dates in history courtesy of www.brainyhistory.com … Except for November 10, 2009 which is courtesy of Spectra Logic.
 

It's Coming!

One score and ten years ago, our forefather brought forth on this continent a groovy little storage company that’s about to whup some… backside.   And in this, our thirtieth year, we’ll do it with the biggest, baddest, box on the block.

Come see us November 16th at Super Computing 09 and find out how.

700 Percent Improvement in Tape Technology? Why, Yes.

Dear Ms. Meade:
I don’t see how you can talk about tape without pain. I’ve been dealing with tape for ten years, ever since we got a big library installed, and I have had nothing but problems. Tape without pain? Hah.
Signed,
You’ve Got to be Kidding

Dear Mr. Kidding,
Did you know that Model T’s were hard to start? Is that the problem you have with your car today?

For that matter, do you find that your ten-year-old computer runs too slowly and just doesn’t have enough memory?

Complaining about antique technology, such a a ten-year-old library, does seem rather silly—so you might want to update your data center and get some current tape and library technology.  The advances over that last decade in tape and its automation are substantial. For example, did you know that with LTO, technological advances have improved tape reliability* by more than 700% over the past decade?

I didn’t think you knew that. Most people don’t.

Tape has not done a very good job at advertising its own wonderfulness. (Yes, I know that I am anthropomorphizing magnetic media.) Tape and reliability are now no longer contradictory terms.  This is particularly true when you add the intelligence of automation to tape backup. For example, Spectra libraries track media health and other media secrets, giving you the inside edge on tape use and usability.

Some wonderful things have happened in the last ten years, including the widespread use of blogs and other types of social — and magnetic — media. You might want to catch up on the latest in technology.

* Beech, Debbie. “Best Practices for backup and long-term data retention” Sylvatica Whitepaper. The evolving role of disk and tape in the data center.  June 2009 

Ms. Meade E.A. Deftly Strikes Again

Dear Ms. Meade:
My tapes are defective. My supplier says that tapes are just like that and she can’t do anything about it.

I am sick of incomplete backups due to failed tapes. What should I do?
Signed,
Tape-Hater

Dear Tape-Hater:
As I understand it, your tapes are bad and you continue to buy them from whomever. Please consider that there may be some correlation between the source of the media and its quality problems. Perhaps we should examine some alternatives. One answer seems a little obvious, but I shall state it nonetheless.

Please investigate Spectra Certified Media with a lifetime guarantee*. Also, evaluate the possibility of using Spectra libraries, which tell you which tapes are failing so you can remove them from circulation before data is at risk or backups fail. That will free you to find something else to do with the time heretofore dedicated to tracking defective tape.  And perhaps you will cease to hate tape, and instead love Spectra Logic. I certainly do. (After all, they pay me.)

(*No text is complete without fine print, or in this case, italic print, so here it is: the lifetime guarantee does NOT mean your lifetime or the lifetime of the data or the lifetime of a planetary body –it means the tape’s lifetime as defined by the manufacturer and so on and so forth. For more details, please check www.SpectraLogic.com.)

 

Painless Backup is Not an Oxymoron


Painless backup is not an oxymoron. Spectra Logic has been continuously inventing features and design that promote painless data protection. These innovations have accompanied the steady improvement in tape technology—drives and media—over the last decade, during which technological advances have improved tape reliability by more than 700%. “Advances in the coating of tape film, read-after-write data verification and powerful error correction codes provide confidence in the integrity of data stored on tape. These robust tape cartridges are coupled with drive technology that features simpler tape paths and servo tracking systems to promote error-free tape handling.”1

Spectra products include both features and fundamental design that have contributed to Spectra Logic’s on-going mission to provide Tape Without Pain™.  The relevant hardware design components include:

  • Integrating connectivity to simplify installation and configuration. Benefit: easier to use/less frustrating to set up.
  • Using less data center real estate through high density/small footprint. Benefit: free data center space for other uses, reducing overhead costs specific to data protection.
  • Grouping media to reduce time spent handling individual tape cartridges—handle ten tapes at a time rather than one at a time. Benefit: much less tape juggling.
  • Designing the library modularly so customers have the option to quickly replace components rather than waiting for engineers to arrive at the site. Benefit: reduce downtime and frustration.
  • Providing a new way to scale libraries that moves existing components into a new chassis, reducing scaling costs and speeding the time it takes to scale a library. Benefit: reduce cost and save time.
  • Inventing nTier disk, which is designed specifically for secondary storage and seamless connection to tape, so that disk can handle tasks it is suited for, and tape can be dedicated to the tasks it does very well. Benefit: reduce frustration.
  • Creating a fail-over method to handle tape drive failure through global spare. Benefit: limit the effect problem drives have on data backup schedules and on data recovery.

To list just a few of the features that support tape without pain:

  • Providing  Media Lifecycle Management (MLM) that tracks tape health, letting IT staff identify failing or ailing tapes before data is at risk. Benefit: substantially increase backup/recovery effectiveness.
  • Providing a Hardware Lifecycle Management feature that monitors hardware use and recommends maintenance as appropriate. Benefit: no surprises.
  • Providing a method of submitting incidents online, so that data about the library and organization is at hand and appended to trouble-tickets when they are opened. Benefit: save time, reduce frustration.
  • Encrypting data while it is backed up and managing keys through a single interface. Benefit: simplifying encryption so IT staff can protect data at rest.
  • Giving customers the option of using remote support for troubleshooting. Benefit: reduce frustration, speed problem identification and time to resolution.

Spectra Logic's commitment over the long-term to backup without pain is supported by the release of features over the years, as shown here: 

Tape Without Pain is an achievable goal through the use of LTO tape technology and Spectra data protection products—libraries and disk.
 

1. Beech, Debbie. “Best Practices for backup and long-term data retention” Sylvatica Whitepaper. The evolving role of disk and tape in the data center. June 2009