Skip to content

Spectra Logic Backup and Recover Blog

The Data Armageddon: It’s Time to Learn What You Don’t Know

When Thomas Gray inked the phrase, "Ignorance is Bliss, 'tis folly to be wise," I don’t think he considered how best to manage data in our present-day data Armageddon.   If you are a data manager and you adhere to the "ignorance is bliss" school of thought, I would recommend that you refresh your resume immediately!

I have spoken with too many people who have no idea of what is to come concerning the world’s rapid and exponentially growing data.  Believe it or not, I talked to a person at the Supercomputing show in Seattle who said they are actually moving all their data to disk and neglecting the tremendous, inherent values and benefits (low cost, high capacity and performance, to name a few) of tape.  As their data doubles each year, which he said it does, the plan is to continue adding more disk... Really?  In his case, I believe he really thinks ignorance is bliss.  I offered to share with him how customers with hundreds of terabytes to hundreds of petabytes are managing data with intelligent file systems and using both tape and disk in cost efficient ways and he refused to listen because his ignorance has caused him to believe that "tape is dead".  Granted, I don’t hear this very often anymore because the HPC community, as a whole, is paving the way for a cost-effective tape-based storage concept we will discuss later, called "Active Archive".  

First, I want to address the ignorance of the individuals who have sipped the "tape is dead" Kool- Aid from certain disk vendors over the past 10 years. Growing up as a teenager in the great state of Texas, I listened to AM radio in my first pickup truck.  (Yes, all it had was an AM radio!)  Anyway, one of my favorite radio talk shows was Mr. Earl Pitts, who addressed controversial topics and would start by sharing his straightforward opinion on them by saying (insert Texas accent)"Ya know what makes me sick, you know what makes me so angry I could spit?"… or something along those lines.  (http://www.youtube.com/watch?v=4DDhrRooNp4)  Then he would talk about something that is usually contradictory to the American way since he was a patriot who was always watching out for our true, red-blooded American values.  Well, I feel sort of like Earl when someone tells me that they think that tape is of no value, which simply shows their ignorance.  I want to say “you know makes me sick, you know what makes me so angry I could spit?".....Ignorance!  He would always end his lesson on values and truth by saying “Wake up America!”  Well, when someone tells me “tape is dead”, I want to grab them, shake them and say “Wake up!”

 

The reality today, regarding data storage, is that it is not folly to be wise and it is not bliss to be ignorant.  Wake up Storage Admins!  I have to admit that the number of people I talk to around the country at trade shows, in meetings, etc., are awake and aware of the ever present danger of data explosion.  So, needless to say, my blood pressure stays in check and I don’t get angry as often.  I try to keep things in perspective and just assume that they simply don’t know what they don’t know. 

 

My job, and that of my colleagues, both at Spectra and within the tape industry overall, is to educate as many people as possible about how to reduce the cost, complexity and fear of managing exponentially growing data.  Spectra is leading the charge to create an awareness of how valuable tape can now be in the data center.  Tape is no longer used just for backup.  It was great to see so many of our HPC customers at SC11, most of whom don’t even use the terminology of “backup” any longer.  As tape continued to mature over the last 10 years by getting 700% more reliable, faster and more dense, many of our HPC customers started leveraging the benefits of tape in what we call an “Active Archive”.  In other words, they are using tape as disk.  An active archive is a combination of open system applications, varying types of disk, and tape hardware that intelligently monitors and migrates data across multiple storage devices while maintaining fast user accessibility.  Traditionally, in the backup world, one could only access tapes and the data on them through a proprietary backup application such as NetBackup, Legato, Commvault, etc.  I’m not advocating that corporations discontinue backups all together because one should always have a “second” copy of data in the event of a disaster.  However, the premise of an active archive is that all data can be online all the time. 

Obviously, when someone has hundreds of terabytes or even petabytes, it is cost prohibitive to try and keep all data online all the time in the traditional way of keeping it all on primary or secondary disk.  With an active archive file system, the data can be dynamically distributed across multiple storage platforms including disk and tape.  Policies can determine where data is at any given time and it is transparent to the end user where that might be.  They simply have a drive letter and directory with all their files as normal.  Nothing proprietary about access to their data—anytime they need it.  By extending a file system across high performing disk, capacity disk and now tape, the need for IT intervention to retrieve an archived file is minimized, if not eliminated.  This data management approach is being used by many of our HPC customers and they are benefiting tremendously by having a searchable, compliant format to store data for the total lifecycle of a file based on policies, industry regulations and laws.

I could go on about the benefits of active archive or the inherent values that are characteristic of the tape technologies of today, but I would rather provide some links to more information on both so you can continue your own research and put aside any tendencies you might have to subscribe to the “ignorance is bliss” philosophy!  Tape is here to stay and is poised to solve your storage headaches today and in the future by offering greater efficiency, better reliability and maximum performance. So wake up!  Data Armageddon: tape’s got this one.

For more information on Active Archive, go to www.activearchive.com

For more information on Spectra Logic tape systems, go to www.spectralogic.com

I also welcome your emails to jimm@spectralogic.com

Airplane Talk

As I was bouncing around the country once again, I struck up a conversation with a complete stranger sitting next to me on the plane, which is my usual modus operandi.  Without knowing what industry I work in, he brought up the term "high performance computing" within the first minute of our friendly exchange.  Come to find out, the gentleman is a defense attorney to helicopter pilots involved in crashes. 

During the boarding process, he had his phone glued to his ear as he was engaged in a serious conversation with a couple of aeronautical engineers from Harvard. The engineers were conducting structural research on using multi-dimensional modeling techniques on super computers to help him build his case in determing why a helicopter recently crashed.  It became apparent to me that supercomputers continue to proliferate in our data-driven culture, and play a role in nearly every aspect of our everyday lives. 

Scientists, engineers and generally smart people continue to leverage the power of massive and distributed processers for calculation-intensive tasks such as quantum physics problems, weather forecasting, climateresearch, molecular modeling (computing the structures and properties of chemical compounds, biological macromolecules, polymers and crystals), and physical simulations (such as simulation of airplanes or helicopters in wind tunnels, simulation of the detonation of nuclear weapons, and research into nuclear fusion). 

You might be asking, what is the significance of all of this to me, to storage and to Spectra?  The way I see it, as supercomputers become more common, more and more data will continue to be created!  It also begs a few questions: Where does all that data go and how can it be preserved?  How can it be archived in a manner that makes it searchable and useable into the foreseeable future? As I ask that seemingly rhetorical question, I feel sort of like the famous Sweathog, Arnold Horshack, in Welcome Back, Kotter with my hand raised high in the air saying, “Ooh-ooh-ooooh, pick me Mister Kotter!"  Knowing what I know, I am ecstatic about the supercomputing revolution that we are experiencing because a large majority of the data generated, according to just about any of the more educated storage analysts you talk to, is going to be on tape.  And again, knowing what I know about Spectra and our track record for growth, profitability, and more importantly innovation over the past 32 years, our name is becoming synonymous with "enterprise" tape since we have the world's most scalable, and feature-rich tape system!  Even though I just revealed my age with the reference to Welcome Back, Kotter, I couldn't be more excited about the continued growth of the HPC market and the subsequent growth of the data explosion as a direct result of HPC.  If you can’t see the HPC market being a tremendous opportunity for continued tape growth because of the inherent characteristics of the most reliable, dense and economical media type, then "up your nose with a rubber hose!"  Of course, that is a line from my favorite Sweathog, Vinnie Barbarino!  Sorry if you are offended...wink

SC10 Recap and HPC Update

I think one of my favorite sayings I heard on more than one occasion and in various different words at SC10 this year was, “It is good to be Spectra!”  I heard this from customers, partners, VARs, analysts and other employees, and it is certainly true.  The tape war is on and by all accounts, Spectra is winning the war one battle at a time.  I heard so many times from so many people that we seem to be the dominant tape company in the market.  Tape is who we are.  It is what we do.  After 31 years of being in business, these truths are starting to manifest themselves in our continued success as a tape company.  The past few years, we have been experiencing much of our success and growth in the High Performance Computing industry not because of tape, but what we can do with tape.  Spectra has committed to continued research and development by investing millions into new features and functionality that are a direct result of our HPC customer requirements. We are therefore able to offer direct benefit and value to them.

SC10 continues to be one of Spectra’s best shows of the year.  This year’s event was no exception.  In addition to learning more about current trends in the HPC market in general, our Spectra executives and team were able to meet with hundreds of our HPC customers and potential customers.

As usual, Spectra was at SC10 in full force.  We had a very nice and large booth with a great location.  In the booth, we showcased a 5 frame T-Finity, T950, T380 and nTier system.  We also partnered with IBM HPSS and had a demo system of their software in the booth, which brought a lot of attention and interest.  Fortunately for us, the competition was nowhere to be found.  I was surprised that there were no other direct competitors at the event with any library systems in their booths.  IDC research shows that the high end HPC market grew by 65% in 2009 and is continuing on that path for many years to come.  Storage contributed to the largest percentage of that growth at almost 10% to just over 3 billion!  In addition to that, they cite that some of the major data center challenges are power, cooling, real estate and system management.  Storage and data management continue to grow in importance.  All these factors and challenges are continuing to increase data centers’ usage of tape, not just for backup, but for near line data archiving.  Through Spectra’s innovations, continued product development and commitment to customer service and support, we will continue to gain market share from the competition and continue to validate ourselves as the market leader.

In the spirit of continuing to be the leader in tape storage, Spectra has a roadmap a mile wide and 2 miles deep.  We continue to innovate, create and incorporate significant enhancements into our products in order to give our customers bigger, better and faster systems that meet their growing demands for performance and scalability.  Much of our continued focus on development is enhancing our T-Finity platform.  By quarter two of next year, we will incorporate the IBM Enterprise TS1130 drive in the T-Finity.  Many of our HPC customers have been anxiously awaiting this feature and we anticipate that the ability to include both LTO and Jaguar drives in our system will be a significant competitive advantage.  In addition to this, we are continuing development of our “fly-over” feature that will interconnect multiple 25 frame T-Finity systems’ allowing scalability that surpasses all other systems on the market today by a long shot!

Our marketing efforts this year were second to none.  We really pushed the active archive message and there was certainly a lot of buzz at the show about it. 

There were analyst meetings with the usual suspects such as IDC and Intersect 360.  They always want to spend time with us to find out what we are up to and it is always good to spend time with them to find out what the market is up to based on their research and findings.  Leigh Grainger always does an outstanding job arranging these important meetings and interviews.

Molly Rector’s speech on “Tape Takes on Mass Storage” had about 120 people in attendance, which was very obvious when our booth began buzzing with activity following her presentation. 

In Summary

The HPC industry is continuing to grow and data storage is the fastest growth component.  We have customers that are already planning on having an Exabyte of data by 2017.  This means that they will continue to double their data storage each year along with other HPC customers.  This explosive and exponential growth is going to demand a large-scale system that can accommodate and manage more data every year in a secure, energy efficient, reliable and space conservative manner.   Spectra understands the growing pains of storage and we are keenly aware of the associated challenges.  This is why we continue to develop products that address the challenges and meet the demanding requirements for data storage.

SC10 in New Orleans:

Best show ever!

Best booth ever!

Best team ever!

Most opportunities ever!

This will, once again, be our best year ever!

Backup, Archive, HSM - What's the Difference Anyway?

Part One or Two

One of the interesting things I have discovered since I have been talking with so many HPC customers is that the term “backup” is seldom used.  You might ask if they aren’t doing traditional backups, then why would we, a backup solutions provider,  want to talk to them. Well, first you need to fully understand the difference between backup and archive.  Archive is a word you will hear more often in the HPC and M&E environments, especially if there is data in excess of the petabyte range and large files that aren’t accessed frequently but need to be kept indefinitely. 

In this blog, which is the first of a two part series, I will provide some fundamental information that can help you differentiate backup from archive.  In the subsequent blog, part two, we will peel the covers back on the process that is different from backup and archive and similar to the traditional HSM (Hierarchical Storage Management). This information will prove to be valuable for those HPC or other data intensive customers who may claim that they don’t do backups.  Stay tuned for more on this subject later.

The differences between backup and archive:

Backup: simply refers to the creation of a copy of data and storing it somewhere for restoration in the event the original version of the data was compromised in some way.  We evangelize the concept of backups because we know, and most customers realize, that data can accidentally be deleted, corruption could occur, data loss, or even worse, a natural disaster could wipe out the entire data center.

Backup is simply safeguarding or protecting the data that is being used by duplicating that data.  This is usually done in a rotating cycle or through schedules including: daily incremental which are kept for seven days, a weekly full kept for a month, a monthly full kept for a year and a yearly full kept for seven years.  Although this process has proven effective and most of the backup applications on the market today are ideal for doing this, problems occur when you start having multiple copies of the same data consuming a lot more hardware than necessary, not to mention the associated costs of running and managing that hardware. 

With backup – think business continuity

One of the key differences when comparing backup strategies to  archiving, is the difficulty of singling out select files for long term retention.  Everything in the backup gets lumped into the large full backup at the end of the year or seven years and called an “archive”.  It may in fact be called an archive but a recovery would function more like a backup recovery, which could be very costly and time consuming.  Backup strategies are more for business continuity purposes and not necessary for long term archiving.

With archive – think long-term retention

Archive: The main difference between an archive and a backup is that an archive refers to a single collection of records or data that is designated for long-term retention.  When the data is moved from the production environment to the archive environment it is tagged or indexed by metadata that assists in quickly locating that particular file or chunk of data through a search mechanism.  This process and the sophisticated software that performs it make locating a single file much more efficient than it would be in a traditional backup.  An archive is generally found in a common file system structure and the determination of where the file is located is a function of file system.  The file system may have several different storage devices that the archived data is stored on based on a number of attributes such as size, type, last accessed, etc.  This system could be a combination of expensive disk, such as fiber channel, less expensive disk, such as SATA or SAS and tape.  The key is how the data is “structured.”  In most cases, the data may never be accessed again, but it is necessary to keep it for historical purposes, regulatory compliance or unplanned event.  The goal with creating an archive is to keep it separate from the backup rotation cycle.  It is recommended that a separate copy of the archived data be made and kept in a separate location so there are at least two copies of the final archive.

Many environments will include both backup and archive.  Through the use of sophisticated software features that are available today, customers can establish policies that determine type, size, age, last accessed, remaining disk space and other characteristics of stored data that can automate the process of deciding whether to keep the data in the backup cycle or move it to the archive pool.

These two functions can be performed within a single library in separate partitions.  The software can then provide notification of what tapes need to be exported based on the function that was performed on those tapes, backup or archive.  I have seen numbers as high as 80% indicating how much data is duplicated within a storage infrastructure because the differences between backup and archive aren’t fully understood.  At the end of the day, knowing the difference and the benefits of backup and archive technologies, when to use them and how to balance the the two functions in an environment can drastically reduce the amount of redundancy, complexity and storage operating costs.

In Part Two of this discussion, we will look at how archives that contain production data, no matter how old or infrequently accessed, can still be retrieved online using high density and high speed tape systems and secondary disk systems.  Stay tuned for my next post which will look at enduring access to data. 

Want to talk more? I’ll be in Dearborn Michigan at the IDC HPC User Forum and DICE Alliance 2010 events next week. Contact me at jimm@spectralogic.com.