ESG REPORT
Data De-duplication and Disk-to-Disk Backup Systems
Part II: Business Considerations
By Heidi Biggar January, 2008
Copyright
2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG REPORT
Data De-Duplication and Disk-to-Disk Backup Systems Part II: Business Considerations
Table of Contents
Table of Contents..................................................................................................................................................... i Introduction ............................................................................................................................................................. 1 Business Considerations ....................................................................................................................................... 1 Hard Dollars .......................................................................................................................................................... 1 Value of Increased Retention ................................................................................................................................ 2 Value of Operational Efficiencies .......................................................................................................................... 3 Time to Protection ................................................................................................................................................. 4 Total Cost of Recovery ......................................................................................................................................... 4 Questions to Ask Vendors ..................................................................................................................................... 5 ESG View ................................................................................................................................................................. 6 Appendix .................................................................................................................................................................. 7 Market Conditions ................................................................................................................................................. 7 Data De-Duplication Defined ................................................................................................................................ 8
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of the Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at (508)482-0188.
-iCopyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG REPORT
Data De-Duplication and Disk-to-Disk Backup Systems Part II: Business Considerations
Introduction
There is no question that data de-duplication is a game-changing technology that fundamentally alters the economics of disk backup, making it even more compelling option than tape for data protection. However, there are some important technological and business issues organizations should consider when evaluating products to ensure a “best fit” today and in the future. This paper—the second in the two-part ESG Series: “Data De-duplication and Disk-to-Disk Backup Systems”— examines the business considerations of implementing de-duplication, including the key components of a data de-duplication ROI analysis and a list of questions every organizations should ask potential vendors. For information on the technology considerations, please refer to Part I of this Series.
Business Considerations
One of the greatest qualities of data de-duplication is that its value is easy to quantify. If you can reduce the amount of capacity needed to store backup data by 10:1, 30:1, or greater, all you have to do is pull out your calculator and put a dollar amount to the cost-savings. However, while these numbers can be significant and may be enough for some organizations to move forward, they only tell part of the data de-duplication cost-savings story. A complete ROI analysis should include both the hard and soft cost-savings of deploying de-duplication. In fact, the soft costs alone—the value of increased retention, operational efficiencies and time to protection—can be very compelling.
1
Hard Dollars
The hard dollar costs are easy to determine. The first metric is the reduced capital cost of the D2D backup solution with and without data de-duplication. The D2D backup solution will require more actual capacity to store backups—in some cases, 20 or more times what the data de-duplication-enabled solution needs. There are other capital cost savings as well. D2D backup solutions can reduce the amount of tape infrastructure you acquire. Some end-users have totally eliminated tape, while others have reduced the number of tape libraries they maintain. If you want to perform remote replication between your primary and remote sites, then data de-duplication can significantly reduce your WAN bandwidth costs. Since there is ultimately less data backed up, you can effectively replicate data over long distances with far less bandwidth. Since WAN bandwidth is still expensive and a recurring cost, data de-duplication can significantly improve the economics of implemented remote backup and disaster recovery.
1
For more information about data de-duplication and how it works, please see the Appendix in this paper. -1Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG REPORT
Data De-Duplication and Disk-to-Disk Backup Systems Part II: Business Considerations FIGURE 1. REMOTE REPLICATION AND DATA DE-DUPLICATION
Source: Enterprise Strategy Group, 2008
It is also important to consider facility costs, which include power and cooling as well as floor space. Since you are using fewer disks, you are creating less heat and drawing less power. Again, at a 20:1 capacity reduction ratio, this can mean significant savings. In some cases, there are data centers that just can’t use up any more power—they are at or near their maximum limits. These companies should certainly evaluate data deduplication-enabled solutions. Additionally, floor space is at a premium. Data de-duplication-enabled solutions can reduce vertical growth by minimizing the amount of shelf space needed to store backup data. Users can eliminate some or all of their tape libraries by moving to disk, which will free up floor space. Data de-duplication enables you to use less capacity to store backup data, but it also reduces the amount of processing power, bandwidth and memory per GB. This impacts all of the aforementioned factors that make data de-duplication-enabled solutions easy to cost justify.
Value of Increased Retention
ESG has found that the majority of end-users still use tape backup as their main method of disaster recovery. However, we’ve also found that end-users consider the process of recovering data from tape to be slow, complex and unreliable. These two realities are clearly at odds with one another. Recovering a single file from tape can take several minutes, whereas recovering data from disk is instantaneous. Multiply this by dozens, hundreds and even thousands of files and the performance difference can be several hours, days and even weeks. Consider database tables that span multiple tapes and the process of trying to recover this information quickly. Consider the process of tape interleaving, which improves tape backup performance, but impacts restore performance because server data is spread randomly across the tapes. Additionally, recovery performance is greatly impacted by tape availability—whether it is within the library or offsite in a box somewhere far away. The fact that end-users are unsure whether they can actually recover 100% of their data from tape is another harsh reality. The very purpose of backing up your data is so you can recover if needed. Backing up data onto
-2Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG REPORT
Data De-Duplication and Disk-to-Disk Backup Systems Part II: Business Considerations
disk resolves recovery and reliability issues. Data de-duplication increases the amount of backup data that you can retain and extends the retention period. In effect, by using data de-duplication-enabled D2D backup solutions, you can eliminate the need for ever having to restore from tape again. That should be the objective of every IT organization—removing the slow, error prone, high touch tape process and replacing it with modern solutions that provide fast, reliable and automated protection. In effect, data de-duplication-enabled D2D backup solutions eliminate the risks and inefficiencies inherent in recovering from tape. Further, data de-duplication-enabled D2D backup provides a solution that meets the true needs of your data recovery requirements without the compromises you’ve come to accept with tape. The cost impacts of being able to rapidly and reliably recover data is harder to quantify than capital cost savings, but the implications range from inconvenience to complete data loss. Perhaps one of your employees had to wait a few hours to recover a lost file they were working on. If that file is unrecoverable, there is a price to be paid. In addition, what if that file contained valuable intellectual property that will be extremely difficult to recreate? What if there was specific litigation or an audit that required information within that file? Perhaps there was important information contained within that document that impacted a major business transaction or valuable research. These are the considerations that must be weighed against using outdated and archaic forms of data protection, especially when there are best-in-class D2D backup solutions in existence that address these issues without breaking your budget.
Value of Operational Efficiencies
Just minimizing—not even removing—the manual tasks of managing tape offer an immediate positive impact on productivity. This may be manifested in time savings that were previously dedicated to the day-to-day issues of managing the tape rotation process as well as any frantic scrambles to recover data in an emergency. D2D backup solutions are often used as a complement to tape. Companies often reduce the number/frequency of backups they perform to tape to once a week or even once a month, while daily backups are sent to disk. In some cases, D2D backup solutions may replace tape systems. ESG has found that a growing number of companies are actually considering this. Much of this will be contingent on best practices and governance of the company or organization. In some cases, removing tape systems is not an option based on regulations. However, for those companies that are not encumbered by these issues, tape removal is very attractive. The question arises—how do I protect my data from a major site disaster? Right now, if you are shipping tapes offsite as your main DR process, then you may need to consider implementing remote replication. Many D2D backup solutions support remote replication to another system at a remote site. As discussed previously, data de-duplication-enabled solutions can do this quickly and cost effectively. Recovering data from disk is instant—recovering from tape is not. The time to recovery is what is really important. Typically, data recovery is an urgent issue. It could take hours, days or even weeks to recover data fully from tape. This is becoming increasingly unacceptable and thanks to current technology developments, it is actually unnecessary to tolerate it. Data de-duplication-enabled D2D backup solutions allow end-users to retain data for longer periods of time, reducing and potentially eliminating the need to ever recover data from tape again. The end result is faster and more reliable recoveries of data. Data de-duplication-enabled solutions provide easier management than D2D backup solutions that don’t provide capacity optimization. Since traditional D2D backup solutions require more capacity, the process of managing those systems is inherently more complex. Capacity utilization will have to be monitored more often, backup data will need to be removed or new capacity added more frequently and you will still need to rely heavily on tape for recoveries. In many cases, operational costs outweigh capital costs. More importantly, there are always more projects that need IT personnel’s attention. By removing the mundane and time consuming process of managing tape, your team can focus on more important pursuits that help the business.
-3Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG REPORT
Data De-Duplication and Disk-to-Disk Backup Systems Part II: Business Considerations
Time to Protection
Time to Protection is important since it impacts how quickly you can get your data protected. A key value of data de-duplication is that it is easy. End-users don’t have to perform Herculean tasks to get the value out of data deduplication-enabled solutions. Data de-duplication should be invisible to the backup and recovery process. If it isn’t, then you need to re-evaluate the data de-duplication-enabled solution and consider another avenue. One of the big advantages of data deduplication-enabled solutions is the ability to replicate data with less bandwidth. This not only reduces cost, but also allows you to transfer and protect data much more quickly. If you had to send all of your backup data over the WAN, it could take several hours or even days. However, with data de-duplicationenabled solutions, the process should be several times faster. Thus, Time to Protection is more rapid and the safety of replicated data is guaranteed more quickly than non-data deduplication approaches.
Data De-Duplication ROI Analysis
Disk and Tape Cost Reduction Reduced Bandwidth Requirements Lower Power and Cooling Consumption Smaller Floor Space Footprint Reliable Data Recoveries Fast Recovery of Data Lower Operational Cost – Less Media Handling Time to Protection Lower Total Cost of Recovery
It is important to consider that it isn’t just an issue of time and how quickly you can protect data, but data de-duplication enabled solutions can actually enable a level of data protection that isn’t otherwise practical. ESG spoke with an end-user that implemented remote backup from Boston to Los Angeles using a data de-duplication-enabled solution. He said that without data de-duplication performing remote backups, these long distance backups would be too costly and require too much time to perform.
Total Cost of Recovery
When you add all of these elements together, the cost of recovery using tape or traditional D2D backup solutions compared to data de-duplication-enabled D2D solutions is as about a “no-brainer” situation as you can get in the data center. As a summary, the following are the cost saving data de-duplication-enabled elements: • • • • • • • • • • • Reduced disk capacity for data protection Potentially fewer D2D backup storage systems over time Fewer tapes or potential elimination of tapes Fewer tape libraries or potential elimination of tape libraries Reduced power and cooling costs More available floor space based on fewer D2D backup and tape systems Reduced WAN bandwidth costs More reliable data recoveries Faster data recoveries Less people hours managing tape and disk administration All of the above for each site
The Total Cost of Recovery (TCR) for data de-duplication-enabled D2D backup solutions is clearly far less than tape or D2D backup solutions that do not support capacity optimization.
-4Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG REPORT
Data De-Duplication and Disk-to-Disk Backup Systems Part II: Business Considerations
Questions to Ask Vendors
1. How many customers do you have using your product in production environments today? The number of customers is important to understand—whether the market is adopting a technology or not. From a product value perspective, quantity is quality. If there are only 10 customers and they have been in the market for 5 years, this is a red flag. If they have hundreds or thousands of customers, then you have market validation. Again, for newer solutions there will be fewer implementations. That is why it’s important to get customer references. 2. Can you provide us a cost saving analysis from companies similar to ours? Please include capital, operational and facilities cost savings. Vendors often talk about value, but hardly ever show you real numbers. Data de-duplication is easy to quantify, so ask the vendors to provide you with real data. This will help you better understand what cost savings you might obtain by using their products. Having more than one data point is important as well, since there are multiple variables to consider. 3. Can you provide us with some existing customers that we can talk to about working with you and your products? Talking to other users is always valuable. They can give you insight as to what to expect when you deploy a vendor’s solution. Of course, the vendor would only recommend a happy customer, but they will still share their real life perspective with you. 4. How disruptive will your product be to our environment? Implementing a new solution that provides real value to your company is always desirable, but at what cost? You need to understand if this new innovative solution will be overly disruptive to your environment. 5. How many hours a week does it take to support your solution? If the solution is complex and requires a great deal of manual management, then you need to consider whether you have the resources to support it. On the other hand, the solution may require little management, but either way, it’s important to find out. Additionally, ask for this data based on what their current customers are experiencing. Also ask about training—is it required or recommended? If the answer is yes, then that is a red flag. If the product is so easy, why do we need training? 6. What else does the vendor have to offer? Vendor selection should play a role in the decision-making process. It is important to understand the vendor’s business success and long term viability, their support capability, how well they communicate with you and what other services or products they could offer to you today and over time. You should also consider positive existing relationships with the vendor and/or system integrator.
-5Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG REPORT
Data De-Duplication and Disk-to-Disk Backup Systems Part II: Business Considerations
ESG View
Disasters will happen. They can range from a file being lost, to a data center being flooded, to an entire building being destroyed. Some of these incidents are common ones, such as file loss or data corruption, systems and infrastructure going down and becoming unavailable, disk drives failing, and user mistakes (remember, humans invented human error). Then there are the somewhat common incidents, including facility disasters involving flooding or fire. Even though the odds are less likely, there may be major natural disasters to contend with including earthquakes, tornadoes, and hurricanes. And there have been a few recent incidents of large geographic blackouts that take hours or even days to correct. The fact that most companies and organization still use tape as their primary defense against these events is troubling. Once, there was an economic rationalization to use it, but data de-duplication-enabled solutions invalidate this. There are end-users that are mandated to use tape for governance and regulatory reasons, but they can use data de-duplication-enabled solutions to augment their environments. For those companies not so encumbered, they should certainly consider data de-duplication to complement and even replace their tape systems. Data de-duplication is a powerful form of virtualization—the ability to logically view and manage physical assets for greater utilization and automation of otherwise manual tasks. Data de-duplication achieves both of these goals by significantly reducing the amount of capacity required to store backup data—5:1, 10:1, 20:1 and beyond. Additionally, data de-duplication reduces or even eliminates the need to manage tapes. Dealing with tape media management is archaic in this digital age. It is analogous to someone still stubbornly hand washing the dishes even though he or she has dishwasher right next to the sink. Tape will be around for some time to come. There are still governance and regulatory mandates that ensure its survival. Additionally, incumbency often trumps innovation and the most common change management policy is to not change anything. There is still a great deal of education that also needs to occur. Too few people know about data de-duplication or are aware of its abilities. Data de-duplication is very real and provides excellent value. ESG believes that it will become prevalent over time within D2D backup and all storage and application tiers. However, it is important to not only evaluate data de-duplication capabilities, but the entire product, customer references, market and company success. ESG encourages you to ask the questions outlined in this report in order to leverage the benefits that can certainly be derived by data de-duplication. Data de-duplication changes the data protection landscape and is one of the few categories that offer such a clear “no-brainer” value proposition.
-6Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG REPORT
Data De-Duplication and Disk-to-Disk Backup Systems Part II: Business Considerations
Appendix
Market Conditions
Disk-to-disk (D2D) backup, combined with data de-duplication, is an emerging category within the data protection ecosystem that ESG believes has the potential to change the entire landscape. D2D backup with data deduplication solutions minimizes the disk and/or the bandwidth capacities required to store and move data used for protection purposes. Data de-duplication solutions optimize physical storage and bandwidth by using less of each to protect your data. Why use less? Perhaps the first and most obvious answer is to reduce cost. By reducing capacity requirements, fewer disks are needed to store the same amount of effective data. This translates to less bandwidth being required to move and copy that data across the WAN. Beyond these cost reductions, there is perhaps an even more important reason to employ data de-duplication. By reducing the amount of storage and bandwidth required to protect data locally and remotely, organizations can significantly improve their levels of data protection and their ability to recover data quickly, reliably and cost effectively. Reducing the cost of the storage required for backup data in turn enables greater data protection and recoverability. For years, there has been a considerable disparity between the prices of tape and disk-based storage systems. As such, it was an economic “no-brainer” to store backups on tape. In fact, the cost delta between tape and disk was so dramatic that despite the inherent weaknesses of tape—which include complexity, unreliability and slow performance—it is still the preferred media for storing backup data today.
FIGURE 2. DISK-TO-DISK ADOPTION, US-BASED RATES
Has your organization implemented or plan on deploying a purposebuilt disk-to-disk backup solution within the next 12 to 24 months? (Percent of respondents, N = 163)
No, 36%
Yes, 64%
Source: Enterprise Strategy Group, 2006
A major market shift occurred when storage system vendors began supporting low cost, high-density ATA drives and the cost delta between disk and tape started to shrink significantly. Although the capital cost-savings still favored tape, the gap narrowed to a point where the operational impact of tape—including cost of management, unreliability issues and performance—had finally moved the value dial from tape to disk for many end-users. The market responded, and the disk-to-disk (D2D) backup market was born. At first, end-users performed backups to lower cost drives within their existing primary storage systems and this is still a popular process. Additionally, the development of new purpose-built solutions, such as D2D appliances and virtual tape libraries (VTL), created an
-7Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG REPORT
Data De-Duplication and Disk-to-Disk Backup Systems Part II: Business Considerations
entirely new market category. As shown in Figure 2, a recent survey conducted by ESG found that 64% of all respondents either have or are intending to implement a purpose-built D2D backup solution. This is a strong validation that these solutions are either replacing or complementing tape libraries. The reason that end-users are embracing D2D backup solutions include improved backup performance, eliminating tape media management issues, scalability, ease of management and cost. Another—and possibly the most important—advancement in D2D backup is data de-duplication. Our research found that 33% of all respondents consider data de-duplication an important capability in their D2D backup solution. ESG believes that this is an especially large percentage based on the fact that data de-duplication is an emerging technology still requiring a great deal of education and awareness. Data de-duplication’s value is even more Our research found that 33% of all compelling above and beyond the use of high respondents consider data de-duplication density SATA drives within disk-based storage an important capability in their D2D backup systems. End-users employing D2D backup solutions with data de-duplication are solution. ESG believes that this is an experiencing backup data capacity reductions of especially large percentage based on the 2 10, 20 and 30 times—possibly even more . fact that data de-duplication is an emerging Consider the economic value of this level of technology still requiring a great deal of reduction: it not only eliminates any delta education and awareness. between the capital costs of tape versus disk, but arguably swings the pendulum to the other side in disk’s favor. Add to this the operational efficiency, rapid and reliable recoveries and the elimination or reduction in tape management enabled by D2D backup solutions and you’ve got a compelling and evident value proposition. Data de-duplication is a game-changing technology. It enables D2D backup by lowering the overall cost of these types of solutions. De-duplication reduces the amount of redundant data that is backed up, which results in less capacity required to store that data. Additionally, companies can retain more backup data on disk for longer periods of time, which reduces and potentially eliminates th e need to recover data from tape. Where replication is supported, data can be more efficiently—and cost-effectively—moved between sites for disaster recovery. Data de-duplication offers landscape changing value that is easy to quantify, improves reliability, simplifies management and provides rapid recovery of data.
Data De-Duplication Defined
Though the technology behind it can be quite sophisticated, the concept of data de-duplication is simple. Data de-duplication is the process of examining data to identify any redundancy. In the context of backup data, we can make a strong supposition that there is a great deal of duplicate data. The same data keeps getting backed up over and over again, consuming more storage space and impacting cost, thereby creating a chain of inefficiency. The following example, though simple, illustrates the potential power of de-duplication: Let’s say that a 2 MB image has been embedded in a Word document and e-mailed to dozens of people. Ten of the people who receive that document take that image and embed it in other documents. In fact, the image is proliferated throughout the organization to the point where the image has been embedded in 200 other different documents. This creates 400 MB of additional capacity. With data de-duplication, only one copy of the image is stored, saving 400 MB that would otherwise be consumed. Now consider a 400 MB file that has been sent to multiple users. There might be ten full copies of weekly backups of that 400 MB, resulting in 4,000 MB (4 GB) of consumed storage. Reducing this to just the one unique copy is significant, saving the 4 GB that would otherwise be required to back up that same file multiple times.
2
Data de-duplication ratios will vary based on the backup data (amount of redundancy and data change rates), the backup policy (frequency of incremental and full backups) and the data de-duplication technology (size of data files/chunks/segments used). -8Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG REPORT
Data De-Duplication and Disk-to-Disk Backup Systems Part II: Business Considerations
Another example of data de-duplication’s value—this time at the file level—involves a PowerPoint presentation attached to an e-mail. If the e-mail is sent to multiple recipients and then forwarded to yet another set of recipients, data de-duplication technology can be used to store the presentation only once. Next, consider what happens when one of the e-mail recipients modifies a slide in the presentation and again forwards it to a group of colleagues. Advanced data de-duplication algorithms work at the sub-file level and can be used to store only the data associated with the changed slide. These examples include block or sub-block data de-duplication. This method works much like file level deduplication, but identifies common data in “chunks” or ”blocks” that are less than a file in size. This method is typically implemented in purpose-built solutions that are dedicated to finding and eliminating duplicate data within a file. What does all this mean in real-life terms? Through hands-on testing, ESG has found that data de-duplication technologies can provide 10 times, 20 times, 30 times and even greater reduction in capacity needed for backup. This means that companies can store 10 TB to 30 TB of backup data on 1 TB of physical disk capacity, which has potentially tremendous economic benefits. For one thing, it could eliminate any delta between the capital costs of tape versus disk, making disk storage a more viable option. Factor in the operational efficiencies of not having to move, store and manage redundant data thanks to de-duplication and the elimination or reduction of tape management provided by D2D backup solutions, and users can extract real value from de-duplicated D2D backup.
FIGURE 3. DATA DE-DUPLICATION
Redundant Data
Data De-Duplication Engine
Unique Data
Source: Enterprise Strategy Group, 2008
Data de-duplication ratios will vary based on the types of data involved and the frequency of full backups and retention. As a rule of thumb, ESG believes a 20:1 ratio—when combined with data compression—to be broadly achievable. Though ESG has seen data de-duplication ratios of 89:1 and there is potential for even greater reductions, do not feel disappointed if you do not achieve 20:1 or greater, since reductions of 5:1 or more are still extremely valuable.
-9Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG REPORT
Data De-Duplication and Disk-to-Disk Backup Systems Part II: Business Considerations
20 Asylum Street Milford, MA 01757 Tel: 508-482-0188 Fax: 508-482-0218 www.enterprisestrategygroup.com
- 10 Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.