Making the Most of Your Storage Budget ® an Storage eBook Contents… Making the Most of Your Storage Budget This content was adapted from Internet.com’s Enterprise Storage Forum Web site. Contributors: Drew Robb, Henry Newman, and Paul Shread. 2 2 What’s Selling In the Data Storage Market? 5 Despite Economy, Storage Bargains Hard to Find 5 7 7 How You Can Save on Data Storage Costs 10 Brother, Can You Spare a Petabyte? 10 13 13 Data Corruption: Dedupe’s Achilles Heel 1 Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget What’s Selling In the Data Storage Market? By Drew Robb W ith EMC’s SAN sales falling by about 20 per- source-based storage technologies are gaining ground as a cent in the first quarter and the rest of the cost-savings measure. data storage market also under pressure, observers could be forgiven for thinking that “What’s selling well are items that are essential to keep- nothing is selling out there. ing business running along with technologies that can help reduce costs, boost efficiency or productivity as opposed to But there were bright spots even within EMC’s report — discretionary or nice to have items,” said Greg Schulz, Stor- Celerra unified storage ar- ageIO Group founder and se- rays, for example, continue to nior analyst. “Dedupe, flash, sell at a double-digit rate — and virtual tape libraries are and other storage technolo- some obvious examples.” gies have managed to catch on in the downturn either Solid Sales for because they offer users Solid State a way to save on storage Schulz points to solid state costs or they offer such a drive (SSD, or flash technol- compelling value that users ogy) as having gained a lot of are willing to spend on them ground recently, particularly even in a tough economy. for read- and write-intensive applications as a way to boost Economic downturns can performance and efficiency. also be where dramatic This correlates well with what change occurs and buying storage vendors are saying. patterns shift — often per- manently. New darlings can Jim Cates, senior director of emerge and seize the storage development at Sun moment, displacing old Microsystems has noticed a faithfuls that are no longer regarded as current or cost-effec- big uptick in interest in SSDs. tive. And once changed by economic necessity, the new hab- its that emerge can become permanent. “While people used to use striped disk for high IOPS, they are now tending toward SSD,” said Cates. “The price of flash The big winners, this time, appear to be flash technol- is low enough that they want to use it for high IOPS data ogy and data deduplication, while some areas of the tape instead of DRAM, which is cost-prohibitive.” and Fibre Channel SAN markets are suffering. But even within those categories, there are specific areas which are Pat Wilkison, vice president of marketing and business thriving. And not surprisingly in the current environment, open development at SSD manufacturer STEC, echoes this view. The big winners, this time, appear to be flash technology and data deduplication… 2 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget But even within the SSD segment, buyers have changed their of its DXi series appliances, a strategy that is paying off. ways. Instead of purchasing some small flash drives, they Comparing its December quarter results to those in the now want larger models in order to get the greatest capacity previous quarter, disk and software sales increased about per dollar. 44 percent, while tape automation sales were roughly flat. For example, STEC’s ZeusIOPS product line features “Sale of Quantum’s deduplication technology — which goes different versions for different needs. At one extreme is the to market both through our branded DXi products and through highest Input/Output Operations per Second (IOPS) model, our OEMs — was the biggest driver for the 44 percent and at the other extreme is high capacity (40 percent more increase in disk and software sales,” said Steve Whitner, capacity, but still with decent IOPS). Thus STEC now offers disk product marketing manager for Quantum. “During 750GB products and is experiencing what it describes as the last quarter, we also benefited from the rapid market meaningful demand for 1.5TB sizes, which will begin shipping adoption of the EMC Disk Library products that include in the fourth quarter. Quantum’s dedupe software.” “We have noticed a bias toward cost savings and an Tape, Fibre Channel Hang On emerging market that doesn’t need breakneck I/O but Quantum and EMC, of course, made their money on old- wants higher-capacity, as SSD is better than cache,” said school storage platforms such as tape and Fibre Channel. Wilkison. “Our high-capacity, lower I/O market has grown from While those fields are hurting to some degree, they certainly 0 percent to 30 percent of total orders in recent months.” aren’t all bad. Like mainframes in the 1990s, tape’s demise exists mainly in the heads of competitors and pundits. Far EMC was the first storage vendor to market with SSDs and from falling off the cliff, tape technology retains a strong has been pushing flash drives as a means of enhancing user base. storage tiering. In the EMC vision, flash becomes the first tier, Fibre Channel disk is tier two and SATA becomes tier 3, “Tape continues to be leveraged for bulk data protection, a model that other vendors are also promoting. backup and archiving,” said Schulz. Users look to a few SSD drives for the most heavily Sun’s Cates noted that declines in tape are being felt in small utilized data, then a small amount of FC drives for less autoloaders and libraries with fewer than 50 cartridges. That utilized data, and low-cost, high-capacity SATA for the bulk market is being gobbled up by disk. On the other side of of information. As that latter information isn’t accessed too the coin, though, Sun is seeing some desire to upload large often — or isn’t mission-critical — it can comfortably reside repositories onto tape as opposed to trying to manage it on bulk SATA drives. all on disk. “It’s rare that they need more than a half-dozen to a dozen “We are also experiencing growth in consolidation flash drives,” said Ken Steinhardt, vice president and CTO of opportunities — moving several small libraries into one customer operations at EMC. “The last six months have seen centralized unit, as it is more cost-efficient,” said Cates. an acceleration of the usage of flash for the first tier.” “In addition, we are seeing an uptick in enterprise storage systems in general.” Deduping the Way to Profits Steinhardt has also noticed a marked shift toward deduplica- Recent stats compiled by Dell’Oro Group confirm this. Fourth tion technology. The massive amount of duplicate data in any quarter 2008 Fibre Channel sales were overall about even system makes this technology a compelling value proposition, with the prior quarter. Like every other sector of storage, he said. though, there were stronger and weaker elements to the market. Fibre Channel switch sales rose, for example, Just look at Data Domain, the dedupe pioneer that, even in primarily due to higher prices rather than volume of sales. a recession, was still growing at a 30 percent to 40 percent Users are clearly buying into the latest generation of rate before it was acquired by EMC. switches, with their new features available at a premium. This includes 8 Gbps Fibre Channel and Fibre Channel over EMC itself is pushing dedupe on several fronts, as are a Ethernet (FCoE). host of others. Quantum, a dedupe partner for both EMC and Dell, is heavily promoting the deduplication capabilities 3 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget Late last year, users began trials of Cisco’s Nexus 5000 declines reported in the server market. switch with FCoE software and adapters from Emulex and QLogic, said Tam Dell’Oro, president of Dell’Oro Group. “The Fibre Channel adapter market is not feeling the rewards of users migrating to the higher-priced, higher-featured On the downside, host bus adapter (HBA) numbers were products,” Dell’Oro said. “Instead, this market is char- down from both the previous quarter and the year-ago acterized by an increasing portion of lower-priced blade quarter. Dell’Oro believes this is a function of the significant server adapters.” 4 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget Despite Economy, Storage Bargains Hard to Find By Drew Robb Y ou’d think with the economy struggling that now Others report a steady if unspectacular decline in prices that would be a good time for a discount on that data began well before the current woes. storage array you’ve been coveting from the likes of EMC or NetApp, but so far storage pricing “Pricing was already dropping due to the arrival on the market seems to be holding up. of products built from general purpose server components,” said Jason Williams, CTO and COO of Digitar Inc. of Boise, Auto dealers and retailers offer big discounts to boost sales Idaho, a company that has saved a lot of money with com- when the economy sours, so why aren’t data storage compa- modity hardware and open source storage. “With more com- nies doing it? You’d think that with budgets being cut drasti- panies embracing those solutions due to financial pressures, cally, storage vendors and resellers would be offering fantas- traditional proprietary vendors are being forced to lower their tic savings on the cost of new prices to stay competitive.” hardware or software. But rebate and trade-in deals to Schulz, for instance, men- rival the automotive industry tions that 4Gb Fibre Channel don’t appear to be on the way SAN products have become just yet. more affordable, thanks to the arrival of 8Gb and Fibre “On some products and tech- Channel over Ethernet (FCoE) nologies, pricing is holding, lurking just around the corner. particularly for those where He has also noticed a price there continues to be strong drop in 10Gb Ethernet ports demand,” said Greg Schulz, as well as many midrange senior analyst and founder of storage systems, includ- StorageIO Group. ing those using high-perfor- mance Fibre Channel or SAS Mind you, there are some disk drives. deals to be had, and prices in general are heading south- “From a storage system ward. One anonymous user perspective, particularly for witnessed a deal with Com- entry-level solutions, some pellent and EMC where both were forced to drop their prices real bang for the buck can be found” in solutions such as the significantly on a midrange array due to price pressure from EMC Clariion AX4, Dell MD1000/3000 series, HP MSA2000, the Sun 7000 series. IBM DS3000, NetApp FAS2000 and Nexsan SATAbeast systems, among others, said Schulz. “Prudent buyers that Jim Dougherty, lead engineer at Plixer International Inc. can plan and leverage their purchasing plans and capacity of Sanford, Maine, is noticing more price cuts gradually plans have great opportunities to leverage current vendor creeping in. incentives and promotions,” he said. “You are seeing the better deals or leverage on the larger Similarly, Chris Beck, a network administrator for the City items/quotes,” he said. “The vendors know that the business of Fontana, Calif., has observed that products seem to is out there and will do whatever they can to obtain it.” be cheaper than before. He replaced an HP EVA 5000 5 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget with a Xiotech Emprise 7000 as the city’s core production also play one vendor against another — unfortunately, it has storage system. come to that.” “The cost of our Emprise 7000 was about a third of the cost So far he hasn’t seen much in the way of cut-rate storage. that we paid for our EVA 5000 back in 2001 — and the EVA What he has found, though, are desktop systems with more 5000 had less than half the capacity back then that it does bells and whistles at better prices than a year or two ago. He’s now,” said Beck. “Even the new EVA 8100 that was our sec- also seen some especially aggressive pricing in the antivirus ond option was less than half the cost of the original EVA that software market. we purchased.” “Newer companies tend to go after the pricing provided by Maintenance and Services Deals older, more established companies,” said Mueller. “We re- One area where the better deals are to be had appears to be ceived a bid from a newer player that offers us three years in services. According to Schulz and Williams, the bargains of coverage compared to what we paid for one year with the are often in multi-year support contracts. older company.” “Anything that will ensure multi-year revenue to a vendor Leasing, SaaS and can be used now as a great Open Source bargaining tool on the rest of All of this may add up to radi- the hardware in the deal,” said The cost of our cal changes in buying patterns Williams. over the long term. With Emprise 7000 was about a dollars for outright purchasing Tim Chester, CIO of Pep- growing tighter, leasing may perdine University in Malibu, third of the cost that we paid make a comeback. Calif., agrees. While he isn’t seeing much in the way of for our EVA 5000 back in “I’m seeing and hearing a hardware price cuts, what he pickup in leasing activity, is noticing more value add in 2001… the EVA 5000 had which has been rather light ongoing deals. This includes to non-existent for the past free consulting and more help less than half the capacity… several years, as a means of on implementation. He has stretching dollars and cash also noticed far more cold flow,” said Nickolett. calling from vendors, which bodes well for easier negotia- Other possible shifts in tions going forward. the market might appear in the areas of open source software and Software as a Resellers and consultants are noticing it too. Service (SaaS). “Companies are definitely pushing back on vendors wherever “SaaS and open source have become vehicles for newer possible,” said Chip Nickolett, owner of Comprehensive Con- vendors to provide a credible threat,” said Nickolett. sulting Solutions Inc. of Brookfield, Wisc. “Usually there is some threat of discontinuing use or migrating off a product He suggests a complete proof of concept effort to dem- used as leverage to renegotiate an existing multi-year agree- onstrate the technical capabilities of such alternatives — ment or achieve more favorable terms on renewals.” then create a plan to migrate 10 percent to 15 percent of your IT footprint to that platform as part of a strategic cost And don’t expect too much customer loyalty in the current reduction effort. climate. The likelihood is that users will lose their long- term preferences when a potential usurper provides a low “You’ll soon have your vendor’s attention,” said Nickolett. enough offer. “I personally believe that our current economic crisis is the change agent that will drive SaaS and open source to the next “I will go outside of normal channels to find that price,” said level of widespread enterprise adoption.” Rainer Mueller, IT analyst for the City of Encinitas, Calif. “I will 6 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget How You Can Save on Data Storage Costs By Drew Rob D ata storage has been a major and growing backup windows (33 percent) and time to restore reduced by part of IT budgets for many years, so it’s not 75 percent, this makes dedupe a relatively easy item to sell to surprising that cost-cutters have been taking even the most tightfisted CFO. a hard look at storage costs in the worst eco- nomic downturn in more than 50 years. A Data Domain customer, Great River Energy of Maple Grove, Minn., reported data compression rates as high as 100 to We’ve compiled a few tips and technologies that can help in one on some applications. As a result, restores were done in the new era of frugality. Some are free but may require some half the time and backup administration has been cut down investment in resources, while others promise a rapid return from one day per week to ten minutes, said Joe Gleason, IT on investment (ROI), in some cases offering a payback in less Systems Engineer for Great River Energy. than six months. The savings occurred on many Dedupe Express fronts, such as a 45 percent Two areas tend to dominate reduction in wattage. discussions about storage costs — flash-based solid “Compared to a scenario in state drives (SSDs) and data which we would expand on deduplication. our legacy tape library platform, this solution has Starting with the latter, provided us with significant dedupe has become almost power, cooling and data a badge of honor among center footprint advantages,” storage vendors. While Data said Gleason. Domain popularized the technology, it’s hard to find a Flash in the SAN storage vendor that doesn’t Like dedupe, flash drives are offer the technology these constantly in the news these days. The likelihood is that days. While they don’t match this technology will eventually up on a capacity/cost basis become standard for storage against Fibre Channel (FC) and backup purposes. disk, prices are dropping rapidly. “Data deduplication means you can squeeze a lot more “The price of flash is down 76 percent in the past year,” data into a lot less space,” said Mike Sparkes, product said Ken Steinhardt, vice president and CTO of customer marketing manager for entry disk systems at Quantum. operations at EMC. “Every day that goes by, it keeps “It helps you save money in numerous ways.” getting cheaper.” He gives the example of a small software development According to Pat Wilkison, vice president of marketing company with a single site. By installing a dedupe appliance and business development at SSD maker STEC, the most for $12,000, it saved that amount alone in tape media costs obvious value proposition is to use SSDs to reduce the in its first year. When you factor in a reduction in backup amount of memory needed by a system. Flash works out management efforts by 250 to 300 hours per year, shorter at orders of magnitude cheaper than RAM and has enough 7 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget capacity these days that an entire database can be saved on flash for super fast I/O. HP Sees Opportunity “Flash offers an immediate and tangible ROI, as you gain high performance and require less memory,” said Wilkison. in Data Deduplication By Paul Shread Tiering Up Over Flash Some vendors, like EMC, Compellent and Sun Microsys- H tems, have taken SSDs a step further as a new high-perfor- P sees the bidding war between mance storage tier. This reduces the need for Fibre Channel EMC and NetApp for Data Domain disks and speeds performance for the most mission-critical as evidence of the potential of applications. deduplication — and the company says it’s ready to do battle. Steinhardt suggests placing the most heavily utilized data on flash, then offloading it to FC for medium utilization, and “The size of the deal and the bidding be- running the bulk of data on low-priced SATA drives. tween EMC and NetApp was a testament to the size of the market opportunity,” “This new kind of system tiering is being driven by flash and said Kyle Fitze, HP’s marketing director low-cost, high-capacity SATA,” said Steinhardt. “This cuts for Storage Platforms. “We want to your power, cooling and space costs, and reduces reliance be there and we want to compete on a large pool of expensive FC drives.” aggressively in the market.” SATA-fied Storage Customers HP has offered dedupe for more than a Moosa Matariyeh, an enterprise storage specialist at CDW, year through its partnership with Sepa- takes things a step further and advises those looking to save ton, and the company also offers its HP on storage costs to dump FC and SAS for SATA. Labs-developed D2D Backup Systems for remote offices and small businesses. “Migrate data from Fibre Channel, small computer system HP’s Data Protector software also offers interface (SCSI) or SAS storage to less expensive SATA,” host-based dedupe capabilities, and the he said. “This also saves on cooling and power costs as well as rack space, as the SATA drives you would be migrating to company announced a reseller agree- have more density.” ment with compression and dedupe specialist Ocarina Networks, which also He offers an example — costs reflect only the drive street works on image files, for HP’s NAS price and do not include any enclosure or additional offerings. equipment needed — SAS storage can be as low as $1.25 a GB, while SATA drives from the same manufacturer cost “This is a space that HP is taking seri- as little as 15 cents per GB, or nearly 90 percent less. Add ously,” said Fitze. in power and cooling costs and the total cost of ownership (TCO) can be dramatically less. Deduplication technology reduces data, speeds up restores, and helps minimize “Migration can be done manually, or a software package can bandwidth usage during replication, he be implemented to automatically monitor the age of files,” said said. Fitze said HP sees dedupe as part of Matariyeh. EMC, Symantec and CommVault “are among those an overall capacity optimization strat- offering software packages to manage this functionality.” egy that also includes thin provisioning, snapshots, pooling and virtualization, Keep Consolidating as storage users become more focused Consolidating has been a staple in IT now for most of this on freeing “trapped capacity and perfor- decade. And it just keeps right on going. The more you get rid continued on page 11 of data center sprawl and Indiana Jones-esque warehouses full of endless rows of servers, the lower your management, 8 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget power, cooling and space expenses will be. With one caveat: Leasing, the Cloud and Open Source don’t throw out perfectly good equipment to consolidate, as Leasing is making a comeback as a way to stretch storage that delays ROI considerably. Wait till gear is at or beyond end dollars, says Chip Nickolett of Comprehensive Consulting of life and then bring in the consolidation cavalry. Solutions of Brookfield, Wisc. “In storage, it is all about consolidation, consolidation, Open source storage technologies have also been catching consolidation,” said Shaun Walsh, vice president of corpo- on in the weaker economy. Sun has built its Open Storage rate marketing at Emulex. “In addition to extending the life of line on open source software and commodity hardware, while current hardware and lowering administration and storage vendors like Zmanda have used open source technologies as costs, one of the hidden benefits of aggressive consolidation a way to break into the storage market. is saving on the ongoing cost of service and maintenance — service contracts on many older systems often cost more an- Nickolett suggests a modest open source implementation as nually than purchasing new storage.” a good way of getting your vendor’s attention. Use the Windows Storage SIS Feature And lastly, while cloud-based storage services have been CDW’s Matariyeh offers one free tip for controlling un- slow to catch on in the enterprise space, this year has seen structured data, which he regards as the biggest issue in the arrival of a new startup that claims its service can make storage because it contains so many file types, is coming primary data storage in the cloud a reality. from different sources and is growing at the fastest rate. Estimates are that 70 to 80 percent of the data in data One of the more interesting uses of a cloud storage service centers today is unstructured and growing at more than has been Twitter, which uses Amazon’s Simple Storage 65 percent a year. Service (S3) to store avatar icons. Perhaps finding well targeted uses for online storage services is an avenue “One simple way to help free up space is to activate a worth exploring. currently existing function in your Windows Storage Servers,” he said. “Single-instance storage (SIS) is a feature built into Windows Storage Server 2003 which will take a look at all the data within the volumes and reduce duplicates to one file.” For example, if a department sends out a 1 MB PowerPoint document to 30 people and each one saves it in their “My Documents” directory, that is 30 MB of space from a single file. WSS will reduce this down to one copy and point all users to the one copy. That is a 29 MB savings in space from activating a feature already available in the system. Get Rid of Certain File Types Sometimes, it’s the simple things that can make daily work experience easier. Storage managers can focus on the easy items that provide the biggest payback, either in reclaimed storage or data protection for business continuity. For example, it isn’t difficult to spot and remove any personal, unnecessary, or large file types from expensive corporate storage resources. “Such work is often already mandated as part of compli- ance directives that outlaw files whose name ends with .mpeg, .mpg, .mp3, .wav, .pst, .log, .bak, and so on, said Stefan Kochishan, director of mainframe product marketing at CA. “These files can then be deleted to increase usable storage space.” 9 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget Brother, Can You Spare a Petabyte? By Henry Newman L et’s face it: Times are tough and there’s a lot error saying that the device cannot be accessed. There are of pressure to cut costs. I hear it all the time lots of reasons for hard errors, such as media errors, head from my cus- errors and media failures. It tomers. doesn’t matter what the cause is; what matters is how often But it’s not as simple as it happens for each of the choosing the cheapest data devices. If you have a hard storage technology. If you error with a RAID-5 LUN, care about your data — and if the LUN will need to be re- you’re reading this, you prob- built, and hopefully you won’t ably do — then you need to get another hard error or the consider the technology and data will be lost. With RAID- reliability tradeoffs of storage 6, another hard error is still technologies, whether you’re not catastrophic, as you have an enterprise, small business two parity devices. or even an individual home user (my own home backup You know what they say about and data protection scheme lies and statistics, but the borders on the paranoid). hard error rates below come Storage costs aren’t just about from drive manufacturers for the price of the hardware or both disk and tape. software; they’re about op- erating and maintenance costs — and the cost of Device Hard error Equivalent PB Days to Days to lost or corrupt data. Rate (in in bytes equivalent hit at 120 hit at 200 bits) MB/sec MB/sec When I am trying to help customers understand the technology tradeoffs, the first thing I do is to try to under- Consumer 10E + 14 12.5E+13 0.89 92 55 stand what their requirements are. Usually I get a glazed SATA look or get told to just solve the problem, and some- Enterprise 10E + 15 12.5E+14 8.88 920 552 times I’m told that the requirement is for storage that’s SATA as cheap as possible. Very few people actually understand Enterprise 10E + 16 12.5E+15 88.82 9,198 5,159 their requirements, and even fewer know how to apply them. SAS/FC LTO 10E + 17 12.5E+16 888.18 91,982 55,189 SATA, SAS and Tape T10000B 10E + 19 12.5E+18 88,817.84 9,198,247 5,518,949 Let’s look at the example of choosing between different types of disk and tape drives. You might say these can all be taken care of by RAID, but there are some important things to con- sider; I think that even the bean counters don’t want you to put You have to remember that the bit error rate (BER), which is the company’s data at risk. also known as the hard error rate, is completely different from the annualized failure rate (AFR) of the device. One way The biggest issue is the hard error rate of the technology. to look at it is the failure of a single access compared to the Every disk and tape drive has a hard error rate specified in the failure of the whole device. Sometimes with some RAID con- average number of bits, which if read or written, will return an trollers, the failure of a single access is the failure of the de- 10 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget vice, but you have to remember that BER is measured in bits of transfer and AFR is measure in hours. A device can fail continued from page 8 just sitting doing nothing, but the BER is based on device usage. If you care about your data, this is a critical issue. mance in the existing environment.” Boosting capacity utilization from, say, Some lower-end storage systems use consumer-level SATA 25 percent to 50 percent can mean big drives, which if used heavily can fail pretty quickly. The savings for end users, Fitze said. problem is that in RAID devices, sometimes if one device fails, other devices will fail during rebuild. The bottom line is that Fitze did not provide the number of HP you need to consider the disk drives and your exposure to dedupe users, but he said the number is data loss as part of any storage decision. Buying the cheapest “growing all the time.” stuff on the market might get you the storage you want, but it might wind up costing you your data. David Shoup, technology manager for the Mohegan Tribe in Connecticut, The cost per GB for SAS and Fibre Channel drives is much said the tribal government chose HP’s higher than SATA, but few people realize that for important Sepaton-based Virtual Library System data you should include the reliability calculation as part of to work with its HP EVA SAN environ- the decision-making process. If your data is critically impor- tant to your organization, having a BER that’s ten times better ment. The tribe also switched to HP’s is an important consideration; clearly the cost difference per Data Protector software at the same GB between SATA and SAS/FC isn’t nearly as great. Even in time for centralized management. Shoup tough times, it is important to consider not just the initial cost, said he briefly looked at Data Domain, but the cost of losing what is important. but the HP VLS made more sense for the HP environment. Tape Versus Deduped Disk I have seen no reputable study showing that disk and tape “It’s worked rather well for us,” he said, costs per GB are even close. Tape always wins on cost, but as backups have fallen from more than do you have to write everything to tape? 24 hours to 8 to 10 hours even as the tribe has backed up more data. He said Data deduplication has become one of the fastest-growing he’s seeing about a 5.5 to 1 deduplication segments of the storage market, if not the fastest. There are ratio on changed data. many companies that provide dedupe technology. Some are integrated hardware platforms, while others are just software. Asked what he’d like to see added to the Some of the claims of 50 to 1 reduction in the amount of data VLS, Shoup said he would like to see backed up are realistic in environments such as VMware, but more automation, but nonetheless said other environments such as media files do not get anywhere he finds it “straightforward” to operate near that ratio and compression is often similar or even bet- and is generally a satisfied customer. ter. Dedupe can speed up the backup process if there is enough bandwidth to the dedupe device compared with the band- but that isn’t the real issue. More often than not, the real issue width to tape. With tape latency and other issues, dedupe will for backup and tape performance is that tapes are faster to- likely be a big winner over standard tape backup from a time day than most networks that they are attached to. Take the fol- perspective, and depending on the size of the backup and the lowing facts. In 2000, LTO uncompressed data rates were 20 number of tapes, tape slots and the cost of the dedupe sys- MB/sec and most networks were 1Gb, or realistically about tem, it can even be a cost savings. Of course, the real issue 80 MB/sec to 90 MB/sec, so the network was four or more for backup isn’t backing up the data, but restoring it. Keep in times faster, and about half that with compression. mind that the dedupe platform can expand data faster than it can likely write it to the channel. LTO-4 today boasts a 120 MB/sec uncompressed data rate, 240 MB/sec compressed, and with 10GbE networks to the One of the biggest complaints I hear about tape is that it is backup server, you have a little more breathing room, but not slow. The latency for tape to load and thread and be ready much. But the problem is that very few people have an end-to- hasn’t changed much since the advent of the tape cartridge, 11 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget end 10GbE network, and remember you will be bound by the state of your network, the cost of the additional hardware and slowest point on the network. The same is true with tape — if software and other factors such as power, training and floor you are using FC-2 with LTO-4, for example, FC-2 has a 200 space. One benefit of a D2D2T system could be deduping MB/sec limit and LTO-4 with compression is 240 MB/sec. data before writing to tape, thus saving even more money. Add to this that most people put multiple tape drives on the same FC connection and you have a performance issue that And another factor to consider: if you’re eliminating multiple is again caused by the network. copies of data, make sure the one you’re keeping is right. Check with your dedupe vendor to make sure they have prop- This is why if you are going to use tape — which is, after er checks for ensuring data integrity and reliability (see Data all, not only cheaper than disk, but also more reliable if han- Corruption: Dedupe’s Achilles Heel). dled and stored properly — to use tape efficiently you need to stream the device at full rate, including compression, so The disk and tape tradeoffs are pretty clear. Tape is cheaper disk-to-disk-to-tape (D2D2T) is the way to go. To accomplish and potentially more reliable than disk, but you need the right this requires using either a VTL or backup software that man- infrastructure to make it efficient. Dedupe has promise for sav- ages a D2D2T framework, and this usually is an added ex- ing on storage costs, but cheap disks carry the potential for pense for the software. The tradeoff between D2D2T, VTLs data loss. With apologies to Rush, you can’t get Something and dedupe, or a combination of one or more, is a complex For Nothing in the data storage market, but hopefully you now decision that depends on the dedupability of your data, the know something about spending your money wisely. 12 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget Data Corruption: Dedupe’s Achilles Heel By Henry Newman D ata de-duplication is one of the hottest technolo- on IBM mainframes running MVS, although the potential is far gies in storage these days, and users and vendors lower than any other system given the amount and number of alike are climbing on the bandwagon. There are parity and checksums calculated and checked. vendors building hardware products, others build- ing software products, and some doing both. A Swiss Laboratory last year published a paper on data cor- ruption and its sources that is worth reading. I am not going to compare products or different vendor tech- nologies, but I am going to look at an important issue you You might wonder what all this has to do with data de-du- need to ask your vendor about plication. In a nutshell, if you if you’re considering purchas- de-duplicate your data and ing data de-duplication hard- the hash area for the data de- ware or software, and that is duplication hardware or soft- data corruption. ware gets corrupted, you can lose all of you data. If you’re You might wonder what de- going to get rid of duplicate duplication has to do with data data, it’s critical that the data corruption, and I’ll get to that in you have be right. a minute. But it’s important to note that I’m writing this article For example, what if the data from a generic hardware and comparison hash was data software point of view. Some that was corrupt at the time vendors’ products may or may the data was read, but the not address all or part of the data on the disk is still good? problems I will discuss in this If you read it again, you will article. It’s up to you to under- likely get the correct data. But stand what you are buying and what if the hash data written to ask the vendors the right on disk was bad or went bad, questions. Caveat emptor. would you still be able to read your files? Let’s step through A Trip Down the Data Path these two examples and see what happens. As a reminder, I wrote an article on a data corruption experience I had where I am doing this generically and the examples might or might I compared a few bits and the ASCII characters had changed not work for a set of vendors based on their hardware and dramatically; in fact, most of the bytes went bad in the ex- software. ample I gave. Case 1: Corrupted Data Read The point of the article was that bits occasionally go bad, If you read data from a disk and the data you read was cor- sometimes sooner than later. It does not matter if it is high-end rupted for any reason (disk drive, channel, controller, or other enterprise Fibre Channel, which might happen far less often reason) and then started to apply the corrupted data to new than cheap SATA. It might not even be the drives or the con- data, you would have a major problem. When you read the troller; it could be that the memory of the machine corrupted information again from disk to de-duplicate it, it would not be the data or the CPU or something else. The bottom line is that the same. at some point your digital data in the digital world will be cor- rupted. Although the likelihood varies based on the operating If you compare the data that you read with the incoming data, system, the hardware, and the software, it can happen even the data in memory will be bad, so any data that you find a 13 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc. Making the Most of Your Storage Budget match with will be compared with data that will be different the hardware in the data path, including the disk drives, and is next time it is read. So basically any new data from the point of from the same people that brought you the SCSI protocol. the data read with the corrupted read will be compared incor- rectly and therefore be unreadable. There are file systems that do checksums, but if a file system is doing checksums and correcting the data, then you have If the hash is reread then for some reason and is read cor- two issues: rectly, any subsequent data read will be just fine. Other than that, it will be a debugging nightmare, one which I am pretty • The file system must read the data back to the server before sure is unrecoverable and a significant amount of data will be the checksum can be confirmed or rejected. It is not checked lost. The scary part is that some of the data is good and some when the data is written to the device by some of the hardware of the data is bad, and figuring that out is likely not possible in the path. without some serious detective work. • The server CPU must calculate the checksum and also con firm it when the file is read back in. There is a significant effect Case 2: Corrupted Data Hash Data on the server doing all of this checksum activity. This includes What if the data on disk gets corrupted and is bad from the increased memory bandwidth requirements and utilized CPU start? This is a similar problem to the first case, except that caches, requiring applications to potentially reload from with Case 1 you have good data, then bad data, and then memory and memory bandwidth usage to increase by the likely good data. With this case, the hash that was created checksum calculation. is in memory and is good, but the hash on disk is bad. That means you have data that was created with a good hash, but This is an issue if you are running applications that use once the hash is read from disk, the data will be bad. The significant server resources. good news, if there is any, is that once the hash is read from disk back into memory, it will be the same, so the problem There are products that have their own file systems and check- should be limited. But you will have data you create that can- sums and address some of my concerns about data corrup- not be un-de-duplicated for the time period that the data was tion, but not all vendors have products that have this func- created with the original in memory hash. So when you go to tionality built into their offerings. This is just one of the areas un-de-duplicate the data months or years later, you will have that you should be concerned about with data de-duplication. bad data until you re-read the hash from disk and then have It should not be the only consideration for the evaluation of good data from that point on. Again, this is a debugging night- a vendor’s offering, but it should be one of the high-priority mare and likely impossible to figure out. considerations. Vendors might say that this is your problem when you ask the question, and that your environment should What You Need to Ask Vendors be running something like T10 DIF. Wrong answer. Vendors I am a firm believer in the reality of undetected data corrup- need to be thinking about your hardware and software before tion. It has happened to me and I have seen it happen to oth- you ever ask a question, and if they leave the problem to you, ers, and sooner or later it will happen to you. I am also a firm then I would be running the other way. believer in the new T10 Data Integrity Field standard, which passes an 8 byte checksum from the host to the disk and Data de-duplication is a great tool for some environments, but has the disk confirm the checksum, which should be gener- as with everything complex, it requires some careful planning ally available from a number of vendors likely later this year. I and execution. personally like this standard, as some of it is implemented in 14 Table of Contents Making the Most of Your Storage Budget, an Internet.com Storage eBook. © 2009, WebMediaBrands Inc.
Pages to are hidden for
"Making the Most of Your Storage Budget"Please download to view full document