2550 73rd Street Urbandale, IA 50322 www.ippathways.com
2009
Storage Sprawl – Manage Storage Better
IP Pathways 9/10/2009
We are living in a virtual world. First it was servers, now its storage and desktops. The savings derived from virtualization are real and the operational and management benefits are greater than imagined. First it was server sprawl. Development needed a test box, a new application was rolled, a file server filled up…the solution, add a new server. Datacenters were full of x86 servers, all requiring their own power, network connectivity and management. The virtual revolution consolidated the number of servers, driving down the power consumption and management overhead, but it requires centralized storage (SAN or NAS) to reap the full benefits. The new monster in IT has become the need to add more and more disk to these storage arrays. Whereas companies once were purchasing new servers to house new applications, companies now are turning up virtual servers. This leads to the need for more I/O and storage. The problem that creates is that previously the cost of adding a new server was only a few thousand dollars. The cost of adding a new shelf of SAS or SATA drives can be twenty thousand dollars or more. Fortunately, this monster can be contained with deduplication—helping enterprises save money and maximize the investment they making in their SAN or NAS solution.
Page |2
THE STORAGE MONSTER
Server virtualization, the segmenting of a single physical machine into multiple virtual servers, is a trend that’s been gaining widespread adoption within just about every enterprise, no matter the size. Provisioning servers and storage has become as easy as clicking a button; without proper planning, it can take on epic proportions. Adding a new server is now as easy as a few mouse clicks, which makes virtual server sprawl a real issue. At the 27th annual Gartner Data Center Conference, Gary Bittman stated that only two or three years ago, server virtualization was mostly being used for test and development purposes. But now, the technology is being accepted into production environments to the tune of about 70 percent of all datacenters using virtual machines in some sort of production role. Bittman also announced three remarkable predictions about the virtualization industry:
•
• •
By 2012, at least 14 percent of the infrastructure and operations architecture of Fortune 1000 companies will be managed and delivered much like a cloud‐ computing provider, internally. These "private clouds" are essentially flexible computing networks designed to be like the solutions being offered by public providers such as Google and Amazon. Between 2007 and 2011, Bittman expects that the installed base of virtual machines will grow more than tenfold. And by 2012, he believes that the majority of x86 server workloads will be running within a virtual machine.
When talking about this hot virtualization technology, Bittman adds, "our key advice is to look beyond simple consolidation and cost savings. Virtualization can be the catalyst to drive many fundamental important changes in architectures, processes, and cultures. Even if short‐term attention needs to be given to cost‐savings, make sure you build a foundation that can be leveraged in a few years. Virtualization 'unlocks' cloud computing potential internally and externally."
Page |3
According to Bittman, the installed base of virtual machines will grow 10x in the next four years! At the same time, companies of all size are virtualizing desktops. The laptop or desktop is removed and replaced with a thin terminal and the desktop is centralized in the data center. Gartner predicts virtual desktops will be a $65.7 billion market by 2013, up from $1.5 billion in 2009. Gartner estimates approximately 15% of current traditional desktops will be converted to virtual desktops by 2014, which equals about 66 million virtual desktops. i Windows 7 may be a large driver in this trend. Most companies have not deployed Windows Vista and still operate their desktops with Windows XP. Microsoft will force organizations’ hands in 2010‐2011 by discontinuing support on the XP platform. This will begin the move to Windows 7, which will likely require upgrades in laptops and desktops, just as the move to XP did years ago. Faced with large capital outlays to upgrade laptops and desktops, virtual desktops should be a benefactor. In a virtual desktop, the desktop resides on the centralized storage (SAN/NAS). The hardware on the client side can be a thin terminal or the existing desktop can be converted into a thin terminal. As an example, if the average desktop has 10GB of storage and there are 100 desktops that are virtualized, an additional 1TB of disk would need to be added to the SAN/NAS. The economics point towards the move to virtual desktops providing CAPEX and OPEX savings, but the move will drive up the storage requirements. This doesn’t even factor in the performance or I/O requirements, which require careful analysis. Virtualization is only one of the reasons for storage sprawl. Enterprises are required to keep more information electronically and keep it longer. Informed virtualized enterprises fight the storage monster with deduplication— using it to reduce risk, contain and manage costs and maximize current investments. According to a CIO Insight study, 28% of CIO respondents list cutting costs as the #1 business priority for 2009. According to the same report, 34% of respondents list achieving ROI on IT investments as their #1 priority in 2009. ii
Page |4
As server and desktop delivery methods change, more data is required to be retained electronically and various regulations require that data to be kept for given periods of time, enterprise data grows and storage is consumed much faster. “More and more critical data keeps coming in, and it must be stored, managed and protected,” says Joe Shields, Co‐owner and co‐founder of IP Pathways. What does all of this mean for the enterprise? According to ESG, “Primary data growth is expensive, but the biggest contributors to the “cost of information” are all the copies made for data protection purposes. When ESG asked nearly 400 IT decision makers what their greatest data protection challenge was, the top response was “keeping pace with the capacity of data to protect.”3 IT organizations have standard practices in place to protect all digital data records within the organization. Typically, that means IT makes a copy of a volume, LUN, or file(s) at one or more points in time during the day and saves the copy—locally for operational recovery and at an offsite location for disaster recovery (DR).” iii Virtualization produces OPEX and CAPEX cost savings by delivering more with less; less physical servers, less power consumed and less equipment to manage. While this technological wave has and continues to sweep the business world, it has created storage challenges. While the cost of storage has decreased, consumption has increased. Shields warns, “If you don’t make intelligent decisions when architecting your virtual environments, unbudgeted CAPEX will come up everywhere you turn. You will need to pay careful attention to budgets and may have to make tough decisions if all factors are not considered in the initial architecture. Adding a shelf of storage is not like adding a new server.”
CONTAINING THE MONSTER
All of this is forcing manufacturers, VARs and CIOs to be more strategic in their thinking and spend more time understanding current requirements, growth plans and the features/functionality of the potential solutions. CIOs won’t kill the monster but with informed decisions and utilizing new technologies, such as deduplication, they can contain the monster and lessen its impact on the organization.
Page |5
Duplication of data is inevitable. An excel file is sent to 30 people and each of them saves it to their home directory and/or hard drive. Those same files are copied for backups and maybe replicated to another site for disaster recovery. There could be 4 copies per user, which could be 120 copies in total for this case. Assume that spreadsheet is 500k in size, that’s 60GB of storage it has consumed in the enterprise. ESG estimates that database data is growing at 25% per annum, with unstructured data increasing at two to three times that rate. This growth is fueled by a dependence on digital assets to conduct business and the need to support an increasingly mobile workforce. Collaboration, Web 2.0 applications, and use of messaging systems also contribute to information growth. This is where the concept of deduplication can have huge impact on containing the storage monster. According to www.whatis.com, “Data deduplication (often called "intelligent compression" or "single‐instance storage") is a method of reducing storage needs by eliminating redundant data. Only one unique instance of the data is actually retained on storage media, such as disk or tape. Redundant data is replaced with a pointer to the unique data copy. For example, a typical email system might contain 100 instances of the same one megabyte (MB) file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy. In this example, a 100 MB storage demand could be reduced to only one MB.” iv Data deduplication refers to a process that eliminates duplicate data. The process can happen as the data is being stored or can be run as a post process or after the data is stored. Different vendors handle this differently, some looking at duplicate files and others looking at duplicate blocks. As the duplicate data is detected, the storage array stores only one copy of the data and inserts pointers for the duplicates. There is a system to index all the data so that requests can be directed to the primary copy thus reducing the total amount of data stored. The process of data deduplication refers to the elimination of redundant data as its being stored. Duplicate data is detected as its being written to disk, and the storage array stores only one copy of the data, leaving pointers for the additional copy. Indexing
Page |6
of all data is retained, essentially directing requests to the primary copy. Thus, deduplication effectively reduces the amount of storage capacity that’s required because only the unique data is kept on hand. “Enterprises are experiencing double digit growths in their storage footprints annually. Many of our customers virtualize a subset of their servers to start and then love the benefits so they virtualize more. A new application is purchased or an existing application that previously wasn’t supported in a virtual environment now is. The legal department changes data retention requirements or decides that items previously retained in paper form now must be scanned and retained electronically. We are in unchartered waters and CIOs have never before had to budget for this type of storage growth. They have to turn to experts in data storage and maximizing the investments made in that storage. Deduplication is one of the keys in managing this growth” says Wade Brower, EVP of Sales and Marketing and co‐founder and co‐owner of IP Pathways. Deduplication is nothing new. It’s been around for years. The first deduplication technologies focused on backups. Incremental backups only stored data that had been added or changed since the last backup. Deduplication has grown from these days and now can be applied to application data, home directories and file shares. Take an example of a PowerPoint presentation that is sent to 20 users. All 20 users save this presentation to a shared drive or their home directory. If the presentation is 1GB in size, 20 copies would consume 20GBs of costly storage. Deduplication can eliminate those 19 additional copies and instead place pointers to the primary copy, reducing the storage requirements by 19GB. There are two main types of data deduplication, file level and block level. File level (also referred to as single‐instance‐storage deduplication scans the data for exact duplicate files. If the process finds exact duplicates, pointers are inserted and through the indexing refer back to the primary copy. Block level deduplication takes it to the next level and scans the data for duplicate blocks of data. Block‐level operates at the sub‐file level. Files are broken down into chunks or blocks. The chunks of data are assigned an identifier, using a hash algorithm that generates a unique identifier for that block. The unique identifier is compared to the index and if that identifier already exists, the data
Page |7
has been processed and stored before so a pointer is inserted and refers to the existing block. So the question is: Where, when and why do enterprises deduplicate and using which technology (file‐level or block‐level)? Today most solutions deduplicate the data after it is written to disk and as part of the backup process. Unfortunately this type of deduplication is dealing with the lowest cost storage. It makes the most sense to eliminate the duplicate data as early as possible and to do so in a manner that achieves the maximum amount of deduplication. Because block‐level operates at the sub‐file level, this process will eliminate more duplicate data than file‐level. “Eliminating duplicate data as early in the storage process as possible can reap tremendous financial rewards for enterprises. The benefits then flow through the entire information lifecycle management and create other related financial rewards” Shields points out. “Take the case of server virtualization. An enterprise has 40 Windows 2003 R2 servers. What is duplicated in all 40?..the operating system. Let’s assume the operating system consumes 20GB of storage. If those 40 servers are virtualized, the operating systems alone will consume 800GB of primary storage. Utilizing block‐level deduplication, only the primary copy of the OS is stored with pointers inserted in the 39 other instances. That 800GB can be reduced to 20GB, producing incredible savings” explains Shields. Shields continues, “To illustrate the further associated costs savings, now say the primary storage (SAN/NAS) will be replicated to a secondary site for disaster recovery. Without deduplication, an enterprise would require connectivity or a “pipe” large enough to handle 800GB of data transfer. If the data is duplicated on the primary storage however, only the non‐duplicate data is transferred, bring the requirement down by 95%. This means a much less expensive “pipe’ is required, producing additional cost savings.”If Gartner is correct and desktop virtualization is the next large virtualization trend, imagine the multiplier effect. An enterprise with 500 desktops that all have Windows XP as the OS could consume 5TB of storage for the OS alone. The cost to store and potentially replicate that data could be considerable. Deduplication could accomplish tremendous cost savings in both CAPEX and OPEX.
MAKING SENSE OF DEDUPLICATION
Deduplication just makes sense in a virtual world. Why consume more storage than you have to? Maximizing IT investments make sense and deduplication helps to maximize
Page |8
the investment made in storage and to contain the long‐term growth and associated expense of that storage. First and foremost, deduplication saves CAPEX dollars. Having less data to store means purchasing less storage capacity. “Not only can enterprises purchase less storage today utilizing deduplication technology, but they will require less storage in the future as well. Storage capacity equals disk and associated maintenance (support) costs. Deduplication allows enterprises to purchase less storage today and in the future. That equates to real cost savings today and in the future” says Brower. Just like virtualization reduces the number of physical servers in the data center and the associated cooling and power requirements, deduplication reduces the amount of storage required, which reduces the physical floor space and associated cooling and power requirements. In addition, with less data to manage, the cost associated with maintaining the storage environment can be slashed considerably. To that end, NetApp claims that by eliminating duplicate data, IT administrators can manage almost two times the number of terabytes per full‐time equivalent when compared to other storage environments. Those resources can be redeployed to more strategic projects. And with such efficiencies, enterprises can more effectively introduce applications and get to market faster with new opportunities.
THE IMPORTANCE OF A QUALIFIED VAR
Virtualization isn’t simple and neither is storage design and administration. Designing a solution is more than simply adding up storage requirements. The landscape is cluttered with VARs authorized to sell VMware and various storage platforms. At last count, VMware had over 200 authorized reseller partners in the Iowa/Nebraska territory alone. Being an authorized reseller does not equal providing the services of a VAR. VAR stands for “value‐ added reseller”. The key is the “V”. To provide value, a VAR has to bring expertise to the table. Many enterprises look to VARs to augment the internal IT staff. Virtualization and centralized storage are newer technologies and many IT staffs have no resources with experience architecting, implementing or managing these
Page |9
newer technologies. The proper design and hardware/software components are critical. The wrong decision can mean poor performance or ongoing or future unforeseen costs. Making informed and intelligent decisions is more critical than ever when dealing with virtualization infrastructures. What’s more, deduplication can improve operational efficiency by simplifying otherwise complex processes, such as backup and disaster recovery. With better recovery time and recovery point objectives, enterprises can lessen the impact of disruptions in service—whether a server outage or a full‐scale natural disaster—and keep operating without missing a beat. Deduplication even helps reduce risk, by minimizing the amount of data being handled. According to NetApp, enterprises can see a 50 percent drop in the amount of data being moved across the network and replicated for backup. With less data being stored, the chances of corrupting a critical file or having data fall into the wrong hands is much lower, which goes a long way with security and compliance goals. “There is a lot to gain and a lot to potentially lose. Money and performance are at stake so making the right decisions to optimize your storage environment is critical. IP Pathways can be your pathway to informed and meaningful decisions” says Brower.
DEDUPLICATION IN THE REAL WORLD
“Many enterprises have inefficient storage utilization in place today. There are islands of storage and many organizations have little insight into what comprises their data” says Shields. Take the case of a financial institution who decided to leverage virtualization and NetApp data deduplication technology. The organization had 18 physical servers in production, all with their own direct attached storage. There were file servers, servers that housed applications and development boxes. When a file server was out of disk space, another file server was added. All of these servers had to be powered, cooled and managed.
P a g e | 10
The customer worked with IP Pathways to architect a solution utilizing VMware to virtualize the servers and NetApp as the centralized SAN/NAS. The 18 servers were virtualized and consolidated onto 3 hosts. After the servers were virtualized and the environment built, NetApp’s block‐level deduplication allowed the total storage footprint to be reduced from approximately 2TB to less than 1 TB. This allowed the customer to utilize higher‐performing fibre channel disk, but still reduced the overall cost. The customer then chose to replicate the storage to a disaster recovery site where another set of hosts reside. This allowed the customer to achieve a recovery time objective of minutes. Deduplication was critical because it allowed the customer to replicate over a standard T1 connection. Had deduplication not reduced the size of the storage footprint, multiple T1s would have been required. The customer used the savings to fund much of the disaster recovery site infrastructure and achieve their objectives. Without deduplication, a recovery time objective of minutes may not have been fiscally achievable.
NETAPP DEDUPLICATION
NetApp deduplication combines the benefits of granularity, performance, and resiliency to provide you with a significant advantage in the race to provide for ever‐increasing storage capacity demands. Data deduplication is an important new technology in your struggle to control data proliferation. The average UNIX or Windows disk volume contains thousands or even millions of duplicate data objects. As data is created, distributed, backed up, and archived, duplicate data objects are stored unabated across all storage tiers. The end result is inefficient utilization of data storage resources. NetApp is the only tier‐one storage vendor to use block‐level deduplication on production volumes – not just on backed‐up volumes like competing implementations. By eliminating redundant data objects and referencing just the original object, an immediate benefit is obtained through storage space efficiencies. The result is twofold:
P a g e | 11
• •
Cost Benefit: Reduced initial storage acquisition cost, or longer intervals between storage capacity upgrades. Management Benefit: The ability the store “more” data per storage unit, or retain online data for longer periods of time.
IP PATHWAYS – THE REAL STORAGE AND VIRTUALIZATION EXPERTS
IP Pathways was founded in November of 2007 by Joe Shields, Wade Brower and Jim Strong. The three had all held senior positions in the IT industry including Sr. Architect, Sr. Sales Management and Sr. Systems Engineering positions. After sitting on the buyer’s side of the table for years, they saw a common theme. As buyers they had all either purchased from, procured services from or partnered with many of the VARs in the Midwest region. Over time they became frustrated. In many instances, the local VAR simply assisted with the procurement and when issues arose, they were sent back to the manufacturer for support. There seemed to be a void in the marketplace, especially VAR engineering talent related to the systems architecture side of the business. Shields, Brower and Strong decided that by forming IP Pathways, they could fill that void in the marketplace and provide valuable services to customers. IP Pathways provides IT advice and support to businesses throughout the Midwest. They have expertise in open systems, virtualization, application portability, centralized storage, data protection and disaster recovery planning. The organization provides customers pre‐sales engineering, implementation, post‐implementation support and disaster recovery planning services as well as assisting with day‐to‐day break/fix issues that may arise. IP Pathways represents only the leading providers of virtualization and centralized data storage technologies available in the market today and attains the highest levels of certification. Hardware and software manufacturers represented include Microsoft, VMware, Citrix, NetApp, Hitachi Data Systems, F5 Networks, Force10, Cisco, Symantec, Syncsort and FalconStor. IP Pathways’ engineers take the time to understand customer’s networks, concerns and objectives and formulate solutions that are based on sound engineering principals and utilize industry established best practices. The foundation is always based on years of corporate IT and consulting experience.
P a g e | 12
Additional information is available in the Gartner report, “Emerging Technology Analysis: Hosted Virtual Desktops.” The report is available on Gartner’s website, www.gartner.com. ii “Why CIOs Should Look To Data Deduplication” ESG Research Report, May, 2009 by Lauren Whitehouse and Brian Babineau. iii “Why CIOs Should Look To Data Deduplication” ESG Research Report, May, 2009 by Lauren Whitehouse and Brian Babineau. iv Data Deduplication definition as defined by www.whatis.com
i
P a g e | 13