TheInfoPro Study-- Why Deduplication Technology is Causing a Paradigm Shift in Storage Tiering

Document Sample
TheInfoPro Study-- Why Deduplication Technology is Causing a Paradigm Shift in Storage Tiering Powered By Docstoc
					 Why Deduplication Technology Is Causing a
      Paradigm Shift in Storage Tiering

                         A TIP Research Paper

TheInfoPro (TIP) Research Paper delivers findings on over 289 in-depth interviews
 with Storage professionals at large enterprises, most of them among the Fortune
   1000. A new TIP Storage Study is released every six months. The following
research paper is based on findings from studies conducted since December 2004.
                                 Why Deduplication Technology Is Causing
                                   a Paradigm Shift in Storage Tiering
                                                       A TIP Research Paper

Massive Data Growth
                                         Top Storage Team Challenges
                                         In TheInfoPro’s research, Storage professionals have consistently named
Standard IT practice calls for           managing storage growth, proper capacity reporting and planning, and backup
keeping enough backups to
                                         management as the top pains facing their Storage organizations. The use of a
recover from the last couple             storage target that identifies redundancy in incoming data streams and only
months of data change in case of         saves those segments identified as unique, thereby conserving space, would
human error, a virus, rippling errors
                                         clearly help with respect to the #1 pain, managing storage growth. In addition,
in a database, or complete system
                                         since deduplication is the process of examining data for patterns, then identifying
failure. As a result, recovery
                                         and storing only the unique data, it can help with respect to capacity planning – a
storage can consume five to ten
                                         key activity in capacity planning is the discovery of usage patterns. And finally,
times more capacity than the
                                         archiving duplicated content reduces the backup load, easing the strain of
primary storage it is protecting.
                                         backup management.
Data growth means more capacity
required to support multiple                                                     Chart 1: Top Storage Pain Points*
storage tiers – increasing                                       Managing Storage Growth
management, cost, and
                                           Proper Capacity Forecasting & Storage Reporting
                                                     Backup Administration & Management

Issues With Tape                                                            Managing Costs
                                                                      Storage Provisioning
For more than 40 years, tape has
                                                                  Lack of Integrated Tools
been the only cost-effective option
                                                                      Managing Complexity
for storing massive amounts of
backup and archive data. Experts                                                         0%         10%     20%      30%      40%      50%      60%

say the odds of recovery from a
given tape backup are about 90%.              “Tapeless and deduplication technology are top areas for improvement. We
The intense physical logistics of             are changing our dependency on tape for recovery to archiving for recovery.”
the process, lack of reliability, vast        – Storage pro at a $40B+ Financial Services firm
amounts of media that need to be
purchased over and over, and the              “We back up 100 to 150 TB a night, and we think deduplication will solve this
drain on IT staff are all contributing        pain.” – Storage pro at a $30B-$40B Financial Services firm
factors that make tape a liability.

Backup storage and data                     Chart 2: Number of Tapes to Do a Complete Single
                                                                                                                   Deduplication means longer
                                                                                                                   onsite retention, and thus
movement should be simple, safe,                          Data Center Recovery*
                                                                                                                   less reliance on tape.
automated, and online. It should
use existing IT infrastructure and           Over 1,000 Tapes                                                      According to TheInfoPro’s
                                                                                                                   Wave 10 Storage research,
standard systems, software, and             501 to 1,000 Tapes                                                     almost 50% of Fortune 1000
networks. In other words, it should
                                                                                                                   Storage organizations
be like every other part of the IT            201 to 500 Tapes
                                                                                                                   require more than 1,000
plan.                                                                                                              tapes for a single data center
                                               51 to 200 Tapes
                                                                                                                   recovery. More than 15% of
Minimizing Tape                                                                                                    a Backup team’s time is
                                                                 0%     10%     20%    30%     40%    50%
By reducing the amount of storage                                                                                  focused on tape
required, deduplication enables                                                                                    management, while close to
                                                   Chart 3: Recovery Staff Time Allocation*                        60% of Storage teams are
disk to be a cost-effective
                                                       Tape Management                                             backing up over 450 TB of
alternative to tape. Data is
available online and onsite for                                                                                    content per month. As the
                                             Backup Software Maintenance
                                                                                                                   pain of backup administration
longer periods, and restores               Troubleshooting Backup Agents
                                                                                                                   continues to escalate,
become fast and reliable. Storing          Coordinating Backup W indows                                            deduplication technology
only unique data on disk also                                                                                      maintains the promise of
                                           Troubleshooting Backup Media
means that data can be replicated                                                                                  minimizing this pain by
                                                     Executing a Recovery
to remote sites for network-efficient                                                                              reducing necessary
DR and consolidated tape                        Adding a New Backup Host
                                                                                                                   hardware, complexity, and
operations.                                                              0%     5%    10%     15%    20%   25%     manual effort.
                                                                                                           *Note difference in the different charts’ scales

        Entire contents © 2008, TheInfoPro, Inc. ♦108 West 39th Street, 16th Floor ♦New York, NY 10022 ♦212-672-0010 ♦                                2
                                Why Deduplication Technology Is Causing
                                  a Paradigm Shift in Storage Tiering
                                                    A TIP Research Paper

Why Deduplicate Data?                    Storage Initiatives
Eliminating redundant data can
                                                                         Chart 4: Top Storage Initiatives*
significantly shrink storage
requirements and improve                     Tiered Storage Build Out
bandwidth efficiency. Enterprises                      Consolidation
typically store many versions of the
same information. In the context of                 Backup Redesign

backup and nearline data, there is             Virtualization Adoption
a great deal of duplicate data. The
                                                  Technology Refresh
same data keeps getting stored
over and over again, consuming a                                         0%      5%        10%         15%        20%        25%       30%
lot of unnecessary storage space
(disk or tape), electricity (to power
and cool the disk or tape drives),       Deduplication and Tiering
and bandwidth (for replication).         One of the challenging aspects                 Chart 5: Tier 1, Tier 2, and Archive Tier
This creates a chain of cost and         of deduplication is determining                      Capacity and Future Growth*
resource inefficiencies within the       where to deploy the technology.
                                                                                      100%                                                      100%
organization. In addition, as data       Some end users talk about                                                          Tier
                                         initially deploying out-of-band or                                                 Capacity
retention increases to satisfy
regulatory and legal discovery           archiving applications, while at             80%                                   Anticipated         80%
                                         the same time, others are quite                                                    Growth
mandates, the situation is
exacerbated. Deduplication lowers        excited about in-band or online              60%                                                       60%
storage costs since fewer disks are      deployments. The trend among                             50%                   44%
needed, and shortens backup /            the Fortune 1000 is to apply                                         39%
                                                                                      40%              31%                                      40%
recovery times since there can be        deduplication first to the archive.
far less data to transfer.               Not surprisingly, these archive                                                           23%
                                         deduplication tiers are projected            20%                                                       20%
Deduplication Effects and                to be the fastest-growing tiers for
Replication                              2008. This growth helps
                                                                                       0%                                                       0%
                                         contribute to the popularity of
                                                                                                  T ier 1         T ier 2      A rchive T ier
The effect deduplication has on          tiered storage build-out, the top
replication and disaster recovery        Storage initiative.
windows and tape consolidation
efforts can be profound.
                                            “We are very interested in deduplication. It will provide a new tier for
Deduplication means a lot less
                                            recoverability.” – Storage pro at a $1B-$5B Consumer Goods / Retail firm
data needs transmission to keep
the DR site up to date, so much
less expensive WAN links may be          Backup Redesign
used. For remote offices, reliance
                                         The promise of new systems that actively identify and help manage storage
on tapes and physical tape               growth, the possibility of minimizing the ever-increasing tape management
transportation can be eliminated
                                         nightmare, and the expansion of archiving and replication protection are key
                                         reasons why Storage teams cited backup redesign as a top Storage initiative.
Replication is fast since there is       When end users describe their backup redesign goals and motivations, they talk
less data to send – only unique          about consolidation and the desire to minimize their dependence on tape, while
new backup or archive data is            looking to simplify replication and reduce the necessary level of effort expended on
replicated between sites. For the        backup. Deduplication technology fits squarely with these goals.
most efficient time-to-DR, inline
deduplication and replication of                                          Chart 6: Backup Redesign Goals*
deduplicated data will yield the
most aggressive and efficient                      Minimize Time Spent on Backup

results. In an inline deduplication                         Simplifying Replication
approach, replication happens
                                                        Minimize Tape Dependency
during the backup, significantly
improving the time by which there              Transparent Archiving & Compliance
is a complete restore point at the
DR site.                                                                              0%         10%        20%     30%         40%       50%

                                                                                                    *Note difference in the different charts’ scales

       Entire contents © 2008, TheInfoPro, Inc. ♦108 West 39th Street, 16th Floor ♦New York, NY 10022 ♦212-672-0010 ♦                           3
                                Why Deduplication Technology Is Causing
                                  a Paradigm Shift in Storage Tiering
                                                     A TIP Research Paper

                                          State of Current Deduplication Environments
Deduplication Is a Storage
Fundamental                               Over the last two years, deduplication has started to solidify its role in the data
                                          center. As mentioned on the previous pages, deduplication deployments have
The proliferation and preservation        been targeted for email and semi-structured content with a high probability of
of many versions and copies of            duplication. In their product evaluations, end users have mentioned that
data propel much of the                   products that sustain the highest duplication effectiveness are the most valued,
tremendous data growth most               as noted in Chart 9. But this does not mean that introducing deduplication can
companies are experiencing. IT            continue without any consideration for the impact on backup windows, recovery
administrators are left to deal with      windows, and backup software integration, all of which follow closely behind
the consequences. Because                 deduplication compression / compaction effectiveness as the most important
deduplication addresses one of the        deduplication functionalities.
key elements of data growth, it           For the Storage teams that deployed deduplication technology in 2007, the
should be at the heart of any data        average repository (compressed) is roughly 20 TB, and has an average
management strategy; it should be         effectiveness of 20:1. This represents about 400 TB of content, making the
baked into the fundamental design         ROI and TCO justification pretty simple – so simple, in fact, that end users are
of the system. Storage systems            starting to demand deduplication technology in file systems, document
vendors who treat deduplication           managers, email software, block storage arrays, and NAS. The span of 2008
merely as a feature will check off a      and 2009 will clearly be an interesting time frame, one which will put a greater
box on a feature list, but may not,       emphasis on storage arrays with intelligence, in addition to high capacity
in practice, deliver the benefits         capability.
deduplication promises.                   Chart 7: Where Deduplication Technology    Chart 8: Size of Deduplication Repository
                                          Should Reside – F1000, Midsize Enterprise,    (in TB) – F1000, Midsize Enterprise,
                                                        Europe Sample                               Europe Sample
Data Domain Deduplication
Storage                                    Storage Array                                   Over 30

Data Domain has made it very            Backup Software
                                                                                           21 to 30
easy by creating a fast,                     File System
application-independent storage                                                            11 to 20
system (attachable as a file server                  VTL

over Ethernet, OST, or as a VTL          All of the Above                                 Under 10
over Fibre Channel). No client
software or other configuration is                          0%   10% 20% 30% 40% 50% 60%              0%   10% 20% 30% 40% 50% 60%
required. As a result, Data Domain                     Chart 9: Most Important Deduplication Functionality – F1000,
deduplication is transparent to the                                  Midsize Enterprise, Europe Sample
backup and recovery process.               High Deduplication Effectiveness
Data Domain systems can easily                      Optimization for Backup
be used with various data movers                Backup Software Integration
and workloads, including non-                     Optimization for Recovery
backup data like email archives,         Remote Site Based Local Recovery
reference data, and engineering                      Low CPU Consumption
revision libraries. More flexibility            Optimization for Replication
means that more consolidation is             Integrated Archiving Awareness
possible using less physical
                                                                               0%   10%    20%         30%     40%    50%    60%
infrastructure, since all the
redundancy across each of these            “High deduplication is important since we want to remove tape. So the higher
data types is being eliminated by          deduplication effectiveness, the better the ability to back up across a wide area
the same deduplication process.            network. We can get backup to a site where the data does not reside.”
Deduplication effectiveness will of        – Storage pro at $5-$10B Industrial / Manufacturing firm.
course be influenced by a number
of factors, including how long the         “Deduplication will be imperative for us. We have deployed it in our legacy
data is retained, how quickly it is        products. I think the challenges are where the data is lost or corrupted.
changing between backup or                 Throughput, maintaining bandwidth for backups, has been an issue. In the
archive events, and the data or            legacy space, scalability has not been an issue. The challenge is where we
application type.                          are going to do dedup. We need a hands-off solution for remote locations. We
                                           will need to do dedup on the server.” – Storage pro at a $40B+ Telecom &
                                           Technology firm

       Entire contents © 2008, TheInfoPro, Inc. ♦108 West 39th Street, 16th Floor ♦New York, NY 10022 ♦212-672-0010 ♦        4
                                 Why Deduplication Technology Is Causing
                                   a Paradigm Shift in Storage Tiering
                                                         A TIP Research Paper

Customer Benefits of Data
                                          Deduplication Technology Adoption Patterns
Domain Deduplication Systems              According to the most recent research, 15% of Storage organizations have
                                          deduplication technology already deployed, and 59% of Storage organizations
High Capacity, High Throughput,           are planning on deploying deduplication technology by the end of 2008.
and Green – A 16-controller DDX,          Additionally, of the 15% that deployed the technology in 2007, over half plan on
using the DD580 controller,               expanding the deployment in 2008. Email archiving systems, disk-to-disk (D2D)
provides up to 12.8 TB/hour               snapshots, department file servers, and applications with high levels of
throughput and 8-20 PB of                 duplicated content are the initial targets.
capacity, depending on backup
policy and data change rate. With                        Chart 10: Deduplication Planned Adoption, Wave 8 through Wave 10
internal storage, it uses as little as
1.1 watts/TB of power and as little           *Wa v e 8 ( F a ll 2 0 0 6 )          22%        7%       15 %          16 %                    4 1%
as 9U of a 19” rack space per
                                                    Wa v e 9 ( S pring
                                                                               9%      9%    12 %              25%                         45%
Cost-effective Retention and                              2007)
Recovery – 20x-50x data
reduction means more data can be
                                              Wa v e 10 ( F a ll 2 0 0 7 )      15 %        15 %         14 %                30%                     25%
stored onsite, increasing retention
periods and improving data
recoverability – providing disk
storage at the price of tape.                       In Us e N o w                           In P ilo t / E v a lua t io n          In N e a r- t e rm P la n
                                                    In Lo ng- t e rm P la n                 N o t in P la n
Flexible DR Configuration – The
DDX Series complements all Data
Domain appliances, acting as a
hub for recovery images vaulted                                      Chart 11: Deduplication In Use Spending Forecasts
efficiently from up to 320 smaller
sites running Data Domain for DR            *Wave 8 (Fall 2006)                               58%                                       36%                    7%
and tape consolidation.
Ultimate Data Integrity – Data            Wave 9 (Spring 2007)                     29%                     29%                             43%
Domain’s Data Invulnerability
Architecture provides the best             Wave 10 (Fall 2007)               17%               28%                                   56%
defense against data integrity
issues. Continuous recovery
verification, along with extra levels
                                                                Le s s M o ne y                  A bo ut t he S a m e                   M o re M o ne y
of data protection such as dual
disk parity RAID (RAID 6),
continuously detects and protects
against data integrity issues during      Deduplication has sustained the #1 position on the TIP Storage Backup and
storage of backup data and                Recovery Technology Heat Index® for two consecutive waves of research, with
throughout the lifecycle of the           no signs of cooling off. This ongoing popularity shows a pattern similar to that of
backup data.                              D2D adoption in 2003, where D2D maintained a top Heat Index position for four
Easy Integration Into Existing            consecutive waves. Furthermore, deduplication is modernizing D2D with
Environment – Data Domain                 replication intelligence.
systems work with all leading
backup and archiving software,                                               Chart 12: Deduplication Heat Index Growth
and easily integrate into the
backup environment using NAS,                                                                                                Wave 8 Wave 9 Wave 10
OST, and / or virtual tape                                                   Technology                                       Rank   Rank   Rank
interfaces without any
                                             Deduplication*                                                                    14            1             1
infrastructure change.
                                             Virtual Tape Library (VTL) for Open Systems                                       2             3             2
Field-proven – Data Domain has
more deduplication customers than            4 Gbps Fibre Channel                                                              1             2             3
all other vendors combined.                  Remote Block Mirroring / Wide Area Replication (Async)                            7            14             4
References are available for many
applications, industries, and
geographies.                                *Technology was previously categorized as De-Duplication / Capacity Optimized Storage / Single
                                            Backup Instance Store

       Entire contents © 2008, TheInfoPro, Inc. ♦ 108 West 39th Street, 16th Floor ♦ New York, NY 10022 ♦ 212-672-0010 ♦                                       5
                                                   Why Deduplication Technology Is Causing
                                                     a Paradigm Shift in Storage Tiering
                                                                                       A TIP Research Paper

Deduplication With Data Domain

       Chart 13: Deduplication Roadmap Vendors, Wave 10
                                                                                                                    TIPNetwork Quotes on Data Domain
 Data Domain
                                                                                                                    “Strengths are in the compression and the reliability
                                                                                                                    has been great. They have exceeded on what they
        EMC                                                                                                         have promised. Great technical innovation.”
                                                                                                                    – Storage pro at a $5B-$10B Industrial /
   Diligent                                                                                                         Manufacturing firm

                                                                                                                     “This is the strongest dedup vendor on the market.
                                                                                                                    They are the only vendor that could substantiate
                                                                                                                    their performance claims. A lot of vendors talked
   FalconStor                                                                                                       the game, but did not have the product to back it.”
                                       In Use Now (NOT including pilots)                                            – Storage pro at a $5B-$10B Industrial /
                                       In Pilot / Evaluation                                                        Manufacturing firm
   SEPATON                             In Near-term Plan (up to Q1 '08)
                                       In Long-term Plan (Q2 '08 - Q4 '08)
                                                                                                                    “This is one of those products that matches the
        NEC                                                                                                         glossy marketing materials. It is very fast and
                                                                                                                    reliable, and does everything it says it does.“
    Symantec                                                                                                        – Storage pro at a $1B-$5B Telecom &
                                                                                                                    Technology firm
                0%           10%             20%            30%           40%           50%

Chart 14: Data Domain Ratings

                             St r at eg ic                    T echnical                     B r and /                      C o mp et it ive
                                                                                                                                                                 Poor        Excellent
                     I        V i sio n            I        I nno vat io n         I      R ep ut at io n           I       Po si t i o ni ng

Data Dom ain         I 1 1 1 1 I                            1 1 1 1 I                    1 1 0              0 I              1 1 1 0                                    Methodology
                             I nt er -                     F eat ur es/                 Pr o d uct                  Pr o d uct Per -             Pr o d uct             Vendor Ratings
                 I       o p er ab ilit y
                                                           F unct io ns
                                                                                        Qualit y
                                                                                                                      f o r mance
                                                                                                                                                R eliab ilit y          Boxes: The vendor
                                                                                                                                                                        ratings are based on
Data Dom ain     I       1 1 1 1 I                         1 1 1 1 I                   1 1 1 1 I                     1 1 1 1 I                  1 1 1 1                 a “normal curve,”
                                                                                                                                                                        with the number of
                                                                                                                                                                        boxes colored blue
                                                                                                                                                 Ease o f               determined by the
                               V a lue f o r                D e liv e ry a s                                            T e c hnic a l
                         I      M o ne y
                                                             P ro m is e d
                                                                                   I   S a le s F o rc e        I
                                                                                                                         S uppo rt
                                                                                                                                          I       D o ing
                                                                                                                                                B us ine s s
                                                                                                                                                                        distance of each
                                                                                                                                                                        vendor’s score from
                                                                                                                                                                        the mean of all
Data Domain              I    1 1 1 0 I                      1 1 1 1 I                  1 1 1 0 I                     1 1 1 0                   1 1 1 0                 vendors’ scores.

 What Are Best Practices in Choosing a Deduplication Solution?
 • Ensure ease of integration to existing environment.
 • Get industry-specific customer references.
 • Pilot the product / technology.
 • Understand the vendor’s roadmap.

 Data Domain is the leading provider of deduplication storage systems for disk backup and network-based disaster
 recovery. Over 1,500 companies worldwide have purchased Data Domain's storage systems to reduce costs and
 simplify data management. Data Domain delivers the performance, reliability, and scalability to address the data
 protection needs of enterprises from the data center core to the remote offices. Data Domain products integrate into
 existing customer infrastructures and are compatible with leading enterprise backup and archiving software.

        Entire contents © 2008, TheInfoPro, Inc. ♦108 West 39th Street, 16th Floor ♦New York, NY 10022 ♦212-672-0010 ♦                                                                   6

Shared By: