TheInfoPro Study-- Why Deduplication Technology is Causing a Paradigm Shift in Storage Tiering
Document Sample


Why Deduplication Technology Is Causing a
Paradigm Shift in Storage Tiering
A TIP Research Paper
TheInfoPro (TIP) Research Paper delivers findings on over 289 in-depth interviews
with Storage professionals at large enterprises, most of them among the Fortune
1000. A new TIP Storage Study is released every six months. The following
research paper is based on findings from studies conducted since December 2004.
Why Deduplication Technology Is Causing
a Paradigm Shift in Storage Tiering
A TIP Research Paper
Massive Data Growth
Top Storage Team Challenges
In TheInfoPro’s research, Storage professionals have consistently named
Standard IT practice calls for managing storage growth, proper capacity reporting and planning, and backup
keeping enough backups to
management as the top pains facing their Storage organizations. The use of a
recover from the last couple storage target that identifies redundancy in incoming data streams and only
months of data change in case of saves those segments identified as unique, thereby conserving space, would
human error, a virus, rippling errors
clearly help with respect to the #1 pain, managing storage growth. In addition,
in a database, or complete system
since deduplication is the process of examining data for patterns, then identifying
failure. As a result, recovery
and storing only the unique data, it can help with respect to capacity planning – a
storage can consume five to ten
key activity in capacity planning is the discovery of usage patterns. And finally,
times more capacity than the
archiving duplicated content reduces the backup load, easing the strain of
primary storage it is protecting.
backup management.
Data growth means more capacity
required to support multiple Chart 1: Top Storage Pain Points*
storage tiers – increasing Managing Storage Growth
management, cost, and
Proper Capacity Forecasting & Storage Reporting
complexity.
Backup Administration & Management
Issues With Tape Managing Costs
Storage Provisioning
For more than 40 years, tape has
Lack of Integrated Tools
been the only cost-effective option
Managing Complexity
for storing massive amounts of
backup and archive data. Experts 0% 10% 20% 30% 40% 50% 60%
say the odds of recovery from a
given tape backup are about 90%. “Tapeless and deduplication technology are top areas for improvement. We
The intense physical logistics of are changing our dependency on tape for recovery to archiving for recovery.”
the process, lack of reliability, vast – Storage pro at a $40B+ Financial Services firm
amounts of media that need to be
purchased over and over, and the “We back up 100 to 150 TB a night, and we think deduplication will solve this
drain on IT staff are all contributing pain.” – Storage pro at a $30B-$40B Financial Services firm
factors that make tape a liability.
Backup storage and data Chart 2: Number of Tapes to Do a Complete Single
Deduplication means longer
onsite retention, and thus
movement should be simple, safe, Data Center Recovery*
less reliance on tape.
automated, and online. It should
use existing IT infrastructure and Over 1,000 Tapes According to TheInfoPro’s
Wave 10 Storage research,
standard systems, software, and 501 to 1,000 Tapes almost 50% of Fortune 1000
networks. In other words, it should
Storage organizations
be like every other part of the IT 201 to 500 Tapes
require more than 1,000
plan. tapes for a single data center
51 to 200 Tapes
recovery. More than 15% of
Minimizing Tape a Backup team’s time is
0% 10% 20% 30% 40% 50%
By reducing the amount of storage focused on tape
required, deduplication enables management, while close to
Chart 3: Recovery Staff Time Allocation* 60% of Storage teams are
disk to be a cost-effective
Tape Management backing up over 450 TB of
alternative to tape. Data is
available online and onsite for content per month. As the
Backup Software Maintenance
pain of backup administration
longer periods, and restores Troubleshooting Backup Agents
continues to escalate,
become fast and reliable. Storing Coordinating Backup W indows deduplication technology
only unique data on disk also maintains the promise of
Troubleshooting Backup Media
means that data can be replicated minimizing this pain by
Executing a Recovery
to remote sites for network-efficient reducing necessary
DR and consolidated tape Adding a New Backup Host
hardware, complexity, and
operations. 0% 5% 10% 15% 20% 25% manual effort.
*Note difference in the different charts’ scales
Entire contents © 2008, TheInfoPro, Inc. ♦108 West 39th Street, 16th Floor ♦New York, NY 10022 ♦212-672-0010 ♦ 2
info@TheInfoPro.net
Why Deduplication Technology Is Causing
a Paradigm Shift in Storage Tiering
A TIP Research Paper
Why Deduplicate Data? Storage Initiatives
Eliminating redundant data can
Chart 4: Top Storage Initiatives*
significantly shrink storage
requirements and improve Tiered Storage Build Out
bandwidth efficiency. Enterprises Consolidation
typically store many versions of the
same information. In the context of Backup Redesign
backup and nearline data, there is Virtualization Adoption
a great deal of duplicate data. The
Technology Refresh
same data keeps getting stored
over and over again, consuming a 0% 5% 10% 15% 20% 25% 30%
lot of unnecessary storage space
(disk or tape), electricity (to power
and cool the disk or tape drives), Deduplication and Tiering
and bandwidth (for replication). One of the challenging aspects Chart 5: Tier 1, Tier 2, and Archive Tier
This creates a chain of cost and of deduplication is determining Capacity and Future Growth*
resource inefficiencies within the where to deploy the technology.
100% 100%
organization. In addition, as data Some end users talk about Tier
initially deploying out-of-band or Capacity
retention increases to satisfy
regulatory and legal discovery archiving applications, while at 80% Anticipated 80%
the same time, others are quite Growth
mandates, the situation is
exacerbated. Deduplication lowers excited about in-band or online 60% 60%
storage costs since fewer disks are deployments. The trend among 50% 44%
53%
needed, and shortens backup / the Fortune 1000 is to apply 39%
40% 31% 40%
recovery times since there can be deduplication first to the archive.
far less data to transfer. Not surprisingly, these archive 23%
deduplication tiers are projected 20% 20%
Deduplication Effects and to be the fastest-growing tiers for
Replication 2008. This growth helps
0% 0%
contribute to the popularity of
T ier 1 T ier 2 A rchive T ier
The effect deduplication has on tiered storage build-out, the top
replication and disaster recovery Storage initiative.
windows and tape consolidation
efforts can be profound.
“We are very interested in deduplication. It will provide a new tier for
Deduplication means a lot less
recoverability.” – Storage pro at a $1B-$5B Consumer Goods / Retail firm
data needs transmission to keep
the DR site up to date, so much
less expensive WAN links may be Backup Redesign
used. For remote offices, reliance
The promise of new systems that actively identify and help manage storage
on tapes and physical tape growth, the possibility of minimizing the ever-increasing tape management
transportation can be eliminated
nightmare, and the expansion of archiving and replication protection are key
altogether.
reasons why Storage teams cited backup redesign as a top Storage initiative.
Replication is fast since there is When end users describe their backup redesign goals and motivations, they talk
less data to send – only unique about consolidation and the desire to minimize their dependence on tape, while
new backup or archive data is looking to simplify replication and reduce the necessary level of effort expended on
replicated between sites. For the backup. Deduplication technology fits squarely with these goals.
most efficient time-to-DR, inline
deduplication and replication of Chart 6: Backup Redesign Goals*
deduplicated data will yield the
most aggressive and efficient Minimize Time Spent on Backup
results. In an inline deduplication Simplifying Replication
approach, replication happens
Minimize Tape Dependency
during the backup, significantly
improving the time by which there Transparent Archiving & Compliance
is a complete restore point at the
DR site. 0% 10% 20% 30% 40% 50%
*Note difference in the different charts’ scales
Entire contents © 2008, TheInfoPro, Inc. ♦108 West 39th Street, 16th Floor ♦New York, NY 10022 ♦212-672-0010 ♦ 3
info@TheInfoPro.net
Why Deduplication Technology Is Causing
a Paradigm Shift in Storage Tiering
A TIP Research Paper
State of Current Deduplication Environments
Deduplication Is a Storage
Fundamental Over the last two years, deduplication has started to solidify its role in the data
center. As mentioned on the previous pages, deduplication deployments have
The proliferation and preservation been targeted for email and semi-structured content with a high probability of
of many versions and copies of duplication. In their product evaluations, end users have mentioned that
data propel much of the products that sustain the highest duplication effectiveness are the most valued,
tremendous data growth most as noted in Chart 9. But this does not mean that introducing deduplication can
companies are experiencing. IT continue without any consideration for the impact on backup windows, recovery
administrators are left to deal with windows, and backup software integration, all of which follow closely behind
the consequences. Because deduplication compression / compaction effectiveness as the most important
deduplication addresses one of the deduplication functionalities.
key elements of data growth, it For the Storage teams that deployed deduplication technology in 2007, the
should be at the heart of any data average repository (compressed) is roughly 20 TB, and has an average
management strategy; it should be effectiveness of 20:1. This represents about 400 TB of content, making the
baked into the fundamental design ROI and TCO justification pretty simple – so simple, in fact, that end users are
of the system. Storage systems starting to demand deduplication technology in file systems, document
vendors who treat deduplication managers, email software, block storage arrays, and NAS. The span of 2008
merely as a feature will check off a and 2009 will clearly be an interesting time frame, one which will put a greater
box on a feature list, but may not, emphasis on storage arrays with intelligence, in addition to high capacity
in practice, deliver the benefits capability.
deduplication promises. Chart 7: Where Deduplication Technology Chart 8: Size of Deduplication Repository
Should Reside – F1000, Midsize Enterprise, (in TB) – F1000, Midsize Enterprise,
Europe Sample Europe Sample
Data Domain Deduplication
Storage Storage Array Over 30
Data Domain has made it very Backup Software
21 to 30
easy by creating a fast, File System
application-independent storage 11 to 20
system (attachable as a file server VTL
over Ethernet, OST, or as a VTL All of the Above Under 10
over Fibre Channel). No client
software or other configuration is 0% 10% 20% 30% 40% 50% 60% 0% 10% 20% 30% 40% 50% 60%
required. As a result, Data Domain Chart 9: Most Important Deduplication Functionality – F1000,
deduplication is transparent to the Midsize Enterprise, Europe Sample
backup and recovery process. High Deduplication Effectiveness
Data Domain systems can easily Optimization for Backup
be used with various data movers Backup Software Integration
and workloads, including non- Optimization for Recovery
backup data like email archives, Remote Site Based Local Recovery
reference data, and engineering Low CPU Consumption
revision libraries. More flexibility Optimization for Replication
means that more consolidation is Integrated Archiving Awareness
possible using less physical
0% 10% 20% 30% 40% 50% 60%
infrastructure, since all the
redundancy across each of these “High deduplication is important since we want to remove tape. So the higher
data types is being eliminated by deduplication effectiveness, the better the ability to back up across a wide area
the same deduplication process. network. We can get backup to a site where the data does not reside.”
Deduplication effectiveness will of – Storage pro at $5-$10B Industrial / Manufacturing firm.
course be influenced by a number
of factors, including how long the “Deduplication will be imperative for us. We have deployed it in our legacy
data is retained, how quickly it is products. I think the challenges are where the data is lost or corrupted.
changing between backup or Throughput, maintaining bandwidth for backups, has been an issue. In the
archive events, and the data or legacy space, scalability has not been an issue. The challenge is where we
application type. are going to do dedup. We need a hands-off solution for remote locations. We
will need to do dedup on the server.” – Storage pro at a $40B+ Telecom &
Technology firm
Entire contents © 2008, TheInfoPro, Inc. ♦108 West 39th Street, 16th Floor ♦New York, NY 10022 ♦212-672-0010 ♦ 4
info@TheInfoPro.net
Why Deduplication Technology Is Causing
a Paradigm Shift in Storage Tiering
A TIP Research Paper
Customer Benefits of Data
Deduplication Technology Adoption Patterns
Domain Deduplication Systems According to the most recent research, 15% of Storage organizations have
deduplication technology already deployed, and 59% of Storage organizations
High Capacity, High Throughput, are planning on deploying deduplication technology by the end of 2008.
and Green – A 16-controller DDX, Additionally, of the 15% that deployed the technology in 2007, over half plan on
using the DD580 controller, expanding the deployment in 2008. Email archiving systems, disk-to-disk (D2D)
provides up to 12.8 TB/hour snapshots, department file servers, and applications with high levels of
throughput and 8-20 PB of duplicated content are the initial targets.
capacity, depending on backup
policy and data change rate. With Chart 10: Deduplication Planned Adoption, Wave 8 through Wave 10
internal storage, it uses as little as
1.1 watts/TB of power and as little *Wa v e 8 ( F a ll 2 0 0 6 ) 22% 7% 15 % 16 % 4 1%
as 9U of a 19” rack space per
petabyte.
Wa v e 9 ( S pring
9% 9% 12 % 25% 45%
Cost-effective Retention and 2007)
Recovery – 20x-50x data
reduction means more data can be
Wa v e 10 ( F a ll 2 0 0 7 ) 15 % 15 % 14 % 30% 25%
stored onsite, increasing retention
periods and improving data
recoverability – providing disk
storage at the price of tape. In Us e N o w In P ilo t / E v a lua t io n In N e a r- t e rm P la n
In Lo ng- t e rm P la n N o t in P la n
Flexible DR Configuration – The
DDX Series complements all Data
Domain appliances, acting as a
hub for recovery images vaulted Chart 11: Deduplication In Use Spending Forecasts
efficiently from up to 320 smaller
sites running Data Domain for DR *Wave 8 (Fall 2006) 58% 36% 7%
and tape consolidation.
Ultimate Data Integrity – Data Wave 9 (Spring 2007) 29% 29% 43%
Domain’s Data Invulnerability
Architecture provides the best Wave 10 (Fall 2007) 17% 28% 56%
defense against data integrity
issues. Continuous recovery
verification, along with extra levels
Le s s M o ne y A bo ut t he S a m e M o re M o ne y
of data protection such as dual
disk parity RAID (RAID 6),
continuously detects and protects
against data integrity issues during Deduplication has sustained the #1 position on the TIP Storage Backup and
storage of backup data and Recovery Technology Heat Index® for two consecutive waves of research, with
throughout the lifecycle of the no signs of cooling off. This ongoing popularity shows a pattern similar to that of
backup data. D2D adoption in 2003, where D2D maintained a top Heat Index position for four
Easy Integration Into Existing consecutive waves. Furthermore, deduplication is modernizing D2D with
Environment – Data Domain replication intelligence.
systems work with all leading
backup and archiving software, Chart 12: Deduplication Heat Index Growth
and easily integrate into the
backup environment using NAS, Wave 8 Wave 9 Wave 10
OST, and / or virtual tape Technology Rank Rank Rank
interfaces without any
Deduplication* 14 1 1
infrastructure change.
Virtual Tape Library (VTL) for Open Systems 2 3 2
Field-proven – Data Domain has
more deduplication customers than 4 Gbps Fibre Channel 1 2 3
all other vendors combined. Remote Block Mirroring / Wide Area Replication (Async) 7 14 4
References are available for many
applications, industries, and
geographies. *Technology was previously categorized as De-Duplication / Capacity Optimized Storage / Single
Backup Instance Store
Entire contents © 2008, TheInfoPro, Inc. ♦ 108 West 39th Street, 16th Floor ♦ New York, NY 10022 ♦ 212-672-0010 ♦ 5
info@TheInfoPro.net
Why Deduplication Technology Is Causing
a Paradigm Shift in Storage Tiering
A TIP Research Paper
Deduplication With Data Domain
Chart 13: Deduplication Roadmap Vendors, Wave 10
TIPNetwork Quotes on Data Domain
Data Domain
“Strengths are in the compression and the reliability
has been great. They have exceeded on what they
EMC have promised. Great technical innovation.”
– Storage pro at a $5B-$10B Industrial /
Diligent Manufacturing firm
Technologies
NetApp
“This is the strongest dedup vendor on the market.
They are the only vendor that could substantiate
their performance claims. A lot of vendors talked
FalconStor the game, but did not have the product to back it.”
In Use Now (NOT including pilots) – Storage pro at a $5B-$10B Industrial /
In Pilot / Evaluation Manufacturing firm
SEPATON In Near-term Plan (up to Q1 '08)
In Long-term Plan (Q2 '08 - Q4 '08)
“This is one of those products that matches the
NEC glossy marketing materials. It is very fast and
reliable, and does everything it says it does.“
Symantec – Storage pro at a $1B-$5B Telecom &
Technology firm
0% 10% 20% 30% 40% 50%
Chart 14: Data Domain Ratings
St r at eg ic T echnical B r and / C o mp et it ive
Poor Excellent
I V i sio n I I nno vat io n I R ep ut at io n I Po si t i o ni ng
Data Dom ain I 1 1 1 1 I 1 1 1 1 I 1 1 0 0 I 1 1 1 0 Methodology
I nt er - F eat ur es/ Pr o d uct Pr o d uct Per - Pr o d uct Vendor Ratings
I o p er ab ilit y
I
F unct io ns
I
Qualit y
I
f o r mance
I
R eliab ilit y Boxes: The vendor
ratings are based on
Data Dom ain I 1 1 1 1 I 1 1 1 1 I 1 1 1 1 I 1 1 1 1 I 1 1 1 1 a “normal curve,”
with the number of
boxes colored blue
Ease o f determined by the
V a lue f o r D e liv e ry a s T e c hnic a l
I M o ne y
I
P ro m is e d
I S a le s F o rc e I
S uppo rt
I D o ing
B us ine s s
distance of each
vendor’s score from
the mean of all
Data Domain I 1 1 1 0 I 1 1 1 1 I 1 1 1 0 I 1 1 1 0 1 1 1 0 vendors’ scores.
What Are Best Practices in Choosing a Deduplication Solution?
• Ensure ease of integration to existing environment.
• Get industry-specific customer references.
• Pilot the product / technology.
• Understand the vendor’s roadmap.
Data Domain is the leading provider of deduplication storage systems for disk backup and network-based disaster
recovery. Over 1,500 companies worldwide have purchased Data Domain's storage systems to reduce costs and
simplify data management. Data Domain delivers the performance, reliability, and scalability to address the data
protection needs of enterprises from the data center core to the remote offices. Data Domain products integrate into
existing customer infrastructures and are compatible with leading enterprise backup and archiving software.
Entire contents © 2008, TheInfoPro, Inc. ♦108 West 39th Street, 16th Floor ♦New York, NY 10022 ♦212-672-0010 ♦ 6
info@TheInfoPro.net
Related docs
Get documents about "