Archival

Document Sample
Archival Powered By Docstoc
					Disk and Tape Storage Cost
          Models
            Richard Moore & David Minor

  San Diego Supercomputer Center (SDSC)‫‏‬
      University of California San Diego

                       Presented by:
                      Heba Saadeldeen


  SAN DIEGO SUPERCOMPUTER CENTER

                                   at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
                  Objectives & Outline
• Realistic cost estimates and projections are critical for
  storage users/providers
   • While much info is available on vendor hardware solutions …
   • Little info on integrated costs from storage provider
     perspective
• Estimate costs for at-scale‫‏‬provider‫‏‬to‫„‏‬store‫‏‬bits‟

• Outline
   • Caveats
   • SDSC‟s Storage Infrastructure
   • „Bit Storage‟ Cost Estimates
       • Tape Archival Storage
       • Disk Storage
   • Projections – with scale of storage facility and into the future
   • Conclusions
        SAN DIEGO SUPERCOMPUTER CENTER

                                         at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
             Caveats on Cost Estimates
• Sustainable storage
   • Annual cost w/ media/technology refresh & data migration
• Based on SDSC experience only
   • Include UCSD‟s indirect costs – will vary by institution
   • Other providers may have different cost structure
• Based on SATA disk and enterprise-class tape systems
• Cannot be specific about vendor costs or burdening, but
  relative fractions are reasonable
• This is a snapshot as of Jan 2007 - will decline w/ time
• Paper focuses only on single-copy ‘bit storage’ costs


 „Bit storage‟ is only a fraction of the cost to „preserve data‟
           SAN DIEGO SUPERCOMPUTER CENTER

                                            at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
   A Three-Stage Model for A Digital
      Preservation Environment
Ingest                         Store                                     Use




         „Bit Storage‟                •Replication
      •Capacity                           • Geographically distributed
            • Online (disk)               •System diversity
            • Archival (tape)
      • Single-copy reliability
      • Media/technology
           advances
      • Data migration

     SAN DIEGO SUPERCOMPUTER CENTER

                                        at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SDSC’s Storage Infrastructure




SAN DIEGO SUPERCOMPUTER CENTER

                                 at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SDSC’s archive shows exponential growth w/ a
  consistent doubling period of ~15 months




       SAN DIEGO SUPERCOMPUTER CENTER

                                        at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
  Cost Elements of Bit Storage Estimates
•SDSC‟s‫‏‬Cost‫‏‬Estimates‫‏‬Include
   • Annualized capital costs of the media (including disk controllers)
   • Other annualized capital costs
      • Disk: File system servers, Storage area network
      • Archive: Tape libraries, tape drives, disk cache, file system servers
   • Hardware maintenance and software licenses (annual)
   • Facilities costs – space, utilities (annual)
   • Labor to maintain & administer systems, migrate data (annual)
      • Disk: 3 FTE‟s to administer disk storage & SAN
      • Archive: 3 FTE‟s to administer archival systems
•Annual costs normalized by:
   • Total SATA disk deployed (~1.8 PB SATA)
   • Current volume of data stored on tape (~5 PB)
•Sustainable rate - $/TB/year
   • Assumed to be long-term storage w/ migration costs

           SAN DIEGO SUPERCOMPUTER CENTER

                                            at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
     Clarifications about their cost model
• Discounts are negotiated for capital purchase and
  maintenance
• Indirect burdening included in these costs on various cost
  elements and these burdens will vary by institution
• Storage system costs are based on several large-scale
  purchases over the last 18 months; there will be a wide range
  of system cost based on the timing, scale, and negotiations.
• Complex sub-issues are not considered like resource costs
  associated with each transaction (read/write),
  networking/bandwidth costs for user to upload/access data




          SAN DIEGO SUPERCOMPUTER CENTER

                                           at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Disk and tape storage cost elements
                                              • Media cost is not the
                                                  dominant cost (36%/20%)
                                              • Additional capital
                                                  infrastructure is
                                                   required (15%/33%)
                                              • Media + other capital
                                                 is ~half the total cost
                                                 (51%/53%)
                                              • Labor costs are a
                                                 significant cost (23%/20%)
                                              • Facilities costs modest
                                                 (11%/5%)


   SAN DIEGO SUPERCOMPUTER CENTER

                                    at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER

                                 at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
       How do costs scale with the size of
          the storage infrastructure?
• Economies‫‏‬of‫‏‬scale‫‏‬are‫‏‬significant‫‏‬as‫‏‬one‫‏‬moves‫‏‬up‫‏‬to‫“‏‬at-scale”‫‏‬
  installations ($/TB/yr decreases)
    • Vendor negotiations on media, other capital, maintenance
    • Fully utilizing servers, infrastructure and personnel
• Once‫‏‬infrastructure‫‏‬is‫“‏‬at-scale”,‫‏‬economies‫‏‬of‫‏‬scale‫‏‬slow‫‏‬down‫‏‬and‫‏‬the‫‏‬
  cost ($/TB/yr) levels off with installation size
    • Media, supporting capital, maintenance, facilities costs
    • Perhaps some weak economies of scale in these factors
    • Some “linear” costs occur in large quantum steps – e.g. hiring
      additional administrator, larger servers to handle load
• A portion of the cost elements (software licenses) are fixed with
  installation size => decreasing $/TB/yr for these elements
• So‫‏‬with‫“‏‬at-scale”‫‏‬installations,‫‏‬net‫/$‏‬TB/yr‫‏‬will‫‏‬level‫‏‬off‫‏‬and‫‏‬then‫‏‬slowly‫‏‬
  decline

             SAN DIEGO SUPERCOMPUTER CENTER

                                              at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
  What about trends in the relative cost of
           disk/tape storage?
• Historical trends in media costs
   • Actual purchases over SDSC‟s 20-year history indicate tape
     media cost/TB declines exponentially with halving time ~3 years
   • Apples-apples comparisons harder for disk, but halving time is
     shorter
   • If these trends continue, expect costs to converge within a few
     years
• Even as costs converge, there may be good reasons to
  maintain a few large-scale centralized tape archives
   • Notion that there‟s less risk to a tape cartridge than spinning disk




           SAN DIEGO SUPERCOMPUTER CENTER

                                            at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
      How will costs change in the future?
• Expect that exponential declines in media costs and other IT equipment
  will continue for a while
• Cost ($/TB/yr) will decline, but how much?
• Critical issue is which cost elements will scale with the declining media
  costs and which will not?
    • Most costs scale w/ media, but labor & facility costs may not scale well
• Cost elements that do not scale well w/ media will dominate future costs,
  even at the ‘bit storage’ level
    – And we expect that for the broader „storage‟ costs beyond bit storage,
      e.g. file management, labor costs will dominate!




             SAN DIEGO SUPERCOMPUTER CENTER

                                              at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
    Comparison with Commercial Services
• Many commercial companies are offering web-accessible
  storage services
• One example - Amazon S3 (aws.amazon.com/s3)‫‏‬
     • Cost structure (~April 2007) - $1800/TB/yr storage + upload
       $100/TB + download $130-180/TB + put/get/list transaction fees
     • # of copies and media not specified, but speculate 2+ disk copies
     • Don‟t know the capital/business model
     • No Guarantees - From AWS License Agreement
“Amazon and its affiliates are not responsible for any unauthorized access to, alteration of, or
   the deletion, destruction, damage, loss or failure to store any Content or other data which
   you submit in connection with your account. “



SDSC cost estimates are “in the ballpark” w/ commercial services


                SAN DIEGO SUPERCOMPUTER CENTER

                                                     at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
                                 Conclusions
• Initial‫‏‬caveat‫‏…‏‬Bit‫‏‬storage‫‏‬costs‫‏‬are‫‏‬only‫‏‬a‫‏‬fraction‫‏‬of‫‏‬the‫‏‬total‫‏‬cost‫‏‬for‫‏‬
  „digital‫‏‬preservation‟
    • Ingest and use phases not addressed
    • Only a portion of storage phase costs included
• SDSC‟s‫‏‬sustainable‫‏‬single-copy‫„‏‬bit‫‏‬storage‟‫‏‬costs:
    • ~$500/TB/yr for tape storage
    • ~$1500/TB/yr for disk storage
• Media costs are ~30%‫‏‬of‫‏‬the‫‏‬integrated‫„‏‬bit‫‏‬storage‟‫‏‬costs‫‏‬and‫‏‬total‫‏‬capital‫‏‬
  is ~50% of costs for both tape and disk
• Costs ($/TB/yr) increase, then flatten out and eventually slowly decline w/
  scale of installation
• Costs will decline with time, but critical issue is which elements do not
  scale w/ media/technology advances
• Disk/tape integrated costs are converging


             SAN DIEGO SUPERCOMPUTER CENTER

                                              at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:7/30/2011
language:English
pages:15