Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

0505 CR1 Silo Engineer Study

VIEWS: 3 PAGES: 20

									                 ESD-DISDA-RP-3001




    CR1 Silo Engineer Study




                Ken Gacke,
     SAIC Systems Engineer

         DIS Digital Archive
USGS National Center, EROS
                   May 2005
                                                            ESD-DISDA-RP-3001
                           Abstract
    This white paper details the status of the U.S. Geological Survey (USGS)
National Center for Earth Resources Observation and Science (EROS) nearline
mass storage system located in Computer Room #1. In fiscal year 2002, the
mass storage system changed Hierarchical Storage Management (HSM) from
Unitree to SGI’s Data Migration Facility (DMF). The DMF solution has been
successful; however, as the EROS compute infrastructure has moved toward low
cost Linux solutions it is appropriate to re-evaluate the hardware and software
infrastructure of the mass storage system.




                                  I
                                                                        ESD-DISDA-RP-3001
                 Document Change Summary
                           List of Effective Pages
            Page Number                                       Issue
                 Title                                       Original
             Abstract p. I                                   Original
      Doc. Change Summary p. III                             Original
            Contents p. III                                  Original
               p. 4 - 19                                     Original
                             Document History
  Document Number         Status/Issue        Publication Date          CCR Number
Mass Storage System
Trade Study                 Original             May, 2005




                                         II
                                                                                                                                          ESD-DISDA-RP-3001



                                                                        Contents

ABSTRACT ................................................................................................................................................................ I


DOCUMENT CHANGE SUMMARY................................................................................................................. II


CONTENTS............................................................................................................................................................. III


1.0      INTRODUCTION ...........................................................................................................................................4

PURPOSE ......................................................................................................................................................................4
MASS STORAGE HISTORICAL PERSPECTIVE ..........................................................................................................4

2.0      SYSTEM OVERVIEW...................................................................................................................................8

NEARLINE HARDWARE: .............................................................................................................................................8
HIERARCHICAL STORAGE MANAGER (HSM):.......................................................................................................8
COMPUTE SERVER HARDWARE: ...............................................................................................................................9

3.0      SYSTEM UPGRADE OPTIONS ............................................................................................................... 12

NEARLINE HARDWARE:.......................................................................................................................................... 12
HIERARCHICAL STORAGE MANAGER (HSM):.................................................................................................... 14
COMPUTE SERVER HARDWARE: ............................................................................................................................ 16
DISK CACHE: ............................................................................................................................................................ 18
FIBRE SWITCHES: .................................................................................................................................................... 18

4.0      CR1 SILO UPGRADE RECOMMENDATIONS................................................................................... 19




                                                                                     III
                                                                          ESD-DISDA-RP-3001



                                1.0 Introduction
Purpose

              This document outlines the current status of the USGS National Center for
          Earth Resources Observation and Science (EROS) nearline mass storage
          system located in Computer Room #1 and compares the current solution with
          alternative hardware/software solutions.
Mass Storage Historical Perspective

             EROS installed a large mass storage system in 1993 to allow users to store
          and retrieve files to tape without operator intervention. The original configuration
                                                                included an SGI server with
                                                                the UniTree Hierarchical
                                                                Storage Manager (HSM) and a
                                                                StorageTek ACS 4400 (silo)
                                                                with 3480 tape technology,
                                                                giving a total system capacity
                                                                of 1 Terabyte (TB). The
                                                                StorageTek silo has the
                                                                capacity to store 5,600 tape
                                                                cartridges and has the
                                                                flexibility to handle mixed
                                                                media types and various tape
          drive configurations.
              In order to meet storage requirements, the mass storage system has had
          several iterations of upgrades and now has a total capacity of 1 Petabyte (PB).
          The HSM includes a 500TB SGI Data Migration Facility (DMF) license, and the
          hardware consists of an SGI Origin 300 server with 4x500MHZ R12K CPUs,
          14TB front end disk cache, and a StorageTek PowderHorn tape library with eight
          StorageTek 9840A tape drives and three StorageTek 9940B tape drives. The
          table below compares the characteristics of the two tape drives.
          Drive       Capacity Performance Average                 Drive Cost     Media
          Type                             Data Acess                             Cost
          9840A       20GB        10 MB/sec       12 sec           $23K           $3.50/GB
          9940B       200GB       30 MB/sec       120 sec          $23K           $.40/GB




                                              4
                                                                                                                                    ESD-DISDA-RP-3001


    The mass storage system currently stores 160TB and the average monthly
 data archive rate is 8TB, and the average monthly data retrieval is 20TB. The
 graphs below show the usage trends of the mass storage system.
     Graph 1-1 details the data growth from the initial installation of the StorageTek
 tape library. From 1993 – 1996, the data stored on the system was mainly
 AVHRR data. In 1997, the tape drives were upgraded to handle the DOQQ
 dataset, and in 2001 the installation of 9940B tape drives allowed for a high
 storage growth rate that included Urban Area tiled datasets, Seamless static
 Oracle table space backups, Modis and EO-1 archive, and archive of numerous
 datasets.




Graph 1-1 -- CR1 Silo Storage Growth

                                                                       Nearline Data Storage

                    140

                    120

                    100
 Terabytes Stored




                                                                                                                                                General
                    80                                                                                                                          Archive
                                                                                                                                                Ortho
                    60                                                                                                                          PDS
                                                                                                                                                Total
                    40

                    20

                     0
                          Dec-93


                                   Dec-94


                                            Dec-95


                                                     Dec-96


                                                              Dec-97


                                                                          Dec-98


                                                                                       Dec-99


                                                                                                Dec-00


                                                                                                         Dec-01


                                                                                                                  Dec-02


                                                                                                                           Dec-03


                                                                                                                                      Dec-04




                                                                                   5
                                                                                                                              ESD-DISDA-RP-3001




                            Graph 1-2 shows the average monthly data archived and retrieved. In
                         general, the monthly data retrieval exceeds data ingest. The high data retrieval
                         implies that the storage system is not being used as an archive device. In many
                         cases the datasets stored are the working copy that is used to generate products
                         and to refresh the offline archive.




Graph 1-2 -- CR1 Silo Average Monthly Data Throughput

                                     Nearline Monthly Average Data Archive/Retrieve

                         25


                         20
    Terabyte Per Month




                         15
                                                                                                                                    Data Archived
                         10                                                                                                         Data Retrieved


                          5


                          0
                              1993

                                      1994

                                             1995

                                                    1996

                                                           1997

                                                                  1998

                                                                         1999

                                                                                    2000

                                                                                           2001

                                                                                                  2002

                                                                                                         2003

                                                                                                                2004

                                                                                                                       2005




                                                                                6
                                                                                            ESD-DISDA-RP-3001




           Graph 1-3 shows the increased transfer rate performance obtained even
       with the high storage growth rate. Performance has been able to increase
       with the strategic injection of new technologies throughout the systems life
       cycle. One caveat to note is that the average archive performance is lower
       due to the data compression processes of the weekly offsite backups. For
       example, excluding the offsite backup for 2005, the average archive transfer
       rate is 11MB/sec.


Graph 1-3 -- CR1 Silo Performance

                                              Nearline Average Transfer Rate

             12


             10


              8
    MB/Sec




              6                                                                                          Data Archived
                                                                                                         Data Retrieved
              4


              2


              0
                  1994

                         1995

                                1996

                                       1997

                                                 1998

                                                        1999

                                                               2000

                                                                      2001

                                                                             2002

                                                                                    2003

                                                                                           2004

                                                                                                  2005




                                                        7
                                                                        ESD-DISDA-RP-3001



                         2.0 System Overview


            Although the CR1 Silo mass storage system is successfully transferring up to
        2TB on a daily basis, there is a need to re-evaluate the system to take advantage
        of technology advances. The mass storage system can be subdivided into three
        main areas:

Nearline hardware:

            The StorageTek tape library is a 6,000 slot PowderHorn with eight 9840 tape
        drives and three 9940B tape drives. Currently there are 2,900 slots free.
           StorageTek has discontinued production of the PowderHorn library and is
        now shipping the next generation SL8500. It is anticipated that StorageTek will
        provide support for the PowderHorn through 2010.
            StorageTek’s current tape drive technology includes the 9840C tape drive that
        is 2x capacity and 3x performance of the current 9840A tape drives.
            In the fall of 2005, StorageTek will be releasing the next generation data
        capacity tape drive, Titanium, which stores 500GB per tape and has a120MB/sec
        transfer rate. The Titanium tape drive is a replacement of the existing 9940B tape
        technology with the Titanium drive having 2x capacity and 4x performance.
           The table below compares the various tape drive technologies (drives
        highlighted in blue are currently installed on the CR1 Silo).


        Drive Type Capacity Performance Average                  Drive       Media Cost
                                        Data Acess               Cost
        9840A        20GB        10MB/sec        8 sec           NA          $3.50/GB
        9840C        40GB        30MB/sec        8 sec           $23K        $1.75/GB


        9940B        200GB       30MB/sec        41 sec          $23K        $.40/GB
        Titanium     500GB       120MB/sec       Unknown         ~$23K       Unknown
        LTO-3        400GB       80MB/sec        72 sec          ~$10K       $.26/GB




Hierarchical Storage Manager (HSM):

           The HSM software is SGI Data Migration Facility (DMF). DMF was originally
        developed by Cray Computer and was ported to SGI IRIX when SGI purchased


                                           8
                                                                        ESD-DISDA-RP-3001


        Cray. The two companies have once again separated, with SGI retaining the
        DMF product. The CR1 Silo currently has a 500TB license and ongoing support
        is purchased from SGI with an annual maintenance cost of $18K.
           DMF has been extremely successful in the ability to handle the high rate of
        data storage growth and continued increase of data throughput. DMF support
        costs are low when compared with other HSM packages (ADIC Storenext and
        Sun SAM-FS). The main issue with the current DMF configuration is the
        business stability of SGI.


Compute server hardware:

           The current server is an SGI Origin 300 configured with four R12K 500MHZ
        MIPS CPUs, 2GB memory, GigE, 12 SCSI channels, four Fibre Channels. Disk
        cache consists of 2TB high performance fibre channel RAID, and 14TB lower
        performance SATA RAID.
            While SGI continues to support the IRIX line of servers, the company’s future
        is with the Altix Linux server. The Altix server utilizes Intel 64 bit CPUs and can
        scale up to 1,024 processors.
            The 2TB SGI TP9400 RAID storage system is near end of life. The RAID has
        functioned well, but should be replaced. The RAID disk cache and StorageTek
        9940B tape drives are connected to the server through three Brocade switches.
        Fibre Channel technology is moving toward the 4Gb interface; therefore, the
        switches will require replacement.
           The selection of HSM software dictates the server platform. The table below
        shows the server options for the various HSM software packages.


             HSM                Server Option 1 Server Option 2         Server Option 3
             SGI DMF            SGI Origin/IRIX    SGI Altix/Linux
             ADIC Storenext IA32/Linux (Dell)      IA64/Linux (SGI,     Sun/Solaris
                                                   HP, etc)
             Sun Sam-FS         Sun/Solaris
             AMASS              Sun/Solaris        SGI/IRIX



            The table below compares four HSM products. DMF is currently being used
        in Computer Room #1, and AMASS is currently being used within Computer
        Room #2. From internal experience, DMF provides a higher performance and
        more reliable HSM than AMASS. The ADIC StorNext is the next generation HSM
        and basically replaces the older AMASS development; therefore, it is included for
        comparison and not intended as a viable solution.


                                            9
                                                                                     ESD-DISDA-RP-3001




                    StorNext          AMASS                   SGI DMF                  SUN SAM-FS
  Support           ADIC              Resellers               SGI                      SUN/STK
  Cost – 500TB      $300K             $205K                   $160K                    $350K
  Cost Basis        Data Stored       Robotic Device          Data Stored              Archive Slots
  Maint Cost –      $90K              $30K                    $20K                     $75K
  500TB
  Platforms         SUN, SGI, Linux   SUN, SGI, NT, …         SGI                      SUN
                    …
  Library Support   Many – STK,       Many – STK, ATL, …      STK                      STK, Ampex, ???
                    ATL, …
 Multi Library      Yes               No                      Yes                      Yes
OS Layer             Application on   Within the Unix         Within the XFS file      V-node layer
                     top of the OS    Kernel                  system
Support SANs         Yes – StorNext   No                      Yes – SGI CXFS           Yes – Sanergy
                     File System
Scalability          Good             Poor *                  Good                     Good **

Sys                 Good **           Good                    Good                     Good **
Administration

System Recovery     Good **           Average – Metafile      Good – Database          Good **
                                      Database journal        journal, Metadata
                                      plus backup to          database stored on
                                      archive media           mirrored filesystem,
                                                              nightly backups
Volume Groups       Unknown           2048 volume groups      2048 File Systems –      256 File Systems
                                      – all share same disk   each with own disk       – each with own
                                      cache                   cache                    disk cache
Logging             Good **           Average                 Good                     Good **
Capability

Overall             Good **           Average – Poor          Good                     Good **
Performance                           performance once
                                      Disk Cache is full.
                                      Tape Starvation.
Media               Device Rate       3-4MB/sec               Device rate              Device rate **
Performance

Media Support       Tape/Optical      Tape/Optical            Tape                     Tape/Optical

File Read Ahead     Unknown           Yes, plus media is      No, however, can         Yes
on Archive Media                      left in drive           leave the archive
                                      indefinitely            media within drive
                                                              for specified # sec
Data File Access    File              Block                   File                     Block

Network Access      All protocols     All protocols           All protocols            All protocols
Disk Cache Size     Unknown           Unknown                 Unlimited                Unlimited
Disk Cache          Unknown           Single File             File system (may be      File system (may
                                                              striped or mirrored)     be striped or
                                                                                       mirrored)




                                                  10
                                                                                      ESD-DISDA-RP-3001


                      StorNext         AMASS                   SGI DMF                  SUN SAM-FS
Disk Cache             Site            No purging algorithm    Site Configurable        Site Configurable
Purge                  configurable    – once full, AMASS
                                       deletes oldest file
Indication of Files   Unknown          No                      Yes, but only on the     Yes, sfind
on Disk Cache                                                  local server             command
Multiple Data         Unknown          Yes, up to 4            Yes                      Yes, up to 4
Copies
Trash Can Utility     Unknown          No                   No, however the             Unknown
                                                            ability to restore data
                                                            from nightly backups
                                                            exists.
     * Database consistency check after crash can take hours, disk cache, etc
     ** Indicated within literature




                                                      11
                                                                        ESD-DISDA-RP-3001



                   3.0 System Upgrade Options


Nearline Hardware:

           Within the nearline storage architecture, there are a multitude of choices for
        upgrading the existing infrastructure. Below is a brief summary for some of the
        major components.
               1) Tape Library – The existing StorageTek Powderhorn will be supported
                  at least through 2010. The new high capacity 500GB per cartridge
                  Titanium tape drive will be supported in the Powderhorn. There is little
                  rationale to upgrade the tape library at this time. While StorageTek
                  has served EROS well, there are alternative tape library vendors such
                  as ADIC. Items that could change the rationale are:
                      a. EROS tape library consolidation – The StorageTek SL8500 can
                         be virtualized making it conceivable that all projects utilize a
                         central resource. Consolidation could include nearline (CR1
                         silo, DAAC), archive (LAM), and backups (CR1 Legato) type
                         applications.
                      b. LTO Tape drive technology – LTO technology is less costly
                         than the StorageTek enterprise tape drives. LTO is not
                         supported within the existing StorageTek Powderhorn, but is
                         supported in the next generation SL8500.
                              i. Although LTO has been successfully deployed at EROS
                                 for offline archive, there is concern that the LTO tape
                                 drive/media would not fare well in a heavy usage
                                 nearline environment. The CR1 Silo has experienced
                                 up to 140,000 mounts in a single month, and for FY05
                                 the average number of mounts per month is 40,000.
                              ii. With the StorageTek tape drives there is an advantage
                                  in that we can dump the drive registers to analyze the
                                  health of the drive and/or media.
               2) Capacity Tape Drive – The StorageTek Titanium tape drive is
                  scheduled for release in September 2005. The Titanium tape capacity
                  is 500GB and I/O performance is 120MB/sec. The Titanium tape drive
                  is not backward read or write compatible with the existing 9940B. In
                  addition to purchasing the tape drive, new media would be required
                  and existing data would need to be migrated. With greater than 2,000
                  slots available within the PowderHorn tape library, there is limited
                  rationale for upgrading to the Titanium drive at this time. Items that
                  could change the rationale are:



                                           12
                                                         ESD-DISDA-RP-3001


       a. The CR1 Silo currently has three 9940B tape drives. If usage
          patterns determine additional tape drives are required, it would
          be difficult to purchase old technology. Would recommend
          purchase a minimum of two Titanium tape drives plus
          maintaining the 9940B drives for a period of time.
       b. Project with large storage requirements that exceed the current
          capacity.
3) Access Tape Drive – The existing 9840A tape drives have been
   installed for greater than 5 years. The existing 9840A drives are not
   compatible with the new StorageTek SL8500 library. The current
   StorageTek drive is 9840C that is 2x the capacity and 3x the speed.
   The 9840C tape drive is backward read compatible and utilizes the
   existing media. With the decline in cost for disk storage, an argument
   can be made that access tape drives are no longer required. To retain
   the scalability of the mass storage system, it is recommended to
   maintain the data access tape drive technology. The table below
   summarizes the capabilities of each technology as it relates to the CR1
   silo (ie. 9840 media can be reused). The 9840C tape drive would
   allow all tape drives to be accessed via the fibre channel switch, which
   will help in maintaining system availability.


                StorageTek 9840C              Bulk RAID
    Cost        Replace current eight         Assuming an average cost
                drives with four 9840C        of $5/GB, the $100K would
                tape drives at a cost of      procure ~20TB.
                $100K
    Life        5yr plus                      3-4yr
    Cycle
    Data        Data not on disk cache,       Immediate data access for
    Access      average data access of 30     20TB of storage; however,
                seconds.                      data access to data on
                                              9940B tape would be 3-4
                                              minutes.
    Data        Existing 9840 media would 20TB storage. Cost for
    Capacity    provide 73TB storage.       additional storage is $5/GB.
                Cost for additional storage
                is $1.88/GB.
    Reliability Data stored on tape is not    Data stored within RAID;
                fault tolerant; however,      however, there is still risk of
                media failure is limited to   file system corruption (file
                single piece of media         system failure, multiple disk
                (40GB).                       drive failure, etc). Data


                           13
                                                                          ESD-DISDA-RP-3001


                                                                stored on tape, but would
                                                                require time (multiple days)
                                                                to recover.


Nearline Hardware Recommendation

            In the near term, it is recommended to upgrade from the current 9840A tape
        drives to the 9840C tape drive. With the increased performance of the 9840C,
        the number of tape drives can be reduced to four drives. Estimated total cost is
        $100K.
            Long term, the tape library and capacity tape drives will require replacement.
        Estimated cost of a 5,000 slot tape library is $250K. Estimated cost of high
        capacity enterprise tape drive is $23K. The new capacity tape drive will require
        new media to be purchased.
Hierarchical Storage Manager (HSM):

            The HSM upgrade options are listed in the following table, in which cost
        estimates are Level 0 (+/- 50%) to store 500TB.
            The current SGI DMF configuration is the most economical in that the 500TB
        license already exists plus the annual support cost of DMF is also substantially
        less than the other two HSM packages. The negative of DMF lies in the business
        stability of SGI. SGI continues a downhill slide with receding revenue and has not
        posted profits within the past ten quarters.
           The ADIC StorNext Data Manager is the HSM companion software to
        StorNext Filesystem (clustered file system) that is utilized by both Landsat and
        LPDAAC. The StorNext software is developed on open system architecture, and
        would fit well within the EROS architecture. There are many unknowns with the
        StorNext software; therefore, substantial integration and testing time would be
        required.
            The Sun SAM-FS solution requires a Sun/Solaris server. There are many
        unknowns with the SAM-FS software; therefore, substantial integration and
        testing time would be required.

        Table 3-1 -- 5 Year HSM Costs with 5% inflation

                      DMF          StorNext       SAM-FS
        Year 1       $23,000.00    $300,000.00    $350,000.00
        Year 2       $24,150.00     $90,000.00     $70,000.00
        Year 3       $25,357.50     $94,500.00     $73,500.00
        Year 4       $26,625.38     $99,225.00     $77,175.00
        Year 5       $27,956.64    $104,186.25     $81,033.75

           Total   $127,089.52    $687,911.25    $651,708.75




                                                 14
                                                                          ESD-DISDA-RP-3001


HSM Recommendation

          Since DMF is performing adequately and is the most economical, it is
       recommended to stay with DMF solution. If SGI falters, the intellectual property
       should be picked up, and we will need to make a decision at that time (DMF will
       continue to operate, but possibly without support). If the current 500TB license
       requires an upgrade, the HSM options should be reevaluated.
           From the discussion topics above, assign a numerical value to the importance
       of each criterion.
             Reliability. Each HSM has a high degree of data reliability. DMF has the
              advantage that it is incorporated within SGI’s standard XFS file system.
             Initial Cost. EROS already has a 500TB DMF license; whereas, the other
              HSM solutions require a new purchase.
             Maintenance Cost. Long-term maintenance is considerably less for
              DMF.
             Performance. SAM-FS is block oriented; therefore, data read access is
              better. DMF performance is good in that it is integrated within XFS,
              especially referencing metadata.
             Data Migration Risk. Data migration not required with DMF; therefore,
              there would be no risk. StorNext and SAM-FS have the ability to decipher
              the DMF database to read data from tape.
             Administration. All three HSMs have similar capabilities.
             SAN Support. DMF integrates directly with SGI’s CXFS, StorNext in
             Leverage Current Infrastructure. SAM-FS would need to be procured.
             Vendor Financial Stability. ADIC and Sun are currently more stable
              than SGI. SGI’s DMF currently has an install base of greater than 300,
              and the growth rate is approximately 30 per year.


                Selecton Criteria          RW # StorNext   SAM-FS   DMF   StorNext   SAM-FS   DMF
                    *Reliability            10      7        7        8      70         70     80
                   *Initial Cost            10      3        3        9      30         30     90
               *Maintenance Cost            10      3        4        9      30         40     90
                  *Performance               9      8        9        8      72         81     72
              *Data Migration Risk           9      8        8       10      72         72     90
                 Administration              8      7        7        7      56         56     56
                 Support of SAN              8      9        7        8      72         56     64
         Leverage Current Infrastructure     8      6        6        9      48         48     72
            Vendor Financial Stability       8      9        7        5      72         56     40
             Total Weighted Score                                           522        509    654

              * => Required Items
          RW # => Relative Weight 1-10




                                               15
                                                                         ESD-DISDA-RP-3001



Compute server hardware:

             The HSM current server is an SGI Origin 300 with 4x500MHZ CPUs running
        the IRIX operating system. The system routinely archives and/or retrieves 2TB of
        data per day. In order to obtain full tape drive performance, the system is
        required to sustain 250MB/sec. With the continued storage growth the system
        utilization is greater than 90% indicating that additional compute resource is
        required.
             The current SGI Origin 300 is not fault tolerant, and the SGI support contract
        is based on 9x5 support. Locally, we do not have any spare hardware for the
        Origin 300. A hardware failure could result in the system being down for a period
        of multiple days. An example of this would be a hardware failure on a Friday
        evening with a Monday holiday, the system could be down through the following
        Wednesday. Current server is a single C-Brick (CPU) connected to a P-Brick
        (PCI I/O) via a high-speed NumaLink interconnect. If either the C-Brick or P-Brick
        fail, the system would be down. Options to reduce the exposure to system
        downtime include the following:
               1) SGI provides the capability to operate DMF in a cluster environment;
                  however, it is costly in that it requires a second server along with the
                  CXFS clustered file system, and SGI FailSafe software.
               2) Replace current system with two Origin 350 C-bricks connected with
                  the high-speed NumaLink interconnect. If one of the C-bricks fails, the
                  system could be booted up on a single C-brick. Note that I/O channels
                  on the failed C-brick would also be unavailable.
               3) Procure a set of onsite spares. Basically would require the purchase
                  of a spare system. System could be used as the development server.
             Assuming the HSM selection is DMF, the server is required to be either an
        SGI Origin running IRIX, or an SGI Altix running Linux. The SGI Altix server
        utilizes the 64-bit Intel CPUs; therefore, even though it is Linux, the ITS support
        would be different from the standard Dell/Redhat configurations. All Linux
        updates would be supplied by SGI.


                                Origin/IRIX                    Altix/Linux
             Server Cost        8 CPU system cost              8 CPU system cost
                                estimated at $60K              estimated at $55K
             Installation       Quick installation with        Integration time required with
                                downtime limited to 2-4        new architecture.
                                hours.
             ITS Support        Similar to current system.     ITS cost savings not
                                IRIX is well known by          expected in that OS releases
                                current staff.                 come from SGI.



                                            16
                                                                            ESD-DISDA-RP-3001


                                  Origin/IRIX                     Altix/Linux
             HSM Support          Same OS as current              DMF was ported to the Altix
                                  system; therefore, limited      Linux one year ago. The
                                  risk with new server.           DMF metadata database
                                                                  would require a conversion.
             CPU                  Clock speed has                 Intel IA64 faster than the
             Performance          increased by 37%.               MIPS.
             I/O                  I/O performance similar to      I/O performance expected to
             Performance          current system.                 be similar to the Origin
                                                                  server.
             Compilers            Floating license C, C++,        C compiler would need to be
                                  Fortran available.              purchased.
             Local Software       With the same OS, no            Software applications would
             Applications         porting of software is          need to be ported. Minimal
                                  required.                       modifications expected, time
                                                                  estimate of 1 person month.
             Local Hardware Hardware engineers are       Rely more heavily on the
             Engineer       familiar with the SGI Origin SGI field engineer located in
                            architecture                 Omaha.
             Long term            SGI will support IRIX at        75% of SGI sales are now
             support              least through 2010.             the Altix.


Server Recommendation

            Assuming the HSM remains DMF, the server options are limited to an SGI
        Origin server running IRIX, or an SGI Altix server running Linux. From the
        weighted table below, it is basically a toss up as to which server is preferred. The
        SGI Origin would allow for an easy transition. The SGI Altix would fit better with
        the EROS Linux architecture; however, it would require more effort to integrate.


                   Selecton Criteria         RW #   Origin     Altix   Origin    Altix
                           *Cost              10      6         8       60        80
                        Installation          7       9         5       63        35
                       *ITS Support           10      8         8       80        80
                      *HSM Support            10      9         8       90        80
                   CPU Performance            9       7         9       63        81
                    I/O Performance           9       8         8       72        72
                         Compilers            4       9         7       36        28
               Local Software Applications    8       9         6       72        48
                Local Hardware Engineer       7       8         6       56        42
                  Long Term Support           10      3         9       30        90
                 Total Weighted Score                                   622      636
                  * => Required Items
              RW # => Relative Weight 1-10




                                             17
                                                                        ESD-DISDA-RP-3001




Disk Cache:

            Additional disk cache improves performance by reducing the amount of data
        that requires tape access. Historically, the CR1 Silo has operated in a disk cache
        ratio of 2%, but with the recent installation of the TP9300S the disk cache ratio is
        currently 10%. The higher disk cache ratio is now required in that several
        projects, such as Emergency Response and Information Access Data
        Development (IADD), make the data available to users on the internet.
            The current SGI TP9400 RAID was installed in 2002 and consists of 40x73GB
        fibre channel disk drives. The TP9400 RAID is a high performance storage
        solution that is used to store the DMF metadata database and first tier disk cache.
        The TP9400 RAID is no longer upgradeable, and SGI support is anticipated to
        terminate in 2008. The current annual maintenance cost for the TP9400 RAID is
        $5.6K, which has been paid up through 4/30/2006.
            In April 2005, the high performance disk cache was augmented with the
        installation of an SGI TP9300S RAID with 70x250GB SATA disk drives. The
        TP9300S RAID is intended for bulk storage and has lower performance than the
        TP9400. The TP9300S is configured as both first and second tier disk cache.
        The TP9300S was delivered with a 3yr warranty; therefore, there is no support
        cost until April 2008.


Disk Cache Recommendation

            It is recommended to replace the existing high performance TP9400 RAID
        with new disk storage technology. With the current maintenance paid through
        4/30/2006, this upgrade would be best to wait until January 2006. To maintain
        existing performance level, the new RAID shall be configured with fibre channel
        disk drives. A 14x300GB RAID would reduce the amount of rack space to a
        single tray rather than five trays (80% less rack space). The new RAID could be
        delivered with a 3yr warranty to help reduce ongoing support costs.
Fibre Switches:

            The two 16-port Brocade fibre switches are 2Gb. As 4Gb switches and
        peripherals become the standard, the two existing switches should be replaced
        (FY06 or FY07).




                                           18
                                                                  ESD-DISDA-RP-3001



4.0 CR1 Silo Upgrade Recommendations

   The table below summarizes the recommended upgrades for the CR1 Silo
architecture.
                                        Time
             Description   Priority    Frame    Est Cost                    Notes
                                                             Four drives plus second fibre
                                                             switch. Current maint is paid
9840C Tape Drives          1          FY05        $100,000   through 10/23/05
                                      FY05 or                SGI Origin or Altix. Current maint
DMF Server                 2          FY06         $60,000   is paid through 4/30/06
                                      FY05 or                Replacement of high performance
Disk Cache                 2          FY06         $80,000   TP9400 RAID
                                      FY06 or                Probably StorageTek Titanium, but
Capacity tape drives       3          later       $100,000   continue to monitor LTO
                                      FY07 or                Tape library replacement of
Tape Library               4          later       $250,000   existing PowderHorn
                                      FY07 or
HSM Software               5          later       $300,000 Monitor HSM options




                                19

								
To top