CECS FS hard drive crash data recovery

					CECS 398 FS2001
10 October 2001
                                  Teraserver Construction
                                    Problem Definition
Staff: 1)      Rob Neff
       2)      Darrel Sharpe
       3)      Matt Todd
Mentor:        Dr. K. Palaniappan (University of Missouri-Columbia professor)

                       Many researchers in the MU CECS visualization lab manage and             Formatted: Indent: Left: 1.5"
                       manipulate hundreds of large multi-gigabyte three-dimensional
                       image files with a program called Kolam. Up until now, they have
                       not been able to easily access all of the files because there is not
                       enough storage space in a central accessible location. The
                       visualization lab needs a server with a terabyte (1000 gigabytes) of
                       storage capable of speedily delivering the image content it holds to
                       computers in the lab and throughout the CECS building.


Goals and Objectives: To create a server that is, in order of importance: A) inexpensive,
                      B) high capacity (a terabyte) to accommodate large files, C)
                      reliable, D) versatile and E) fast. The basic purpose of the server is
                      to retain large amounts of large files, and run the program that
                      accesses those files for client systems.

                       The system’s general performance will be officially benchmarked
                       by Dr. Palaniappan’s Kolam software, a large scale terrain
                       mapping software with highly intensive demands on the host’s
                       processing power as well as the host’s ability to churn out a large
                       amount of data at the same time. The “unofficial” software to be
                       used to benchmark will likely include various third-party
                       applications commonly accepted in the hardware community, at
                       least until the Kolam software is adapted to fit on the Teraserver
                       platform.

Putting the Terraserver Problem in ContextConstraints:        The system must 1) have a         Formatted: Indent: Left: 0"
terabyte of available disk space, 2) perform well in the regions of speed and reliability, 3)
cost under $5000, and 4) be able to be mounted by client MacOS, Windows, and *NIX
clones, though there is no constraint on the type of OS the system itself runs (so long as
the external clients are able to mount the file share).


Background: Since the system is intended to serve outright massive data files for a
graphic rendering program, it must be both filled with disk space and CPU power to the
limit of the budget allotted.
:
                    It is difficult to create an affordable server that carries a terabyte of
                    disk space. Although the web contains the home page of many
                    attempts at a Teraserver, they are usually large capacity but low
                    performance solutions, usually a single CPU and low bandwidth
                    oriented type systems. Commercially Mmanufactured servers of
                    quality containing a terabyte of data have a retail price running
                    from $7500 to $15,000.
                    Typically, the However, these commercial storage systems must
                    carry SCSI components in order to fit obtain efficient data flow
                    rthe requirements businesses demand of both speed and capacity,
                    since as SCSI components are the most easily expandable, fastest,
                    and lowest CPU-utilizing hard drive access products.
                    Unfortunately, they are the most expensive, especially when
                    compared to the inexpensive storage medium for consumers called
                    EIDE.. Unfortunately, Tthe pricing of these SCSI components and
                    unavailability of large SCSI drives (at least at a feasible price)
                    makes the construction of an economic (under $5000) server with
                    our storage requirements entirely impossible at this point in timee.
                    For example, the An empty SCSI drive array alone alone in SCSI
                    would cost at least $9000.. Another option, Ffiber channel, also
                    while offerings incredible speed, but alsoalso it too requires offers
                    incredible pocketbook punishment (also hovering around a
                    minimum of $9000 for just the a drive array). Quantum, a leading
                    storage-solutions company, offers an “affordable” RAID 5
                    Teraserver RAID with a small OS footprint for $14,999, and it is a
                    SCSI-based solution ( called the Snap Server 12000.). Obviously,
                    it will be a considerable undertaking to build a capable teraserver
                    with $5000.

Feasible Options:                                                                               Formatted: Indent: Left: 0"
                    The only Our only real storage optionother option would be is a
                    series of [EIDE, Ultra ATA, UDMA] type devices in a RAID
                    setting. IDE RAID (Redundant Array of Independent Disks)
                    allows multiple drives in arrays to behave as one, either increasing
                    performance and capacity, or providing redundant backup. There
                    are at least 10 different modes of RAID: 0, 1, 2 (does not
                    commercially exist), 3, 4, 5, 6, and 7. RAID modes 10, 53, and
                    0+1 are combinations of other modesmodes 0 and 1.
                                                                                                Formatted: Indent: Left: 1.5"
                    Here are descriptions of the different RAID modes: The following
                    RAID modes are those under consideration:
                                                                                                Formatted: Font: 12 pt
                       RAID-0 (disk striping): Users see a large, “virtual” drive
                                                                                                Formatted: Indent: Left: 1.5", Tab stops:
                        actually composed of an array of drives with data striped across        1.75", List tab + Not at 0.5"
                        all drives of the array (maximum of eight drives per array). No         Formatted: Bullets and Numbering
                        loss in usable capacity. This offers the best performance but no
                                                                                                Formatted: Font: (Default) Times New
                        fault tolerance.                                                        Roman, 12 pt
                                                                         Formatted: Font: 12 pt
   RAID-1 (disk mirroring): Second set of drives duplicates the
                                                                         Formatted: Font: (Default) Times New
    first set for maximum data protection (50% usable capacity).         Roman, 12 pt
    There is no striping. Read performance is improved because           Formatted: Font: 12 pt
    both sets of disks can be read at the same time. Write               Formatted: Font: (Default) Times New
    performance is the same. Best fault tolerance.                       Roman, 12 pt
                                                                         Formatted: Font: 12 pt
   RAID-3 (disk striping with parity): For each array, an entire
    disk is reserved for parity checking. RAID-3 uses embedded           Formatted: Font: (Default) Times New
                                                                         Roman, 12 pt
    error checking (ECC) information to detect errors. Data
                                                                         Formatted: Font: 12 pt
    recovery is accomplished by calculating the exclusive OR
    (XOR) of the information recorded on the other drives. Since         Formatted: Font: (Default) Times New
                                                                         Roman, 12 pt
    an I/O operation addresses all drives at the same time, RAID-3
                                                                         Formatted: Font color: Black
    cannot overlap I/O. For this reason, RAID 3 is best for single-
                                                                         Formatted: Font: 12 pt
    user systems with long record applications.
                                                                         Formatted: Font color: Black
   RAID-4 (large striping with parity): RAID-4 is similar to            Formatted: Font: Times New Roman, 12 pt
    RAID-3 but applies large stripes to arrays, meaning it can read      Formatted: Font: 12 pt
    records from any single drive. This large striping allows the
    advantage of overlapped I/O for read operations. Since all
    write operations have to update the parity drive, no I/O
    overlapping is possible during write operations. RAID-4 offers       Formatted: Font: 12 pt
    no advantage over RAID-5.                                            Formatted: Font: (Default) Times New
                                                                         Roman, 12 pt
   RAID-5 (large striping with rotating parity): For each array, the    Formatted: Normal, Indent: Left: 1.5", Space
    size of one disk is reserved for parity checking but the parity      Before: 7.5 pt, After: Auto, Outline numbered
    checking storage is spread between disks. This is calling            + Level: 1 + Numbering Style: Bullet + Aligned
                                                                         at: 0.25" + Tab after: 0.5" + Indent at: 0.5",
    rotating parity, and it addresses the RAID-4 problem of              Tab stops: 1.75", List tab + Not at 0.5"
    overlapping I/O during write operations so all read and write        Formatted: Font: 12 pt
    operations can be overlapped. RAID-5 stores parity
    information but not redundant data (but parity information can
    be used to reconstruct data). RAID-5 requires at least three and
    usually five disks for the array. It's best for multi-user systems
    in which performance is not critical or which do few write
    operations.
                                                                         Formatted: Indent: Left: 1.5", Tab stops:
   RAID-6 Similar to RAID-5 but includes a second parity                1.75", List tab + Not at 0.5"
    scheme that is distributed across different drives and thus
    offers extremely high fault- and drive-failure tolerance. There
    are few or no commercial examples currently.                         Formatted: Font: (Default) Times New
                                                                         Roman, 12 pt, Font color: Auto
   RAID-7 This type includes a real-time embedded operating             Formatted: Font: (Default) Times New
    system as a controller, caching via a high-speed bus, and other      Roman, 12 pt
    characteristics of a stand-alone computer. One vendor offers
    this system.                                                         Formatted: Font: 12 pt

   RAID-10 (or RAID 1+0). This type offers an array of stripes in
    which each stripe is a RAID-1 array of drives. This offers
    higher performance than RAID-1 but at much higher cost. If
    one drive goes down, the array can still operate at 100%
    capacity and data loss will only occur if both drives in a
    mirrored pair go down.
   RAID-01 (or RAID 0+1). The opposite of RAID-10, this type
    offers a mirrored pair of drive arrays (RAID-1) each consisting
    of a drive array of stripes (RAID-0). This identical
    performance to RAID-10, but a different level of data integrity
    as the loss of any two drives on opposite sides of the RAID-1
    array will result in complete data loss.
                                                                        Formatted: Font: 12 pt
   RAID-53. This type offers an array of stripes in which each
                                                                        Formatted: Indent: Left: 1.5", Tab stops:
    stripe is a RAID-3 array of disks. This offers higher               1.75", List tab + Not at 0.5"
    performance than RAID-3 but at much higher cost.                    Formatted: Font: (Default) Times New
                                                                        Roman, 12 pt
Since RAID modes 3 and 5 are the two modes seriously under              Formatted: Normal, Indent: Left: 1.5", Space
consideration for the teraserver, we will describe them further.        Before: 7.5 pt, After: Auto


RAID 3 is essentially striping with a disk devoted to parity
checking. Parity is created on writes, and checked during reads.
Although high in read/write rates and relatively reliable parity
checking (therefore highly efficient), this mode suffers from severe
CPU and main memory resource demands because of its fairly
complex design. In the following illustration, you can see that 2
bytes are stored across the disks, with the parity checking bytes all
on the same disk.

                                                                        Formatted: Font: Italic
                                                                        Formatted: Centered, Keep with next
                                                                        Formatted: Font: Italic




                                                                        Formatted: Caption, Centered
         RAID-3 Array of Hard Disks


RAID 5 is an array of independent data disks with distributed
parity blocks. Mode 5 is extremely fast, but also extremely
complex and more prone to errors than 3 or 1 due to its parity-
checking scheme. As you can see below, bytes are no longer
divided between disks, but are grouped together. The parity bits
no longer have their own disk, but are split between the disks.
                                                                    Formatted: Keep with next




                                                                    Formatted: Caption, Centered
        RAID-5 Array of Hard Disks 1


Since the access time in RAID mode 5 is supposedly superior to
that of RAID 3, and their transfer rate is comparable, mode 5
should be the theoretical choice (which is fortunate, considering
the pricing of RAID 3 cards).
CPU and Motherboard Options:                                                              Formatted: Indent: Left: 0"
                   CPU load is a concern, since client systems will result in high
                   levels of separate and inefficient CPU use, since clients will spawn
                   separate Kolam instances. Symmetric Multi Processing is a
                   possible solution. In a $5000 budget, only single and dual
                   processor configurations will be looked at. A quad processor
                   configuration would require the use of Intel Xeon processors at the
                   sacrifice of a great deal of clock speed and money ($1600 for 4
                   800 MHz Xeon CPUs @ 100 MHz bus, $1800 for a quad board).
                   So that leaves the dual configuration, since it is more desirable
                   than a single CPU configuration.

                     The major decision in the dual configuration is between the two
                     CPU types usable: AMD and Intel. Intel CPUs (we are limited to
                     Pentium III’s in dual CPU configurations for the time being) have
                     a lower degree of heat dissipation, have a built-in thermal
                     protection solution, and generally have a better selection of
                     motherboards to choose from.
                                                                                          Formatted: Centered, Keep with next
                                   An Athlon XP CPU
                                                                                          Formatted: Keep with next




                                                                                          Formatted: Caption
                            A Pentium III CPU Front and Back


                     On the other hand, AMD’s CPUs have higher performance, higher
                     price/performance, negligible heat, much better floating point
                     performance, and are much more available at higher clock speeds.
                     The heat difference between the two competitors is neglible.
                                                                   Formatted: Centered, Keep with next
             An Athlon XP CPU
                                                                   Formatted: Indent: Left: 0", Keep with next




                                                                   Formatted: Caption, Indent: Left: 0"



As mentioned before, AMD’s selection of motherboards is
somewhat limited, particularly in the dual CPU market. Luckily,
the one current manufacturer of AMD dual CPU boards is also one
of the most highly esteemed: Tyan. As of the writing of this
article, only two boards were available: the Tyan Thunder K7 and
the Tiger MP.
                                   Tyan Thunder K7 1




                     The Tiger Thunder K7 is more expensive, and really not worth it
                     feature-wise (offers a few perks here and there for slim line
                     servers). Both offer 64 bit PCI bus slots, which is important for
                     the RAID card and the gigabit Ethernet adapter.

                     It would also be desirable to have a hot swappable system.
                     However, a hot-swapping RAID enclosure for 8 drives would cost
                     at least $2000 by itself, without hard drives or additional hardware.
                     Unfortunately, if the system is not hot-swappable, it must be shut
                     down to replace any failing hard disks. Considering the user base
                     for this server, this is not an entirely bad thing. Since there should
                     be less than 10 users using the system (probably 2 or less at a given
                     time), there should always be a foreseeable downtime in which to
                     replace failed or failing drives or components.
                                                                                              Formatted: Indent: Left: 0"
RAID Controller Options:
                    The RAID controller suggested by San Diego Supercomputer
                    Center is a 3Ware Escalade 6000; a less recent edition of the 7800
                    (supports ATA100 and 64 bit PCI). The Escalade is a more
                    affordable, more documented, and more available solution that is
                    very appealing, and supports up to eight drives.
   Promise SuperTRAK66 (as an example) 1




Choosing hard drives, for the time being, was relatively simple:
Maxtor D540X 160 GB model was the only EIDE HDD that was
able to meet the capacity demands in one controller, which made
the server design much simpler, thermally and spatially. Of
course, it helped that many of the Teraserver projects
recommended the drive.

     Maxtor DiamondMax D540X 160GB 1




The rest of the system should remain under typical server
construction guidelines, with the exception of offering both high
bandwidth (gigabit) and low bandwidth (10/100 megabit)
solutions.

With all the hard drives, there is a vast power demand and size
demand put upon the system chassis. For the time being, the
recommendation of the San Diego Supercomputer Center will
suffice- the CalPC Super Server. It’s very spacious, and has more
than enough power to fit our project, and it’s affordable.
                              CalPC Super Server Chassis 1




                     The Ethernet controllers, while integral to the system’s interaction
                     to the outside world, are more of a concern with driver support
                     than performance. So, it was a tough consideration between
                     3Com’s and Intel’s offerings, since both have well developed
                     drivers for Windows and Linux. In the end, however, Intel proved
                     to be a superior choice for two reasons: their 10/100 card was an
                     all around price/performance victor, and both cards had lower CPU
                     utilizations under intensive network activity.

                       Intel PWLA8490T Gigabit Ethernet adapter 1




A Brief Note on File SystemsServer Operating System Options:
                      Under consideration are Red Hat Linux 7.0 with its X-File System
                      and Windows 2000 Server with the NT File SystemS-. wWhile
                     XFS would be ideal under the Linux environment, under Windows
                     2000NT there is not choice but NTFS, which is an extremely fault
                     tolerant file systemFS. Although theoretically not as fast as XFS,
                     NTFS will probably end up being the choice since: 1) poor driver
                     support under Linux makes time to development longer, 2) there is
                     better large drive support under NTFS, and 3) for the time being, it
                     would be easier to use Windows 20002k to benchmark and initially
                     set up the server, although there probably will be a migrate the
                     server ion to Linux.

Constraints:                                                                                 Formatted: Indent: Left: 0"
                     As you can see from our options above, we will have to make
                     many key hardware decisions in the coming month. All of the
                     decisions will be made around the following constraints. The
                     system must 1) have about a terabyte of available disk space, 2)
                     cost around $5000, 3) be able to be mounted by client MacOS,
                     Windows, and *NIX clones,4) deliver image data to workstations
                     in a reasonable time, and 5) prove its reliability by staying up for
                     months at a time. There is no constraint on the type of OS the
                     system itself runs (so long as the external clients are able to mount
                     the file share).

                     Since the system is intended to serve massive data files for a
                     graphic rendering program, it must be both filled with disk space
                     and CPU power to the limit of the budget allotted.

Selected approach:   Although, by the time of actual construction, components will be
                     more readily available and more economical and will allow for
                     variations on this original plan. However, this should not alter the
                     basic technologies useddescribed above; rather it should these
                     technologies will be updated in provide better newer and more
                     efficient variations formsthereof,. as Tthere are no severely
                     scenescenario-altering devices ready for near a commercial launch
                     at this time that will be usable affordable, including (Intel’s
                     Itanium and AMD’s Clawhammer will probably not be affordable
                     on launch chips).


       Components:
                     Workstation model (Kolam runs on client and server):
                     Motherboard: Tyan Tiger K7 S2460
                     CPUs:        AMD AthlonMPAthlon XP 1800+ (2)
                     RAM:         Crucial/Micron 512 MB PC2100 DDR RAM (4)
                     RAID:        3Ware Escalade 7800
                     Chassis:     CalPC Super Server Chassis
                     HDDs:        Maxtor DiamondMax D540X 160GB (8)
               IBM Deskstar 75GXP 75 GB model (1, for OS)
Video adapter: GeForce2 GTS-based card with 64 MB of RAM
Net Adapters: Intel PWLA8490T Gigabit Ethernet adapter
               Intel Pro/100+ Management Adapter
CD-ROM:        Pioneer DVD-DR116
K/B, mouse: PS/2 based, to avoid USB devices
Comments: Not recommended at this time- it is difficult to
gauge whether or not the CPUs could handle both serving and
local program executing, and what the impact would be on the data
file serving. The system will probably have to run Linux.

Server model (Kolam runs on client and server):
Motherboard: Tyan Tiger K7 S2460
CPUs:         AMD AthlonXP 1800+ (2)
RAM:          Crucial/Micron 512 MB PC2100 DDR RAM (4)
RAID:         3Ware Escalade 7800
Chassis:      CalPC Super Server Chassis
HDDs:         Maxtor DiamondMax D540X 160GB (8)
              IBM Deskstar 75GXP 75 GB model (1, for OS)
Video adapter: NVIDIA RIVA TNT2 M64
Net Adapters: Intel PWLA8490T Gigabit Ethernet adapter
              Intel Pro/100+ Management Adapter
CD-ROM:       Pioneer DVD-DR116
K/B, mouse: PS/2 based, to avoid USB devices

External RAID enclosure:

         Promise ultratrak100 tx8 1




1 Promise ultratrak100 tx8 ($2000)
Maxtor DiamondMax D540X 160GB (8)


The system will be using a port of Dr. Palaniappan’s Kolam
software, and must be accessed with UNIX, Windows, and Mac
type clients. For the time being (pretest), Windows 2000 Server
seems the most suitable candidate for the server OS. Win2k server
file system is accessible from Linux using Samba.
References:   http://www.acnc.com/04_00.html - RAID info
              http://www.pricewatch.com - price feasibility
              http://www.microsoft.com - info on Win2k server
              http://www.tomshardware.com – info on RAID, CPUs
              http://www.sharkyextreme.com – info on chassis
              http://www.anandtech.com – info on SMP motherboards
              http://www.finitesystems.com/PRODUCT/raid/raidlevel.htm -
                       RAID pictures
              http://staff.sdsc.edu/its/terafile/ - Hard disks, RAID controller

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:6/16/2012
language:
pages:13