Docstoc

Storage Device Testing . . . in a Brave New World

Document Sample
Storage Device Testing . . . in a Brave New World Powered By Docstoc
					  Storage Device Testing
. . . in a Brave New World.
         Anthony Lavia
       President and CEO
         www.Flexstar.com




         IDEMA Asia
        October, 2010
            System Level Testing
          “Test via the interface…”




Tester                                     Storage Device
                                               - “Black Box” -


             Data Interface
         - SAS/SATA/PCIe/SCSI/FCAL   -           SSS
                                         HDD
                                                   ODD
                   10100011010
                                          SSD
           10100011010
                                         ???      Tape
        . . .controller; media (disk or
                  NAND Flash




                 HDD                SSD

Media       Rotational Media     NAND Flash



IOPS           100 - 300       40,000 - 150,000



Latency        5.5 msec          0.015 msec



MTBF         300K - 1M hrs      1.5M - 3M hrs




Shock        200 - 400 G's        1,500 G's
Storage Test Scripts
Test Type                                    Test Name          HDD SSD DVR DVD CD
    •   Buffer Test (Cache)                  BufferE8-E4.P25     a   a   a   a   a
    •   Command Overhead (Knee)              Mult-Wrt-Rd.p2e     a   a   a
    •   Command Queuing                      SSD2.p2e            a   a   a
    •   Command Time Measurements            CMD-Times.p2e       a   a   a   a   a
    •   Complete Identify Device             IdDec.class         a   a   a   a   a
    •   Contact Start Stop                   10K-CSS.p2e         a       a
    •   Continuous seeks                     Placeholder.?       a       a   a   a
    •   DOD Wipe                             DOD-Wipe.p2e        a   a   a
    •   Generic Smart Commands               Smart.p2e           a   a   a   s   s
    •   Lube Test                            Lub-Tst.p2e         a       a
    •   Multi stream performance (DVR sim)   DVRsim              a   s   a
    •   Pattern Test ATE Read through        TestPat.p2e         a   a   a
    •   Power Management Profiling           PwrMgtPrf.p2e       a   a   a   a   a
    •   Power up / down                      Placeholder-?       a   a   a   a   a
    •   Radial Perf                          RadialPerf.p2e      a   a   a
    •   RW entire drive (multipass option)   Rw-multpass.p2e     a   a   a
    •   Seek Performance                     Simple.p2e          a       a
    •   Streaming                            DVRsim              s   s   a
    •   Surface Scan                         SurfScan.p2e        a   a   a
    •   Time to Ready Sleep/wake             SleepProfile.p2e    a   a   a   a   a
    •   Voltage Margining Test               48HrUpDn.p2e        a   a   a
    •   Voltage Margining to failure         Placeholder-?       a   a   a   a   a
    •   Wear Leveling (SSD)                  Wearlevel.p2e           a
    •   Write Shutoff                        Splicer.p2e         a   a   a   a   a
The Testing Sphere
                                           EVT/DVT
                  Specification
                                                 RDT

                                                        Component
         Qualification
                                                        Verification


                                                           Production
      Screening            OEM           Manufacturing
                                                              Reliability



      Certification                                          Quality
                                                            Assurance
                                  Servicing


                                                         RMA
          Failure Analysis
                                              Cloning
                      Re-certification

                                    Data Wipe
Typical HDD Testing

• Qualification: 1000 drives for
    1000 hours
•   Contact Start/Stop: 50,000
    cycles
•   Zone performance uniformity
•   Medium corrosion resistance
•   Radial performance
•   Fly height dynamics
                HDD Tests Target
                  Failure Modes

Failure Mode (Root Cause)                 Test Process
–   Design/Components/DRAM    Pattern reads/writes/verify
–   Component Failures        Power cycling
–   Head/Media instability    Extended test at temperature
–   Design margin             Voltage margining
–   Contamination             Temperature Cycling
–   Handling, head laps       Voltage margining & 4 corner
–   Lube issues               Multiple full media R/W.
–   Motor bearings            Head/Surface defects
–   Growing defect lists      W/R adjacent tracks; Dwell at
                               elevated temperature
              SSD Failure Modes


      Root Cause                           Test Process
–   Wear leveling performance      Multiple targeted writes
–   Bit Disturbance                Disturb testing / pattern writes
–   Endurance                      Temperature Cycling
–   Data Retention                 Temperature & power cycling
–   Write splice                   Power cycling mid writes
–   Component Failures             Voltage margining & 4 corner
–   Metadata corruption            Random I/O w/ power cycling
–   Write performance              Cold write
–   Erase failures                 Margined erase
–   Reallocation errors            RW tests w/ power cycling


    Utilize similar test equipment as HDD
     Typical SSD Test Suites
     Test Element                       Target Problem                    Test Suite Element Set

 1   Multiple writes                                     Endurance==>     1,3,4,5,6,7,8,9,11,13,14
                                                     Performance-mfg.
 2   Performance verification                             variability=>   2,10,12

 3   Disturb testing / pattern writes   Bit failures / Data Retention=>   3,4,5,

 4   Power cycling                             Component Failures =>      3,4,5,6,78,9,

 5   Extended test at temperature                        Write splice=>   9,14

 6   Voltage margining                         Metadata corruption=>      6,14

 7   Four Corners                                Write performance=>      2,12

 8   Voltage margining & 4 corner                     Erase failures=>    11

 9   Power cycling mid writes                        Design Margin=>      2,3,4,6,7,8,9

10   Random I/O w/ power cycling         Wear leveling performance=>      1,3

11   Margined erase                                 Data Retention=>      5,13

12   Fragmentation tests                         Reallocation errors=>    1,3,10

13   RW tests w/ power cycling

14   Write splice, cold writes
  Evolution of Test Systems
                  32E              60ET            64E               128E                  110E

Environmental
Test Chamber
Series


                        128B              416B - SSD            160A                      320A

Burn-in &
Ambient Test
Series



                                            Desktop      Desktop HS 16A     Desktop 16A
                               Quattro      4U/5U/7U                                       Desktop 16B
Bench-top       Solo

Tester
Series
  Building Blocks of a Storage Drive Tester

                                             Multi-core,
                                             Threaded,
                        Programmed             Script
                        Power Module           Control         Bench-top
                                              Computer




                                                                      Burn-In



                            Anti-Vibration
Scripting/Control S/W
                              Fixtures



                                                           Environmental
System Network Overview




               Internet
Software Capabilities
   GUI interface
     Many levels of drill down
   Many graphing capabilities
   Oven Synchronization
   Extensive automatic reports
   Remote capabilities
   Simple Scripting language
     Expandable via robust Java plug-ins
     Test Locking Ability
   Flexibility and Expandable
    Power Commands
•   SET   Ch0 Volts=5
•   SET   Ch1 Volts=A0
•   SET   Ch=1     mVOLTS=3300
•   SET   Ch=2     mVOLTS=+5%

•   DRIVE POWER ON
•   DRIVE POWER OFF
•   SET SLEW Ch=1      RATE(x1mS)=25
•   PROFILE CURRENT event=C24W rate (us)= 1000
•   POWER CONTROL macro=pdnldwave
    args=sin.wcf ch0 tscale=50 vscale=25
    vbias=5000 (User-defined slew curve)
•   POWER CONTROL macro=pexecwave args=ch0
    (e.g. emulate AC noise)
       “Spin Up” Power Profile

HDD @ 23.75W Peak     SSD @ 2.75W Peak
Performance Tests
• TEST SPIN-UP [time to “ready”]

• TEST INDEX [rotational speed]

• TEST SEEK TIMING
• TEST OSC AVG SEEK
• TEST AVERAGE SEEK #seeks=n

• START PERF {Command(s)} END PERF
Data Perform.
• Vary by Read/Write

• Vary LBA size
    Data Rate vs. Current Draw (HDD)

C
u
r
r                                Read
e                                Rate
n
                                 Read i
t
                                 Write rate
D
r                                Write i
a
w



                              Data Rate
    Oven/Script Synchronization
0044*
0045:HCLOOP
0046* WRITE HOT
0047 WAIT FOR OVEN STEP=11
                                         Four Corner Testing
0048* NOW HOT; DO WRITE HOT
0049 { Timed Loop sec=300
                                         1)   Write hot
0050 WRITE RANDOM CYLIN pass=100
0051 } End Timed Loop
                                         2)   Read cold
0052* DONE, SKIP TO END OF SOAK
0053 RUN OVEN PROGRAM prog=INC line=11   3)   Write cold
0054*
0055* READ COLD                          4)   Read hot
0056* WAIT FOR COLD CONDITION
0057 WAIT FOR OVEN STEP=14
0058* NOW COLD, READ
0059 { Timed Loop sec=300
0060 READ RANDOM CYLIN pass=100
0061 } End Timed Loop
0062* DONE, SKIP TO END OF SOAK
0063 RUN OVEN PROGRAM prog=INC line=14
0064*
0065* WRITE COLD
0066 WAIT FOR OVEN STEP=16
0067* TEST AT TEMPERATURE
0068 { Timed Loop sec=300
0069 WRITE RANDOM CYLIN pass=100
0070 } End Timed Loop
0071* DONE, SKIP TO END OF SOAK
0072 RUN OVEN PROGRAM prog=INC line=16
0073*
0074* READ HOT
0075* WAIT FOR HOT CONDITION
0076 WAIT FOR OVEN STEP=19
0077* TEST AT TEMPERATURE
0078 { Timed Loop sec=300
0079 READ RANDOM CYLIN pass=100
0080 } End Timed Loop
0081* DONE, SKIP TO END OF SOAK
0082 RUN OVEN PROGRAM prog=INC line=19
0083*
0084 RETURN FROM SUBROUTINE




Stress servo…vc, bearings
Radial Performance Test




                 1)Media?
                 2)Flying height?
                 3)Servo?
Test for “Write Splice”



• Write Shutoff (Splicer)
  – Complete write operation during power failure.
     • Simulate: turn off the power in a middle of a write operation
         – Sector written completely?
         – Check: Restart DUT Read sector  Data compare
         Production Test Example
Stress SSD failure modes => cycle heat & cold
Validate h/w components: Asics, Oscillators, …
Performance error rates: BER/UBER
Integrity of interface communication
SSD Test Advantages
• 30 years of storage testing evolution
• Co-opt tests and test process and capital
  equipment
   – Increment and optimize
• SSD are more amenable to standardized testing
   – No mechanical parts implies:
      1.   Weed out infant mortality  predictable/non-catastrophic failure
      2.   Test/Screen only once in the supply chain
      3.   Test only once in RMA process (e.g. Regional centers)
      4.   More predictable AFR  warranty accruals
      5.   AFR can be correlated to manufacturing/line-integration failures
      6.   Feasible to use test service without purchasing capital equipment
Conclusions
• At device-system level, SSD testing 
    evolution of HDD testing
•   Target SSD components and failure modes
•   Standardization of test processes 
    supply chain cost efficiencies
•   Reliability is the key perceived issue
    – Testing is the remedy to crossing this chasm
Thank You

				
DOCUMENT INFO