lecture21

Shared by: gaohaijuan
Categories
Tags
-
Stats
views:
0
posted:
4/7/2012
language:
pages:
32
Document Sample
scope of work template
							    CS 501: Software Engineering



                Lecture 21

             Reliability II




1                             CS 501 Spring 2003
                 Administration


    Lecture 23
       Lecture 23 on Wednesday, April 16 (evening),
       not Tuesday, April 15.




2                                          CS 501 Spring 2003
                Software Reliability

    Failure: Software does not deliver the service expected by
    the user (e.g., mistake in requirements)
    Fault (BUG): Programming or design error whereby the
    delivered system does not conform to specification
    Reliability: Probability of a failure occurring in operational
    use.
    Perceived reliability: Depends upon:
           user behavior
           set of inputs
           pain of failure
3                                                 CS 501 Spring 2003
                      Faults and Failures

    (a) A mathematical function loops for ever from rounding error.
    (b) A distributed system hangs because of a concurrency problem.
    (c) After a network is hit by lightning, it crashes on restart.
    (d) A program dies because the programmer typed: x = 1 instead
        of x == 1.
    (e) The President of an organization is paid $5 a month instead of
        $10,005 because the maximum salary allowed by the program
        is $10,000.
    (f) An operating system fails because of a page-boundary error in
        the firmware.

4                                                        CS 501 Spring 2003
         User Perception of Reliability



    1. A personal computer that crashes frequently v. a machine
       that is out of service for two days.
    2. A database system that crashes frequently but comes back
       quickly with no loss of data v. a system that fails once in
       three years but data has to be restored from backup.
    3. A system that does not fail but has unpredictable periods
       when it runs very slowly.



5                                                  CS 501 Spring 2003
               Reliability Metrics


    Traditional Measures
    • Mean time between failures
    • Availability (up time)
    • Mean time to repair
    User Perception is Influenced by
    • Distribution of failures
    Hypothetical example: Cars are safer than
    airplane in accidents (failures) per hour, but less
    safe in failures per mile.


6                                                CS 501 Spring 2003
    Reliability Metrics for Distributed Systems


    Traditional metrics are hard to apply in multi-component
    systems:
    • In a big network, at any given moment something will be giving
      trouble, but very few users will see it.
    • A system that has excellent average reliability may give
      terrible service to certain users.
    • There are so many components that system administrators
      rely on automatic reporting systems to identify problem areas.



7                                                   CS 501 Spring 2003
       Requirements Specification of System
                   Reliability

       Example: ATM card reader

    Failure class   Example                  Metric
    Permanent      System fails to operate   1 per 1,000 days
    non-corrupting with any card -- reboot
    Transient      System can not read       1 in 1,000 transactions
    non-corrupting an undamaged card
    Corrupting      A pattern of             Never
                    transactions corrupts
                    database

8                                                 CS 501 Spring 2003
        Cost of Improved Reliability


    $




                                            Up time
                      99%            100%

    Will you spend your money on new functionality
    or improved reliability?
9                                         CS 501 Spring 2003
     Example: Dartmouth Time Sharing (1978)

        A central computer serves the entire campus. Any
        failure is serious.
        Step 1
        Gather data on every failure
        • 10 years of data in a simple data base
        • Every failure analyzed:
               hardware
               software (default)
               environment (e.g., power, air conditioning)
               human (e.g., operator error)
10                                                 CS 501 Spring 2003
     Example: Dartmouth Time Sharing (1978)


        Step 2
        Analyze the data
        • Weekly, monthly, and annual statistics
               Number of failures and interruptions
               Mean time to repair
        • Graphs of trends by component, e.g.,
               Failure rates of disk drives
               Hardware failures after power failures
               Crashes caused by software bugs in each module

11                                                    CS 501 Spring 2003
     Example: Dartmouth Time Sharing (1978)


         Step 3
         Invest resources where benefit will be maximum, e.g.,
         • Orderly shut down after power failure
         • Priority order for software improvements
         • Changed procedures for operators
         • Replacement hardware




12                                                 CS 501 Spring 2003
                       Terminology


     Fault avoidance
     Build systems with the objective of creating fault-
     free systems
     Fault tolerance
     Build systems that continue to operate when faults
     occur
     Fault detection (testing and validation)
     Detect faults before the system is put into operation.

13                                                 CS 501 Spring 2003
         Fault Avoidance: Cleanroom Software
                     Development

     Software development process that aims to develop zero-defect
     software.
     •    Formal specification
     •    Incremental development with customer input
     •    Constrained programming options
     •    Static verification
     •    Statistical testing
     It is always better to prevent defects than to remove them later.
     Example: The four color problem.


14                                                    CS 501 Spring 2003
                      Fault Tolerance


     General Approach:
     • Failure detection
     • Damage assessment
     • Fault recovery
     • Fault repair
     N-version programming -- Execute independent
     implementation in parallel, compare results, accept the
     most probable.


15                                              CS 501 Spring 2003
                   Fault Tolerance


     Basic Techniques:
     • After error continue with next transaction
     • Timers and timeout in networked systems
     • Error correcting codes in data
     • Bad block tables on disk drives
     • Forward and backward pointers
     Report all errors for quality control


16                                             CS 501 Spring 2003
                       Fault Tolerance



     Backward Recovery:
     • Record system state at specific events (checkpoints). After
       failure, recreate state at last checkpoint.
     • Combine checkpoints with system log that allows
       transactions from last checkpoint to be repeated
       automatically.




17                                                  CS 501 Spring 2003
     Software Engineering for Real Time


     The special characteristics of real time computing require
     extra attention to good software engineering principles:
     • Requirements analysis and specification
     • Development of tools
     • Modular design
     • Exhaustive testing
     Heroic programming will fail!


18                                               CS 501 Spring 2003
       Software Engineering for Real Time


     Testing and debugging need special tools and environments
     • Debuggers, etc., can not be used to test real time
       performance
     • Simulation of environment may be needed to test interfaces
       -- e.g., adjustable clock speed
     • General purpose tools may not be available




19                                                  CS 501 Spring 2003
               Defensive Programming


     Murphy's Law:
        If anything can go wrong, it will.
     Defensive Programming:
     • Redundant code is incorporated to check system state after
       modifications
     • Implicit assumptions are tested explicitly




20                                                  CS 501 Spring 2003
     Defensive Programming Examples


     • Use boolean variable not integer
     • Test i <= n not i = = n
     • Assertion checking
     • Build debugging code into program with a switch to
       display values at interfaces
     • Error checking codes in data, e.g., checksum or hash




21                                              CS 501 Spring 2003
                Error Avoidance

     Risky programming constructs
     •   Pointers
     •   Dynamic memory allocation
     •   Floating-point numbers
     •   Parallelism
     •   Recursion
     •   Interrupts
     All are valuable in certain circumstances, but
     should be used with discretion


22                                             CS 501 Spring 2003
                     Maintenance


     Most production programs are maintained by people
     other than the programmers who originally wrote them.
     (a) What factors make a program easy for somebody
     else to maintain?
     (b) What factors make a program hard for somebody
     else to maintain?




23                                             CS 501 Spring 2003
       Static and Dynamic Verification



     Static verification: Techniques of verification that
     do not include execution of the software.
     • May be manual or use computer tools.
     Dynamic verification:
     • Testing the software with trial data.
     • Debugging to remove errors.



24                                                CS 501 Spring 2003
          Static Validation & Verification

     Carried out throughout the software development process.

                           Validation &
                           verification


       Requirements
       specification          Design            Program



                            REVIEWS

25                                                CS 501 Spring 2003
                   Static Analysis Tools

     Program analyzers scan the source of a program for possible
     faults and anomalies (e.g., Lint for C programs).
     • Control flow: loops with multiple exit or entry points
     • Data use: Undeclared or uninitialized variables, unused
       variables, multiple assignments, array bounds
     • Interface faults: Parameter mismatches, non-use of
       functions results, uncalled procedures
     • Storage management: Unassigned pointers, pointer
       arithmetic


26                                                   CS 501 Spring 2003
         Static Analysis Tools (continued)


     Modern compilers contain many static analysis tools
     • Cross-reference table: Shows every use of a variable,
       procedure, object, etc.
     • Information flow analysis: Identifies input variables on which
       an output depends.
     • Path analysis: Identifies all possible paths through the
       program.




27                                                   CS 501 Spring 2003
     Static Verification: Program Inspections


     Formal program reviews whose objective is to detect faults
     • Code may be read or reviewed line by line.
     • 150 to 250 lines of code in 2 hour meeting.
     • Use checklist of common errors.
     • Requires team commitment, e.g., trained leaders
     So effective that it is claimed that it can replace unit testing



28                                                       CS 501 Spring 2003
     Inspection Checklist: Common Errors

      Data faults: Initialization, constants, array bounds, character
        strings
      Control faults: Conditions, loop termination, compound
        statements, case statements
      Input/output faults: All inputs used; all outputs assigned a
        value
      Interface faults: Parameter numbers, types, and order;
         structures and shared memory
      Storage management faults: Modification of links,
        allocation and de-allocation of memory
      Exceptions: Possible errors, error handlers
29                                                    CS 501 Spring 2003
                     Fixing Bugs

     Isolate the bug
             Intermittent --> repeatable
             Complex example --> simple example
     Understand the bug
           Root cause
           Dependencies
           Structural interactions
     Fix the bug
            Design changes
            Documentation changes
            Code changes

30                                           CS 501 Spring 2003
            Moving the Bugs Around

     Fixing bugs is an error-prone process!
     • When you fix a bug, fix its environment
     • Bug fixes need static and dynamic testing
     • Repeat all tests that have the slightest relevance
       (regression testing)
     Bugs have a habit of returning!
     • When a bug is fixed, add the failure case to the test suite
       for the future.


31                                               CS 501 Spring 2003
                   Some Notable Bugs


     • Built-in function in Fortran compiler (e0 = 0)
     • Japanese microcode for Honeywell DPS virtual memory
     • The microfilm plotter with the missing byte (1:1023)
     • The Sun 3 page fault that IBM paid to fix
     • Left handed rotation in the graphics package
     Good people work around problems.
     The best people track them down and fix them!


32                                                 CS 501 Spring 2003

						
Related docs
Other docs by gaohaijuan
SIEU Advanced IT Assessment Task 2
Views: 8  |  Downloads: 0
A.12.1_Template_Simple_LTA_Overview_v2
Views: 109  |  Downloads: 1
cs486-lecture7
Views: 0  |  Downloads: 0
Corporative
Views: 49  |  Downloads: 0
2006-09-28 Nominations form
Views: 0  |  Downloads: 0
vonwachter_bender_schmieder_revision_web
Views: 34  |  Downloads: 0
crs30
Views: 0  |  Downloads: 0
04_P072_38837
Views: 0  |  Downloads: 0