Software- Reliability

Document Sample
Software- Reliability Powered By Docstoc
					Software Reliability Engineering:
     Techniques and Tools


             CS130
           Winter, 2002


                                    1
                 Source Material
   “Software Reliability and Risk Management:
    Techniques and Tools”, Allen Nikora and Michael
    Lyu, tutorial presented at the 1999 International
    Symposium on Software Reliability Engineering
   Allen Nikora, John Munson, “Determining Fault
    Insertion Rates For Evolving Software Systems”,
    proceedings of the International Symposium on
    Software Reliability Engineering, Paderborn,
    Germany, November, 1998
                                                        2
                      Agenda
Part I:   Introduction
Part II:  Survey of Software Reliability Models
Part III: Quantitative Criteria for Model Selection
Part IV:  Input Data Requirements and Data Collection
          Mechanisms
Part V: Early Prediction of Software Reliability
Part VI: Current Work in Estimating Fault Content
Part VII: Software Reliability Tools



                                                        3
    Part I: Introduction
   Reliability Measurement Goal
   Definitions
   Reliability Theory




                                   4
       Reliability Measurement Goal
   Reliability measurement is a set of mathematical
    techniques that can be used to estimate and
    predict the reliability behavior of software during
    its development and operation.
   The primary goal of software reliability modeling
    is to answer the following question:
    “Given a system, what is the probability that it will
    fail in a given time interval, or, what is the
    expected duration between successive failures?”
                                                            5
              Basic Definitions
   Software Reliability R(t): The probability of
    failure-free operation of a computer program
    for a specified time under a specified
    environment.

   Failure: The departure of program operation
    from user requirements.

   Fault: A defect in a program that causes
    failure.
                                                    6
          Basic Definitions (cont’d)
   Failure Intensity (rate) f(t): The expected number
    of failures experienced in a given time interval.

   Mean-Time-To-Failure (MTTF): Expected value of
    a failure interval.

   Expected total failures m(t): The number of
    failures expected in a time period t.

                                                         7
             Reliability Theory
Let "T" be a random variable representing the
failure time or lifetime of a physical system.
For this system, the probability that it will fail by
time "t" is:


The probability of the system surviving until time
"t" is:


                                                        8
     Reliability Theory (cont’d)
Failure rate - the probability that a failure will
occur in the interval [t1, t2] given that a failure
has not occurred before time t1. This is written
as:




                                                      9
        Reliability Theory (cont’d)
Hazard rate - limit of the failure rate as the length of the
interval approaches zero. This is written as:




This is the instantaneous failure rate at time t, given that
the system survived until time t. The terms hazard rate
and failure rate are often used interchangeably.


                                                               10
      Reliability Theory (cont’d)
A reliability objective expressed in terms of one
reliability measure can be easily converted into
another measure as follows (assuming an
“average” failure rate,  , is measured):




                                                    11
Reliability Theory (cont'd)




                              12
Part II: Survey of Software Reliability Models

 •   Software Reliability Estimation Models:
     •   Exponential NHPP Models
          • Jelinski-Moranda/Shooman Model
          • Musa-Okumoto Model
     •   Geometric Model
 • Software Reliability Modeling and
   Acceptance Testing



                                               13
Jelinski-Moranda/Shooman Models
   Jelinski-Moranda model was developed by Jelinski and
    Moranda of McDonnell Douglas Astronautics Company
    for use on Navy NTDS software and a number of
    modules of the Apollo program. The Jelinski-Moranda
    model was published in 1971.

   Shooman's model, discovered independently of Jelinski
    and Moranda's work, was also published in 1971.
    Shooman's model is identical to the JM model.


                                                       14
Jelinski-Moranda/Shooman (cont'd)
Assumptions:
   1. The number of errors in the code is fixed.

   2. No new errors are introduced into the code
      through the correction process.
   3. The number of machine instructions is essentially
      constant.
   4. Detections of errors are independent.

   5. The software is operated in a similar manner as
      the anticipated operational usage.
   6. The error detection rate is proportional to the
      number of errors remaining in the code.
                                                          15
  Jelinski-Moranda/Shooman (cont'd)
  Let  represent the amount of debugging time spent on
  the system since the start of the test phase.
  From assumption 6, we have:
                       z() = Kr( )
  where K is the proportionality constant, and r is the error
  rate (number of remaining errors normalized with respect
  to the number of instructions).
                  r() = ET / IT - c()
ET = number of errors initially in the program
IT = number of machine instructions in the program
c = cumulative number of errors fixed in the interval
  [0,normalized by the number of instructions).
                                                            16
Jelinski-Moranda/Shooman (cont'd)
ET and IT are constant (assumptions 1 and 3).
No new errors are introduced into the correction process
  (assumption 2).

As , c()  ET/IT, so r()  0.

The hazard rate becomes:
          K(ET/IT)
       z()




                                    ET/IT
                                            C()
                                                           17
Jelinski-Moranda/Shooman (cont'd)

The reliability function becomes:



The expression for MTTF is:




                                    18
                  Geometric Model
   Proposed by Moranda in 1975 as a variation of the
    Jelinski-Moranda model.

   Unlike models previously discussed, it does not assume
    that the number of errors in the program is finite, nor does
    it assume that errors are equally likely to occur.

   This model assumes that errors become increasingly
    difficult to detect as debugging progresses, and that the
    program is never completely error free.

                                                                19
        Geometric Model (cont'd)
Assumptions:

   1.   There are an infinite number of total errors.
   2.   All errors do not have the same chance of
        detection.
   3.   The detections of errors are independent.
   4.   The software is operated in a similar manner as
        the anticipated operational usage.
   5.   The error detection rate forms a geometric
        progression and is constant between error
        occurrences.
                                                          20
   Geometric Model (cont'd)

The above assumptions result in the
following hazard rate:
                   z(t) = Di-1
for any time "t" between the (i - 1)st and
the i'th error.

The initial value of z(t) = D
                                             21
            Geometric Model (cont'd)
Hazard                       Hazard Rate Graph
 rate
   D
   D
  D
        2
                } D(1 - )
                                    }   D(1 - 2)




                             time




                                                     22
                Musa-Okumoto Model
   The Musa-Okumoto model assumes that the failure intensity function
    decreases exponentially with the number of failures observed:



   Since (t) = dm(t)/dt, we have the following differential equation:

                                 or




                                                                          23
      Musa-Okumoto Model (cont’d)
Note that




We then obtain




                                    24
       Musa-Okumoto Model (cont’d)
Integrating this last equation yields:




Since m(0) = 0, C = 1, and the mean value function m(t) is:




                                                              25
Software Reliability Modeling and
      Acceptance Testing
Given a piece of software advertised as having a failure
rate , you can see if it meets that failure rate to a
specific level of confidence.
   is the risk (probability) of falsely saying that the
   software does not meet the failure rate goal.
   is the risk of saying that the goal is met when it is
   not.
  The discrimination ratio, , is the factor you specify
   that identifies acceptable departure from the goal.
   For instance, if  = 2, the acceptable failure rate lies
   between /2 and 2.

                                                              26
Software Reliability Modeling and
  Acceptance Testing (cont’d)

                       Reject
      Failure Number




                                Continue


                                                Accept




                   Normalized Failure Time
      (Time to failure times failure intensity objective)
                                                            27
 Software Reliability Modeling and
   Acceptance Testing (cont’d)
We can now draw a chart as shown in the previous slide. Define
intermediate quantities A and B as follows:



The boundary between the “reject” and “continue” regions is given
by:



where n is number of failures observed. The boundary between the
“continue” and “accept” regions of the chart is given by:




                                                                    28
Part III: Criteria for Model Selection


    Background
    Non-Quantitative criteria
    Quantitative criteria




                                     29
Criteria for Model Selection - Background
   When software reliability models first appeared, it was felt
    that a process of refinement would produce “definitive”
    models that would apply to all development and test
    situations:
   Current situation
      Dozens of models have been published in the literature

      Studies over the past 10 years indicate that the
        accuracy of the models is variable
      Analysis of the particular context in which reliability
        measurement is to take place so as to decide a priori
        which model to use does not seem possible.
                                                              30
Criteria for Model Selection (cont’d)

  Non-Quantitative Criteria
   Model Validity

   Ease of measuring parameters

   Quality of assumptions

   Applicability

   Simplicity

   Insensitivity to noise



                                        31
Criteria for Model Selection (cont’d)
Quantitative Criteria for Post-Model Application
   Self-consistency

   Goodness-of-Fit

   Relative Accuracy (Prequential Likelihood
    Ratio)
   Bias (U-Plot)

   Bias Trend (Y-Plot)



                                                   32
  Criteria for Model Selection (cont’d)
Self-constency - Analysis of a model’s predictive quality can help
user decide which model(s) to use.
     Simplest question a SRM user can ask is “How reliable is the
       software at this moment?
     The time to the next failure, Ti, is usually predicted using
       observed times ot failure
     In general, predictions of Ti can be made using observed
       times to failure
The results of predictions made for different values of K can then be
compared. If a model produced “self consistent” results for differing
values of K, this indicates that its use is appropriate for the data on
which the particular predictions were made.

HOWEVER, THIS PROVIDES NO GUARANTEE THAT THE
PREDICTIONS ARE CLOSE TO THE TRUTH.
                                                                          33
Criteria for Model Selection (cont’d)
Goodness-of-fit - Kolmogorov-Smirnov Test
 Uses the absolute vertical distance between

  two CDFs to measure goodness of fit.
 Depends on the fact that:




  where F0 is a known, continuous CDF, and Fn
  is the sample CDF, is distribution free.


                                                34
Criteria for Model Selection (cont’d)
Goodness-of-fit (cont’d) - Chi-Square Test
 More suited to determining GOF of failure counts data than to
  interfailure times.
 Value given by:




  where:
   n = number of independent repetitions of an experiment in
    which the outcomes are decomposed into k+1 mutually
    exclusive sets A1, A2,..., Ak+1
   Nj = number of outcomes in the j’th set

   pj = P[Aj]



                                                                  35
    Criteria for Model Selection (cont’d)
Prequential Likelihood Ratio
  The pdf for Fi(t) for Ti is based on observations              . The pdf

   For one-step ahead predictions of                     , the prequential
    likelihood is:




   Two prediction systems, A and B, can be evaluated by computing the
    prequential likelihood ratio:




   If PLRn approaches infinity as n approaches infinity, B is discarded in
    favor of A
                                                                              36
Prequential Likelihood Example

           fi    fi+2    fi+1                    true pdf




                   High bias, low noise

                        true
      fi        fi+1     pdf    fi+3      fi+2




                Low bias, high noise



                                                            37
Criteria for Model Selection (cont’d)
Prequential Likelihood Ratio (cont'd)
When predictions have been made for           , the PLR
is given by:




Using Bayes' Rule, the PLR is rewritten as:




                                                          38
Criteria for Model Selection (cont’d)
Prequential Likelihood Ratio (cont’d)
This equals:




If the initial conditions were based only on prior belief, the second
factor of the final equation is the prior odds ratio. If the user is
indifferent between models A and B, this ratio has a value of 1.



                                                                        39
 Criteria for Model Selection
            (cont’d)
Prequential Likelihood Ratio (cont’d):
The final equation is then written as:




This is the posterior odds ratio, where w A is the posterior belief that
A is true after making predictions with both A and B and comparing
them with actual behavior.



                                                                           40
      Criteria for Model Selection (cont’d)
The “u-plot” can be used to assess the predictive quality of a model
 Given a predictor,       , that estimates the probability that the time to
  the next failure is less than t. Consider the sequence

     where each is a probability integral transform of the observed t i
     using the previously calculated predictor based upon               .
   If each were identical to the true, but hidden, , then the would
    be realizations of independent random variables with a uniform
    distribution in [0,1].
   The problem then reduces to seeing how closely the sequence
    resembles a random sample from [0,1]


                                                                               41
U-Plots for JM and LV Models


   1.0           JM


                       LV
   0.5




         0
             0   0.5   1.0



                               42
    Criteria for Model Selection (cont’d)
The y-plot:
 Temporal ordering is not shown in a u-plot. The y-plot
  addresses this deficiency
 To generate a y-plot, the following steps are taken:
    Compute the sequence of

    For each   , compute
    Obtain   by computing:




      for i  m, m representing the number of observations made
   If the really do form a sequence of independent random
    variables in [0,1], the slope of the plotted will be constant.
                                                                     43
Y-Plots for JM and LV Models


   1.0
                 LV


                       JM
   0.5




         0
             0   0.5   1.0



                               44
Criteria for Model Selection (cont’d)
Quantitative Criteria Prior to Model Application
   Arithmetical Mean of Interfailure Times

   Laplace Test




                                                   45
Arithmetical Mean of Interfailure Times
    Calculate arithmetical mean of interfailure times as
     follows:


     i = number of observed failures
     j = j’th interfailure time

    Increasing series of (i) suggests reliability growth.
    Decreasing series of (i) suggests reliability decrease.

                                                                46
                       Laplace Test
   The occurrence of failures is assumed to follow a non-
    homogeneous Poisson process whose failure intensity is
    decreasing:



   Null hypothesis is that occurrences of failures follow a
    homogeneous Poisson process (I.e., b=0 above).
   For interfailure times, test statistic computed by:




                                                               47
               Laplace Test (cont’d)

   For interval data, test statistic computed by:




                                                     48
                 Laplace Test (cont’d)
   Interpretation
      Negative values of the Laplace factor indicate decreasing
       failure intensity.
      Positive values suggest an increasing failure intensity.

      Values varying between +2 and -2 indicate stable reliability.

      Significance is that associated with normal distribution; e.g.
          The null hypothesis “H0 : HPP” vs. “H1 : decreasing
           failure intensity” is rejected at the 5% significance level
           for m(T) < -1.645
          The null hypothesis “H0 : HPP” vs. “H1 : increasing
           failure intensity” is rejected at the 5% significance level
           for m(T) > -1.645
          The null hypothesis “H0 : HPP” vs. “H1 : there is a trend”
           is rejected at the 5% significance level for |m(T)| > 1.96
                                                                  49
Part IV: Input Data Requirements and
     Data Collection Mechanisms
   Model Inputs
     Time Between Successive Failures

     Failure Counts and Test Interval Lengths

   Setting up a Data Collection Mechanism
   Minimal Set of Required Data
   Data Collection Mechanism Examples


                                                 50
Input Data Requirements and Data Collection
               Mechanisms
Model Inputs - Time between Successive Failures
 Most of the models discussed in Section II require the
  times between successive failures as inputs.
 Preferred units of time are expressed in CPU time (e.g.,
  CPU seconds between subsequent failures).
    Allows computation of reliability independent of wall-
     clock time.
    Reliability computations in one environment can be
     easily transformed into reliability estimates in another,
     provided that the operational profiles in both
     environments are the same and that the instruction
     execution rates of the original environment and the
     new environment can be related.
                                                                 51
Input Data Requirements and Data Collection
            Mechanisms (cont’d)
Model Inputs - Time between Successive Failures
  (cont’d)
 Advantage - CPU time between successive

  failures tends to more accurately characterize the
  failure history of a software system than calendar
  time. Accurate CPU time between failures can
  give greater resolution than other types of data.
 Disadvantage - CPU time between successive

  failures can often be more difficult to collect than
  other types of failure history data.
                                                     52
Input Data Requirements and Data Collection
            Mechanisms (cont’d)
 Model Inputs ( cont’d) - Failure Counts and Test Interval Lengths
  Failure history can be collected in terms of test interval lengths
   and the number of failures observed in each interval. Several of
   the models described in Section II use this type of input.
  The failure reporting systems of many organizations will more
   easily support collection of this type of data rather than times
   between successive failures. In particular, the use of automated
   test systems can easily establish the length of each test interval.
   Analysis of the test run will then provide the number of failures
   for that interval.
  Disadvantage - failure counts data does not provide the
   resolution that accurately collected times between failures
   provide.

                                                                         53
         Input Data Requirements and Data
           Collection Mechanisms (cont’d)
Setting up a Data Collection Mechanism
1. Establish clear, consistent objectives.
2. Develop a plan for the data collection process. Involve all
   individuals concerned (e.g. software designers, testers,
   programmers, managers, SQA and SCM staff). Address the
   following issues:
    a. Frequency of data collection.
    b. Data collection responsibilities
    c. Data formats
    d. Processing and storage of data
    e. Assuring integrity of data/adherence to objectives
    f.   Use of existing mechanisms to collect data
                                                                 54
      Input Data Requirements and Data
        Collection Mechanisms (cont’d)
Setting up a Data Collection Mechanism (cont’d)
3. Identify and evaluate tools to support data collection effort.
4. Train all parties in use of selected tools.
5. Perform a trial run of the plan prior to finalizing it.
6. Monitor the data collection process on a regular basis (e.g.
   weekly intervals) to assure that objectives are being met,
   determine current reliability of software, and identify problems in
   collecting/analyzing the data.
7. Evaluate the data on a regular basis. Assess software reliability
   as testing proceeds, not only at scheduled release time.
8. Provide feedback to all parties during data collec-tion/analysis
   effort.
                                                                     55
     Input Data Requirements and Data
       Collection Mechanisms (cont’d)
Minimal Set of Required Data - to measure software
reliability during test, the following minimal set of data
should be collected by a development effort:
     Time between successive failures OR test interval
      lengths/number of failures per test interval.
     Functional area tested during each interval.

     Date on which functionality was added to software
      under test; identifier for functionality added.
     Number of testers vs. time.

     Dates on which testing environment changed, and
      nature of changes.
     Dates on which test method changed.
                                                             56
Part VI: Early Prediction of Software
              Reliability
   Background
   RADC Study
   Phase-Based Model




                                    57
             Part VI: Background
   Modeling techniques discussed in preceding sections
    can be applied only during test phases.
   These techniques do not take into account structural
    properties of the system being developed or
    characteristics of the development environment.
   Current techniques can measure software reliability,
    but model outputs cannot be easily used to choose
    development methods or structural characteristics
    that will increase reliability.
   Measuring software reliability prior to test is an open
    area. Work in this area includes:
      RADC study of 59 projects

      Phase-Based model

      Analysis of complexity
                                                              58
               Part VI: RADC Study
   Study of 59 software development efforts, sponsored by RADC
    in mid 1980s
   Purpose - develop a method for predicting software reliability in
    the life cycle phases prior to test. Acceptable model forms
    were:
      measures leading directly to reliability/failure rate predictions

      predictions that could be translated to failure rates (e.g., error
        density)
   Advantages of error density as a software reliability figure of
    merit, according to participating investigators:
      It appears to be a fairly invariant number.

      It can be obtained from commonly available data.

      It is not directly affected by variables in the environment

      Conversion among error density metrics is fairly
        straightforward.
                                                                            59
       Part VI: RADC Study (cont’d)
   Advantages of error density as a software reliability figure of
    merit (cont’d)
      Possible to include faults by inspection with those found
       during testing and operations, since the time-dependent
       elements of the latter do not need to be accounted for.
   Major disadvantages cited by the investigators are:
      This metric cannot be combined with hardware reliability
       metrics.
      Does not relate to observations in the user environment. It is
       far easier for users to observe the availability of their systems
       than their fault density, and users tend to be far more
       concerned about how frequently they can expect the system
       to go down.
      No assurance that all of the faults have been found.


                                                                       60
    Part VI: RADC Study (cont’d)
Given these advantages and disadvantages, the
investigators decided to attempt prediction of error
density during the early phases of a development effort,
and develop a transformation function that could be
used to interpret the predicted error density as a failure
rate. The driving factor seemed to be that data available
early in life cycle could be much more easily used to
predict error densities rather than failure rates.



                                                         61
     Part VI: RADC Study (cont’d)
Investigators postulated that the following measures representing
development environment and product characteristics could be
used as inputs to a model that would predict the error density,
measured in errors per line of code, at the start of the testing phase.

       A -- Application Type (e.g. real-time control system,
        scientific computation system, information management
        system)
       D -- Development Environment (characterized by
        development methodology and available tools). The types
        of development environments considered are the organic,
        semi-detached, and embedded modes, familiar from the
        COCOMO cost model.


                                                                      62
       Part VI: RADC Study (cont’d)
    Measures of development environment and product
    characteristics (cont’d):

   Requirements and Design Representation Metrics
      SA - Anomaly Management

      ST - Traceability

      SQ - Incorporation of Quality Review results into the software

   Software Implementation Metrics
      SL - Language Type (e.g. assembly, high-order language,
       fourth generation language)
      SS - Program Size

      SM - Modularity

      SU - Extent of Reuse

      SX - Complexity

      SR - Incorporation of Standards Review results into the
       software                                                         63
      Part VI: RADC Study (cont’d)
   Initial error density at the start of test given by:


   Initial failure rate:




      F = linear execution frequency of the program
K = fault exposure ratio (1.4*10-7 < K < 10.6*10-7, with an
                  average value of 4.2*10-7)
              W 0 = number of inherent faults
                                                           64
     Part VI: RADC Study (cont’d)
Moreover, F = R/I, where
     R is the average instruction rate

     I is the number of object instructions in the program

I can be further rewritten as IS * QX, where
     I S is the number of source instructions,

     QX is the code expansion ratio (the ratio of machine instruc-
       tions to source instructions, which has an average value of 4
       according to this study).
Therefore, the initial failure rate can be expressed as:




                                                                   65
       Part VI: Phase-Based Model
   Developed by John Gaffney, Jr. and Charles F. Davis of the
    Software Productivity Consortium
   Makes use of error statistics obtained during technical review of
    requirements, design and the implementation to predict software
    reliability during test and operations.
   Can also use failure data during testing to estimate reliability.
   Assumptions:
      The development effort's current staffing level is directly
        related to the number of errors discovered during a
        development phase.
      The error discovery curve is monomodal.

      Code size estimates are available during early phases of a
        development effort.
      Fagan inspections are used during all development phases.


                                                                        66
        Part VI: Phase-Based Model
The first two assumptions, plus Norden's observation that the
Rayleigh curve represents the "correct" way of applying to a
development effort, results in the following expression for the
number of errors discovered during a life cycle phase:




            E = Total Lifetime Error Rate, expressed in
       Errors per Thousand Source Lines of Code (KSLOC)
                  t = Error Discovery Phase index



                                                                  67
       Part VI: Phase-Based Model
Note that t does not represent ordinary calendar time. Rather, t
represents a phase in the development process. The values of t
and the corresponding life cycle phases are:

                   t = 1 - Requirements Analysis
                       t = 2 - Software Design
                        t = 3 - Implementation
                            t = 4 - Unit Test
                 t = 5 - Software Integration Test
                          t = 6 - System Test
                       t = 7 - Acceptance Test


                                                                   68
     Part VI: Phase-Based Model
p, the Defect Discovery Phase Constant is the location of the peak
in a continuous fit to the failure data. This is the point at which 39%
of the errors have been discovered:




                The cumulative form of the model is:




where Vt is the number of errors per KSLOC that have been dis-
covered through phase t

                                                                      69
Part VI: Phase-Based Model




                             70
      Part VI: Phase-Based Model
This model can also be used to estimate the number of latent errors
in the software. Recall that the number of errors per KSLOC
removed through the n'th phase is:




The number of errors remaining in the software at that point is:




times the number of source statements

                                                                      71
    Part VII: Current Work in Estimating
                Fault Content
   Analysis of Complexity
   Regression Tree Modeling




                                           72
           Analysis of Complexity
   The need for measurement
   The measurement process
   Measuring software change
   Faults and fault insertion
   Fault insertion rates




                                    73
      Analysis of Complexity (cont’d)
   Recent work has focused on relating measures of
    software structure to fault content.
   Problem - although different software metrics will
    say different things about a software system, they
    tend to be interrelated and can be highly
    correlated with one another (e.g., McCabe
    complexity and line count are highly correlated).



                                                     74
      Analysis of Complexity (cont’d)
   Relative complexity measure, developed by
    Munson and Khoshgoftaar, attempts to handle
    the problem of interdependence and
    multicollinearity among software metrics.
   Technique used is factor analysis, whose pur-
    pose is to decompose a set of correlated
    measures into a set of eigenvalues and
    eigenvectors.

                                                    75
      Analysis of Complexity (cont’d)
   The need for measurement
   The measurement process
   Measuring software change
   Faults and fault insertion
   Fault insertion rates




                                        76
Analysis of Complexity - Measuring Software
   Source Code
                                  LOC         14
                                  Stmts       12
                                  N1          30
                                  N2          23
                    CMA           eta1        15
                                  eta2        12
                                          •
                                          •
                    Metric                •

                   Analysis
    Module
                                   Module
                                Characteristics
                                                   77
 Analysis of Complexity - Simplifying
           Measurements
                                                   Principal
Modules    Metric                                                 50
                                                  Components
                      12 23 54 12 203 39 238 34


          Analysis
                       7 13 64 12 215 9 39 238     Analysis       40
  •
            CMA
                              •                     PCA/           •
  •                           •                                    •
  •                           •                     RCM            •
                      11 21 54 12 241 39 238 35                   60
                       5 33 44 12 205 39 138 44                   45
                      42 55 54 12 113 29 234 14
                                                                  55
Program              Raw Metrics                                Relative
                                                               Complexity
                                                                       78
    Analysis of Complexity - Relative
              Complexity
   Relative complexity is a synthesized metric

   Relative complexity is a fault surrogate
     Composed of metrics closely related to
      faults
     Highly correlated with faults




                                                  79
      Analysis of Complexity (cont’d)
   The need for measurement
   The measurement process
   Measuring software change
   Faults and fault insertion
   Fault insertion rates




                                        80
      Analysis of Complexity (cont’d)

Software Evolution
 We assume that we are developing (maintaining) a
  program
   We are really working with many programs over time
   They are different programs in a very real sense
   We must identify and measure each version of each
    program module



                                                         81
   Analysis of Complexity (cont’d)
Evolution of the STS Primary Avionics Software System (PASS)




                                                               82
  Analysis of
                          Build N+1
  Complexity    Build N
                  A          A
   (cont’d)       B          B

                  C          L
The                          M
Problem           D          D
                  E          E
                  F          F
                  G          G
                  H
                             H
                  I
                             I
                  J
                             J
                  K
                             K
                                      83
   Analysis of Complexity (cont’d)
Managing fault counts during evolution
 Some faults are inserted during branch builds
    These fault counts must be removed when the branch
     is pruned
 Some faults are eliminated on branch builds
    These faults must be removed from the main sequence
     build
 Fault count should contain only those faults on the main
  sequence to the current build
 Faults attributed to modules not in the current build must
  be removed from the current count


                                                               84
    Analysis of Complexity (cont’d)
Baselining a software system
 Software changes over software builds
 Measurements, such as relative complexity, change
  across builds
   Initial build as a baseline
   Relative complexity of each build
   Measure change in fault surrogate from initial
    baseline


                                                      85
Analysis of Complexity - Measurement
              Baseline


       Point A




                      Point B




                                       86
Analysis of Complexity - Baseline
          Components
   Vector of means
   Vector of standard deviations

   Transformation matrix




                                    87
   Analysis of Complexity - Comparing Two
  Build i
          Measurement
                      Builds
               Tools
                                         Baselined Build i
                                                             Code
                                                             Churn



Source Code   Baseline     RCM Values          RCM-
                                               Delta



                                                             Code
                                                             Deltas

Build j                  Baselined Build j
                                                                      88
Analysis of Complexity - Measuring
             Evolution
   Different modules in different builds
           set of modules not in latest build
           set of modules not in early build
           set of common modules
   Code delta
   Code churn
   Net code churn


                                                 89
      Analysis of Complexity (cont’d)
   The need for measurement
   The measurement process
   Measuring software change
   Faults and fault insertion
   Fault insertion rates




                                        90
Analysis of Complexity - Fault
          Insertion
               Build N                   Build N+1

Existing                      Existing
 Faults                        Faults




            Faults   Faults
           Removed   Added

                                                     91
       Analysis of Complexity -
    Identifying and Counting Faults
   Unlike failures, faults are not directly observable
   fault counts should be at same level of granularity as
    software structure metrics
   Failure counts could be used as a surrogate for fault
    counts if:
      Number of faults were related to number of failures

      Distribution of number of faults per failure had low
       variance
      The faults associated with a failure were confined
       to a single procedure/function

               Actual situation shown on next slide
                                                              92
Analysis of Complexity - Observed Distribution of Faults per
                          Failure




                                                               93
      Analysis of Complexity - Fault
    Identification and Counting Rules
   Taxonomy based on corrective actions taken in response to
    failure reports
   faults in variable usage
      Definition and use of new variables

      Redefinition of existing variables (e.g. changing type from
        float to double)
      Variable deletion

      Assignment of a different value to a variable

   faults involving constants
      Definition and use of new constants

      Constant definition deletion



                                                                     94
      Analysis of Complexity - Fault
Identification and Counting Rules (cont’d)
      Control flow faults
         Addition of new source code block

         Deletion of erroneous conditionally-executed path(s) within a
          set of conditionally executed statements
         Addition of execution paths within a set of conditionally
          executed statements
         Redefinition of existing condition for execution (e.g. change
          “if i < 9” to “if i <= 9”)
         Removal of source code block

         Incorrect order of execution

         Addition of a procedure or function

         Deletion of a procedure or function


                                                                          95
   Analysis of Complexity (cont’d)
  Control flow fault examples - removing execution
  paths from a code block




Counts as two faults, since two paths were removed

                                                     96
   Analysis of Complexity (cont’d)
Control flow examples (cont’d) - addition of conditional execution
paths to code block




Counts as three faults, since three paths were added

                                                                     97
Analysis of Complexity - Estimating
           Fault Content
    The fault potential of a module i is
     directly proportional to its relative
     complexity

    From previous development projects
     develop a proportionality constant, k,
     for total faults
    Faults per module:

                                              98
       Analysis of Complexity -
    Estimating Fault Insertion Rate
   Proportionality constant, k’, representing
    the rate of fault insertion
   For jth build, total faults insertion



   Estimate for the fault insertion rate




                                                 99
      Analysis of Complexity (cont’d)
   The need for measurement
   The measurement process
   Measuring software change
   Faults and fault insertion
   Fault insertion rates




                                        100
Analysis of Complexity - Relationships
 Between Change in Fault Count and
          Structural Change
Version ID     Number of          Pearson’s R          Spearman
                 Cases                                 Correlation
                                           D                  D
Version 3         10         .376          -.323   .451        -.121
Version 4         12         .661          .700    .793        .562
Version 5         13         .891          -.276   .871        -.233
Versions 3-5      35         .568          .125    .631        .087


                           = code churn
                           = code delta
                                                                       101
 Analysis of Complexity -
   Regression Models



 is the number of faults inserted between builds j and
j+1
 is the measured code churn between builds j and
j+1
 is the measured code delta between builds j and j+1
                                                  102
Analysis of Complexity - PRESS
 Scores - Linear vs. Nonlinear
            Models




                                 103
Analysis of Complexity - Selecting
   an Adequate Linear Model
   Linear model gives best R2 and PRESS score.
   Is the model based only on code churn an adequate
    predictor at the 5% significance level?




   R2-adequate test shows that code churn is not an
    adequate predictor at the 5% significance level    .
                                                           104
Analysis of Complexity - Analysis of Predicted Residuals




                                                   105
       Regression Tree Modeling
Objectives
 Attractive way to encapsulate the knowledge of

  experts and to aid decision making.
 Uncovers structure in data

 Can handle data with complicated an unexplained

  irregularities
 Can handle both numeric and categorical

  variables in a single model.

                                               106
  Regression Tree Modeling (cont’d)
Algorithm
 Determine set of predictor variables (software metrics)
  and a response variable (number of faults).
 Partition the predictor variable space such that each
  partition or subset is homogeneous with respect to the
  dependent variable.
 Establish a decision rule based on the predictor variables
  which will identify the programs with the same number of
  faults.
 Predict the value of the dependent variable which is the
  average of all the observations in the partition.
                                                           107
    Regression Tree Modeling (cont’d)
Algorithm (cont’d)
 Minimize the deviance function given by:


   Establish stopping criteria based on:
      Cardinality threshold - leaf node is smaller than certain
       absolute size.
      Homogeneity threshold - deviance of leaf node is less
       than some small percentage of the deviance of the root
       node; I.e., leaf node is homogeneous enough.

                                                              108
    Regression Tree Modeling
            (cont’d)
Application
• Software for medical imaging system, consisting of
  4500 modules amounting to 400,000 lines of code
  written in Pascal, FORTRAN, assembly language,
  and PL/M.
• Random sample of 390 modules from the ones
  written in Pascal and FORTRAN, consisting of about
  40,000 lines of code.
• Software was developed over a period of five years,
  and had been in use at several hundred sites.
• Number of changes made to the executable code
  documented by Change Reports (CRs) indicates
  software development effort.                          109
  Regression Tree Modeling (cont’d)
Application
 Software for medical imaging system, consisting of 4500
  modules amounting to 400,000 lines of code written in
  Pascal, FORTRAN, assembly language, and PL/M.
 Random sample of 390 modules from the ones written in
  Pascal and FORTRAN, consisting of about 40,000 lines of
  code.
 Software was developed over a period of five years, and
  had been in use at several hundred sites.
 Number of changes made to the executable code
  documented by Change Reports (CRs) indicates software
  development effort.
                                                       110
    Regression Tree Modeling (cont’d)
   Application (cont’d)
      Metrics

         Total lines of code (TC)

         Number of code lines (CL)
         Number of characters (Cr)

         Number of comments (Cm)

         Comment characters (CC)

         Code characters (Co)
         Halstead’s program length
         Halstead’s estimate of program length metric

         Jensen’s estimate of program length metric
         Cyclomatic complexity metric
         Bandwidth metric
                                                         111
  Regression Tree Modeling
          (cont’d)
Pruning
• Tree grown using the stopping rules is too elaborate.
• Pruning - equivalent to variable selection in linear
  regression.
• Determines a nested sequence of subtrees of the
  given tree by recursively snipping off partitions with
  minimal gains in deviance reduction
• Degree of pruning can be determine by using cross-
  validation.



                                                           112
  Regression Tree Modeling (cont’d)

Pruning
 Tree grown using the stopping rules is too elaborate.
 Pruning - equivalent to variable selection in linear
  regression.
 Determines a nested sequence of subtrees of the given
  tree by recursively snipping off partitions with minimal
  gains in deviance reduction
 Degree of pruning can be determine by using cross-
  validation.


                                                             113
  Regression Tree Modeling (cont’d)

Cross-Validation
 Evaluate the predictive performance of the regression tree
  and degree of pruning in the absence of a separate
  validation set.
 Data are divided into two mutually exclusive sets, viz.,
  learning sample and test sample.
 Learning sample is used to grow the tree, while the test
  sample is used to evaluate the tree sequence.
 Deviance - measure to assess the performance of the
  prediction rule in predicting the number of errors for the
  test sample of different tree sizes.
                                                          114
   Regression Tree Modeling (cont’d)
Performance Analysis
 Two types of errors:
     Predict more faults than the actual number - Type I
      misclassification.
     Predict fewer faults than actual number - Type II error.

 Type II error is more serious.
 Type II error in case of tree modeling is 8.7%, and in case of fault
  density is 13.1%.
 Tree modeling approach is significantly than fault density approacy.
 Can also be used to classify modules into fault-prone and non fault-
  prone categories.
 Decision rule - classifies the module as fault-prone if the predicted
  number of faults is greater than a certain a.
 Choice of a determines the misclassification rate.

                                                                          115
    Part VIII: Software Reliability Tools
   SRMP
   SMERFS
   CASRE




                                        116
      Where Do They Come From?
   Software Reliability Modeling Program (SRMP)
      Bev Littlewood of City University, London

   Statistical Modeling and Estimation of Reliability
    Functions for Software (SMERFS)
      William Farr of Naval Surface Warfare Center
   Computer-Aided Software Reliability Estimation Tool
    (CASRE)
      Allen Nikora, JPL & Michael Lyu, Chinese
       University of Hong Kong



                                                          117
            SRMP Main Features
   Multiple Models (9)
   Model Application Scheme: Multiple Iterations
   Data Format: Time-Between-Failures Data Only
   Parameter Estimation: Maximum Likelihood
   Multiple Evaluation Criteria - Prequential
    Likelihood, Bias, Bias Trend, Model Noise
   Simple U-Plots and Y-Plots


                                                118
          SMERFS Main Features
   Multiple Models (12)
   Model Application Scheme: Single Execution
   Data Format: Failure-Counts and Time-Between
    Failures
   On-line Model Description Manual
   Two parameter Estimation Methods
      Least Square Method
      Maximum Likelihood Method

   Goodness-of-fit Criteria: Chi-Square Test, KS Test
   Model Applicability - Prequential Likelihood, Bias,
    Bias Trend, Model Noise
   Simple Plots
                                                          119
       The SMERFS Tool Main Menu
   Data Input               Plots of the Raw Data
   Data Edit                Model Applicability Analysis
   Data Transformation      Executions of the Models
   Data Statistics          Analyses of Model Fit
                             Stop Execution of SMERFS




                                                      120
              CASRE Main Features
   Multiple Models (12)
   Model Application Scheme: Multiple Iterations
   Goodness-of-Fit Criteria - Chi-Square Test, KS Test
   Multiple Evaluation Criteria - Prequential Likelihood, Bias,
    Bias Trend, Model Noise
   Conversions between Failure-Counts Data and Time-
    Between-Failures Data
   Menu-Driven, High-Resolution Graphical User Interface
   Capability to Make Linear Combination Models

                                                              121
CASRE High-Level Architecture




                                122
                   Further Reading
   A. A. Abdel-Ghaly, P. Y. Chan, and B. Littlewood; "Evaluation of
    Competing Software Reliability Predictions," IEEE Transactions on
    Software Engineering; vol. SE-12, pp. 950-967; Sep. 1986.
   T. Bowen, "Software Quality Measurement for Distributed Systems",
    RADC TR-83-175.
   W. K. Erlich, A. Iannino, B. S. Prasanna, J. P. Stampfel, and J. R. Wu,
    "How Faults Cause Software Failures: Implications for Software
    Reliability Engineering", published in proceedings of the International
    Symposium on Software Reliability Engineering, pp 233-241, May 17-
    18, 1991, Austin, TX
   M. E. Fagan, "Advances in Software Inspections", IEEE Transactions on
    Software Engineering, vol SE-12, no 7, July, 1986, pp 744-751
   M. E. Fagan, "Design and Code Inspections to Reduce Errors in
    Program Development," IBM Systems Journal, Volume 15, Number 3,
    pp 182-211, 1976
   W. H. Farr, O. D. Smith, and C. L. Schimmelpfenneg, "A PC Tool for
    Software Reliability Measurement," published in the 1988 Proceedings
    of the Institute of Environmental Sciences, King of Prussia, PA
                                                                         123
           Further Reading (cont’d)
   W. H. Farr, O. D. Smith, "Statistical Modeling and Estimation of
    Reliability Functions for Software (SMERFS) User's Guide," Naval
    Weapons Surface Center, December 1988 (approved for unlimited
    public distribution by NSWC)
   J. E. Gaffney, Jr. and C. F. Davis, "An Approach to Estimating Software
    Errors and Availability," SPC-TR-88-007, version 1.0, March, 1988,
    proceedings of Eleventh Minnowbrook Workshop on Software
    Reliability, July 26-29, 1988, Blue Mountain Lake, NY
   J. E. Gaffney, Jr. and J. Pietrolewicz, "An Automated Model for Software
    Early Error Prediction (SWEEP)," Proceedings of Thirteenth Minnow-
    brook Workshop on Software Reliability, July 24-27, 1990, Blue
    Mountain Lake, NY
   A. L. Goel, S. N. Sahoo, "Formal Specifications and Reliability: An
    Experimental Study", published in proceedings of the International
    Symposium on Software Reliability Engineering, pp 139-142, May 17-
    18, 1991, Austin, TX
   A. Grnarov, J. Arlat, A. Avizienis, "On the Performance of Software
    Fault-Tolerance Strategies", published in the proceedings of the Tenth
    International Symposium on Fault Tolerant Computing (FTCS-10),
    Kyoto, Japan, October, 1980, pp 251-253                                 124
            Further Reading (cont’d)
   K. Kanoun, M. Bastos Martini, J. Moreira De Souza, “A Method for
    Software Reliability Analysis and Prediction - Application to the TROPICO-
    R Switching System”, IEEE Transactions on Software Engineering, April
    1991, pp, 334-344
   J. C. Kelly, J. S. Sherif, J. Hops, "An Analysis of Defect Densities Found
    During Software Inspections", Journal of Systems Software, vol 17, pp
    111-117, 1992
   T. M. Khoshgoftaar and J. C. Munson, "A Measure of Software System
    Complexity and its Relationship to Faults," proceedings of 1992
    International Simulation Technology Conference and 992 Workshop on
    Neural Networks (SIMTEC'92 - sponsored by the Society for Computer
    Simulation), pp. 267-272, November 4-6, 1992, Clear Lake, TX
   M. Lu, S. Brocklehurst, and B. Littlewood, "Combination of Predictions
    Obtained from Different Software Reliability Growth Models," proceedings
    of the IEEE 10th Annual Software Reliability Symposium, pp 24-33, June
    25-26, 1992, Denver, CO
   M. Lyu, ed. Handbook of Software Reliablity Engineering, McGraw-Hill
    and IEEE Computer Society Press, 1996, ISBN 0-07-0349400-8
                                                                            125
               Further Reading (cont’d)
   M. Lyu, "Measuring Reliability of Embedded Software: An Empirical Study
    with JPL Project Data," published in the Proceedings of the International
    Conference on Probabilistic Safety Assessment and Management; February
    4-6, 1991, Los ngeles, CA.
   M. Lyu and A. Nikora, "A Heuristic Approach for Software Reliability
    Prediction: The Equally-Weighted Linear Combination Model," published in
    the proceedings of the IEEE International Symposium on Software Reliability
    Engineering, May 17-18, 1991, Austin, TX M. Lyu and A. Nikora, "Applying
    Reliability Models More Effectively", IEEE Software, vol. 9, no. 4, pp. 43-52,
    July, 1992
   M. Lyu and A. Nikora, "Software Reliability Measurements Through Com-
    bination Models: Approaches, Results, and a CASE Tool," proceedings the
    15th Annual International Computer Software and Applications Conference
    COMPSAC91), September 11-13, 1991, Tokyo, Japan
   J. McCall, W. Randall, S. Fenwick, C. Bowen, P. Yates, N. McKelvey, M.
    Hecht, H. Hecht, R. Senn, J. Morris, R. Vienneau, "Methodology for Software
    Reliability Prediction and Assessment," Rome Air Development Center
    (RADC) Technical Report RADC-TR-87-171. volumes 1 and 2, 1987
   J. Munson and T. Khoshgoftaar, "The Use of Software Metrics in Reliability
    Models," presented at the initial meeting of the IEEE Subcommittee on
    Software Reliability Engineering, April 12-13, 1990, Washington, DC          126
               Further Reading (cont’d)
   J. C. Munson, "Software Measurement: Problems and Practice," Annals of
    Software Engineering, J. C. Baltzer AG, Amsterdam 1995.
   J. C. Munson, “Software Faults, Software Failures, and Software Reliability
    Modeling”, Information and Software Technology, December, 1996.
   J. C. Munson and T. M. Khoshgoftaar “Regression Modeling of Software
    Quality: An Empirical Investigation,” Journal of Information and Software
    Technology, 32, 1990, pp. 105-114.
   J. Munson, A. Nikora, “Estimating Rates of Fault Insertion and Test
    Effectiveness in Software Systems”, invited paper, published in Proceedings of
    the Fourth ISSAT International Conference on Quality and Reliability in Design,
    Seattle, WA, August 12-14, 1998
   John D. Musa., Anthony Iannino, Kazuhiro Okumoto, Software Reliability:
    Measurement, Prediction, Application; McGraw-Hill, 1987; ISBN 0-07-044093-
    X.
   A. Nikora, J. Munson, “Finding Fault with Faults: A Case Study”, presented at
    the Annual Oregon Workshop on Software Metrics, May 11-13, 1997, Coeur
    d’Alene, ID.
   A. Nikora, N. Schneidewind, J. Munson, "IV&V Issues in Achieving High
    Reliability and Safety in Critical Control System Software," proceedings of the
    Third ISSAT International Conference on Reliability and Quality in Design,
    March 12-14, 1997, Anaheim, CA.                                              127
           Further Reading (cont’d)
   A. Nikora, J. Munson, “Determining Fault Insertion Rates For Evolving
    Software Systems”, proceedings of the Ninth International Symposium
    on Software Reliability Engineering, Paderborn, Germany, November 4-
    7, 1998
   Norman F. Schneidewind, Ted W,.Keller, "Applying Reliability Models to
    the Space Shuttle", IEEE Software, pp 28-33, July, 1992
   N. Schneidewind, “Reliability Modeling for Safety-Critical Software”,
    IEEE Transactions on Reliability, March, 1997, pp. 88-98
   N. Schneidewind, "Measuring and Evaluating Maintenance Process
    Using Reliability, Risk, and Test Metrics", proceedings of the
    International Conference on Software Maintenance, September 29-
    October 3, 1997, Bari, Italy.
   N. Schneidewind, "Software Metrics Model for Integrating Quality
    Control and Prediction", proceedings of the 8th International Sympsium
    on Software Reliability Engineering, November 2-5, 1997, Albuquerque,
    NM.
   N. Schneidewind, "Software Metrics Model for Quality Control",
    Proceedings of the Fourth International Software Metrics Symposium,
    November 5-7, 1997, Albuquerque, NM.
                                                                         128
            Additional Information
   CASRE Screen Shots
   Further modeling details
      Additional Software Reliability Models

      Quantitative Criteria for Model Selection – the
       Subadditivity Property
      Increasing the Predictive Accuracy of Models




                                                         129
CASRE - Initial Display




                          130
CASRE - Applying Filters




                           131
CASRE - Running Average Trend Test




                                132
CASRE - Laplace Test




                       133
CASRE - Selecting and Running Models




                                 134
CASRE - Displaying Model Results




                                   135
CASRE - Displaying Model Results (cont’d)




                                     136
CASRE - Prequential Likelihood Ratio




                                  137
CASRE - Model Bias




                     138
CASRE - Model Bias Trend




                           139
CASRE - Ranking Models




                         140
CASRE - Model Ranking Details




                                141
CASRE - Model Ranking Details (cont’d)




                                   142
CASRE - Model Results Table




                              143
CASRE - Model Results Table (cont’d)




                                  144
CASRE - Model Results Table (cont’d)




                                  145
       Additional Software Reliability
                  Models
   Software Reliability Estimation Models:
      Exponential NHPP Models

         Generalized Poisson Model

         Non-homogeneous Poisson Process Model

         Musa Basic Model

         Musa Calendar Time Model

         Schneidewind Model

      Littlewood-Verrall Bayesian Model

      Hyperexponential Model


                                                  146
    Generalized Poisson Model

   Proposed by Schafer, Alter, Angus, and
    Emoto for Hughes Aircraft Company under
    contract to RADC in 1979.
   Model is analogous in form to the Jelinski-
    Moranda model but taken within the error
    count framework. The model can be shown
    reduce to the Jelinski-Moranda model under
    the appropriate circumstances.

                                                  147
Generalized Poisson Model (cont'd)
Assumptions:
   1. The expected number of errors occurring in any
      time interval is proportional to the error content at
      the time of testing and to some function of the
      amount of time spent in error testing.
   2. All errors are equally likely to occur and are
      independent of each other.
   3. Each error is of the same order of severity as any
      other error.
   4. The software is operated in a similar manner as
      the anticipated usage.
   5. The errors are corrected at the ends of the testing
      intervals without introduction of new errors into
      the program.
                                                              148
Generalized Poisson Model (cont'd)
Construction of Model:
   Given testing intervals of length X1, X2,...,Xn

   f i errors discovered during the i'th interval
   At the end of the i'th interval, a total of Mi errors
    have been corrected
   First assumption of the model yields:
         E(f i) = (N - Mi-1)gi (x1, x2, ..., xi)

   where

          is a proportionality constant
         N is the initial number of errors
         gi is a function of the amount of testing time
          spent, previously and currently. gi is usually
          non-decreasing. If gi (x1, x2, ..., xi) = xi, then the
          model reduces to the Jelinski-Moranda model. 149
             Schneidewind Model
   Proposed by Norman Schneidewind in 1975.

   Model's basic premise is that as the testing
    progresses over time, the error detection process
    changes. Therefore, recent error counts are usually
    of more use than earlier counts in predicting future
    error counts.

   Schneidewind identifies three approaches to using the
    error count data. These are identified in the following
    slide.
                                                           150
                Schneidewind Model
   First approach is to use all of the error counts for all
    testing intervals.

   Second approach is to use only the error counts from test
    intervals s through m and ignore completely the error
    counts from the first s - 1 test intervals, assuming that
    there have been m test intervals to date.

   Third approach is a hybrid approach which uses the
    cumulative error count for the first s - 1 intervals and the
    individual error counts for the last m - s + 1 intervals.

                                                                   151
        Schneidewind Model (cont'd)
Assumptions:
   1.    The number of errors detected in one interval is
         independent of the error count in another.
   2.    The error correction rate is proportional to the
         number of errors to be corrected.
   3.    The software is operated in a similar manner as
         the anticipated operational usage.
   4.    The mean number of detected errors decreases
         from one interval to the next.
   5.    The intervals are all of the same length.
                                                            152
 Schneidewind Model (cont'd)
Assumptions (cont'd):

  6.   The rate of error detection is proportional the
       number of errors within the program at the time of
       test. The error detection process is assumed to
       be a nonhomogeneous Poisson process with an
       exponentially decreasing error detection rate.
       This rate of change is:


       for the i'th interval, where  > 0 and  > 0 are
       constants of the model.
                                                            153
             Schneidewind Model
   Proposed by Norman Schneidewind in 1975.

   Model's basic premise is that as the testing
    progresses over time, the error detection process
    changes. Therefore, recent error counts are usually
    of more use than earlier counts in predicting future
    error counts.

   Schneidewind identifies three approaches to using the
    error count data. These are identified in the following
    slide.
                                                           154
                Schneidewind Model
   First approach is to use all of the error counts for all
    testing intervals.

   Second approach is to use only the error counts from test
    intervals s through m and ignore completely the error
    counts from the first s - 1 test intervals, assuming that
    there have been m test intervals to date.

   Third approach is a hybrid approach which uses the
    cumulative error count for the first s - 1 intervals and the
    individual error counts for the last m - s + 1 intervals.

                                                                   155
        Schneidewind Model (cont'd)
Assumptions:
   1.    The number of errors detected in one interval is
         independent of the error count in another.
   2.    The error correction rate is proportional to the
         number of errors to be corrected.
   3.    The software is operated in a similar manner as
         the anticipated operational usage.
   4.    The mean number of detected errors decreases
         from one interval to the next.
   5.    The intervals are all of the same length.
                                                            156
Schneidewind Model (cont'd)

From assumption 6, the cumulative mean number of
errors is:



for the i'th interval, the mean number of errors is:



mi = Di - Di - 1

                                                       157
    Nonhomogeneous Poisson Process
   Proposed by Amrit Goel of Syracuse University and Kazu
    Okumoto in 1979.

   Model assumes that the error counts over non-
    overlapping time intervals follow a Poisson distribution.

   It is also assumed that the expected number of errors in
    an interval of time is proportional to the remaining number
    of errors in the program at that time.


                                                                158
Nonhomogeneous Poisson Process (cont’d)
   Assumptions:
      1.   The software is operated in a similar manner as the
           anticipated operational usage.
      2.   The numbers of errors, (f 1, f 2, f 3,...,f m) detected in
           each of the respective time intervals [(0, t 1), (t1,
           t2),...,(tm-1,tm)] are independent for any finite
           collection of times t1 < t2 < ... < tm.
      3.   Every error has the same chance of being detected
           and is of the same severity as any other error.
      4.   The cumulative number of errors detected at any
           time t, N(t), follows a Poisson distribution with mean
           m(t). m(t) is such that the expected number of error
           occurrences for any time (t, t + Dt), is proportional to
           the expected number of undetected errors at time t. 159
Nonhomogeneous Poisson Process (cont’d)
    Assumptions (cont’d):

       5.   The expected cumulative number of errors
            function, m(t), is assumed to be a bounded,
            nondecreasing function of t with:

                  m(t) = 0 for t = 0
                  m(t) = a for t = 

            where a is the expected total number of errors to
            be eventually detected in the testing process.

                                                                160
Nonhomogeneous Poisson Process (cont’d)
     Assumptions 4 and 5 give the number of errors
     discovered in the interval       as:


     where
     b is a constant of proportionality
     a is the total number of errors in the program
     Solving the above differential equations yields:


     Which satisfies the initial conditions

                                                        161
               Musa Basic Model

Assumptions:
  1.   Cumulative number of failures by time t, M(t), follows a
       Poisson process with the following mean value function:




  2.   Execution times between failures are piecewise
       exponentially distributed, i.e, the hazard rate for a single
       failure is constant.



                                                                      162
         Musa Basic Model (cont’d)

   Mean value function given on previous slide.
   Failure intensity given by:




                                                   163
               Musa Calendar Time
   Developed by John Musa of AT&T Bell Labs between
    1975 and 1980.
   This model is one of the most popular software reliability
    models, having been employed both inside and outside of
    AT&T.
   Basic model (see previous slides) is based on the amount
    of CPU time that occurs between successive failures
    rather than on wall clock time.
   Calendar time component of the model attempts to relate
    CPU time to wall clock time by modeling the resources
    (failure identification personnel, failure correction
    personnel, and computer time) that may limit various time
    segments of testing.
                                                            164
       Musa Calendar Time (cont'd)
   Musa's basic model is essentially the same as the Jelinski-
    Moranda model
   The importance of the calendar component of the model is:
      Development of resource allocation (failure identification staff,
       failure correction staff, and computer time)
      Determining the relationship between CPU time and wall
       clock time

       Let tI/ = instantaneous ratio of calendar to execution time
       resulting from effects of failure identification staff
       Let tF/ = instantaneous ratio of calendar to execution time
       resulting from effects of failure correction staff
       Let tC/ = instantaneous ratio of calendar to execution time
       resulting from effects of available computer time
                                                                        165
     Musa Calendar Time (cont'd)
Increment in calendar time is proportional to the average
amount by which the limiting resource constrains testing
over a given execution time segment:


Resource requirements associated with a change in MTTF
from 1 to 2 can be approximated by:
             DXK = KD + mKDm
     D = execution time increment
     Dm = increment of failures experienced
     K = execution time coefficient of resource expenditure
 mK = failure coefficient of resource expenditure
                                                            166
    Littlewood-Verrall Bayesian Model
   Littlewood's model, a reformulation of the Jelinski-
    Moranda model, postulated that all errors do not
    contribute equally to the reliability of a program.

   Littlewood's model postulates that the error rate, assumed
    to be a constant in the Jelinski-Moranda model, should be
    treated as a random variable.

   The Littlewood-Verrall model of 1978 attempts to account
    for error generation in the correction process by allowing
    for the probability that the program could be made less
    reliable by correcting an error.
                                                             167
          Littlewood-Verrall (cont'd)
Assumptions:
1. Successive execution times between failures are independent
   random variables with probability density functions

                                 where the i are the error rates

2.   i's form a sequence of random variables, each with a gamma
     distribution of parameters  and (i), such that:




     (i) is an increasing function of i that describes the "quality" of the
     programmer and the "difficulty" of the task.

                                                                         168
       Littlewood-Verrall (cont'd)
Assumptions (cont'd)

     Imposing the constraint                            for
     any x reflects the intention to make the program
     better by correcting errors. It also reflects the fact
     that sometimes corrections will make the program
     worse.

3.   The software is operated in a similar manner as
     the anticipated operational usage.


                                                              169
  Littlewood-Verrall (cont'd)



  =


  =


This is a Pareto distribution
                                170
  Littlewood-Verrall (cont'd)
Joint density for the Xi's is given by:

    f[X1, X2, ... , Xn | , (i)] =




 For , Littlewood and Verrall suggest:



                                          171
            Hyperexponential Model
   Extension to classical exponential models. First con-
    sidered by Ohba; addressed in variations by Yamada,
    Osaki, and Laprie et. al.

   Basic idea is that different sections of software ex-
    perience an exponential failure rate. Rates may vary over
    these sections to reflect their different natures.




                                                            172
    Hyperexponential Model (cont’d)
Assumptions:

    Suppose that there are K sections of the software such that within
    each class:
   1.  The rate of fault detection is proportional to the current fault
       content within that section of the software.
   2.  The fault detection rate remains constant over the intervals
       between fault occurrence.
   3.  A fault is corrected instantaneously without introducing new
       faults into the software.
   4.  Every fault has the same chance of being encountered and is of
       the same severity as any other fault.

                                                                    173
  Hyperexponential Model (cont’d)
Assumptions (cont’d):
   1.   The failures, when the faults are detected, are
        independent.
   2.   The cumulative number of failures by time t, m(t), follows a
        Poisson process with the following mean value function:




                                                                       174
     Criteria for Model Selection
Quantitative Criteria Prior to Model Application
   Subadditive Property Analysis




                                                   175
       Subadditive Property Analysis
   Common definition of reliability growth is that successive
    interfailure times tend to grow larger:



   Under the assumption of the interfailure times being
    stochastically independent, we have:




                                                             176
Subadditive Property Analysis (cont’d)
   As an alternative to assumption of stochastic
    independence, we can consider that successive
    failures are governed by a non-homogeneous
    Poisson process:
      N(t) = cumulative number of failures observed in
        [0,t]
      H(t) = E[N(t)], mean value of cumulative failures
      h(t) = dH(t)/dt, failure intensity

   Natural definition of reliability growth is that the
    increase in the expected number of failures, H(t),
    tends to become lower. However, there are situations
    where reliability growth may take place on average
    even though the failure intensity fluctuates locally.
                                                            177
Subadditive Property Analysis (cont’d)
   To allow for local fluctuations, we can say that the
    expected number of failures in any initial interval (i.e.,
    of the form [0,t]) is no less than the expected number
    of failures in any interval of the same length occurring
    later (i.e., in [x,x+t]).
   Independent increment property of an NHPP allows
    above definition to be written as:
                        H(t1)+H(t2)  H(t1+ t2)
   When above inequality holds, H(t) is said to be
    subadditive.
   This definition of reliability growth allows there to be
    local intervals in which reliability decreases without
    affecting the overall trend of reliability increase.
                                                                 178
Subadditive Property Analysis (cont’d)
   We can interpret the subadditive property graphically. Consider the
    curve m(t), representing the mean value function for the cumulative
    number of failures, and the line Lt joining the two end points of m(t) at
    (t,H(t)), shown below.


    H(x)
                                             L
                  m(t)




                            t1   t2              T

   Let AH(t) denote the difference between the area delimited by (1) m(t)
    and the coordinate axis and (2) Lt and the coordinate axis. If H(t) is
    subadditive, then AH(t)  0 for all t  [0,T]. We call AH(t) the
    subadditivity factor.
                                                                                179
Subadditive Property Analysis (cont’d)
Summary
 AH(t)  0 over [0,T] implies reliability growth on average over
  [0,T].
 AH(t)  0 over [0,T] implies reliability decrease on average over
  [0,T].
 AH(t) constant over [0,T] implies stable reliability on average
  over [0,T].
 Changes in the sign of AH(t) indicate reliability trend changes.
 When derivative with respect to time of AH(t), Ah(t),  0 over
  subinterval [t1,t2], this implies local reliability growth on average
  over [t1,t2]
 Ah(t)  0 over [t1,t2] implies local reliability decrease on average
  over [t1,t2].
 Changes in the sign of Ah(t) indicate local reliability trend
  changes.
 Transient changes of the failure intensity are not detected by the
  subadditivity property.                                               180
Increasing the Predictive Accuracy of
               Models
     Introduction
     Linear Combination Models
        Statically-Weighted Models

        Statically-Determined/Dynamically-Assigned
          Weights
        Dynamically-Determined/Dynamically-Assigned
          Weights
        Model Application Results

     Model Recalibration

                                                       181
 Problems in Reliability Modeling

Introduction
 Significant differences exist among the performance

   of software reliability models
 When software reliability models were first introduced,

   it was felt that a process of refinement would produce
   a “definitive” model
 The reality is: no single model could be determined a

   priori as the best model during measurement


                                                        182
             The Reality Is In Between

                                         Pessimistic
                                         Projection
                               Reality

Cumulative
 Defects                                    Optimistic
                                            Projection




                        Elapsed time



                                                         183
      Linear Combination Models
Forming Linear Combination Models
(1) Identify a basic set of models (called component
    models)
(2) Select models that tend to cancel out in their biased
    predictions
(3) Keep track of the software failure data with all the
    component models
(4) Apply certain criteria to weigh the selected
    component models and form one or several Linear
    Combination Models for final predictions


                                                            184
A Set of Linear Combination Models
Selected component models : GO, MO, LV
(1) ELC Equally-Weighted LC Model (Statically-Weighted
    model)
(2) MLC Median-Oriented LC Model (Statically-
    Determined/Dynamically-Assigned)
(3) ULC Unequally-Weighted LC Model (Statically-
    Determined/Dynamically-Assigned)
(4) DLC  Dynamically-Weighted LC Model (Dynamically
    Determined and Assigned)
      Weighting is determined by dynamically calculating
       the posterior “prequential likelihood” of each model as
       a meta-predictor
                                                           185
A Set of Linear Combination Models (cont’d)
  Equally-Weighted Linear Combination Model - apply equal weights
  in selecting component models and form the Equally-Weighted
  Linear Combination model for final predictions.
  One possible ELC is:




  These models were chosen as component models because:
      Their predictive validity has been observed in previous
        investigations.
      They represent different categories of models: exponential
        NHPP, logarithmic NHPP, and inverse-polynomial Bayesian.
      Their predictive biases tend to cancel for the five data sets
        analyzed.
                                                                       186
A Set of Linear Combination Models (cont’d)
  Median-Oriented Linear Combination Model - instead of choosing
  the arithmetic mean as was done for the ELC model, use the
  median value. This model's formulation is:


  where :
            O = optimistic prediction
            M = median prediction
            P = pessimistic prediction




                                                                   187
A Set of Linear Combination Models (cont’d)
 Unequally-Weighted Linear Combination Model - instead of choosing the
 arithmetic mean as was done for the ELC model, use the PERT weighting
 scheme. This model's formulation is:




 where:
          O = optimistic prediction
          M = median prediction
          P = pessimistic prediction



                                                                    188
A Set of Linear Combination Models (cont’d)
    Dynamically-Weighted LC Model - use changes in one or
     more of the four previously defined criteria (e.g.
     prequential likelihood) to determine weights for each
     component model.
    Changes can be observed over windows of varying size
     and type. Fixed or sliding windows can be used.
                               w i computation                     w         computation              w         computation
                                                                       i+1                                i+2

      w i reference window                   w         reference window           w         reference window
                                                 i+1                                  i+2
                                                                                                                          Time
                           w         reference window
                               i+2

                w         reference window
                    i+1
       w       reference window
           i
                                                                                                                          Time
    Weights for component models vary over time.
                                                                                                                              189
Data Set 1: Model Comparisons for
         Voyager Project




      Recommended Models: 1-DLC 2-ELC 2-LV

                                             190
Overall Model Comparisons Using All
           Four Criteria




                                 191
Overall Model Comparisons by the
 Prequential Likelihood Measure




                                   192
Summary of Long-Term Predictions




                               193
 Possible Extensions of LC Models
1.   Apply models other than GO, MO, and LV
2.   Apply more than three component models
3.   Apply other meta-predictors for weight
     assignments in DLC-type models
4.   Apply user-determined weighting schemes,
     subject to project criteria and user judgment
5.   Apply combination models as component
     models for a hybrid combination

                                                     194
Increasing the Predictive Accuracy of
               Models
Model Recalibration - Developed by Brocklehurst, Chan, Littlewood, and
Snell. Uses model bias function (u-plot) to reduce bias in model
prediction.
Let the random variable represent the prediction of the next time to
failure, based on observed times to failure                 .
Let       represent the true, but hidden, distribution of the random
variable , and let        represent the prediction of . The relationship
between the estimated and true distributions can be written as:




                                                                      195
Increasing the Predictive Accuracy of
           Models (cont’d)
 Model Recalibration (cont’d):

 If were known, it be possible to determine the true distribution of
 from the inaccurate predictor.

 The key notion in recalibration is that the sequence is
 approximately stationary. Experience seems to show that
 changes only slowly in many cases. This opens the possibility of
 approximating with an estimate and using it to form a new
 prediction:


                                                                       196
Increasing the Predictive Accuracy of
           Models (cont’d)
Model Recalibration (cont’d):

A suitable estimator for is suggested by the observation that
   is the distribution function          . The estimate   is based on
the u-plot calculated from predictions which have been made prior to



This new prediction recalibrates the model based on knowledge of the
accuracy of past predictions. The simplest form of   is the u-plot with
the steps joined to form a polygon. Smooth versions can be constructed
using spline techniques.


                                                                        197
Increasing the Predictive Accuracy of
           Models (cont’d)
  Model Recalibration (cont’d):

  To recalibrate a model’s predictions, follow these four steps:
      1.  Check that error in previous predictions is approximately
          stationary. The y-plot can be used for this purpose.
      2.  Find the u-plot for predictions made before Ti, and join up
          the steps on the plot to form a polygon G* (alternatively, use
          a spline technique to construct a smooth version).
      3.  Use the basic model (e.g., the JM, LV, or MO models) to
          make a “raw” prediction         .
      4.  Recalibrate the raw prediction using                .


                                                                      198

				
About if any file u wil find copyright contact me it will be remove in 3 to 4 buisnees days. add me on sanjaydudeja007@gmail.com or visit http://www.ohotech.com/