Observations on Cost Modeling by keh16561

									Observations on Cost Modeling and Performance
           Measurement of Long Term Archives
                                            Kathy Fontaine
                         NASA Goddard Space Flight Center
                Earth Science Data Systems Working Groups

                               Greg Hunolt, Bud Booth, Mel Banks
                                                        SGT, Inc.

                                      PV2007 Conference
                                     October 9 - 11, 2007
                 DLR Oberpfaffenhofen - Munich - Germany

                                           CEOS WGISS
                                    October 15 - 19, 2007
                 DLR Oberpfaffenhofen - Munich - Germany
• Review - What is the Cost Estimation Toolkit?
    – Goal and Approach of the CET (Cost Estimation Tool)
    – High Level Description of the Data Activity Reference Model
• Experience / Lessons Learned - Building and Maintaining
  the Comparables Database (CDB)
• High Level Description of the Cost Estimating Tool
• Application of the CET to Long-Term Archives
• Summary
• Next Steps

             Goal of the CET (Cost Estimation Tool) Development
• NASA has always used cost estimating models for planning Earth and
  space science flight projects
    – For estimating costs of instrument packages, spacecraft, mission control
      centers, etc.
• NASA had no tool for estimating life cycle costs of science ground data
  handling capabilities, whether stand-alone or within a flight project.
• The goal of the CET development was to see whether that gap could be
  filled -
    –   Project was begun in 2002.
    –   CET Prototypes were tested and evaluated in 2003 and 2004.
    –   „Operational Beta‟ versions were completed in 2005, 2006, and 2007.
    –   Initial testing at GSFC and LaRC in 2005, 2006, 2007 were successful.
    –   CET „operational beta‟ being evaluated for addition to GSFC‟s Integrated
        Development Center‟s package of tools, and was made available as a NASA
        Open Source item in 2007.
•   - for the PI planning a new Data Activity
    – To help PI consider the full range of items that will contribute to the life cycle
      cost of a new data activity and to produce an estimate that the PI can com- 3
      pare to estimates produced by other means.
                                                                          CET Approach
•   Cost Estimation by Analogy Method was Adopted
     – Decision based on Benchmark study and internal testing of existing tools (PRICE, SEER,
       COCOMO, and others);
     – At the time, did not find other acceptable parametric methods for estimating life cycle
       data costs for implementation and maintenance/operations costs;
     – Ensures that estimates will be based on experience with existing science ground data
       handling activities;
     – Requires assembly of information about existing activities, and…
     – Mapping of that information to a common reference model, so that information from
       multiple activities with multiple data providers can be normalized and used together in
       the estimating process.
•   Comparables Database (CDB)
     – The database of information from many existing data activities mapped to the common
       reference model.
•   Data Activity Reference Model
     – Based on reference model developed during 2001 comparative analysis of 19 U.S. and
       international data activities.
     – Includes a set of development, operational and support functions / areas of cost, and
       descriptors for each.
•   CET Estimation by Analogy Implementation
     – The CET uses adaptive regression curve fitting for estimating staffing levels and       4
       parametric techniques (e.g. cost curves) for non-staff cost items.
                                            Data Activity Reference Model
• A „Data Activity‟ is
    – An entity that performs data handling functions that may include ingest,
      product generation, storage/archive, distribution, and support functions (see
    – A data activity‟s life cycle includes implementation and a period of operations
      (when data activity is performing data handling functions) that may overlap.
    – A data activity can be a „stand-alone‟ organization or embedded within a flight
      project or other science or applications project. A „data center‟ can include
      more than one distinct data activity.
• Data Activity Reference Model
    – Functions with Descriptors for each…
    – Operating Functions: Ingest, Product Generation, Archive, Search and Order,
      Access and Distribution, User Support.
    – Support Functions: Documentation, Implementation, Sustaining Engineering,
      Engineering Support, Management, Technical Coordination,
    – See paper for more detail on functions, example of descriptors.
    – Compatible with OAIS where the models overlap, see paper for more detail.
                                              General Data Activity Reference Model

                                         For each
                                                        Template           Concepts:
                                                                           Reference Model,
                                                                           CDB and CET

 DAACs, ESIPS,            Map Data Activity                  Mapped
 SIPSs, Space             Information to                  Data Activity
                                                                           Comparables Database (CDB)
 Science Data             Reference Model                  Information,    Version 2.1 - 29 Data Activities
 Centers, etc
                                                          Year by Year,
Information on            CDB Building &                   Function by
Existing                  Maintenance Tool
Data Activities

  PI User Input                                                                           CET Output

 Specify Mission Schedule, etc.               Cost Estimation by Analogy:                Life-cycle costs and
                                              Function by Function,                      staffing levels
 Select from Menu of Functions
                                              Staff – Adaptive Regression Curve
 Provide Descriptors for each                                                              Graphs
 selected function.                                                                       Sensitivity Analysis
                                              Non-Staff – Parametric.

                                              Cost Estimation Toolkit (CET)
                                               High Level Description of CET

• Excel-based, uses Visual Basic for Applications, two workbooks, one
  for CET, the other holds the CDB; runs on PC or Macintosh
• Use the CET to
    – Describe a new Data Activity: Menu Driven Sequence of Forms for Selecting
      Functions and Entering Descriptors (example to follow).
    – Produce a life cycle estimate: year by year, functional breakdown, staffing
      profile and costs, costs for non-staff items (example to follow).
    – Run a „what-if‟: vary one or more inputs, re-run, produce new estimate and
      comparison with original estimate.
    – Test sensitivity of estimate to a range of variation of a selected descriptor.
    – Produce graphs: select from a number of options (examples to follow).
    – Review and edit/tailor the estimate… tool offers hints such as:
         • Adjust staffing levels to smooth out ups and downs that track workload changes but
           would be impractical to implement;
         • Delete costs for items included in loaded labor rates;                         7
         • Adjust for re-use of existing resources.
                          Data Activity Reference Model:
                              Functions and Descriptors
Data Activity Functions            No. of Descriptors to
                                  Describe Each Function
Ingest                                      9

Product Generation                         21

Documentation                               4

Archive                                    16

Distribution                               31

User Support                                6

Management                                  5

Sustaining Engineering                      4

Engineering Support                         3

Implementation                              8

                           Data Activity Reference Model
                   Ingest Function Descriptors (Example)

Total Ingest FTE
Ingest Technical FTE
Ingest Operations FTE
Ingest Function Level of Service (LOS)
External Ingest Interfaces
Product Types Ingested per Year
Ingest Automation LOS
Number of Products Ingested per year
Ingest Volume per Year

CET - Sample Ingest Descriptor Input Form

CET Screen Shot – Archive Form

CET Screen Shot – Processing Form

CET Screen Shot – Sample Output Table

CET - Sample Life Cycle Cost Estimate Output

                                  CET - Graph Example 1

3. Sample Activity - Total Mission Life Staffing
          by Labor Cost Category

        7% 1%

                                        Admin Support
                                        Development / Engineering
                     50%                Management
                                        Technical / Science


                                      CET - Graph Example 2

5. Sample Activity - Avgerage Annual Staffing by
      Function FTE - Operations Period
              0.97                                 Archive
              0                                    Development
                        3.87                       Distribution
                               0                   Eng Support
                               0.35                Ingest
    1.79                                           Sustaining Eng
           0.42                                    Tech Coord
                                                   User Support

                                 CET – Graph Example 3

7. Sample Activity - Total Estimated Staff


                                        Development / Engineering
34%                   49%               Management / Admin
                                        Technical / Science


                     Application of the CET to Long-Term Archives
• CET and CDB currently do not directly support Long Term Archives
    – No such NASA requirement currently exists for Earth science data, but they
      could be extended to do so…
• Step 1 – Extend Data Activity Reference Model:
    – Analyze OAIS model (especially Preservation Planning and aspects of Ingest,
      Archival Storage, and Data Management) and existing Long Term Archives
    – Identify specific functions or aspects of functions associated with long term
      archiving that go beyond what the model now includes.
• Step 2 – Extend the Comparables Database:
    – Collect information from a number of existing Long Term Archives
    – Map to the extended Data Activity Reference Model
    – Populate the CDB
• Step 3 – Extend the CET:
    – Add estimation of new factors particular to Long Term Archives
• Extended CET / CDB could then be used to estimate staffing / costs for
  a New Long Term Archive, and could be used to support management
  of existing Long Term Archives.                                   18
• Yes, the gap could be filled
    – The CET is proving to be a valuable tool for estimating the life cycle costs of
      scientific data processing, archive, and distribution activities.
    – The information collected for the CDB can also be used by such activities to
      monitor their performance.
    – The CET and its database is capable of being extended to encompass long term
      archives, thus providing a quantitative tool for both planning their development
      and monitoring their performance.

• However,
    – Cost estimation by analogy requires, among other things,
         • lots of analogies [many data activities of similar sizes, for instance]
         • lots of maintenance [information must be updated to maintain currency and
         • lots of security [data activity information must not be labeled or otherwise
    – All of the above would require a good, solid set of requirements, a project plan,
      and other necessary management and review structure.

• And so…
                                                   Next Steps

• NASA is preparing to do an in-depth evaluation of the
   – NASA‟s evolving data systems present a different
     overall picture than was present at the beginning of this
   – It is now time to determine whether „it should continue
     to be done,‟ and if so, which pieces and how.
   – Existing user feedback is being incorporated, and will
     continue to be critical to this process.


 Thank You for Your Attention!


Further questions or comments:

Backup Charts

                                                        CET Effort Estimation Process

Comparables DB –
Describes Existing Activities

  Effort and Workload
  Multiple Activities
  Year by Year                                                  Parameter by
  Function by Function                                           Parameter
                                Intermediate                       Effort
                                Parameters                       Estimation
 Compute:                       Compute:                    Generate:
 Annual Averages,               Annual Averages,            Effort Estimating
 Workload and Effort            Workload and Effort         Relationships
 Parameters for                 Parameters across           for each Workload
 Each CDB Activity              CDB Activities              parameter

                                                                                              Overall Effort Estimate

 Workload, LOS                                                                                Compute:
 Parameters:                     Compute:                     Compute:                        Year by Year
 Single New Activity             Parameters –                 Set of year by year             Effort Estimates:
 Year by Year                    Year by Year                 effort estimates                Correlation weighted average
 Function by Function            Summed over                  for each workload               over workload parameters,
 Stream by Stream                Streams                      parameter                       apply levels of service

Activity Dataset -                              Form of effort estimate computation:
Describes a New Activity                        Effort[new activity] = f ( Workload [new activity] where
                                                f is function based on CDB activities’ effort-workload
                                                developed using “Curve-Fit” approach.                                  1
                                                  Cost Estimating Approach
•   Method is Cost Estimation by Analogy – the data activities in the CDB are assumed
    to be analogs for a new data activity to be estimated.

•   Year by year staff effort for new data activity is estimated from mission and
    expected year by year workload (using “effort estimating relationships” – see next
    chart), then user‟s projected local labor rates are applied to produce estimates of
    staff costs.

•   Estimating of effort is done function by function, so CDB comparison is with data
    for separate functions rather than with whole data activities.

•   Non-staff items are currently based on CDB history, use inflation normalization,
    parametric approaches, „cost curves‟ etc., for projections.

                                             CET’s Effort Estimating Process

•   Compute averages of annual workload parameters and staffing levels for each
    functional area for each CDB activity.
•   Compute “Effort Estimating Relationships”, i.e. equations for FTE as function
    of workload parameters
     – Using regression-based curve fitting (see next chart) for operating functions
        (ingest, processing, archive, distribution) and for implementation and sustaining
        engineering, system purchase cost (normalized to base year then projected);
     – Using a „base plus delta‟ approach for other non-operating functions – CDB
        averages as base, delta based on comparison of new activity LOS‟s with CDB
•   Compute year by year staffing for the functional area for the new activity by
     – Use the equations to compute a set of FTE estimates, each based on a specific
        workload parameter, and…
     – Compute weighted average for each functional area‟s staffing categories,
        weighted by curve fit correlation for each workload parameter,
     – Use applicable Level of Service parameter(s) to bump up estimate if new
        activity‟s LOS is higher than CDB average, or decrease if lower.
                               Regression-based Curve Fitting - Detail
Curve Fitting is used to develop a relationship between workload parameters and
     FTE for the CDB data activities.
    1.    CET takes a set of data, i.e. values for a workload parameter and corresponding
          operational or technical FTE values, performs “cluster” outlier screening.
    2.    CET computes a set of eight curves, using regression: linear, quadratic,
          exponential, logarithmic, power, root, linear-exponential, linear-logarithmic.
    3.    CET eliminates those curves which drive estimated FTE negative or introduce
          double values (i.e. two FTE estimates for one workload value).
    4.    CET computes Pearson correlation coefficient for each curve left.
    5.    CET checks all remaining curves for outliers – points whose departure from
          the curve exceed a threshold multiple of standard deviation, eliminates an
          outlier point (the “worst”).
    6.    CET re-computes the curves without the outlier, makes sure each curve‟s
          correlation is not worse.
    7.    CET repeats 5 and 6 until outliers are gone or outlier toss limit is reached.
    8.    CET selects the re-computed curve with the best correlation value.
    9.    CET uses a limited linear projection if ADS workload exceeds CDB range.
    10.   CET uses the final curve‟s equation/coefficients to make year by year estimates
          of FTE.
                                     Calibration / Tuning of the CET

•   The premise of the cost estimation by analogy approach is that if the
    CET is calibrated against existing data activities, i.e. tuned to produce
    the best possible overall results for the known existing data activities, it
    will produce a good life cycle cost estimate for a new data activity.
•   The CDB contains information for twenty-nine data activities that can be
    used as test subjects, since CDB information includes mission
    information, workload, staffing, etc.
•   Calibration / tuning is accomplished by adjusting CET controls:
    parameter weighting, outlier removal limits, LOS adjustment
    coefficients, until the „best‟ overall performance for the set of CDB data
    activities is achieved.
•   The accuracy of the CET for existing CDB data activities is measured by
    independent testing…

                                               Independent Testing Process
• The data activity used as a test subject is not allowed to influence its own
  test results.
• In the independent testing process
    – The objective is to measure the error of the estimate of a data activity‟s staffing
      profile (estimate based on its mission and workload).
    – A CDB data activity is selected to be a test subject.
    – An Activity Data Set is prepared for the data activity, which contains the mission
      and workload information a CET user would enter.
    – That activity is removed from the CDB.
    – The CET reads the Activity Data Set, accesses the CDB, and produces an
      estimate for the data activity.
    – The estimated staffing profile is compared with the actual staffing profile to
      determine the error, function by function and for the activity as a whole.
    – The process is repeated for the set of CDB data activities.
    – When all activities have been processed, overall errors across the data activities
      are computed: e.g. overall average absolute error and percentage, and overall
      bias.                                                                           28
                                                Independent Testing Results

• Results are based on testing with 28 CDB sites
• Test Results for the September 2006 version 2.1 of the CET
    – The typical annual error of estimate is 2.46 FTE (average absolute error, so
      positive and negative errors don‟t cancel). The average typical error % of actual
      is 22.9%.
    – The overall annual average error across the 29 sites is –0.03 FTE, which is
      –0.3%, showing very little overall bias.
    – For the individual estimates for the 29 data activities:
      13 have errors less than 20%, 18 have errors less than 30%; 21 have errors less
      than 50%; and overall smaller activities have greater errors (see next chart).
    – For the CDB activities, the average standard deviation of FTE for a function,
      weighted by the number of activities having the function, is 2.57. This is a
      rough measure of the variability of the information in the CDB.
    – The standard deviation of the typical error for the Version 2.1 CET, 1.66, is well
      within the range of variability of the information in the CDB.

                                          Independent Testing Results, Continued

                         Actual Staff Size vs ATE (Average Typical Error) Percentage



    ATE Err %




                     0.00          5.00      10.00      15.00      20.00      25.00    30.00
                                Actual Staff Size, FTE (Averaged over Activity Life)

If the actual size of an activity was 10 FTE or greater, 14 out of 15, or 93%, had
an ATE of less than 30%.
If the actual size of an activity was less than 10 FTE, 4 out of 13, or 31%, had
an ATE of less than 30%.
     Progress with CET Independent Testing Performance

           Improving Average Typical Error (ATE) for CETs





                                                   2.78       2.47         2.46



              Working        IOC      Beta Test   Version 1   Version 2   Version 2.1
  CET         Prototype   September   May 2004    Sept 2004   Oct 2005    Sept 2006
Version:      May 2003      2003


