KC1 - This is a single CSCI within a large ground system. It is made up of 43 KSLOC of C++ code. Error data for this
code has been collected since the beginning of the project but that data can only be associated to the module
(method) back to five years.
1. KC1_product_hierarchy.csv
2. KC1_defect_product_relations.csv
3. KC1_dynamic_defect_data.csv
4. KC1_static_defect_data.csv
5. KC1_product_module_metrics.csv
6. KC1_product_class_metrics.csv
DEFINITIONS OF TERMS:
PRODUCT - Refers to anything to which defect data and metrics data can be associated. In most cases
products will be synonymous with code related items such as functions and systems/sub-
systems.
CSCI - Term applied to a grouping of products usually having similar functions or capabilities.
Usually the term CSCI is synonymous with system or sub-system but the definition may
vary from project to project.
CSC - Computer Software Component. CSCIs are comprised of logical groups of CSCs. This
design element generally is comprised of files, which make up an executable.
CLASS - A template from which objects can be created.
MODULE - Term applied to the lowest level functional unit which metrics can be applied. In most
cases this will refer to code specific items such as functions but the real world
representation can vary between projects and programming languages (e.g. functions,
modules, subroutines)
SEVERITY - This value quantifies the impact of the defect on the overall environment with 1 being
most severe to 5 being least severe. For example, severity 1 may imply that the defect
caused a loss of functionality without a workaround where severity 5 may mean that the
impact is superficial and did not cause any major disruptions to the system.
PRIORITY - This value quantifies the necessity for this defect to be resolved where 1 has the highest
priority (critical) to 3 for lowest priority. For example, priority 1 means that there is a
critical need for the resolution of a defect where priority 3 may imply that the
development or maintenance can delay the change.
1. KC1_product_hierarchy.csv
2,107 Entries
This file contains a representation of the product hierarchy as it exists in the database.
CSC_ID - The unique numeric identifiers of the CSCs.
CLASS_ID - The unique numeric identifiers of the Classes contained by the CSC.
MODULE_ID - The unique numeric identifiers of the modules contained by the Class and CSC.
LANGUAGE - The programming language in which the product was developed.
CSC_ID 2,107 entries, 18 identifiers, numbered between 22202 – 22309
CLASS_ID 2,107 entries, 145 identifiers, numbered between 22323 – 22824
MODULE_ID 2,107 entries, ranging from 22825 – 24951
LANGUAGE All C++
2. KC1_defect_product_relations.csv
669 Entries
This file lists directly makes the link between products and the defect records to which they are associated.
MODULE_ID - The unique numeric identifier of the module.
DEFECT_ID - The unique numeric identifier of the associated defect.
MODULE_ID 669 entries, 293 identifiers, numbered between 22851 – 24950
DEFECT_ID 669 entries, 109 identifiers, numbered between 2218 – 2661
3. KC1_dynamic_defect_data.csv
10,845 Entries
This file contains defect data that changes constant throughout the life cycle of that defect. This file essentially
outlines the history of the defect. The dates for ACTIVITY and TRIGGER entries are the dates on which the
defect was opened. The dates for TARGET_PRODUCT_ID and DEFECT_TYPE are the dates on which the
defect was closed.
DEFECT_ID - The unique numeric identifier of the defect.
ENTRY_TYPE - The type of defect history entry represented.
ENTRY_DATE - The date associated with the defect data entry.
ENTRY_SEVERITY - The severity at the time of the entry.
ENTRY_PRIORITY - The priority at the time of the entry.
ENTRY_DATA - The data contained in the entry
DEFECT_ID 10,845 entries, 1,001 unique identifiers, ranging from 1661 – 2661
ENTRY_TYPE 10,845 entries, broken down as follows:
ACTIVITY 1001
CLOSED 1001
DEFECT_TYPE 996
OPENED 1001
STATE_CHANGE 6177
TARGET_PRODUCT_ID 669
ENTRY_DATE 10,816 entries with dates, from 20-Jun-1996 to 23-Apr-2003
ENTRY_SEVERITY 10,845 entries, broken down as follows:
ENTRY_SEVERITY
Most Severe 1 897
2 7271
3 2444
4 158
Least Severe 5 75
10845
ENTRY_PRIORITY 1,215 entries, broken down as follows:
ENTRY_PRIORITY
Critical 1 493
2 389
Low 3 333
1215
ENTRY_DATA 669 entries, number 22851 – 24950
4. KC1_static_defect_data.csv
1,001 Entries
This file contains defect data that remains constant throughout the life cycle of that defect. Priority and severity
values are listed in this file for project KC1 because these values do not change over the life of a defect.
DEFECT_ID - The unique numeric identifier of the defect.
SEVERITY - The severity of the defect.
PRIORITY - The priority of the defect.
CODE_REV - Whether or not the defect had a code review [yes(Y) or no (N)].
DESIGN_REV - Whether or not the defect had a design review [yes(Y) or no (N)].
CLOSE_REASON - The general reason for the closure of the error report.
COST - Relative cost for the fix.
DEFFERRED_ON - The date the error report was deferred.
EST_FIX_HOURS - The number of man hours it took to implement the fix.
EST_SLOC_COUNT - The number of sloc involved in the fix.
FIX_HOURS - The actual number of man hours the fix took to implement.
HOW_FOUND - The stage in which the defect was found.
MODE - The mode the system was operating in.
PROBLEM_TYPE - Specific reason for closure of error report.
SLOC_COUNT - The actual number of SLOC changed or added.
DEFECT_ID 1,001 unique identifiers, numbered 1661 – 2661
SEVERITY 1,001 entries, broken down as follows:
SEVERITY
High 1 84
2 670
3 226
4 15
Low 5 6
1001
PRIORITY 95 entries, broken down as follows:
PRIORITY
Critical 1 38
2 29
Low 3 28
95
CODE_REV All entries all blank
DESIGN_REV All entries all blank
CLOSE_REASON 997 entries, broken down as follows:
CLOSE_REASON
Duplicate 1
Fixed 798
Obsolete (OBE) 52
Reject 146
997
COST 401 entries, broken down as follows:
COST
HIGH 18
LOW 312
MEDIUM 71
401
DEFFERRED_ON 2 entries, both dated 02-Apr-1999
EST_FIX_HOURS All entries are blank
EST_SLOC_COUNT 2 entries, 0 and 20
FIX_HOURS 990 entries, from 0 – 1010, (average = 18.49)
HOW_FOUND 1,001 entries, broken down as follows:
HOW_FOUND
Acceptance Test 31
Analysis 85
Customer Use 196
Demo 2
Engineering Test 222
Inspection 96
Mission Critical 4
Mission Essential 1
Mission Success 1
Planned Test 270
Regression Test 79
Release_I&T 14
1001
MODE 958 entries, broken down as follows:
MODE
DEV01 3
DEV02 19
DEV03 1
DEV04 46
OPS 342
Other 332
SHARED 3
TS1 98
TS2 114
958
PROBLEM_TYPE 996 entries, broken down as follows:
PROBLEM_TYPE
builds 2
configuration 89
COTS/OS 14
design 39
documentation 8
hardware 3
no fix 1
not a bug 91
Prob w/o fix 1
procedure 2
process 3
scripts 2
source code 633
unknown 2
unreproducible 106
996
SLOC_COUNT 988 entries, average = 51.56
5. KC1_product_module_metrics.csv
2,107 Entries
This file contains all of the available metrics values and their associated products. These metrics are module
level.
MODULE - The unique numeric identifier of the product.
LOC_BLANK - The number of blank lines in a module.
BRANCH_COUNT - Branch count metrics.
LOC_CODE_AND_COMMENT - The number of lines which contain both code & comment in a module.
LOC_COMMENTS - The number of lines of comments in a module.
CYCLOMATIC_COMPLEXITY - The cyclomatic complexity of a module.
DESIGN_COMPLEXITY - The design complexity of a module.
ERROR_COUNT - The number of defects associated with a module.
ERROR_DENSITY - The number of defects per 1000 lines of code for a module
[ERROR_DENSITY = 1000*(ERROR_COUNT/LOC_TOTAL)].
ESSENTIAL_COMPLEXITY - The essential complexity of a module.
LOC_EXECUTABLE - The number of lines of executable code for a module (not blank or
comment)
HALSTEAD_CONTENT - The Halstead length content of a module.
HALSTEAD_DIFFICULTY - The Halstead difficulty metric of a module.
HALSTEAD_EFFORT - The Halstead effort metric of a module.
HALSTEAD_ERROR_EST - The Halstead error estimate metric of a module.
HALSTEAD_LENGTH - The Halstead length metric of a module.
HALSTEAD_LEVEL - The Halstead level metric of a module.
HALSTEAD_PROG_TIME - The Halstead programming time metric of a module.
HALSTEAD_VOLUME - The Halstead volume metric of a module.
NUM_OPERANDS - The number of operands contained in a module.
NUM_OPERATORS - The number of operators contained in a module.
NUM_UNIQUE_OPERANDS - The number of unique operands contained in a module.
NUM_UNIQUE_OPERATORS - The number of unique operators contained in a module.
ERROR_REPORT_IN_1_YR - Number of error reports in one year.
ERROR_REPORT_IN_6_MON - Number of error reports in six months.
ERROR_REPORT_IN_2_YRS - Number of error reports in two years.
LOC_TOTAL - The total number of lines for a given module.
MODULE 2,107 unique entries, with identifiers numbered between 22825 – 24951
LOC_BLANK 2,107 entries, from 0 – 58, (average = 1.76)
BRANCH_COUNT 2,107 entries, from 1 – 89, (average = 4.67)
LOC_CODE_AND_COMMENT 2,107 entries, from 0 – 12, (average = 0.13)
LOC_COMMENTS 2,107 entries, from 0 – 44, (average = 0.95)
CYCLOMATIC_COMPLEXITY 2,107 entries, from 1 – 45, (average = 2.84)
DESIGN_COMPLEXITY 2,107 entries, from 1 – 45, average = 2.55)
ERROR_COUNT 325 entries, from 1 – 7, (average = 1.62)
ERROR_DENSITY 271 entries, from 0 – 750, (average = 72.47)
ESSENTIAL_COMPLEXITY 2,107 entries, from 1 – 26 (average = 1.67)
LOC_EXECUTABLE 2,107 entries, from 0 – 262, (average = 14.54)
HALSTEAD_CONTENT 2,107 entries, from 0 – 193.06, (average = 21.26)
HALSTEAD_DIFFICULTY 2,107 entries, from 0 – 53.75, (average = 6.78)
HALSTEAD_EFFORT 2,107 entries, average = 5247.36
HALSTEAD_ERROR_EST 2,107 entries, from 0 – 2.64, (average = 0.09)
HALSTEAD_LENGTH 2,107 entries, from 0 – 1106, (average = 49.88)
HALSTEAD_LEVEL 2,107 entries, from 0 – 2, (average = 0.32)
HALSTEAD_PROG_TIME 2,107 entries, from 0 – 18044.64, (average = 291.52)
HALSTEAD_VOLUME 2,107 entries, from 0 – 7918.82, (average = 258.94)
NUM_OPERANDS 2,107 entries, from 0 – 428, (average = (18.80)
NUM_OPERATORS 2,107 entries, from 0 – 678, (average = 31.07)
NUM_UNIQUE_OPERANDS 2,107 entries, from 0 – 120, (average = 9.55)
NUM_UNIQUE_OPERATORS 2,107 entries, from 0 – 37, (average = 7.64)
ERROR_REPORT_IN_1_YR 184 entries, from 1 – 5, (average = 1.41)
ERROR_REPORT_IN_6_MON 271 entries, from 1 – 8, (average = 1.61)
ERROR_REPORT_IN_2_YRS 105 entries, from 1 – 4, (average = 1.32)
LOC_TOTAL 2,107 entries, from 1 – 288, (average = 20.39)
6. KC1_product_class_metrics.csv
145 Entries
This file contains all of the available metrics values and their associated products. These metrics are class level.
MODULE - The unique numeric identifier of the product.
PERCENT_PUB_DATA - The percentage of data that is public and protected data in a class.
ACCESS_TO_PUB_DATA - The amount of times that a class’s public and protected data is
accessed.
COUPLING_BETWEEN_OBJECTS - The number of distinct non-inheritance-related classes on which a
class depends.
DEPTH - Depth indicates at what level a class is located within its class
hierarchy.
LACK_OF_COHESION_OF_METHODS - For each data field in a class, the percentage of the methods in the
class using that data field; the percentages are averaged then subtracted from 100%.
NUM_OF_CHILDREN - The number of classes derived from a specified class.
DEP_ON_CHILD - Whether a class is dependent on a descendant.
FAN_IN - This is a count of calls by higher modules.
RESPONSE_FOR_CLASS - A count of methods implemented within a class plus the number of
methods accessible to an object class due to inheritance.
WEIGHTED_METHODS_PER_CLASS - A count of methods implemented within a class (rather than all
methods accessible within the class hierarchy).
MODULE 145 unique entries, with identifiers, numbered between 22323 –
22824
PERCENT_PUB_DATA 145 entries, from 0 – 100, (average = 14.40)
ACCESS_TO_PUB_DATA All entries are 0
COUPLING_BETWEEN_OBJECTS 145 entries, from 0 – 24, (average = 8.32)
DEPTH 145 entries, from 1 – 7, (average = 2.0)
LACK_OF_COHESION_OF_METHODS 145 entries, from 0 – 100, (average = 68.72)
NUM_OF_CHILDREN 145 entries, from 0 – 5, (average = 0.21)
DEP_ON_CHILD 145 entries, either 0 (143) or 1 (2)
FAN_IN 145 entries, broken down as follows:
FAN_IN
0 68
1 65
2 9
3 3
145
RESPONSE_FOR_CLASS 145 entries, from 0 – 222, (average = 34.38)
WEIGHTED_METHODS_PER_CLASS 145 entries, from 0 – 100, (average = 17.42