How to measure success

Reviews
Shared by: user002
Stats
views:
830
rating:
not rated
reviews:
0
posted:
2/5/2008
language:
English
pages:
0
TDT 4735 Project in software engineering. How to measure success? By: Anders Person and Knut Steinar Engene Subject supervisor: Maria Letizia Jaccheri Department of Computer and Information Science at the Faculty of Information Technology, Mathematics an Electrical Engineering, NTNU, Trondheim, Norway. Fall 2004 Abstract Since the term software engineering was established in the 1960's, a lot of the developed software has run into problems like missed deadlines and poor quality. Software development has often been guided by gut feelings and "expert knowledge". In other areas of science, empirical engineering has been a resource in product development, because it helps us understand how and why things work. Empirical software engineering enables the use of statistics, and can therefore back up its claims with significance and probability data. Without empirical engineering, we can't know what mechanisms that drive the costs and benefits of software tools. Unless we have this information, determining whether we are basing our actions on credible interpretations or faulty assumptions, is hard. In this paper, we have conducted an empirical experiment on the organization Gentoo. This is a voluntary organization that produces a distribution called Gentoo Linux. The organization went through a process improvement initiative and we want to find out whether or not this reorganization improved the Gentoo's efficiency. ii Acknowledgements We wish to thank the following people: Our supervisor Thomas Østerlie for doing a good job as a supervisor and keeping us on our toes. He has also provided us with articles and knowledge about Gentoo and the writing of the report. In addition we would like to thank Professor Tor Stålhane for his help with statistics and testing of the hypotheses. Jan Kjeran Kolsrud has also been a resource during the hypotheses-testing. _________________________ Anders Person _________________________ Knut Steinar Engene iii ABSTRACT ............................................................................................................................................................II ACKNOWLEDGEMENTS ..............................................................................................................................III FIGURE LIST.....................................................................................................................................................VII TABLE LIST.....................................................................................................................................................VIII 1 INTRODUCTION...............................................................................................................................................1 1.1 Motivation.................................................................................................................................................1 1.2 Project context.........................................................................................................................................1 1.3 Problem definition...................................................................................................................................2 1.4 Report outline...........................................................................................................................................3 2 PRESTUDY ..........................................................................................................................................................4 2.1 EMPIRICAL SOFTWARE ENGINEERING (ESE) ............................................................................................4 2.1.1 Why do empirical software engineering...........................................................................................4 2.1.2 What is ESE?.........................................................................................................................................4 2.1.3 Why experiment?..................................................................................................................................5 2.2 HOW TO PERFORM EMPIRICAL SOFTWARE ENGINEERING.......................................................................7 2.2.1 Introduction...........................................................................................................................................7 2.2.2 The evolution.........................................................................................................................................7 2.2.3 Alternative approaches.................................................................................................................... 10 2.2.4 Our approach..................................................................................................................................... 11 2.3 OPEN SOURCE SOFTWARE (OSS) ..............................................................................................................13 2.3.1 An introduction to OSS..................................................................................................................... 13 2.4 THE EVOLUTION OF LINUX AND GENTOO LINUX....................................................................................15 2.4.1 Linux.................................................................................................................................................... 15 2.4.2 A technical overview of Linux......................................................................................................... 16 2.4.3 Linux distributions............................................................................................................................ 16 2.4.4 Gentoo Linux...................................................................................................................................... 16 2.4.5 Gentoo Linux in detail...................................................................................................................... 17 2.4.6 Gentoo, organizational.................................................................................................................... 19 2.4.6.1 Gentoo community.........................................................................................................................19 2.4.6.2 The Herds project...........................................................................................................................21 2.4.6.3 Concurrent Versions Control .........................................................................................................21 2.5 REORGANIZATION OF GENTOO ......................................................................................................... 22 3.1 RESEARCH AGENDA.....................................................................................................................................23 3.2 FOCUS ............................................................................................................................................................23 3.3 QUESTIONS ...................................................................................................................................................23 3.4 A SSOCIATED RESEARCH METHOD /PROCESS .............................................................................................24 4 EXPERIMENT PLANNING ........................................................................................................................ 25 4.1 CONTEXT SELECTION ..................................................................................................................................25 4.2 HYPOTHESIS EXPLANATION........................................................................................................................26 4.4 VARIABLES SELECTION...............................................................................................................................29 4.4.1 Independent variables...................................................................................................................... 29 4.4.2 Dependent variables......................................................................................................................... 29 4.5 SELECTION OF SUBJECTS.............................................................................................................................29 4.6 EXPERIMENT DESIGN ...................................................................................................................................30 4.6.1 Randomization................................................................................................................................... 30 4.7 INSTRUMENTATION......................................................................................................................................30 4.7.1 Python scripting................................................................................................................................ 30 4.8 VALIDITY EVALUATION ..............................................................................................................................32 4.8.1 Conclusion validity........................................................................................................................... 33 4.8.2 Internal validity................................................................................................................................. 34 4.8.3 Construct validity.............................................................................................................................. 35 4.8.4 External validity:............................................................................................................................... 36 4.9 PRIORITY AMONG TYPES OF VALIDITY THREATS.....................................................................................37 iv 5 EXPERIMENT OPERATION..................................................................................................................... 38 5.1 INTRODUCTION.............................................................................................................................................38 5.2 EXPERIMENT PREPARATION .......................................................................................................................38 5.3 EXPERIMENT EXECUTION............................................................................................................................41 5.3.1 Data collection.................................................................................................................................. 41 5.3.2 Different methods.............................................................................................................................. 42 5.4 DATA VALIDATION ......................................................................................................................................42 5.4.1 Data source integrity........................................................................................................................ 42 5.4.2 Bugzilla ............................................................................................................................................... 42 5.4.3 Manual bug inspection..................................................................................................................... 43 5.4.4 The participants................................................................................................................................. 43 5.4.5 Information included in the collected data................................................................................... 43 5.4.6 Possible improvements..................................................................................................................... 43 6 ANALYSIS AND INTERPRETATION .................................................................................................... 45 6.1 DESCRIPTIVE STATISTICS............................................................................................................................45 6.1.1 Hypothesis 1....................................................................................................................................... 45 6.1.2 Hypothesis 2....................................................................................................................................... 49 6.1.2.1 Before the reorganization...............................................................................................................50 6.1.2.2 After the reorganization .................................................................................................................50 6.1.2.3 Total number of open bugs ............................................................................................................51 6.1.2.4 Plotting solved vs. new bugs per week ..........................................................................................51 6.1.3 Hypothesis 3....................................................................................................................................... 53 6.2 DATA SET REDUCTION.................................................................................................................................56 6.3 HYPOTHESIS TESTING..................................................................................................................................56 6.3.1 Hypothesis 1....................................................................................................................................... 56 6.3.1.1 t-Test ..............................................................................................................................................57 6.3.1.2 Result interpretation.......................................................................................................................57 6.3.1.3 Linear regression............................................................................................................................57 6.3.1.4 Result interpretation.......................................................................................................................58 6.3.1.5 t-Test II...........................................................................................................................................59 6.3.1.6 Brief summary................................................................................................................................60 6.3.2 Hypothesis 2....................................................................................................................................... 60 6.3.2.1 t-Test ..............................................................................................................................................60 6.3.2.2 Result interpretation.......................................................................................................................61 6.3.2.3 Brief summary................................................................................................................................61 6.3.3 Hypothesis 3....................................................................................................................................... 61 6.3.3.1 Linear regression............................................................................................................................62 6.3.3.2 Result interpretation.......................................................................................................................63 6.3.3.3 Brief summary................................................................................................................................63 7 EVALUATION AND DIS CUSSION OF RESULTS ............................................................................. 65 7.1 DISCUSSING THE HYPOTHESES...................................................................................................................65 7.2 EVALUATING QUESTION 1..........................................................................................................................67 7.3 THINGS WE COULD HAVE DONE DIFFERENTLY........................................................................................67 8 CONCLUSIONS AND FURTHER WORK ............................................................................................. 68 8.1 PROJECT EVALUATION ................................................................................................................................68 8.2 CONCLUSION ................................................................................................................................................69 8.3 FURTHER WORK ...........................................................................................................................................69 BIBLIOGRAPHY................................................................................................................................................ 71 ONLINE REFERENCES .........................................................................................................................................73 ATTACHMENT A .............................................................................................................................................. 75 ATTACHMENT B .............................................................................................................................................. 76 ATTACHMENT C.............................................................................................................................................. 83 ATTACHMENT D.............................................................................................................................................. 84 ATTACHMENT E .............................................................................................................................................. 90 v vi Figure list FIGURE 1: O VERVIEW OF VALIDATION METHODS [ZELKOWITZ & W ALLACE , 1998]......................................6 FIGURE 2: HIGH -LEVEL STEPS OF GQM/MEDEA [L. C. BRIAND ET AL, 2002] ..............................................9 FIGURE 3: O VERVIEW OF THE EXPERIMENT PROCESS [W OHLIN ET AL, 2000]................................................12 FIGURE 4: LINUX ARCHITECTURE [LINUX, KLINGAUF].....................................................................................15 FIGURE 5: GENTOO USER SURVEY [GWN 08.11.2004] .....................................................................................17 FIGURE 6: DATA FLOW DIAGRAM, GENTOO ........................................................................................................19 FIGURE 7: THE GENTOO HERDS PROJECT ............................................................................................................21 FIGURE 8: EXPERIMENT PLANNING.......................................................................................................................25 FIGURE 9: ILLUSTRATION OF INDEP ENDENT AND DEPENDENT VARIABLES.....................................................29 FIGURE 10: RESULTS FROM AN EARLY VERSION OF THE MODIFIED BUGZILLA SCRIPT .................................31 FIGURE 11: EXPERIMENT PRINCIPLES [W OHLIN 2000]......................................................................................32 FIGURE 12: EXPERIMENT OPERATION [W OHLIN ET AL , 2000] ..........................................................................38 FIGURE 13: BUG ACTIVITY LOG [BUGZILLA].......................................................................................................39 FIGURE 14: “MOVES, A DDS AND CHANGES” FROM GWN PUBLISHED 30TH JUNE 2003................................40 FIGURE 15: A NALYSIS AND INTERPRETATION .....................................................................................................45 FIGURE 16: DIAGRAM THAT ILLUSTRATES THE HANDLING TIME FOR EACH OF THE INSPECTED BUGS .......46 FIGURE 17: A VERAGE HANDLING TIME ................................................................................................................47 FIGURE 18: COMPARISON OF THE BUG HANDLING TIME....................................................................................48 FIGURE 19: THE DIAGRAM PLOTS THE AVERAGE HANDLING TIME PER BUG ON A WEEKLY BASIS...............48 FIGURE 20: THE PICTURE SHOWS THE DEVELOPMENT IN REPORTED AND CLOSED BUGS ON A WEEKLY BASIS...............................................................................................................................................................49 FIGURE 21: THE PICTURE SHOWS THE NUMBER OF CLOSED BUGS DIVIDED ON THE NUMBER OF NEW BUGS. .........................................................................................................................................................................50 FIGURE 22: NUMBER OF OPEN BUGS EACH WEEK ..............................................................................................51 FIGURE 23: COMPARISON OF SOLVED VS NEW BUGS PER WEEK .......................................................................52 FIGURE 24 THE EVOLUTION OF DEVELOPERS FROM 01.012003 .......................................................................53 FIGURE 25: NEW BUGS PER DEVELOPER PER WEEK............................................................................................54 FIGURE 26: SOLVED BUGS PER DEVELOPER PER WEEK ......................................................................................55 FIGURE 27: THE DIAGRAM PLOTS THE AVERAGE HANDLING TIME PER BUG ON A WEEKLY BASIS...............55 FIGURE 28: COMPARING LINEAR REGRESSION ....................................................................................................59 FIGURE 29: HANDLING TIME PLOT ........................................................................................................................62 FIGURE 30: RESIDUAL PLOT ...................................................................................................................................63 vii Table list TABLE 1: GANTT CHART (SMALL VERSION).........................................................................................................24 TABLE 2: SUGGESTED HYPOTHESES......................................................................................................................27 TABLE 3: CONCLUSION VALIDITY.........................................................................................................................33 TABLE 4: INTERNAL VALIDITY..............................................................................................................................34 TABLE 5: CONSTRUCT VALIDITY...........................................................................................................................35 TABLE 6: EXTERNAL VALIDITY .............................................................................................................................36 TABLE 8: SCALE FOR CATEGORIZING THE BUG HANDLING T IME......................................................................39 TABLE 9: FINAL SCALE FOR CATEGORIZING THE BUG HANDLING TIME...........................................................40 TABLE 10: SHOWS THE DISTRIBUTION OF THE BUGS WITHIN EACH CATEGORY .............................................47 TABLE 11: T -TEST : TWO-SAMPLE A SSUMING UNEQUAL VARIANCES............................................................57 TABLE 12: OUTPUT OF THE LINE RE GRESSION TEST PRIOR REORGANIZATION ...............................................58 TABLE 13: OUTPUT OF THE LINE RE GRESSION TEST AFTER REORGANIZATION ..............................................58 TABLE 14: T -TEST : TWO-SAMPLE A SSUMING UNEQUAL VARIANCES............................................................61 TABLE 15: SUMMARY OUTPUT ..............................................................................................................................62 viii 1 Introduction 1.1 Motivation In today’s software development society, efficiency, thoroughness and the constant need for improvement are just a few of several crucial factors to success. The competition is tough and many organisations are struggling to gain market shares and keep themselves alive. In the commercial part of the software development community, a lot of techniques and proposals for cost efficiency and process improvement have become available. This work is mainly based on empirical data collected from budgets and financial statements. Therefore, measuring any gained success is fairly easy. In non-commercial open source software development, the measuring of improvement initiatives gets a bit harder. As in the commercial organisations, there is a need for streamlined workflow and organisation architecture to obtain a good result. However, some open source communities have a virtual organizational structure. This means that the participants in open source software projects rarely meet physically and almost all communication takes place on the Internet. In addition, the participants don’t receive any salary for their contributions. The organizations rely on voluntary work, therefore it is difficult to control the participants and the development-progress. As mentioned above some obstacles appear when trying to measure success in these projects, because organization structure and techniques from the commercial world can't be adopted without adjustment. There is a large number of OSS projects. At Sourceforge, the largest repository for open source applications, more than 91,000 projects are registered [Sourceforge]. This indicates that open source software is here to stay. 1.2 Project context The project description for this project was given by the Department of Computer and Information Science at the Norwegian University of Science and Technology. The project is part of the 9th semester at the masters program and it had to be completed within 13 weeks. We were not given any economic aids to the project. We were assigned semiprivate booths with reserved computers. We have worked in a two-man team and have been appointed a supervisor. It has also been possible to consult our teachers. We did not have much experience with OSS, Linux or empirical software engineering when starting on this project. The project goal was to "…determine the outcome of a real software process improvement initiative in an open source project ". It was to be completed "By using state-of-the-art methodology within empirical software engineering,…" and the outcome was to "…determine whether or not this improvement initiative is a success or a failure." [Project assignment, M. L. Jaccheri]. We approached this challenge by first studying empirical software engineering, open source software and the Gentoo Linux project. After gaining this knowledge we performed an empirical experiment where we tried to determine Gentoo Linux's efficiency before and after the reorganization. This was done by creating a question that was supposed to be answered by the outcome of the testing of three hypotheses. 1 1.3 Problem definition The project title is "How to measure success?”. It refers to the reorganization done by Gentoo Linux, and to what extent it was a successful initiative [GLEP 4, D. Robbins]. To do this we had to determine appropriate ways of measuring the success of the reorganization. After this reorganization, questions were raised. Did this OSS project really benefit from the reorganization? Can voluntary virtual organization do such a radical change and still come out on top? This project emphasized on the empirical part of software engineering. Therefore the main task was to conduct an empirical experiment. Then the report was written where the results of the experiment were discussed. We have also given an introduction to open source software, empirical software engineering and Gentoo Linux. 2 1.4 Report outline. We have used a report template given to us by the Department of Computer and Information at NTNU as a basis, then modified it to fit our project. The rest of the report is organized in the following parts: 2. Prestudy Introduction to empirical software engineering, open source software, Linux in general and Gentoo Linux. Problem statement Presents an elaboration of the project, its challenges and our agenda. Experiment planning Here the project is defined. The hypotheses are discussed, as is variables, instrumentation and validity. Experiment operation This section presents the experiment preparation, its execution and data validation. Analysis and interpretation Descriptive statistics are used on the hypotheses, data set reduction is briefly mentioned and the hypotheses are tested. Evaluation and discussion of results Evaluates the theoretical and practical work. Our view on the project and the work process is described. Conclusion and further work The project is briefly summarized and we reach a conclusion. Suggestions to further work are presented. 3. 4. 5. 6. 7. 8. 3 2 Prestudy The aim of this prestudy is to learn about empirical software engineering, open source software and Gentoo Linux. This was necessary for us to understand the scope of the project. The following chapter is a summary of the articles, books and web pages we have read. It gives an introduction to some of the most important areas of our research. 2.1 Empirical Software Engineering (ESE) In this section we will present some of the work done in the field of empirical software engineering. The sources we have used are mainly from the syllabus at the section for empirical software engineering at our university [Syllabus]. In addition we have used one of the textbooks from the course “Software Quality and Empirical Work” [Tdt25], that both the authors attend. 2.1.1 Why do empirical software engineering For a product to evolve, it needs testing and experimenting. By doing empirical experiments and analyze historical data, one might be able to make claims about improvements in future projects. Software is not an exception, and the small amount of software experimentation might hinder its development [Basili et al., 1986]. Empiric software engineering is a good way of doing experiments because it backs up its claims with statistics. There are several different ways of doing empirical research; survey, case study and experiment. Why do many development-projects generate less-than-desirable products? Many of the approaches are chosen on gut feelings, expert opinions and poor research [Fenton et al., 1994]. Fenton et al. claims that quantitative data and well-designed experimental research should be used to substantiate any claims made for new or changed practices. Observing, making theories and experimenting is a formula that has been successful for other sciences like medicine [Kitchenham et al., 2002]. Kitchenham et al believe its time for software engineering to embrace this practice. 2.1.2 What is ESE? Empirical software engineering can be defined as collecting data, doing statistical research, and then use the results in order to reject or not reject a hypothesis. However empirical engineering is not a complete science with set standards. It is still being developed, and there are several suggested templates that compete with each other in order to become a standard [Kitchenham et al. 2002, Basili et al., 1986]. Empirical software engineering is still in its early development and in order for it to mature, it might be useful to perform empirical experiments en masse. This could create trends and indicate what works and what doesn't. Software engineering is not like manufacturing; its technologies are human based. It is hard to build models and verify them with 4 experiments. Reasons for this are the many variables, environments and the evolving technologies [Basili, 1996]. 2.1.3 Why experiment? Experiments can be used to test theories and to explore. Experimentation can help creating a base of knowledge about the software in the experiment. This helps determine what theories, tools and methods are adequate. By experimenting, new, useful and unexpected insights may be learned. Whole new areas of investigation can be revealed. Tichy [1998] claims that in areas where engineering progress is slow, experimentation can push through. Experimenting can quickly eliminate fruitless approaches and erroneous assumptions, thereby accelerating progress. It can also orient engineering and theory in promising directions. The experimenting process in itself can also produce results and knowledge both in the area being experimented in, and the techniques used [Tichy, 1998]. Evidence of new software being superior to old is not often provided. Statements like: "Productivity gains of 250%" and "Time to market reduced by half" might seem very tempting, but are often not backed up with statistical data. This makes it difficult to differentiate va lid claims from invalid ones [Fenton et al., 1994]. Making such claims and being able to back them up with empirical data can give an advantage in the business market. From a business perspective, it is necessary to develop products and processes that can help creating quality systems productively and profitably, e.g., estimate the cost of a project, track its progress and evaluate the quality of a product [Basili, 1996]. These models of process and products should be tailored based upon the data collected within the organization and should be able to continually evolve based upon the organizations evolving experiences [Fenton et al., 1994]. However, empirical experimenting does not come for free. Experiments and data gathering need resources and manpower that could be used elsewhere. There are also direct expenses like equipment and training. Many managers also dread the fact that experiments may need a long time before they start creating profits compared to other types of development. This is especially important in software engineering as technology pushes business and borders extremely fast. As the picture below [Zelkowitz & Wallace, 1998] shows, few experiments are done. Most of the papers either have no experimentation or they use assertion to validate their claims. This looks grim for the software industry as it seems to lure itself by posting all sorts of claims without being able to prove them. However there are several positive trends. The percentage of papers with no experimentation has almost halved from 1985 to 1995. Papers based on assertions have also decreased in the same time period. Also the number of papers validated with case studies and lessons learned have risen. Actually, almost all the validation methods have been used to a greater extent in 1995 then in 1985. 5 Figure 1: Overview of validation methods [Zelkowitz & Wallace, 1998] 6 2.2 How to perform Empirical Software Engineering The following section will describe some of the methods and techniques used to execute empirical software engineering. 2.2.1 Introduction In our preparations for this section of the report, we have studied work from different contributors in the software engineering research community. We have noticed that different theories and proposals are suggested. However the authors seem to agree upon one thing, and that is the need for further work and emphasising on the empirical part of software engineering [Basili 1996, Kitchenham et al., 2002]. One of the main challenges is to create a credible empirical discipline for software engineering with satisfying guidelines for the research and reporting processes. It is claimed that empirical studies in software engineering research have not had the same success as in other parts of modern science[D. Perry et al., 2000]. This is widely discussed in different articles, and possible reasons are presented. N. Fenton et al. [1994] claims that software engineering research got off to a bad start. They characterises many of the publicised articles as “analytical advocacy research” with poor experiment and statistical design. Victor Basili [1996] mentions the differences between software engineering and other fields like physics, medicine and manufacturing, where empirical research is widespread. These differences could be the reason for the lack of success in software engineering. Basili also suggests that the distinctive characteristics of software projects often makes it hard to compare different studies. As software engineering doesn’t have long traditions in the empirical research world, parts of the research community have glanced at other spheres to get ideas for their work. This has resulted in both guidelines and templates for designing, conducting and evaluating empirical studies. One of the first articles that emphasized on the need for experimentation in software engineering was released by Basili, Selby & Hutchens in 1986 [Basili et al., 1986]. This article includes both a framework for analysing and designing experimental work performed in software engineering, and recommendations for performing future experiments. The framework presented consists of four categories; definition, planning, operation and interpretation, each corresponding to phases of the experimentation process. This article has been the inspiration and source for a lot of the research in the software engineering area. 2.2.2 The evolution As mentioned above, a lot of the work in the software engineering research community has aimed at developing guidelines and templates for the empirical research. The tendency from the past was software engineering driven by technology development and advocacy research. This is not acceptable in the long run if control of the software development is desired. To gain this control, the ability to evaluate new methods and techniques before using them is necessary [Wohlin et al., 2000]. This can be achieved by performing empirical studies like surveys, experiments and case studies, and then turn software engineering into a science. 7 This issue is covered by Fenton et al. [1994] where the authors present some suggestions to improve software engineering research practices. They emphasise the importance of claims based on valid evidence. To achieve this, the authors state that: Five questions should be (but rarely are) asked about any claim arising from software engineering research: • • • • • Is it based on empirical evaluation and data? Was the experiment designed correctly? Is it based on a toy or a real situation? Were the measurements used appropriate to the goals of the experiment? Was the experiment run for a long enough time? [N. Fenton et al., 1994, p. 87] Further, the authors are examining each question in detail and presenting examples from real projects to illustrate the consequences when these questions are ignored. According to the article, evaluative research must involve realistic projects with realistic subjects. The proposed hypotheses have to be tested against satisfactory data. This is a timeconsuming and expensive task, but it is a necessity for any valid empirical analysis. In addition to having satisfactory data, i.e. enough and valid data, the design of the experiment itself has to be correct. One way of avoiding this threat is to use appropriate guidelines and to gain experience by carrying out several empirical experiments. When it comes to the question regarding toy versus real situation, the authors call attention to the cost of accomplishing a large-scale study. The cost and time constraints are often the reasons why a lot of software engineering researchers choose to conduct an experiment based on artificial problems in artificial situations. This generates a new problem. The results from toy studies can not unconditionally be scaled up to larger and more realistic situations. But this kind of experiments is not valueless, even if the results they present are not conclusive. They can indicate directions for further investigation, meaning that it is often better to perform a small-scale experiment than none at all. When defining the measurements used in an experiment, Fenton et al. [1994] emphasize the importance of measuring the correct attributes. If this is not done appropriately, wrong conclusions might be reached. In addition, the choice of scales is crucial. If the combination of scale and statistical technique is wrong, then the researcher is in deep water. To support this assertion, a study performed at IBM where the relationship between faults and failure in software is presented. This study claims that focusing on faults instead of failures can be fatal. L. C. Briand et al, [2002] take a closer look on measurement definition. They point out that the principles and methods in software measurements is currently being defined and consolidated. They also claim that few of the measurements presented in publications, are actually used in the industry. This is due to several problems, which the authors point out. The article includes a proposal for defining measures that will be appropriate in the software engineering, but the authors do not expect to find any generally valid quantitative laws. This is regarded as an ideal, long term research goal. The measure definition process proposed in this article is based on the Goal/Question/Metric (GQM) paradigm with some extensions. The authors have named their proposal GQM/MEDEA (GQM/MEtric DEfinition Approach), which is an exhaustive process. The high- level structure of GQM/MEDEA can be summarised in four steps; setting of the empirical study, definition of measure for the independent attributes, 8 definition of measure for the dependent attributes and hypothesis refinement and verification. Figure 2 shows this high- level structure. Figure 2: High-level steps of GQM/MEDEA [L. C. Briand et al ., 2002] 1. Setting of empirical study The first step is setting of the empirical study which is done in two main tasks. As the authors indicates: “The definition of the measurement goals and empirical hypotheses are the fundamental phases since all the other steps in our approach are affected by them”[ L. C. Briand et al., 2002, p. 1111]. Based on the knowledge regarding the corporate objectives, development environment and available resources, measurement goals are defined. It is essential that the corporate objectives are prioritized to increase the probability of receiving adequate support. This information is merged together with information about the specific environment, and results in tactical goals. These goals are more specific than the corporate objectives, and are the foundation of the definition of the measurement goals together with information regarding resources. The process of defining the measurement goals is quite exhaustive and the authors approach it by using GQM. The template that is suggested, includes five goal dimensions that are meant to help the researcher in the task. These goal dimensions are: object of study, purpose, quality focus, viewpoint and environment. Each dimension is guidance in the determination of the measurement goals. According to the article, the authors claim that "A hypothesis captures one’s own intuitive understanding of the studied phenomena and needs to be explicit so it can be discussed, questioned, and refined"[L. C. Briand et al., 2002, p. 1114]. 9 The process of defining the hypotheses isn’t covered in detail by the authors, but some guidelines are given. In the definition of the empirical hypotheses, the authors emphasises the use of terms of measures. They also define just one hypothesis per issue. This results in hypotheses that differ from the hypotheses defined in the statistical test. A statistical test of hypotheses requires both a null hypothesis and an alternative hypothesis. In addition there are the statistical hypotheses defined in terms of measures. The empirical hypotheses are later in the process refined when the measured are defined. 2/3. Definition of measures for independent / dependent attributes After completion of the first step, definition of measures for independent attributes is next. This phase uses the hypotheses from the previous phase, in addit ion to the process and product information to come up with the measures needed. For all the attributes of each of the entities appearing in the empirical hypotheses, appropriate measure must be identified. This is an exhaustive process which includes formalising independent attributes and identifying abstractions for measuring independent attributes. It also instantiates and refines properties for measures of independent attributes. When the measures are defined, the phase is wrapped up with a validation of the measures. The definition of measures for dependent attributes follows an identical path, and is often a bit easier as the dependent attributes usually are more tangible. 4. Hypothesis refinement and verification The last step in the measure definition process is the hypothesis refinement and verification. After the definition of the measures for the dependent and independent attributes, some refining of the original empirical hypothesis might be necessary. This will hopefully result in more precise hypotheses that are consistent with the initial ones. When this is done, the data gathering is the next task. It is crucial that the data collected is consistent with the defined measures, and that adequate information is gathered to carry out the empirical validation. Another issue mentioned above with importance for the research, is the duration. If the study isn’t carried on long enough, the wrong results may appear. By violating this requirement, the researcher might interpret the data incorrect and reject wrong hypotheses. This can for example be the result if he/she credits some initiative as the cause of an alteration in the data, while it actually is a tendency prior the start of the data gathering. 2.2.3 Alternative approaches As mention above, many articles have focused on the difficulties with performing empirical studies in software engineering and tried to reveal the causes for this. Perry, Porter and Votta [D. E. Perry et al., 2000] have another point of view in this matter. They claim that the main problem is the gap between the studies performed, and the goals that these studies try to achieve. To deal with this problem, better design and more credible interpretations must be present, they continue. The structure of an empirical study should according to the authors include the following components: • • • • • research context hypotheses experimental design threats to validity data analysis and presentation 10 • results and conclusions In addition to use this structure, the authors claim that the most important thing a researcher can do is to ask insightful questions. They also point out that the quality of many computer science experiments could be improved by involving others with qualifications and experience. This particular issue is covered in other articles as well. A comprehensive effort made by a group of software engineering researcher and statisticians is presented in the article “Preliminary Guidelines for Empirical Research in Software Engineering” [Kitchenham et al., 2002]. The authors have based their work on publications in the medical and psychological sphere, and tried to merge this with their own experiences from software engineering. They examine six basic topic areas and present a set of do’s and don’ts. The areas are: • • • • • • Experimental context Experimental design Conduct of the experiment and data collection Analysis Presentation of results Interpretation of results In this examination the authors have come up with a set of guidelines on to perform future empirical research in software engineering. But the authors stress that the guidelines alone will not improve the relevance and usefulness of empirical software engineering research. 2.2.4 Our approach Much work has been done by the empirical software engineering research community. Different approaches have been made and a widespread set of solutions have been proposed. We chose to use “Experimentation in software engineering, an introduction” [Wohlin et al., 2000] which is the textbook in one of our classes, as a template. This book was released a few years ago, and many of the articles we have reviewed above have been used as source and inspiration during the compilation of the textbook. It is also convenient that this book is addressing experimentation in software engineering in particular. The experiment process can be divided into five main activities and is illustrated in figure 3 below. There is no requirement that an activity has to be finished prior to the next one in the model, but the order indicates the starting order. The five main activities in Wohlin’s experiment process are quite similar to the basic topic areas that Kitchenham et al. presented in their article [Kitchenham et al., 2002]. We believe that this is a tendency in the community and the contributors seem to agree upon many of the aspects in empirical software engineering. 11 Figure 3: Overview of the experiment process [Wohlin et al., 2000] We will not detail the main activities in this section, but briefly describe some of them in chapters 4, 5 & 6. 12 2.3 Open Source Software (OSS) The following paragraph briefly describes open source software and some of its attributes. We continue by comparing OSS with standard commercial software. Then we briefly discuss communication in OSS communities. 2.3.1 An introduction to OSS "The basic idea behind open source is very simple: When programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. People improve it, people adapt it, and people fix bugs. And this can happen at a speed that, if one is used to the slow pace of conventional software development, seems astonishing" [Opensource]. Official open source definition [Definition]: 1. Free Redistribution 2. Source Code 3. Derived Works 4. Integrity of The Author's Source Code 5. No Discrimination Against Persons or Groups 6. No Discrimination Against Fields of Endeavor 7. Distribution of License 8. License Must Not Be Specific to a Product 9. License Must Not Restrict Other Software *10. License Must Be Technology-Neutral OSS has gained increased support in the last years [Opensource]. In the beginning it might only have been an alternative for the computer elite and gurus, but recent focus on usability and support has made it a serious competitor to traditional commercial software. The way we interpret things, a popular argument against OSS has been the lack of support for both private households and companies. This has dominated the fact that the software itself is free. Therefore it has long been assumed that the costs and risks of maintaining OSS has outweighed the benefits of free downloads. To counter this, commercial companies like RedHat Linux offer installation CD's, manuals and even personal support [RedHat]. This might have been the final push that convinced companies to try out Linux and open source software. Unlike the slow start of open source operating systems, smaller programs like open source FTP servers have flourished in the market for a long time. The Apache Web Server totally dominates the market, with a market share of almost 70 %, and this number is even increasing! [Netcraft] Although the Apache project differs from many other OSS projects by defining the development process before the actual development began, it can be seen as an sign of the OSS invasion in the commercial market. OSS products often approach users in a very different way compared to commercial software. OSS applications are freely available for downloading on the internet. This 13 means that anyone, anywhere, at any time, can download the software for free, and legally employ it for home or corporate use. Traditionally software had to be purchased, and then the customer received a link to download from, a CD in the mail or a serial to enter into the trial version. This was very logical because the companies that produced the software had the same goal as any other company, to enrich its shareholders. That leads us to the next paragraph; what motivates the OSS developers? Most of them obviously don’t make money on it, at least not directly. There are many different answers to this question, ranging from bazaar gift exchange [Raymond 2001], basic communist philosophies [Glass, 2004] to private-collective activities [Hippel and von Krogh, 2003]. The latter of these suggested answers, argues that programmers both contribute to the public good, and simultaneously obtain private benefits in terms of learning, enjoyment and solutions to their own technical issues. A win- win situation. We believe that perhaps there are other rewards that might not have been investigated adequately. These people spend a lot of time on chat networks like IRC; they have deep relationships and long-term friendships with other people on IRC. The fact that a person is known as a skilled developer gives respect in these peoples online lives. They have administrator rights on huge channels and this means ultimate power over everyone else, hundreds, perhaps thousands of people. This might be compared to a man who is a father at home, but in his parallel work life, he owns a large company, and rules thousands of people’s lives. Of course a developer might have less power than his mundane comparison but in principle, there might be similarities. 14 2.4 The evolution of Linux and Gentoo Linux In this section we will give a brief introduction to Linux operative systems in general but focus on Gentoo Linux. The section contains some of the historical background for the evolution of Linux, and a description of the distinct characteristics of Gentoo. The goal is to give the readers some information regarding the context of the empirical experiment performed later in the report. 2.4.1 Linux Linux is an open-source implementation of the UNIX operating system and was initially created by Linus Torvalds while he was studying at the University of Helsinki in Finland. Torvalds was interested in creating an operative system exceeding the standards of a small OS called MINIX which is very similar to the powerful, interactive timesharing OS UNIX. [Hyperdictionary]. After working with his project for some months, Torvalds made an announcement on Usenet to get feedback on his work. The response was overwhelming, and in September 1991 version 0.01 of Linux was released [Linus]. This was the start on a major open source project which resulted in an operative system with all the expected features like virtual memory, shared libraries, TCP/IP networking and true multitasking. Linux was originally developed for the Intel 80386 microprocessor, but much of the platform-dependent code was later moved into platform-specific modules. Today Linux has gained middleware-like capabilities and support for a widespread of different hardware architectures. This layer architecture is shown below in figure 4. In addition to the fact that it is freely distributed, Linux’s functionality and adaptability are some of the reasons that it has become probably the most popular UNIX-like OS in the world. Figure 4: Linux architecture [Linux, Klingauf] 15 2.4.2 A technical overview of Linux As figure 4 shows, the Linux kernel lies between the hardware and the software applications. The kernel is built up by many sub-elements and includes device driver support, processor and memory management features and support for many different types of file systems [UNIX, W. Knottenbelt] A large group of developers are constantly improving the kernel and adding new features. Periodically this group releases new stable versions of the Linux kernel and users can download these versions from servers all over the world. 2.4.3 Linux distributions A Linux distribution is a complete Linux system. It includes, in addition to the Linux kernel, a selectio n of packages bound around it. These packages give Linux a set of compilers, libraries, utilities and other features resulting in a full- scale useful operating system. There exists a huge amount of different distributions, all with their own features and optimizations for different tasks/hardware. There are both commercial and noncommercial distributions on the market, and RedHat, Debian, Mandrake and Gentoo Linux are just a few examples. To be able to interact with the system, some sort of interface mus t be present. Linux supports two different sorts of command input; textual command line shells and graphical user interfaces (GUIs). There has been a change in the composition of the user group of Linux, from the early days and until today. In the beginning being a highly skilled computer user was almost a requirement to start using the system. Nowadays user with different levels of skills wish to use Linux, and this has resulted in more focus on userfriendliness and graphical environments. Many of the dis tributions (a version of Linux) on the market have therefore integrated a great deal of graphical user interfaces. The graphical environment can roughly be separated into two parts; the window manager and the desktop manager. The window manager controls the layout of the windows on the screen, while the desktop manager uses these windows to arrange menu bars, file managers and so on. Gnome and KDE are two of the most popular desktop managers on the market. The textual command line shell is still often used to connect remotely to a Linux server. 2.4.4 Gentoo Linux The development of Gentoo Linux was initiated by Chief Architect Daniel Robbins in 2000. Robbins started the work because he didn't like the functionality that the other Linux distributions offered. The most fundamental issue for Gentoo; "is designing a technology that allows us and others to do what they want to do, without restriction" [Philosophy, D. Robbins ]. On Linux Online’s web site, Gentoo Linux has been given the following description: “Gentoo Linux is designed for the developer, power user and enthusiast. It incorporates the latest sources and technologies (such as ReiserFS and the Portage system).” [Linux Online] Today Gentoo Linux has about 1.0 % market share of Linux distributions, and is also the fastest growing GNU/Linux distribution in terms of users [Market share]. The system is available for free over the Internet, and the install file is about 650 Mb. Potential users 16 can download a Gentoo LiveCD which is a bootable CD that allows him/her to boot Linux from it. This software detects the user’s hardware and loads the appropriate drivers during the boot process. The Gentoo community releases a newsletter every week, and in the edition from the 8th of November 2004 a user survey was presented. This survey had gathered data from more than 9000 users, and was the first ever done. The figure below shows the results from the question: “What was the most important factor for you when choosing Gentoo?” As the pie chart exposes, is the package repository and the availability to customize the distribution the main reason for the lion’s share of the users. Figure 5: Gentoo user survey [GWN 08.11.2004] 2.4.5 Gentoo Linux in detail There are several attributes that distinguish Gentoo Linux from the other Linux distributions available. In the article “Gentoo Linux: The next generation of Linux” [Thiruvathukal, 2004] the author points out some of the features that give Gentoo Linux a competitive advantage compared to other distros. Thiruvathukal especially mentions Gentoo Linux’s use of metadata. This is not unique among the available distros, but Gentoo Linux takes it to another level. Gentoo Linux’s use of metadata gives the user information regarding what version of a package is installed, ho w that package was built, and whether a newer version is available. Thiruvathukal also mentions that the entire operating system is maintained from source code and that the user only needs to install it once. This is because of the available upgrades that are distributed continuous in the Portage system. Modifiability has high priority in Gentoo Linux, and one of the other features that distinguish Gentoo Linux from the other Linux distributions available, is the Portage technology. This technology enable s the user to build the entire system from source code using his/her choice of optimisation, and Gentoo is therefore called a meta-distribution. Portage is a package management system which performs different tasks like software distribution, package building and installation, and keeping the users system up-to-date [Gentoo Portage ]. This is done to simplify many of the obstacles that the users face with 17 open source software. Take the software distribution as an example; the only thing the users have to do is to type a simple command to get the latest version of the system. As mentioned above, Portage also includes an installer. This feature ensures customisation of the software and optimizing it to the respective user’s hardware. As a result of the features that the Portage technology offers, the people behind Gentoo Linux hope that their system will cover the needs of the users. The basis of the portage system is the ebuild scripts. This is the format of the packages stored in the portage system, and these scripts contain all the information required to download, unpack, compile and install a set of functions. The ebuilds also contain information on how to perform any optional pre/post install/removal or configuration steps. By downloading software code before compiling it, Gentoo achieves both advantages and disadvantages. The system potentially executes faster, as the applications only have to support the current system and not be compatible with all other systems. The downside to this is that the compiling takes time, often about two days for a Gentoo installation. This makes Gentoo very powerful but it might need better hardware than other Linux distributions. 18 2.4.6 Gentoo, organizational Gentoo Linux is an open source software project. The structure of Gentoo differs from the traditional organizational structure in the commercial world of software development. We will try to expose some of these points of distinction in the following sections. Figure 6 shows a overview of the data flow in Gentoo. Figure 6: Data flow diagram, Gentoo 2.4.6.1 Gentoo community Since the start in 2000, the Gentoo development community has grown to a group of more than 250 developers [Developer list]. The title of Gentoo ‘developer’ is restricted, and a person can only address himself/herself with this title after being adopted by the Gentoo community. This process can be initiated in different ways. One way of getting approved by Gentoo and become a developer, is to contribute by fixing bugs and submitting ebuilds and thereby be recommended. It also happens that Gentoo is in urgent 19 need of people with certain skills, and announce this in their weekly newsletters [GWN 01.11.2004]. People can then apply, and candidates satisfying the requirements are adopted. When a person is adopted, he or she will then be evaluated for some time before an approval. During this period the new developers will be given a mentor that’s responsible for guidance, assistance and some evaluation. To manage all the processes involved in the adoption and locating of new developers, Gentoo has established a developer recruiter's project. The members of this project have the final word in the selection of new developers. Their decision is based on feedback from the mentors, and the results of a test the candidates have to pass [Monteiro et al., 2004], [Gentoo recruiters]. In August 2003 a project called Gentoo BugDay was organised by one of the developers, Brian Jackson. The motivation behind this event was to take a vigorous pull to close as many bugs as possible, but also to create a context where the users and developers could get to know each other. The participants worked together in an online chat channel on irc.freenode.net, testing, discussing and fixing bugs. It also says in Gentoo Weekly Newsletter that: “…we may even have scouted a few candidates for future developers” [GWN 04.08.2003]. So it seems that this also is a gateway to be adopted as a developer. In addition to the developers, a large number of other people contribute to the development and maintenance of Gentoo Linux by reporting bugs and submitting proposals for solving problems. This is one of the advantages with open source software development. A lot of the work for the developers involves writing ebuilds and maintaining them. This is a challenging task and since Gentoo is OSS, even more obstacles arise if the developers don’t take their share of the workload or in the worst-case, become inactive. 20 2.4.6.2 The Herds project In the Gentoo Linux development structure, a sub-project called the Gentoo herds project was introduced to gain better control of the ebuilds. This project aims to ensure that ebuilds are organised in groups that have maintainers, and that all ebuilds get maintainers assigned. Each herd is a collection of closely related ebuilds which a number of maintainers are given the responsibility to maintain. The maintainers are people that contribute in the development, and they’re often assigned to maintain parts of the system that they have written themselves. Gentoo 1..n Developer 0..n 0..n Ebuild Herd Bug-report 1 1..n 0..n 1 Figure 7: The Gentoo Herds Project Since Gentoo Linux is OSS, and therefore a volunteer-driven distribution, high-quality documentation is vital. This is to ensure that interested users can easily get the information they need to be able to contribute in the further development. To satisfy this requirement, all the documentation is gathered in one place and users have the opportunity to report bugs or send proposals to a bugtracking system. A project called the Gentoo Documentation Project handles all these reports. 2.4.6.3 Concurrent Versions Control Another part of the Gentoo system that is crucial in the OSS development is the Concurrent Versions System (CVS). This is a client/server system designed to keep track on changes made by different users on the same files. This allows multiple developers to work on the same source-code at the same time, and prevents that work can get lost [CVS]. Using the tool allows developers situated around the world to store their work in a central repository, and a complete history of the evolution of the system is created. CVS uses the Revision Control System (RCS) that was designed by Walter Tichy [RCS]. This is a software tool for the UNIX system. It allows an individual developer to maintain control over a certain item such as a source file, while he/she implements and tests it. Gentoo Linux Enhancement Proposals (GLEPs) are a particular type of text files that are maintained under the CVS control. A GLEP is according to Gentoo: “a design document providing information to the Gentoo Linux community, or describing a new feature for 21 Gentoo Linux. The GLEP should provide a concise technical specification of the feature and rationale for the feature.” This means that GLEP is the media where information regarding higher architectural subjects is distributed. The structure of the GLEP is quite rigid and the following criteria are stated at the GLEP website: “For a GLEP to be approved it must meet certain minimum criteria. It must be a clear and complete description of the proposed enhancement. The enhancement must represent a net improvement. The proposed implementation, if applicable, must be solid and must not complicate the distribution unduly. Finally, a proposed enhancement must satisfy the philosophy of Gentoo Linux.” [GLEP] 2.5 Reorganization of Gentoo The 24th of June 2003, Daniel Robbins, the chief architect of Gentoo Linux, posted a proposal for a new top- level management structure on GLEP [GLEP 4]. In this proposal, Robbins points out some issues regarding the difficulties to track the status of projects in Gentoo. Robbins describes the current situation as: “…we have no clearly defined toplevel management structure, and no official, regular meetings to communicate status updates between developers serving in critical roles.” He also mentions the problem with not having clearly-defined roles and scopes of executive decision- making authority for top-level developers. This situation results according to Robbins, in: “no one knows what is going on, and everyone defers to the Chief Architect for all executive decisions.” To deal with these problems, Robbins suggests some changes to Gentoo. Firstly he wants to alter the organizational structure of Gentoo by introducing an official top- level management structure. In this management group the chief architect and a chosen group of developers will be members. The developers will be given the title of “Top- level managers” and be responsible for communicating the status of their projects to the rest of the management group. The exchange of status reports will take place in fixed, weekly meetings. In addition, clearly defined areas of responsibility regarding the daily operations, will be created. The 30th of June, GWN announced that Gentoo adopts a new management structure for the Gentoo Linux Project [GWN 30.06.2003]. By doing this adoption, Gentoo hoped that: “…users will notice benefits as well through improved speed of delivery, increased quality control and other tangible benefits”. The final outcome of Daniel Robbins proposal and the motivation behind the reorganization can be found in detail in the Gentoo documentation. [Gentoo Management] 22 3 Problem statement 3.1 Research agenda Our research agenda is to find out whether or not the reorganization of Gentoo Linux resulted in a more efficient organization. If it did, such a reorganization could be performed to improve other open source projects. If it didn't, this could be a warning for anyone thinking about doing such a reorganization. A measure of to what extent this reorganization would be a success, can also help creating data on the costs and profits of doing a reorganization in an open source environment. We wanted to perform an empirical experiment in order to verify our data statistically. 3.2 Focus The focus of this project is to do an empirical study on the reorganization in Gentoo Linux. Doing the experiment with empirical engineering enables us to use known statistical templates and tests. This helps us verify to what extent our assumptions are correct. The fact that Gentoo Linux is studied, derives from the PhD done by Thomas Østerlie [Østerlie]. He also works with Gentoo and he is assigned as our supervisor. 3.3 Questions We have stated one question that we want to answer in this project: Q1: Did the reorganization in Gentoo Linux improve the efficiency of the organization? We thought it would be difficult to answer this question directly and therefore constructed three hypotheses that we wanted to answer, and then draw our conclusions to answer the question. These are listed and discussed in chapter 4. 23 3.4 Associated research method/process As stated previously, we have been assigned the empirical research method. Our work process is indicated in the Gantt chart belo w. Gantt diagram v0.3 Week Task: Phase 1: Reading / prestudies. Phase 2: Definition Phase 3: Planning Phase 4: Operation Phase 5: Analysis & interpretation Phase 6: Presentation & package Phase 7: Completion & refinement Last update: 09.09.2004 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Table 1: Gantt chart (small version) The diagram has been followed very well, although we had to make some modifications as we discovered that some tasks needed more or less work than assumed. This is a scaled-down version of our Gantt chart. The full version can be found in attachment A. 24 4 Experiment planning After defining the experiment the planning takes place. The aim of the planning process is to decide how the experiment is to be conducted. This is an important part of the process and a prerequisite in order to be able to control the experiment. Figure 8: Experiment planning An online project is when the investigation is executed in the field under normal conditions [Wohlin et al., 2000]. This is the opposite of an offline project where the research might be done after the experiment is complete or in a laboratory. Real-time data is a term we use when we describe data that is collected as soon as it is generated. 4.1 Context selection We do not classify the project as on- line, as we are not in the process ourselves. We monitor a process that has already happened, and the few real-time data we gather are done in parallel with the process. According to Wohlin et al. [2000] this reduces any risks. We do not believe that we have much control on the process. We simply observe what people do and have done. The project does not evaluate a special group of people like students or professionals. The subjects come from a variety of backgrounds. We do not know the age of the subjects. The project addresses a real problem, not a toy problem as we research a real organization. We don’t think the project can easily be generalized to 25 the general software domain. This is because there are many differences in OSS organizations and commercial companies. For example the fact that the "employees" in OSS organizations don’t get paid by the company. However the project might be valid to OSS projects and OSS influenced companies. In our case, several issues in the Gentoo Linux organization are compared before and after it was reorganized. As the old method of doing things ended when the new began, it is hardly possible to use the old methods after the reorganization. 4.2 Hypothesis explanation When our project was advertised by NTNU it had a working title "How to measure success ?". On the first briefing we were informed that the scope should be the Gentoo Linux community and the reorganization that was made in June 2003. Below, all the hypotheses proposals are listed with some attributes. An explanation of why we have used them in the report or discarded them is stated in attachment B. The hypotheses were created with a brainstorming process. After a brief introduction in ESE, OSS and Gentoo, we wrote down all possible hypotheses that we could think of. Then all the hypothesis were discussed, evaluated and rated. We had some fundamental questions about the reorganization in the Gentoo community that was used as a basis for the chosen hypotheses. We wanted to know what effects this reorganization had on the massive Gentoo community. What was the motivation behind it and did it fulfill its goals? Based on these questions we brainstormed the following hypotheses: Suggested hypotheses 1: Reorganizing has improved the efficiency of bug handling. 2: Gentoo Linux will continue to exist in the following years. 3: Reorganizing lead to a higher release cycles. 4: Reorganizing lead to improved communication 5: Number of developers increased as a result of the reorganization. 6: Number of users and market share increased after the reorganization. 7: Reorganization fulfilled its goals to a reasonable extent. 8: New roles and scopes have simp lified decisionmaking. 9: "The Cabal" and secret mail lists are negative for the OSS community. 10: Meeting deadlines has improved after the Rate * Good Source Bugzilla Candidate Yes Poor Mail lists, market share, interviews Inspect releases, cvs, mail lists Forums, interviews, newsletters Developer records, mail lists, GWN Independent sites with objective data. Community feedback, forums, mail lists, bugzilla Developer experience, discussion, meeting logs, forums, interviews Fork documents, forums, official statements, meeting logs Bugzilla, forums, dev No Good Poor Good Yes No Yes Medium No Bad No Medium No Horrible No Medium No 26 reorganization. 11: The number of bugs is threatening the future of Gentoo. 12: The increase in the total number of unsolved bugs is a threat to Gentoo Linux. 13: The increase of unsolved bugs will eventually kill the Gentoo Linux Project. 14: In x years the number of unsolved bugs will leave Gentoo Linux as a non-competitive distro. 15: The increase of unsolved bugs dosen’t threatens Gentoo Linux. 16: The reorganizing lead to a decrease in the average time used to solve bugs. 17: Reorganization led to a greater share of solved bugs compared to new bugs. Poor discussions Bugzilla No Poor Bugzilla, articles No Poor Bugzilla No Bad Bugzilla No Difficult Bugzilla, mail, forums No Good Bugzilla, forums, interviews Yes Good Bugzilla Yes Table 2: Suggested hypotheses * More info on the hypothesis suggestions and rating in attachment B. 27 4.3 Hypothesis formulation After the brainstorming, all the hypothesis proposals were debated and evaluated. We tried to find out whether or not the hypotheses were suited for effective data gathering. They were also evaluated on how interesting they were, and what conclusions that might be drawn from them. Then we selected a few that we believed were the best ones and tried to refine them. We wanted the cause to lead to the effect, and therefore some of them were changed. For example in hypothesis 5 "Number of developers increased as a result of the reorganization” it seemed difficult to determine that the increase in developers was actually caused by the reorganization. The super-hypothesis was changed to a question that the hypotheses were supposed to answer. The question will not be evaluated directly as a definite measurement of effect is outside the scope of this project. However we wish to evaluate the results from the hypotheses, and then draw our conclusions. Some of the hypotheses were instantly discarded when they were first discussed, as they were little more than caffeine- fueled digressions that surfaced during the brain-storming. Others were systematically discarded as we discovered that they were too hard to back up with data. Especially the Bugzilla bug system varied in usability, as it proved great for some data extraction, but other data was hard to extract. We do believe that the data in question is there, we just don’t have time to develop the tools to get it. The hypothesis we refer to is " Growing number of users report increasingly many bugs, result in more work for developers (who might not increase in numbers at the same ratio).", it requires the Python script to first list all bugs, then enter a individual site for each bug, get some data, then enter another site and interpret a table. Having said all that, we also wanted a collection of hypothesis that was real and not trivial. We believe that the ones we have chosen are meaningful and can be generalized to similar projects/companies. Question 1: Q1: Did the reorganization in Gentoo Linux improve the efficiency of the organization? Hypothesis 1: H1.0: The reorganizing did not lead to a decrease in the average time used to solve bugs. H1.1: The reorganizing lead to a decrease in the average time used to solve bugs. Hypothesis 2: H2.0: Reorganization did not lead to a greater share of solved bugs compared to new bugs per week. H2.1: Reorganization led to a greater share of solved bugs compared to new bugs per week. Hypothesis 3: H3.0: Number of developers has no influence on the average time needed to solve bugs on a weekly basis. H3.1: Number of developers has an influence on the average time needed to solve bugs on a weekly basis. 28 4.4 Variables selection There are two kinds of variables in an experiment, independent and dependent. The figure below illustrates the information flow in the experiment. Below, the figure variables are explained and justified. Figure 9: Illustration of independent and dependent variables. 4.4.1 Independent variables "An independent variable is a variable in a process that is manipulated and controlled" [Wohlin 2000, p. 33]. We chose "Organization structure" as our independent variable. The project compares data from before and after the reorganization. The structure of the organization should have an effect on the dependent variables because the entire aim of the reorganization was to improve the efficiency of the organization. 4.4.2 Dependent variables The variables that we want to study to see the effect of the changes in the independent variables, are called dependent variables. The effect of the treatments is measured in our dependent variable: "Efficiency". When we talk about efficiency we monitor several aspects of the organization, i.e. time used to solve bugs and Gentoo release cycles. The efficiency is not an exact value and is therefore measured indirectly by looking at different processes. 4.5 Selection of subjects The automated data gatherings will use the entire bug-reporting community as subjects. Therefore we cannot see any difficulties generalizing this. The data we collect manually will have a far lesser sample pool. However by examining all subjects within a given time period, it would be possible to generalize this (not flawlessly, though) as the selection gives a partially representative view of the population. The low number of samples will enlarge the errors if generalized. 29 The manual data gathering can be called systematic sampling, as we choose a period of time to sample from, and then every n:th period. 4.6 Experiment design To achieve a better understanding about the experiment we’re about to perform, a clear definition must be developed. The type of statistical analyses that we’re applying later in the project depends among other factors on the chosen design. It is therefore an important task to describe the experiment as good as possible and define a design. When defining the experiment design, the basis is the number of factors and treatments included. A factor is the combination of one or more independent variables that affects the dependent variables. A treatment is one particular value of the factors in the experiment. In our experiment the factor is the organization structure and the treatments are the new and the old structure. Based on this we can determines that our experiment has the “one factor with two treatments” design. 4.6.1 Randomization The automatic data gathering did not employ any randomization as it gathered all the available data. When we collected data manually we let the users randomize it for us, and then took the two first bugs reported each day. This might indicate that we only get bugs reported by people that are awake at the beginning of each day, but the different time zones should remedy this. 4.7 Instrumentation Instrumentation is done to provide means for performing the experiment, and to monitor it without affecting the control of the experiment. In our case we needed data that had to be collected in different ways. All the data were available from Bugzilla or GWN, but they had to be accessed differently. We realized that manual data gathering for some of the cases would be to time-consuming and not be feasible. As a result we decided to search for other ways of gathering the data automatically. 4.7.1 Python scripting At first we chose not to spend any time learning Python and how to program datagathering scripts. Tue, 28 Sep 2004 we wrote a mail to the GWN editor Ulrich Plate, asking about what options to select in Bugzilla to recreate the bug-data in the newsletters. This mail can be seen in attachment C. He responded by giving us the Python script used by the GWN staff to generate bug data. Aided by our project guide Thomas Østerlie we were able to at least to some extent, modify the script to collect the data we wanted. This was not at all planned, but when we received the script we just started to fiddle with it and managed to make it work. We modified it so that it extracted the total number of new bugs in a given period of time (week) and closed bugs in the same time period. 30 Figure 10: Results from an early version of the modified Bugzilla script. We also tried to count the total number of currently open bugs on a weekly basis. As shown in the figure 10, the "Total open bug reports" between the Bugzilla birth-date and a given date, the number of returned bugs is far to low. The last part: "Original total opened" counts the total number of open bugs up to today. This is the query we tried to modify, but we couldn’t get the numbers to match. Initially we thought that this was caused by the fact that the script was run on random hours of the day, therefore if the script was run early, there would still tick in bugs until 24.00. These bugs would not be counted in the numbers stated in GWN. However when we on later dates tried to recreate the GWN data, the query would find all the bugs from the whole week INCLUDING the ones GWN didn’t have. Later during this process, as we discussed our problems with Thomas, we discovered that it might not be possible to get the bugs where status was changed, because it did not look for status changes during the chosen time period, it checked for CURRENT status on the bugs found in the past(the specified date). This would mean that it wasn’t possible to find these data unless the query was done in real-time the actual week in question. More specifically the exact time the GWN crew ran the script. If this was correct it meant that the script wasn't able to do this search. Then we would have to do this manually. Given the fact that we are able to find this data checking the numbers for a year or two should be manageable. Our correspondence with the GWN crew has left us with the impression that having these data accurate, isn’t a big priority. They don’t seem to mind if there are reported 7045 unsolved bugs or 7050, and they can’t really be blamed. On the contrary we think it is sporty to publicly announce Gentoo's inability to solve bugs fast enough, and frankly a little odd, we wouldn’t exactly call it good advertising when on a weekly basis, GWN reports the growing number of unsolved bugs. 31 4.8 Validity evaluation One important issue appears during the experiment planning, and that is the validity evaluation of the results. This task has to be done during the planning phase to ensure valid experiment results. Without the validity evaluation, one might end up with results that are not valid for the population from which the sample is drawn. In the past, different types of threats to the validity of an experiment have been suggested. In “Experiment in software engineering, An introduction” [Wohlin et al., 2000] four types of threats are presented. These threats are mapped to different steps of the experiment, this is shown in figure 11 below. Figure 11: Experiment principles [Wohlin 2000] The figure presents the two areas of an experiment; the theory and the observation area. In the theory area, the hypotheses that we want to test are defined based on data from the observation area. This will hopefully make it possible to draw some conclusions. The process of drawing these conclusions are divided into four steps which are shown below as the numbers from 1-4. In each of these steps, one type of threat to the validity of the result, is present. According to Wohlin et al. the threats are:" 1. Conclusion validity. This validity is concerned with the relationship between the treatment and the outcome. We want to make sure that there is a statistical relationship, i.e. with a given significance. 2. Internal validity. If a relationship is observed between the treatment and the outcome, we must make sure that it is a causal relationship, and that it is not a result of which we have no control or have not measured. In other words that the treatment causes the outcome. 3. Construct validity. This validity is concerned with the relation between theory and observation. If the relationship between cause and effect is causal, we must 32 ensure two things: 1) that the treatment reflects the construct of the cause well (see left part of the figure) and 2) that the outcome reflects the construct of the effect well (see right part of the figure). 4. External validity. The external validity is concerned with generalisation. If there is a causal relationship between the construct of the cause, and the effect, can the result of the study be generalized outside the scope of our study? Is there a relation between the treatment and the outcome?" [Wohlin et al., 2000, p. 63-64] In the following part of this section we will present a list of threats to the validity of the experiment. In addition, every threat is evaluated to determine if it might cause any problems in our experiment. The marking used in the tables are as follows: +: Threats that we believe will not be of any significance /: Threats that might have an effect, but with low probability -: Threats that could affect the result, with significant probability n/a: Threats which are not applicable for our experiment 4.8.1 Conclusion validity Low statistical power Violated assumption of statistical tests Fishing and the error rate Reliability of measures Reliability of treatment implementation Random irrelevancies in experimental setting Random heterogeneity of subjects Table 3: Conclusion validity / / + + + + + • Low statistical power The statistical power can be expressed as: Power = P(reject H0 | H0 false) = 1 – P(type-II-error) Based on the design of our experiment determined above, will we most likely perform a t-Test. This test gives us the ability to determine the confidence in our statements, and thereby ensure high statistical power. However we should be aware of this threat. • Violated assumption of statistical tests Our datasets will most likely be quite large, so any requirements regarding normal distribution should not be an issue. Other requirements might be violated, so we should be aware of this threat to some extent. • Fishing and the error rate Since the persons performing the experiment (i.e. Person, Engene) do not have any connections to the organisation investigated in this project, the probability of fishing for a specific result is low. As long as the confidence intervals of our tests are quite rigid, the threat from the error rate should not be extensive. • Reliability of measures In this experiment, the data is based on number of bugs and number of developers beginning/leaving the organisation for a given period of time. This is objective and direct measures of attributes, and therefore increases the reliability. • 33 Reliability of treatment implementation The treatment that we are applying in our experiment i.e. the structure of the organisation is quite simple and should not lead to any differences in the implementations. • Random irrelevancies in experimental setting As we are using historical data in our experiment, it is hard to determine if there were any elements outside the normal setting that made an impact on the result. But the data is collected from a wide time-period, and any minor irrelevancies should not influence the result in a way that will lead to the wrong conclus ions. • Random heterogeneity of subjects The subjects that take part in the experiment are the developers/maintainers in Gentoo and the users of the distribution. These subjects are chosen by randomisation and should not pose any threats. 4.8.2 Internal validity History Maturation Testing Instrumentation Statistical regression Selection Mortality Ambiguity about direction of causal influence Interactions with selection Diffusion of imitation of treatments Compensatory equalization of treatments Compensatory rivalry Resentful demoralization Table 4: Internal validity + / + + + + / / n/a n/a n/a n/a n/a • History The data in this experiment is collected from a wide period of time, and potential influence of the history will neutralize each other and not affect the final results. • Maturation The experiment collects data from a long period of time, it is possible that some of the developers/maintainers will get bored or loose motivation of performing the bug-fixing. • Testing This experiment is based on historical data. There is no danger that the subjects know about the test and therefore perform differently. • Instrumentation The data is collected quantitatively by using queries in Bugzilla, so the experiment should not be effected negatively by bad designed instrumentation. • Statistical regression In our experiment all subjects involved are one big group, and their participation is included completely. This should prevent the influence of regression that might be a problem when the subjects are classified into experimental groups. • Selection 34 We have included all the persons involved in the bug reporting/fixing in the Gentoo community, and therefore will the effect of selection i.e. that the selected group is not representative the whole population, not be present. • Mortality As this experiment collects data from a long period of time, we believe that some of the initial developers and users in general have left the Gentoo community. These people might have had a higher motivation for contributing to the community than the new users/developers have. Another issue is the additional developers and users who have joined the community continually. All this might have an effect on the historical data that can influence the experimental results. • Ambiguity about direction of causal influence There might be other factors than the reorganisation that affect the outcome of our experiment. This could violate the validity of our statements. As an example it might be hard to prove that an effect is caused by the reorganisation and nothing else. • Interactions with selection As our experiment doesn’t involve multiple groups, the threat due to different behaviours in different groups is not present. • Diffusion or imitation of treatments There exist no control groups in this experiment, and the possible threats connected to diffusion or imitations of treatments are not present. In this experiment the whole population is included, and it is the same group that is evaluated before and after the reorganisation. • Compensatory equalization of treatments See Diffusion or imitation of treatments above. • Compensatory rivalry See Diffusion or imitation of treatments above. • Resentful demoralization See Diffusion or imitation of treatments above. 4.8.3 Construct validity Inadequate preoperational explication of constructs Mono-operation bias Mono- method bias Confounding constructs and levels of constructs Interaction of different treatments Interaction of testing and treatment Restricted generalizability across constructs Hypothesis guessing Evaluation apprehension Experimenter expectancies Table 5: Construct validity + + + / / + n/a + + + • Inadequate preoperational explication of constructs In the selection of the hypotheses made above, we tried to separate the ambiguous and inadequate hypotheses from the rest. As a result, the hypotheses that are chosen are well formulated and the threats avoided. 35 • Mono-operation bias The whole Gentoo community is included in the experiment, therefo re any possible threats regarding mono-operation bias are avoided. This is also the case with the objects, as every bug from the start of Bugzilla is inspected. • Mono-method bias By measuring bugs, release cycle and the pool of developers, we involve different types of measures and observations that can be cross-checked against each other. This results in an avoidance of the risks tied to mono- method bias. • Confounding constructs and levels of constructs As we’re not detailing all the aspects regarding the process of bug handling, some factors like developer experience aren’t measured. This could influence the result, but hopefully the randomisation of subjects will even this out. • Interaction of different treatments The subjects involved in our experiment might be involved other OSS-projects as well. It is therefore possible that this has an influence on our results. • Interaction of testing and treatment The subjects involved in the experiment don’t know that they are participating in an experiment, and the data was collected after the reorganisation. • Restricted generalizability across constructs There is always the possibility that the reorganization did have some negative effects but that is outside the projects scope. • Hypothesis guessing People don’t know that they are part of an experiment and therefore they will not base their behaviour on our hypothesis. • Evaluation apprehension Again the subjects don’t know about our experiment and will not fear our evaluation and results. • Experimenter expectancies Subjects’ unawareness of the experiment prevents them from biasing the results. 4.8.4 External validity: Interaction of selection and treatment Interaction of setting and treatment Interaction of history and treatment Table 6: External validity + • Interaction of selection and treatment We don’t know if the subjects included in the study is representative for other commercial and non-commercial organisations. • Interaction of setting and treatment This is not a toy problem; we use the same tools that all the developers use. Bugzilla is an OS project that is used by a number of OSS projects. However, Gentoo Linux is an open source project. Generalizing the result to industrial practice in both commercial and noncommercial projects, might be difficult. • Interaction of history and treatment The experiment is run during a long period of time, therefore the data should be representative! 36 4.9 Priority among types of validity threats In this project, some of the scope was given to us in advance. This relates to the choice of organisation and process improvement initiative being studied. The aim of this project is not to generalize our result to industrial practice, but to complete an experiment with emphasize on conclusion, internal and construct validity. As a result, the external validity has suffered. 37 5 Experiment operation In this chapter we will detail the operational phase of the experiment. This section documents the task of carrying out the experiment in accordance to the design defined in the previous chapter. The aim is to give the reader adequate information regarding our execution of the experiment and validation of the data collected. Figure 12: Experiment operation [Wohlin et al., 2000] 5.1 Introduction Even a perfectly designed experiment can go seriously wrong if the operational phase is conducted with lack of accuracy. As figure 12 shows, the operational phase of an experiment consist of three steps: preparation, execution and data validation. Each of these steps will be described in detail in the following section, except the preparation step. As the only participants in this experiment are the authors of this report (according to Wohlin et al.), some of the aspects in the preparation will not be applicable. This applies to the challenges regarding inducements, deception and obtaining consent from the participants [Wohlin, 2000]. 5.2 Experiment preparation During this phase, the last preparations prior the execution of the experiment were accomplished. The first hypothesis that we wanted to collect data to, was hypothesis 2. As mentioned in the previous section, we used a script to collect this data. So the only preparation we did in addition to the alteration of the script was to determine the scope of the data collection. As it didn’t cost us any extra effort to include data from all the weeks Gentoo have been using Bugzilla, we decided to do this. By including all this data we also hoped that any tendencies would become even more distinct. 38 The preparations for hypothesis 1 was a bit more extensive. The first issue that we looked into was the scope of the experiment. As the goal of the data gathering was to test our hypotheses and see which one of them we could reject, any data exposing possible influence on the reorganisation would be of interest. Based on this we decided to include data gathered from the 1 st of January 2003 until the 31st of July 2004. This data would hopefully give us a basis of comparison for the efficiency of bug handling before and after the reorganisation. The reason why we intended to include data more than a year after the reorganisation, was to see if there had been an instant alteration in the bug handling time that faded away in time, or if the possible alteration still exists. The next thing we had to decide was in which format we should collect the data. As Bugzilla provides a table for every bug called Bug activity, as shown below, we had to come up with some sort of categorizing of the bug handling time. Figure 13: Bug activity log [Bugzilla] After some discussion, a scale with range 1-5 was drawn up. This scale is shown below. By using this scale we hoped that the analysis of the data would be simplified. We also believed that a higher grade of granularity would just cause us more work and not give us any valuable increase in the accuracy, since we only were trying to expose tendencies in bug handling time. Bug handling time Less then 1 day Less then 3 days Less then 7 days Less then 1 month More then a month Category 1 2 3 4 5 Table 8: Scale for categorizing the bug handling time When the period of time and the granularity was defined, we had to decide the quantity of the sample. As we had no idea how long time it would take to gather the data, a test gathering was carried out. The test indicated that collecting one bug per day for 10 days would take one person approximately 7 minutes. We were not sure if we should collect one or two bugs per day at this time. So some rough estimates were made to help us in the decision making. The estimates concluded that it would take about 6 hours of effective work for one person to collect one bug per day, and 12 hours to collect two bugs. After some discussion we decided to include two bugs per day in the analysis. This decision was based on the fact that 2 bugs would give us a more solid basis for the statistical test. We also decided to do a minor alteration of our scale for bug handling time after the test. It seemed that about 50 percent of the bugs fell into category 5, so we added one more category and ended up with the scale shown below. 39 Bug handling time Less than 1 day Less than 3 days Less than 7 days Less than 1 month Less than 3 months More than 3 month Category 1 2 3 4 5 6 Table 9: Final scale for categorizing the bug handling time As every bug in Bugzilla is given a priority from 1 to 5, we decided to include this when we collected the data. Our motivation for this was to see if there where any significant difference in the bug handling time between bugs with high and low priority. The last hypothesis we made preparations for was hypothesis 3. To test this hypothesis we needed some data that would state the evolution of the amount of developers in the Gentoo community. In addition, information about the amount of time used to solve a bug, was needed. To get the first data we decided to use Gentoo Weekly Newsletter and its section “Moves, Adds and Changes”. The reason for this was that we noticed that it would be difficult to find this info using the script witho ut major alterations. As we’re not particularly experienced in writing scripts, this alteration would be too difficult for us. Every week GWN presents previous week’s changes in the developer list. An example of this information is shown in the figure below. To gather this information we manually went trough all the released newsletters and registered the adds and moves. Figure 14: “Moves, Adds and Changes” from GWN published 30th June 2003 40 All the tasks during the preparation phase were accomplished while both Anders and Knut Steinar were present. We believed that this would reduce the risks of misunderstandings regarding the procedures for the data gathering, and illuminate as many aspects and challenges with the following tasks as possible. 5.3 Experiment execution After creating the hypotheses, they were refined to depend on data that was gatherable within the scope of this project. Q1 did not directly need any distinct data as it would be answered based on the conclusions from the hypotheses. 5.3.1 Data collection In the following section, the data collection for each hypothesis is discussed. Hypothesis 1 The hypotheses were much more specific in their demands compared to Q1. To answer the first one we needed to find out how much time that was used to solve individual bugs. This information could be found in Bugzilla, however it required quite a few clicks and scrolling to find out the time used on each bug. We decided to do it manually, this is discussed in a previous section. To ensure sufficient validity and accuracy in the data pool, quite a few bugs had to be examined. When discussing how many bugs we wanted to analyze, we considered several aspects. We did not want to use too much time on this single hypothesis because we felt time was running short. Yet we needed quite a few to make the data representative and generalizable. The process of deciding the research pattern and guidelines is discussed above in the experiment preparation section. The result was that we chose to gather 2 bugs every day from 01-01-2003 to 18-07-2004. In order to catch up, we decided to sacrifice a weekend. In the end we had examined 1130 bugs. At first we also inspected bugs reported in August, but then we found out that none of these bugs could be category 6 (more than 3 months old) as it obviously was less than 3 months from August to October. Therefore we chose to not include any bugs found later than 18-07-2004. Hypothesis 2 Data for the second hypothesis was found with a python script. This script was accidentally obtained as one of the staff in GWN sent it to us when we asked him some questions. This script queried Bugzilla and returned the results. We modified this script to collect the number of new and closed bugs every week from 04-01-2002 to the current date. Then it wrote the data to a excel file. The fact that we could use a script to gather this data let us use all the data available, and not only a selection. Hypothesis 3 In order to come to a conclusion on this hypothesis we calculated how many developers there were in a given week. Then we compared this number with the average time used to solve bugs on a weekly basis. The only place we found data about the number of developers was GWN. This did restrict the scope of the experiment as its duration would be from the date GWN started and until fall 2004. The bug-solving data was collected for the first hypothesis and only required us to calculate the weekly average bug-solving time. The data from GWN was gathered manually. It was done by downloading all the GWN editions, reading it and copying developer data to an excel worksheet. 41 5.3.2 Different methods The first hypothesis was not very suited for automatic data gathering. Or perhaps it was, but not for someone with our lack of Python experience. Each bug required several clicks, scrolling and then a table had to be interpreted. We did not believe that we could create such a script given the time limits that were upon us. Gathering the data manually was quite troublesome, we spent about 12 hours staring at the screen and copying data from the screen to our worksheet. Initially we wanted to inspect 500 bugs before the reorganization and 500 after. We chose to check 1000 bugs because it was quite a lot of bugs and although the task of manually inspecting them bordered to madness, we felt that the task was manageable. Manual data gathering is very open for errors. This along with the small pool of inspected bugs is probably the greatest threats to validity. However the second hypothesis enabled us to gather all the data automatically, this was of great help as we did not have to select some weeks to extract samples from. In 60 minutes the script inspected and saved all the bugs from the day Bugzilla was implemented to the current date. We did use a couple of days editing the script but the results where worth it. When a script does the data gathering it doesn’t err providing it has been configured right, but if it’s not properly configured it will only create bad data. 5.4 Data validation This chapter deals with the degrees of validity on the different data sources. The purpose of checking the validation is to ensure that the data is reasonable and that it has been collected correctly. 5.4.1 Data source integrity During the project we have collected several types of data; number of developers, open bugs, closed bugs and time needed to solve bugs. We believe that the integrity of the data is good as it has come from GWN and Bugzilla. Although both sources are vulnerable to human errors, we believe that GWN take pride in not misleading the audience and developers. The weekly numbers of open and closed bugs are published in the GWN. We could not solely base our data gathering on these numbers as GWN didn’t start until 2312-2002. Therefore we used the script to ga ther data for all of 2003 and up to the current date in 2004. To verify that the script was correct, we compared the published results from GWN to the data that our script produced. The numbers were identical from the start of GWN until the current date; therefore we presume that the script also has correct data from the period before GWN. 5.4.2 Bugzilla Bugzilla is a database with a web interface where Gentoo users can report bugs. We had to assume that the bugs reported were valid and did not have the time nor the knowledge to test this ourselves. But the fact that developers use Bugzilla as a tool supports our assumption. Still it is likely that these manual bug reports contain errors like incomplete descriptions of bugs and misunderstandings. There is also a big probability that many identical bugs are reported multiple times by different users. To prevent this there are guidelines that are supposed to help users describe the bugs in a correct manner. Developers then manually compare bugs and close duplicates. When developers solve bugs they report how much time is used, this data we collected manually. We had to 42 assume that the hours and days reported were correct, we had no way of querying the developers about time usage on bugs fixed months or years ago. 5.4.3 Manual bug inspection Our own manual research is also vulnerable to errors. The manual data gathering was very repetitive and frankly, quite boring. The fact that we completed the gathering during a weekend and in long sessions, lowered morale and enthusiasm. Therefore we were more susceptible to making errors as we were tired. However the procedure of datagathering itself didn’t give much room for errors. We increased the dates by 1 day, pressed search, chose a bug, scrolled to a link and clicked it, and then read the contents of the time table. The chance of reading the same bug twice is very small as its link would have changed color if it was inspected earlier. The manual interpretation of the time-table was another possible compromise to the validity. We calculated the difference between the start and end dates and gave the bug a number from our scale. The calculation was done in our heads and although it is fairly easy mathematics, it is still a possible error source. The GWN inspection fo r the third hypothesis did not leave much room for errors. The GWN pages were loaded and the developer data copied and pasted into an excel sheet. If there are any errors in the data they might come from the GWN editors or the tool they use to gather their data. The calculations on the average time needed to solve bugs were done in excel and generalized for each week. So if the first calculation was wrong, then all of them are. But we have double- and triple-checked the calculation and haven't found any errors. 5.4.4 The participants In this case the participants are the people submitting the bugs, and there is no easy way of knowing how well they understand the Bugzilla interface. But the fact that Bugzilla is doing well as a development tool suggests that most users know how to use it reasonably well. The seriousness of the participants cannot be checked either, but we have found no indications of false reports, and feel forced to assume that most of the bugs are real. 5.4.5 Information included in the collected data In the experiment we did not include information regarding who did the bug solving. In other words, we did not check if most of the bugs were solved by a specific group of developers or other contributors. This might have had an effect on our conclusions. The number of developers might not be the deciding factor, when looking at the rate that bugs are solved. There is a possibility that developers are included in the lists long after they become inactive. As a result, we can't be certain if the number of developers is a problem regarding the efficiency of Gentoo. 5.4.6 Possible improvements In retrospect there are some issues we would have done differently, when gathering data manually we didn’t really have any clear rules on which bugs to count and how to interpret them. This might have caused some deviation because Knut Steinar sometimes used the search day as start date for bug treatment while Anders exclusively used data from the table. As Knut Steinar gathered data both before and after the reorganization we believe that eventual deviances will equal each other out. However this method slightly 43 differs from Anders’ research in 2004. We don’t think it will have a large impact because we look for tendencies and don’t use the data directly. This was probably only an issue for 10% of the bugs and the given scale number did not often deviate much. The scale we made was created in about 10 minutes and evaluated in 5, although it seems to work well it could have been more thought-through. If possible the manual bug- gathering would be done on a time when one of the project members hadn't broken his right arm. 44 6 Analysis and interpretation In order to draw valid conclusions we must interpret the experiment data. The interpretations have been carried out as shown in the figure below. The aim of this chapter is to present the hypotheses and whether or not they should be rejected. Figure 15: Analysis and interpretation 6.1 Descriptive statistics After the data gathering was accomplished, analyzing the data was next on the schedule. As mentioned above, each hypothesis required different data. To get a feeling of how the different data set was distributed, a preliminary phase to the hypothesis testing was carried out. In this phase we tried to visua lize central tendencies to better understand the nature of the data. 6.1.1 Hypothesis 1 The formulation of this hypothesis is as follows: H1.0: The reorganizing did not lead to a decrease in the average time used to solve bugs per week. H1.1: The reorganizing lead to a decrease in the average time used to solve bugs per week. The foundation for the testing of this hypothesis was the handling time of more than 1100 bugs in the determined period of time. These data can be found in attachment D. Initially we visualized the data by making a diagram that showed all the bugs with their handling time. This diagram is shown in figure 16. 45 Bug handling time 01.01.2003-19.07.2004 7 6 5 4 3 2 1 0 1 85 169 253 337 421 505 589 673 757 841 925 1009 1093 Bug handling time 01.01.200319.07.2004 Figure 16: Diagram that illustrates the handling time for each of the inspected bugs The scale on the y-axis is identical with the scale that we introduced prior the data gathering, and the x-axis indicates the number of the bugs. This diagram is hard to interpret and makes it difficult to see any tendencies. Based on this, we decided to calculate the average handling time for each of the periods before and after the reorganization. The initial intention for our project was to evaluate the alteration of efficiency in the organization as a result of the reorganization. We assumed that analyzing the data prior to and after this initiative collectively would expose some of the central tendencies. Based on the calculations we made, the diagram in figure 17 was generated. 46 Average handling time 4 3,5 3 2,5 2 1,5 1 0,5 0 1 2 3 1:Prior to reorg 2:After reorg 3:Whole period Figure 17: Average handling time The scale on the y-axis is based on the scale that we introduced prior the data gathering only with higher granularity. We found that this diagram clearly exposed a difference in the handling time for the two periods. The handling time after the reorganization seems to have decreased with about 20 % compared to the period prior the initiative. This result made us curious, so we decided to see if this decrease was distributed even among the scale that we had introduced. By doing this, we hoped to find out if there was a certain category of bugs that had been altered. The results of this calculation are shown in table 10. Prior reorganization Number of bugs 90 20 17 53 63 107 After reorganization Number of bugs 293 67 62 138 111 109 Category 1 2 3 4 5 6 Percentual 0,257 0,057 0,049 0,151 0,180 0,306 Percentual 0,376 0,086 0,079 0,177 0,142 0,140 Table 10: Shows the distribution of the bugs within each category In addition to the table above we displayed the same data in figure 18 shown below. There is a distinct change in the amount of bugs in category 6, i.e. with handling time more than 3 months. Before the reorganisation about 30 percent of all the bugs fell into this category, while the amount decreased to less than 15 percent after the reorganisation. There is also a change in the number of bugs in category 1, i.e. bugs with handling time less than one day. Prior to the reorganization about 25 percents of the bugs were fixed in less than a day, but this share increased to more than 37 percent after the initiative. 47 Comparison of the bug handling time 0,4 0,35 0,3 0,25 Percentual share 0,2 0,15 0,1 0,05 0 1 2 3 4 5 6 Handling time Prior reorganisation After reorganisation Figure 18: Comparison of the bug handling time The hypothesis treats with the average handling time, therefore we also needed some plots regarding this. We decided to plot the average handling time on a weekly basis. This was done based on the average handling time that we calculated in excel. In other words, the sample size per week was 14 bugs. The plot is shown in the figure 19. Figure 19: The diagram plots the average handling time per bug on a weekly basis. 48 X-axis is the week numbers in 2003/2004, the plot starts 01.01.2003. Y-axis is the scale we created to tag the bug handling times. This diagram shows that the average handling time has decreased after the reorganisation. But it is also important that the reader notices that this tendency might have started even before the reorganisation. Unfortunately we didn't collect data prior to 2003, but we hope that the statistical test will gives us some answers in this matter. 6.1.2 Hypothesis 2 In order to partially answer the defined project question, the following hypothesis was examined. H2.0: Reorganization did not lead to a greater share of solved bugs compared to new bugs per week. H2.1: Reorganization led to a greater share of solved bugs compared to new bugs per week. In order for this hypothesis to be rejected or not rejected, we collected data on new and closed bugs. Then we examined the numbers before and after the reorganization. The picture below shows how the number of new and closed bugs inc rease on a weekly basis. Figure 20: The picture shows the development in reported and closed bugs on a weekly basis. The x-axis is weeks, and the y-axis is number of incidents. 49 6.1.2.1 Before the reorganization The graphs in figure 20 show that prior the reorganization the number of new and solved bugs follows each other quite well. This means that the developers were able to solve about the same number of bugs as the users found. This looks like a sign of a healthy organization that manages its challenges well. There are some large spikes that deviate from the rest of the graph. They were not caused by BugDay efforts, as the BugDay phenomenon didn’t start until August 2003. There seems to be a connection between releases and these spikes. Release candidates 1-3 came shortly after each of the 3 first peaks. This indicates that developers have made "all-out efforts" and solved a lot of bugs so the candidates can be released to the public. There are also negative peaks in holidays like Christmas. Figure 21: The picture shows the number of closed bugs divided on the number of new bugs. Figure 21 supports this trend. Before the reorganization the graph has several spikes above 1, meaning that the developers solve more bugs than the users report. However the process of solving bugs seems to be quite disorderly as the graph has big fluctuations. 6.1.2.2 After the reorganization The reorganization was in week 26 in 2003, this is marked by the line. From this point on, the two graphs in figure 20 seem to deviate from each other. There are found more bugs than there are solved. A pool of unsolved bugs arises, and according to picture 19 it keeps growing. We discovered that the release cycle increased after the reorganization. According to pre-reorganization results, this should cause an increase in solved bugs. Yet we see the opposite. In figure 21 the ratio on solved and new bugs becomes more stable. This might be good for future strategies as the workload gets easier to predict. However it stabilizes below 1, meaning that Gentoo developers aren’t able to solve all the reported bugs. This implicates that the number of unsolved bugs rises. 50 6.1.2.3 Total number of open bugs Although we have not done any extensive research on this topic we have expressed some concern about the number of unsolved bugs. The diagram below shows how the number of bugs almost has quadrupled less than 2 years. However there is a variable that we haven’t measured. If Gentoo Linux is measured in lines of code (LOC), and this number has increased. Then the number of unsolved bugs might have increased proportionally. This means that the ratio between LOC and unsolved bugs might be stable. And as the organization grows and gains manpower the increased pool of unsolved bugs might not be a problem after all. #Total Open Bugs 8000 7000 6000 5000 4000 3000 2000 1000 0 06.01.2003 06.03.2003 06.05.2003 06.07.2003 06.09.2003 06.11.2003 06.01.2004 06.03.2004 06.05.2004 06.07.2004 06.09.2004 #Total Open Bugs Figure 22: Number of open bugs each week 6.1.2.4 Plotting solved vs. new bugs per week In figure 23 we have plotted the ratio between solved and new bugs per week. The plot confirms our earlier assump tions and clearly shows how the workload spikes. This might indicate that the Gentoo project was less streamlined and depended on all-out workingsprees. That is not an ideal situation as Gentoo would be very vulnerable if some developers suddenly were unable to commit to these intensive bug fixing sessions before a release. The plot after the reorganization is very different. They are far more concentrated, indicating a more continuous working-rhythm. We believe this is preferable as the workload becomes more predictable. 51 An interesting observation is that according to the plot, fewer bugs seem to be solved after the reorganization. Although some spikes before the reorganization are far above the rest, the central mass of the plot before the reorganization is higher than the plot after. This means that the ratio between solved and new bugs generally decreased after the reorganization. Comparison of solved vs new bugs per week 2,5 Percentage of solved vs new bugs 2 1,5 1 0,5 0 0 0,5 1 1: Before reorg 1,5 2 2: After reorg 2,5 Figure 23: Comparison of solved vs new bugs per week 52 6.1.3 Hypothesis 3 The last hypothesis has the following formulation: H3.0: Number of developers has no influence on the average time needed to solve bugs on a weekly basis. H3.1: Number of developers has an influence on the average time needed to solve bugs on a weekly basis. This hypothesis needed data regarding the number of developers in Gentoo at a given time, together with the number of solved bugs in the same period. The first thing we did was to create a diagram that would expose the evolution of the amount of developers in Gentoo. This was done in Excel and the diagram is shown below. The data are detailed in attachment E. According to the diagram we can claim that there has been a virtually linear increase in the number of developers. Number of developers 300 250 200 150 100 50 0 W ee k W 1 ee W k6 ee k1 W 1 ee k1 W 6 ee k2 W 1 ee k2 W 6 ee k3 W 1 ee k3 W 6 ee k4 W 1 ee k4 W 6 ee k5 1 W ee k4 W ee k W 9 ee k1 W 4 ee k1 W 9 ee k2 W 4 ee k2 W 9 ee k3 W 4 ee k3 9 #Devs Figure 24 The evolution of developers from 01.012003 The next step in our process of visualizing the tendencies of the data distribution was to take a closer look on the number of bugs per week. We assumed that the number of new bugs registered per week would have an influence on the average bug handling time. To get better control over this issue, we made the diagram shown in figure 25. This diagram is based on the data we got when we divided the number of new bugs per week, by the number of developers for the same week. 53 Figure 25: New bugs per developer per week It is a bit hard to interpret this diagram and determine the tendencies. As we don’t have any data prior to the 1st of January, it is impossible to tell if it has been a tendency of a decrease in the number of new bugs per week. The peak at the first week in the diagram might have been an extremum and should therefore not be emphasized. But it seems that there has not been any major change in the number of new bugs per developer. The next thing we did was to examine the average number of bugs each developer solved per week. This was done to see if there had been a change in the contribution of the registered developers. The plot of the examination is shown in figure 26. As the reader will notice, the plot does fluctuate quite a lot, both prior and after the reorganization. The cause of this phenomenon is most likely due to the fluctuation of the number of solved bugs per week. But it seems to be a tendency that each developer solves fewer bugs per week now than before. 54 Figure 26: Solved bugs per developer per week To end our preliminary evaluation of the data gathered for hypothesis 3, we took a closer look at the diagram in figure 27. This diagram expose the weekly average handling time we produced in previous section. As mentioned above this diagram shows that there has been a decrease in the average handling time. When we compare this diagram with the diagram in figure 24 there is possible to see that these plots are inversely proportional. But we need a statistical test to determine if there exists a correlation between the data. Figure 27: The diagram plots the average handling time per bug on a weekly basis. 55 X-axis is the week numbers in 2003/2004, the plot starts 01.01.2003. Y-axis is the scale we created to tag the bug handling times. 6.2 Data set reduction We had to remove some of the data because we felt they where invalid. The manual bug research originally stretched from 01-01-2003 to 30-08-2004. However, as our scale ranged from “1 day” to “more than 3 months” any bugs found in August 2004 could not be older than 3 months as the project itself was done in mid October. Therefore we removed all bug data after 18-07-2004. This deduction reduced our data pool by about 90 bugs to 1130. We do not believe this has a significant impact on the results. There are some extrema. We have chosen not to remove them. This was because we found out why they were there, and we felt that hey were an important part of the data. 6.3 Hypothesis testing Hypothesis testing is done to see if it is possible to reject a certain null hypothesis based on a sample from some statistical distribution. We have formulated the null hypotheses negatively, and the intention of the test is to find out if there exists any foundation to reject these hypotheses. If the null hypothesis is not rejected nothing can be said about the outcome, while if it is rejected, it can be stated that the hypothesis is false with a given significance. [Wohlin et al., 2000] 6.3.1 Hypothesis 1 To be able to determine if there is a significant decrease in the average handling time after the reorganisation, an appropriate statistical test must be carried out. The design of our experiment is: one factor with two treatments. This gives us some alternatives when it comes to the choice of statistical test. According to Wohlin et al. [Wohlin et al., 2000] there are 4 different tests for this design and they are: t-Test, F-test, Mann-Whitney and Chi-2-test. While the first two tests are parametric tests, the two latter are non-parametric tests. The parametric tests require that the parameters involved in the model are normally distributed. They also require that parameters can be measured at least on an interval scale. The parameters involved in our model can be considered to be normally distributed since we included as many observations as we did. So it should not be any problem with the first requirement for a parametric test. When it comes to the second requirement, we might get into some trouble. The scale that we introduced in connection with the handling time does not satisfy the criteria to be considered as an interval scale [Wohlin et al., 2000]. This violates the second requirement, and we should therefore choose a nonparametric test. But after some discussion with professor Stålhane, we chose to use the tTest after all. This was based on the assumption that our scale wouldn’t influence the final result in any significant way. More regarding this issue can be found in several papers [T. Dybå, 2001, D. Davis 1996]. But we should include this in the validation of the test in any case. 56 6.3.1.1 t-Test The execution of the t- Test is quite forward in Excel. First we had to choose which type of t-Test we would like to carry out. The safest choice is to assume that the two datasets have different variance, so we picked that test. Then we had to choose the datasets that we wished to include in the test. This was done by dividing the average handling time data into two sets; one including the data prior to the reorganization, and the other including the data after. After the definition of the datasets, we chose an alpha value equal to 0,05. The outcome of the test is presented below in table 11. Variable 1 is prior to the reorganization and variable 2 is after. t-Test: Two-Sample Assuming Unequal Variances Variable 1 3,83516484 0,27500785 26 0 60 5,90705704 8,7783E-08 1,67064854 1,7557E-07 2,00029717 Variable 2 3,03792208 0,42009991 55 Mean Variance Observations Hypothesized Mean Difference Df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail Standard deviation: 0,52441191 0,64815115 Table 11: t-Test: Two-Sample Assuming Unequal Variances 6.3.1.2 Result interpretation The first line of the outcome table is the calculated mean of the datasets. Excel also calculates the variance in the sets and this is stated in the next line. Observations are the number of observations in each set. In this scenario we have pretended that the mean difference is equal, and therefore it is presented as 0. Df states the degrees of freedom. Since we only have interest in a potential decrease in the average handling time, the onetail figures are used. In this case, there is almost 100% probability that we will observe a “t Stat” as small as the one in the table, provided that the mean of variable 1 (prior reorganisation) is indeed larger than the mean of variable 2 (after reorganisation). “t Critical one-tail” is the smallest value “t” can take without violating the 95% certainty of the mean of variable 1 being larger than the mean of variable 2. An alternative way to interpret the data in table 11 is to compare the mean of the datasets with the standard deviation of the means. We found the standard deviations by calculating the rot of the variance in line 5 in the table. Those values are presented at the bottom of the table. As the difference between the means is larger than a standard deviation, we can claim that it is likely to believe that the re is a difference. 6.3.1.3 Linear regression As mentioned in the section with the descriptive statistics, the tendency of a decrease in the handling time might have started prior the reorganization. Figure 27 shows the plot of 57 the average handling time, and it is quite clear that there is a decrease, all the way from the first observation. One way of investigating this phenomenon is to execute a linear regression test. This test will give us the slope and the interception of the least-square regression line. This line is the line that gives the "best fit" i.e. where the sum of the squares of the differences to all data points has the smallest possible value. With these figures is it possible to get the equation of the regression line and display it. df Regression Residual Total 1 23 24 SS 0,147708006 6,413516484 6,56122449 MS F 0,14770801 0,52970693 0,27884854 Significance F 0,47407484 Intercept X Variable 1 Coefficients Standard Error 3,995714286 0,217725176 -0,010659341 0,01464578 t Stat P-value 18,3521004 3,1239E-15 -0,72780968 0,47407484 Lower 95% Upper 95% 3,54531606 4,44611251 -0,0409564 0,01963772 Table 12: Output of the line regression test prior reorganization. 6.3.1.4 Result interpretation This test is constructed to find out dependencies between variables, and is described in detail in the next section. We will now focus on the way leading to the regression line equations. The table above is a cutting form the output of the regression test executed on the data observations prior to the reorganization. The two most important numbers for now is the coefficients of “Intercept” and “X Variable 1”. These two numbers are the intercept and slope of the least-square regression line. Based on these two numbers we come up with the following equation for the regression line: y = -0, 01066·x + 3, 99571 Then we did the same task one more time with the data from the period after the reorganization. The output can be seen in table 13. df Regression Residual Total 1 54 55 Coefficients 3,532560297 -0,017200469 SS 4,328375089 18,41732462 22,74569971 Standard Error 0,158195911 0,0048283 MS 4,32837509 0,34106157 F 12,6908908 Significance F 0,00077701 Intercept X Variable 1 t Stat 22,3302883 -3,56242765 P-value 6,0796E-29 0,00077701 Lower 95% Upper 95% 3,215396317 3,84972428 -0,026880635 -0,0075203 Table 13: Output of the line regression test after reorganization This gave us the following equation: 58 y = - 0, 01720·x + 3, 53256 The next step in our investigation of the regression lines was to display the two lines and compare them. To be able to plot the lines we had to construct some points. This was done in Excel and the outcome was the two lines exposed in figure 28. Comparing linear regression 4,5 4 3,5 3 2,5 2 1,5 1 0,5 0 Regression line prior reorganisation Regression line after reorganisation Figure 28: Comparing linear regression Based on this diagram is it tempting to claim that the average handling time is decreasing with a higher rate after the reorganization. But when we take a closer look at the confidence interval to the two models, a problem appears. The confidence interval of the model of the dataset prior the reorganization totally surrounds the interval of the dataset after the reorganization. This means that we can not statistically claim that the handling time after the reorganization decreases with a higher rate than before. 6.3.1.5 t-Test II To determine if there is a change in the slope of the two regression lines we perform a ttest. Since we have the slope and the standard deviation we’re able to perform this test. The equation for this test is as follows: We define H0: µ1 = µ2 à µd = 0 and H1: µd ? 0 We test for the extreme value when µ1 = µ2. The values needed are gathered from table 12 & 13, and the test results in T = 2, 1808. T= ( −0,010659341 + 0,017200469) − 0 0,014645780^ 2 / 25 + 0,004828299^2 / 56 = 2, 1808 59 In addition we need to calculate the degrees of freedom. The equation for this is as follows: In this case the degree of freedom is 28. We now have the figures we need to test the hypothesis. The table for the critical values of the t-distribution indicates that we can say with more than 97, 5 % probability that the values are different. This means that we can reject H0 with the same probability and claim that average handling time after the reorganization decreases with a higher rate than before. 6.3.1.6 Brief summary The first t-Test indicted that the average bug handling time had decreased after the reorganization. But since the decrease had been a tendency prior the reorganization as well, we had to compare the rate of decrease for the two datasets. This was done with linear regression and it resulted in the slope for both regression lines. However, we were not able to say if the slope of the two regression lines were statistical different. To check if there had been a break in the total regression line at the reorganization we did another t-Test. Based on this test we can statistically claim with more than 97% confidence that the average handling time after the reorganization decrease with a higher rate than prior the improvement process. This means that we can reject H1.0. 6.3.2 Hypothesis 2 We chose the t-Test for this hypothesis as well, because we wanted to compare the mean average time used to solve bugs before and after the reorganization. The t-Test is a good choice when we have two data sets. In this test the two data sets are closed bugs divided on new bugs per week, before and after the reorganization. The large number of observations lets us assume that the data is normally distributed. 6.3.2.1 t-Test The test is in this case supposed to answer the following question: "Is the mean for variable 1 larger than the mean in variable 2?". t-Test: Two-Sample Assuming Unequal Variances Variable 1 0,89241063 0,10090659 77 0 95 5,69126309 6,9799E-08 1,6610511 1,396E-07 1,98524958 Variable 2 0,67347245 0,01151934 68 Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail 60 t-Test: comparing break-even for #solved vs #new bugs Variable 1: Before reorg Variable 2 : After reorg Standard deviation: 0,31765797 0,10732817 Table 14: t-Test: Two-Sample Assuming Unequal Variances The table above is the result from the t-test that was done in Microsoft Excel. Variable 1 is before the reorganization and variable 2 is after the reorganization. 6.3.2.2 Result interpretation T Critical one-tail is the smallest value t can have where we are 95% certain about the fact that one of the means really is bigger than the other. • • The size of the t value (5, 69) is a solid indicator that there is a difference between the variables. P(T<=t) one-tail tells us that there is a chance of 0,0000000067% that we are wrong in assuming that there is a difference in the number of found and solved bugs per week, before and after the reorganization. T Critical two-tail (1, 98) is the lowest value t can have and still claim that variable 1 is bigger than variable 2. H2.0 can not be rejected. • • 6.3.2.3 Brief summary The t-Test indicates that H0 can't be rejected, this means that there has not been an improvement in the number of solved vs. found bugs after the reorganization. H2.1 is then rejected as the t-test suggests that the reorganization did not lead to a greater share of solved bugs compared to new bugs per week 6.3.3 Hypothesis 3 Hypothesis 3 is intended to investigate the dependency between the average handling time per bug and the number of developers, on a weekly basis. The first step in our investigation was to plot the data in a diagram. The input range was the number of developers per week as x values, and the average handling time per week as the y values. 61 Figure 29: Handling time plot 6.3.3.1 Linear regression The diagram in figure 29 exposes what seems to be a tendency that the handling time decreases when the number of developers increases. It is therefore likely to believe that there is a linear dependency between these two factors. To test this we decided to perform a linear regression test in Excel with the data we had gathered earlier. The summary output is shown below. SUMMARY OUTPUT Regression Statistics Multiple R 0,63410706 R Square 0,402091764 Adjusted R Square 0,394523305 Standard Error 0,555566298 Observations 81 ANOVA df Regression Residual Total 1 79 80 Coefficients 5,204646834 -0,01061785 SS 16,39794849 24,38365897 40,78160746 Standard Error 0,269326267 0,001456725 MS 16,3979485 0,30865391 F 53,1272986 Significance F 2,08453E-10 Intercept X Variable 1 t Stat 19,3246908 -7,2888475 P-value 1,1318E-31 2,0845E-10 Lower 95% 4,668565856 -0,013517391 Upper 95% 5,74072781 -0,00771831 Table 15: Summary output 62 6.3.3.2 Result interpretation To interpret this result we have to take a closer look at the values in the last part of the table. Based on the plots in figure 24 and figure 27 we claimed that there is a dependency between the number of developers and the average handling time. The probability for this assertion being true is 1 - The P-value for Intercept. This means that it is virtually 100% (1 - 1,318E-31) probability that it is a dependency between the factors. The “Regression Statistic” section of the table presents some information regarding the variance in the data. Every model gets more nuanced the more parameters being used. It’s therefore important to take a closer look at “Adjusted R Square”-value. This correlationcoefficient is corrective in the matter of the number of parameters included in the model. In our test, this value is calculated to about 39%. This means that we have explained 39% of the variance in the average handling time as a result of the number of developers. It is therefore reasonable to interpret the data in such a way that there exist other relations that effects the handling time. Another important issue in the interpretation of linear regression tests is to check the residual plot. Since the data points in the plot in figure 29 are spread around, and not laying along a smooth line, it is essential that they are evenly distributed on both sides of the regression line. If this requirement is not fulfilled, the test will not be valid. The residual plot below, show the distribution in our test. X Variable 1 Residual Plot 2 1,5 Residuals 1 0,5 0 -0,5 0 -1 -1,5 -2 X Variable 1 50 100 150 200 250 300 Figure 30: Residual plot As the 81 observations in our test are distributed with 40 points over the regression line and 41 below it, there exists no reason to reject the linear model. 6.3.3.3 Brief summary The linear regression test indicates that H3.0 can be rejected. This means that there is a significant probability that there is a relation between the number of developers and the 63 average bug handling time per week. But the test also indicates that there are other factors that affect the handling time. 64 7 Evaluation and discussion of results In this chapter we present the hypothesis and discuss validation and eventual rejections. 7.1 Discussing the hypotheses Hypothesis 1: H1.0: The reorganization did not lead to a decrease in the average time used to solve bugs per week. H1.1: The reorganization lead to a decrease in the average time used to solve bugs per week. Hypothesis 2: H2.0: Reorganization did not lead to a greater share of solved bugs compared to new bugs per week. H2.1: Reorganization led to a greater share of solved bugs compared to new bugs per week. Hypothesis 3: H3.0: Number of developers has no influence on the average time needed to solve bugs on a weekly basis. H3.1: Number of developers has an influence on the average time needed to solve bugs on a weekly basis. H1 The t-Test indicates that there is a difference in the average bug handling time with almost 100% certainty. There has been a decrease in the average bug handling time. However this decrease might not have been a result of the reorganization. It might have been a trend that was present before the period from which we collected data. To clarify this we performed a linear regression on the data before and after the reorganization. It shows that there has been a decrease in the average bug handling time both before and after the reorganization. The graphs in figure 28 indicate that the decrease in average bug handling time has been faster after the reorganization. However this claim can't be statistically supported with the test that we have performed. To be able to come to a conclusion we did a t-Test to find out if the slopes of the two regression lines are different. The t-Test indicates that we can claim with more than 97% certainty that the handling time after the reorganization decreases with a higher rate. The validity of our data was partially compromised when we introduced a scale that classified the duration needed for a bug to be fixed. We interpret that this won’t influence the result of our investigation. People with mathematical or statistical background might think different in this matter. But based on the work of researcher like T. Dybå [2001] and D. Davis [1996] we find the validity of our test acceptable. H2 The graph [Figure 21] and the plot [Figure 23] show that the average number of closed bugs vs found bugs on a weekly basis was higher before the reorganization than it was after. 65 Again a t-Test shows that there is a difference in the two variables. Meaning that there is a difference in the number of found and solved bugs per week before and after the reorganization. This is very strongly suggested by the test and it claims there is only a chance of 0,0000000067% that this is wrong. As a result, we are unable to reject H2.0, and are forced to reject H2.1. This means that the reorganization did not lead to a greater share of solved bugs compared to new bugs per week. We believe that the data have a high degree of validity because it has been gathered over a large time span of about two and a half year. H3 During the investigation of hypothesis 3, the linear regression indicated that there is a dependency between the number of developers and average handling time with almost 100% certainty. The diagram in section 6.1.3 exposed a continuously increase in the number of developers since the beginning of 2003. In addition we have registered a decrease in the average handling time for the same period. The linear regression test exposes some other interesting issues as well. The number of developers only explains 39% of the variance in the handling time, which means that there most likely exists other factors that affect this parameter. It is outside the scope of our project to investigate this any further, and it doesn’t validate the testing of our hypothesis. Based on this test, we are able to reject hypotheses H3.0 and claim that the number of developers has an influence on the average time used to solve bugs. This result will be included in the evaluation of the reorganization. H3 has partially used data based on the data source used by H1. In addition we have gathered data from GWN. We have not been able to measure the integrity of the GWN data, but as mentioned earlier, we assume that the data is valid. Some thoughts came up when we looked into the results from hypothesis 3. During the test of H3 we only investigated the relation between handling time and the number of developers. One other possible issue that appeared, is the relation between the number of users that contribute in the development of Gentoo Linux and those who don't. If the increase in the first group doesn't keep up with the increase in the latter, problems might appear. An imbalance between these two groups could result in a workload that is bigger than the contributors can handle. This result in an increase in the amount of unsolved bugs and handling time. We are not able to investigate this issue, as our gathered data doesn't include the required information. 66 7.2 Evaluating Question 1 Question 1: Did the reorganization in Gentoo Linux improve the efficiency of the organization? In order to answer this question we look at the hypotheses. We will not answer this question statistically but, draw conclusions based on the results from the hypotheses testing. • • • H1.0, rejected à The reorganization lead to a decrease in the average time used to solve bugs per week. H2.0, not rejected à The reorganization did not lead to a greater share of solved bugs compared to new bugs per week. H3.0, rejected à Number of developers has an influence on the average time needed to solve bugs on a weekly basis. After the reorganization the developers managed to solve bugs faster. However they were not able to solve more bugs. Therefore the number of unsolved bugs has continued to increase. There has been an increase in the number of developers all the time. Based on the result of hypothesis 3, we can't tell if the decrease in the time needed to solve bugs was caused by the reorganization. It might also have been caused by the increased number of developers. Based on the results of these hypotheses we can not give a one-sided answer to questio n 1. There ha ve been some improvements but we are not able to tell whether or not they were caused by the reorganization. 7.3 Things we could have done differently. When collecting bug handling time data we should not have introduced a scale. This diminished the value of the data. We believe that the way we formulated our hypotheses might not have been ideal. It is very difficult to prove that the reorganization was the single and direct cause of any improvements. Perhaps it would be better if H1.0 was formulated like this: There was not a decrease in the average time used to solve bugs per week, after the reorganization. The same goes for H2.0: There has not been an increase in solved bugs compared to new bugs per week, after the reorganization. We did not like the report-template that was suggested. We feel that there was a certain pressure on us to use this template. Later in the project we were told that we could deviate from this template, and so we did. In the future it might be better if the institute and the advisors present different report-templates and state that the templates are only suggestions that the students can modify to fit their projects. We were not able to find out who did most of the bug solving. This means that 67 8 Conclusions and further work In this chapter, project evaluation, results and future work is presented. 8.1 Project evaluation We are very satisfied about what we have accomplished in this project. Doing an experiment using empirical software engineering has been interesting and we have learned a lot. Apart from ESE we have also learned about open source software and Gentoo. OSS has in the recent years become more and more debated and it has been exciting to study such a "hot" subject. Our supervisor told us that he intentionally had us start on the experiment before we had read a lot about ESE and OSS. We believe there are both pros and cons to this method. However this forced us to "learn by doing" and not only "learn by reading", therefore we feel that we have gotten a good insight in the subjects that this project is based on. Having used ESE, we can see why people like Basili believe that it can be a great addition to software experimenting. It seems to be a valuable addition to evaluating software [Basili, 1986]. The collaboration with our supervisor Thomas Østerlie has been good. He has been helpful and guided us when we had problems. He has also supplied us with articles and relevant background information. Yet he limited his involvement to guiding and helping us see things from different angles. He clearly stated that we would have to do the work, and he would supervise. We are happy with the way this relationship has been and believe it has accelerated our learning process. Meetings were conducted at irregular intervals when needed. The meetings were informal and we think that the absence of formal progress presentations helped us focus on the project itself. Anders and Knut did not know each other beforehand. We think our teamwork has been good. Differences in opinions has been debated within the team and sometimes been taken to our supervisor for a third opinion. This has been a very democratic and peaceful project without any big arguments. At first we had four hypotheses we wanted to answer in the project, this was reduced to a question and three hypotheses. Over-ambitiousness and partial irrelevance was the reason why the fourth hypothesis was removed. The goal of the project was to decide whether the Gentoo reorganization resulted in process improvements. With some reservations we believe that we have given an indication of Gentoo`s efficiency before and after the reorganization. 68 8.2 Conclusion During this project we have determined an appropriate way of measuring the success of the improvement initiative performed in an open source organization. This has been done by using empirical software engineering. We performed an experiment in order to find out if the reorganization was successful. The experiment was supposed to answer if the reorganizatio n led to a more efficient organization. We did not produce a single answer as our hypotheses didn’t provide us with an unambiguous answer. However, based on our research we present the following claims: After the reorganization there was a decrease in the average time used to solve bugs per week. This is certainly positive as bug solving is a very important and time-consuming task in open source software organizations. However we were not able to statistically determine if the reorganization led to a greater share of solved bugs compared to new bugs per week. Our data tells us that statistically, it looks like the number of developers has an impact on the average time needed to solve bugs on a weekly basis. These claims are presented with reservations, because during the project we discovered many possible threats to our results. Overall it seems like some parts of the Gentoo organization has become more efficient after the reorganization. If we were to recommend such a process, the costs should also be taken into account. This project has shown empirical software engineering in action. We have used what we believe are the best practices in this area. 8.3 Further work This chapter contains our suggestions on how our research can be taken further. LOC vs unsolved bugs As mentioned earlier in chapter 6.1.2, we did not take the size of Gentoo Linux (measured in lines of code) into account when looking at the rising number of unsolved bugs. The ratio between unsolved bugs and the size of the Gentoo software and the number of developers might be constant. This might mean that although the number of unsolved bugs increases, it increases in parallel with the number of developers and LOC in Gentoo. Trace who solves bugs, a core crew or are bugs solved by "everyone"? Gentoo almost increases its number of developers every month. It would be interesting to see if all the developers solve their fair share of bugs, or if only a few solve most of the bugs. If so, the increase in developers might be a negative thing. Analyze eventual logging of resources used to perform the reorganization, and see if it was worth it. Many commercial companies log how much time they use on different projects. Is this the case in Gentoo? If yes, it would be interesting to find out how much resources that have been used on the reorganization, and if it was worth the cost. 69 Survey on the people involved in the reorganization, find out if they believe it was helpful for Gentoo. Interviews and questionnaires might reveal what the people who performed the reorganization think about it. If they believe it was exclusively positive or if there were drawbacks. What they would have done differently and what Gentoo has gained. Similarities/differences between this reorganization and a similar reorganization in a commercial company. Such a project might pin the strengths and weaknesses in OS organizations and commercial companies against each other. A template on how to reorganize a organization/company might even be created, taking the best from both worlds. Find out if the rate of added developers increased after the reorganization. à This could in addition to the conclusion of H3, explain why H1.0 was rejected. 70 Bibliography [Basili et al., 1986] V. R. Basili, R. W. Shelby and D. H. Hutchens, "Experimentations in Software Engineering", IEEE transactions on software engineering, vol. SE-12, NO. 7, pp. 733-741, July 1986 V. R. Basili, "Editorial", Empirical Software Engineering Journal, vol. 1, NO. 2, 1996 L. C. Briand, S. Morasca and V. Basili, "An Operational Process for Goal- Driven Definition of Measures", IEEE transactions on software engineering, vol. 28, NO. 12, pp. 1106-1125, Dec 2002 D. Davis, "Business Research for Decision Making", Fourth Edition, Belmont, California: Duxbury Press, 1996 T. Dybå, "Enabling Software Process Improvement: An Investigation of the Importance of Organizational Issues", PhD Thesis, NTNU, IDI Report 7, 2001 N. Fenton, S. L. Pfleeger and R. L. Glass, "Science and Substance: A Challenge to Software Engineers", IEEE Software, pp. 86-95, July 1994 R. L. Glass, “A Look at the Economics of Open Source”, Communications of the ACM, Vol. 47, No. 2, Februa ry 2004 [Basili, 1996] [Briand et al., 2002] [Davis, 1996] [Dybå, 2001] [Fenton et al., 1994] [Glass, 2004] [Hippel & von Krogh, 2003] E. von Hippel and G. von Krogh, “Open Source Software and the “Private-Collective” Innovation Model: Issues for Organization Science”, Organization Science, Vol. 14, No. 2, March–April 2003 [Kitchenham et al., 2002] B. A. Kitchenham, S. L. Pfleeger, L. M. Pickard, P. W. Jones, D. C. Hoaglin, K. E. Emam and J. Rosenberg, "Preliminary Guidelilnes for Empirical Research in Software Engineering", IEEE transactions on software engineering, vol. 28, NO. 8, pp. 721-733, Aug 2002 E. Monteiro, T. Østerlie, K. H. Rolland and Emil Røyrvik, ”Keeping it going: The Everyday Practices of Open Source Software”, submitted for reviewing, 2004 D. E. Perry, A. A. Porter and L. G. Votta, "Empirical Studies of Software Engineering: A Roadmap", The Future of Software Engineering - ICSE2000, Finkelstein, ed. June 2000. [Monteiro et al., 2004] [Perry et al., 2000] 71 [Raymond, 2001] E. Raymond, “The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary”, Sebastopol, CA: O'Reilly and Associates, 2001 G. K. Thiruvathukal, “Gentoo Linux: The Next Generation of Linux”, IEEE Scientific Programming, vol. 6, NO. 5, pp. 66-74, 2004 W. F. Tichy, "Should Computer Scientists Experiment More?", IEEE Computers, vol. 31, NO. 5, pp. 32-39, 1998 C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell and A. Wesslen, "Experimentation in software engineering. An introduction", Kluwer Academic Publishers, 2000. [Thiruvathukal, 2004] [Tichy, 1998] [Wohlin et al., 2000] [Zelkowitz&Wallace, 1998] M. V. Zelkowitz and D. R. Wallace, "Experimental Models for Validating Technology", IEEE Computers, vol. 31, NO. 5, pp. 23-31, 1998 72 Online references [Bugzilla] http://www.bugs.gentoo.org (Accessed Oct 2004) http://www.opensource.org/docs/definition.php (Accessed Sept 2004) http://www.gentoo.org/proj/en/devrel/roll-call/userinfo.xml (Accessed Oct 2004) http://www.gentoo.org/doc/en/ management-structure.xml (Accessed Sept 2004). http://www.gentoo.org/main/en/about.xml (Accessed Sept 2004) http://www.gentoo.org/proj/en/devrel/recruiters/index.xml (Accessed Sept 2004) http://www.gentoo.org/proj/en/glep/glep-0002.html (Accessed Sept 2004) http://www.gentoo.org/proj/en/glep/glep-0004.xml (Accessed Sept 2004) http://www.gentoo.org/news/en/gwn/20030630newsletter.xml (Accessed Sept 2004) http://www.gentoo.org/news/en/gwn/20030804newsletter.xml (Accessed Sept 2004) http://www.gentoo.org/news/en/gwn/20041101newsletter.xml (Accessed Nov 2004) http://www.gentoo.org/news/en/ gwn/20030804newsletter.xml (Accessed Nov 2004) http://www.hyperdictionary.com/search.aspx?define=minix (Accessed Oct 2004) http://www.linux.org/info/linus.html (Accessed Oct 2004) [Definition] [Developer list] [Gentoo Management] [Gentoo Portage] [Gentoo recruiters] [GLEP] [GLEP4] [GWN 30.06.2003] [GWN 04.08.2003] [GWN 01.11.2004] [GWN 08.11.2004] [Hyperdictionary] [Linus] [Linux] 73 http://www.cs.tu-bs.de/eis/english/ research/current/researchWK.htm (Accessed Oct 2004) [Linux Online] http://www.linux.org/dist/list.html (Accessed Oct 2004) http://news.netcraft.com/archives/2004/07/12/ slight_linux_market_share_loss_for_red_hat.html (Accessed Nov 2004) http://news.netcraft.com/archives/2004/11/01/ november_2004_web_server_survey.html (Accessed Nov 2004) [Market share] [Netcraft] [Opensource] http://www.opensource.org (Accessed Sept 2004) http://www.gentoo.org/main/en/philosophy.xml (Accessed Sept 2004) http://www.idi.ntnu.no/undervisning/ prosjektoppgaver.php?utvalg=fordypning&gruppe=SU (Accessed Aug 2004) http://www.redhat.com (Accessed Sept 2004) http://www.idi.ntnu.no/emner/empse/syllabus.html (Accessed Nov 2004) http://www.idi.ntnu.no/emner/tdt25/ (Accessed Sept 2004) www.doc.ic.ac.uk/~wjk/UnixIntro/Lecture1.html (Accessed Oct 2004) http://www.idi.ntnu.no/~thomasos/ (Accessed Sept 2004) [Philosophy] [Project assignment] [RedHat] [Syllabus ] [Tdt25] [Unix] [Østerlie] 74 Attachment A 75 Attachment B In this section we detail all the suggested hypotheses and discuss their usability in our project. Each hypothesis was given a rate which indicated our overall view of it. The hypotheses with good rates became candidates for the final considerations. 1: Reorganizing has improved the efficiency of bug handling. Source: bug reports, Bugzilla Usable: Yes, as super-hypothesis. Rate: Good Dependent variables: Number of bugs / Efficiency Independent variables: Dates Subjects: People doing the research (us). / Members of herds, developers. Requirements: Efficiency must be defined or hypothesis must be divided into more specific hypotheses. About: The hypothesis can be checked for validation by comparing bug results before and after the reorganization date. We believe that bug handling is vital to any OSS project and that answering this hypothesis might tell in what direction the Gentoo project is moving. Number of new bugs per week and number of solved bugs per week, these numbers will be compared week to week before and after reorganization date. This is a general hypothesis that might be divided in sub-hypotheses. Hypothesis can be reformulated to: Reorganizing has improved the efficiency of Gentoo development. Efficiency= time needed for a discovered bug to be solved Implications from hypothesis result: Is the bug issue under control? What effect does it have on Gentoo's survivability? Was the reorganization worth the effort? Did it fulfill its goals? 2: Gentoo Linux will continue to exist in the following years. Source: mail lists, market share, interviews Rate: Poor Dependent variables: Number of users, market share, reputation. Independent variables: Future dates, not testable. Subjects: Users… Why: 76 Hypothesis is discarded because it requires future data, number of users and market share in the future can not be measured. It might be possible to see the tendencies in the future based on current statistics, but we choose not to do this because its lame. 3: Reorganizing lead to a higher release cycles. Source: Inspect new releases, cvs, mail lists Rate: Good. Dependent variables: Number of releases. Independent variables: Dates Subjects: People doing the research (us). Requirements: Release frequency before and after reorganization. Define what releases affect the cycles… all releases or only major kernel ones Why: Hypothesis is based on measurable data. This might be a sub- hypothesis under H#1. Release cycles can be a measurement of productivity, quality and efficiency. Releases must be of high quality, frequent releases of low quality products is not necessarily positive. Release frequency often mirrors the total community output and efficiency. 4: Reorganizing lead to improved communication. Source: forums, interviews, comments in newsletters Rate: Poor. Dependent variables: Quality of communication. Independent variables: Dates/NONE? Requirements: Clear definition of communication and improvement. Excessive amounts of logs, mails and forum posts. Define the actors involved in the communication. Why: Very hard to measure data from empirical studies due to the qualitative nature of the data. Necessary definitions will restrict and narrow data. Not all data is freely available(mails). How communication occurs (good/bad) is very subjective and may vary from person to person. 5: Number of developers increased as a result of the reorganization. Source: developer records, mail lists, newsletters 77 Rate: Good Dependent variables: Number of developers. Independent variables: Date. Requirements: Number of developers before and after the reorganization. Additions and subtractions of developers on a weekly basis in the same time period. Why: The number of developers might not directly influence the quality of the software, but a OSS community with many developers (assuming that they are skilled and contribute) should produce faster/cheaper/better than its competitors. An eventual change in the number of developers might not be caused by the reorganization. Importance of the total number of developers compared to the number of added/lost developers? Number of developers compared to the project size. Is there an ideal number of developers for a project of a certain size or is more always better? 6: Number of users and market share increased after the reorganization. Source: Independent sites with objective numbers. Rate: Medium Dependent variables: Number of users and market share. Independent variables: Dates Requirements: We would need accurate numbers of users in the months before and after the reorganization. Why: Might be difficult to find accurate number of users for short time intervals like weeks before and after reorganization. It might be difficult to prove that the reorganization is the reason why the market share increased. 7: Reorganization fulfilled its goals to a reasonable extent. Source: community feedback, forums, mail lists, bug reports Rate: Bad Dependent variables: Goal fulfillment Independent variables: Requirements: Clear definition of goals in measurable values. Clear definition of reasonable extent. Why: 78 Unless using this as a super-hypothesis it would take forever to make wise definitions. It requires reading tons of non-quantifiable information in order to draw conclusions. The hypothesis is simply to wide to be answered directly. It doesn’t immediately spawn subhypotheses either. 8: New roles and scopes has simplified decision-making. Source: Developer experiences, discussion, meeting logs forums, interviews Rate: Medium Dependent variables: Key personnel opinions. Independent variables: Dates Requirements: Need to know that simplification actually happened, not if it was intended to happen, therefore we would probably need to interview developers. Why: The hypothesis is interesting, new roles and simplifying decision making are some of the key points in the reorganization. However we do believe that in order to answer this hypothesis we should ideally have a fair amount of interviews from key personnel. That we do not possess, and we believe that getting it would be quite difficult. Meeting them in person would be difficult, and these people aren't known for dedicating their time to sporadic questionnaires by students. The non-quantifiable data also brings a degree of subjectivity. 9: "The Cabal" and secret mail lists are negative for the OSS community. Source: Fork documents, forums, official statements, meeting logs Rate: Horrible Dependent variables: Qualitative data Independent variables: ? Requirements: Why: "The cabal" is probably not more than a conspiracy theory, however it is interesting reading. And there might be some truth in the fact that some developers plan to go commercial. The secret mailing lists are lists where bugs in treatment are discussed, developers don’t feel that the public need to follow these lists. 10: Meeting deadlines has improved after the reorganization. Source: bug reports, forums, dev discussions Rate: medium Dependent variables: deadline date and delivery date Independent variables: org structure Requirements: 79 Why: Again definitions on deadlines are required. It might be possible to find data on deadlines and if they are met. When talking about deadlines we mean bug fixes, and if they were fixed on time? 11: The number of bugs are threatening the future of Gentoo. Source: Total number of unsolved bugs increasing? bug list, Bugzilla Rate: Poor Dependent variables: number of unsolved bugs, Independent variables: Dates Requirements: Why: The thought behind this hypothesis was to look at Gentoo and the number of unsolved bugs that appeared every month and to compare this number with the number of solved bugs every month. The goal of this hypothesis would be to se if the total number of bugs increased, and then make predictions of when the unsolved bugs would threaten Gentoo's existence. The data is measurable, it’s the predictions and calculations that might me questionable. Another issue is the fact that this has nothing to do with the reorganization. However it would be interesting to see a graph showing the number of unsolved bugs, and see if we could find other large projects that have gone down the drain and see if we can find out how the bug-count was during the late periods of the project. This hypothesis might be a super- hypothesis for more specific hypotheses. 12: The increase in the total number of unsolved bugs is a threat to Gentoo Linux. Source: Bugzilla, Rate: Poor Dependent variables: Independent variables: Requirements: Why: Se #11 This might be a sub- hypothesis to #11. 13: The increase of unsolved bugs will eventually kill the Gentoo Linux Project. Source: Bugzilla, articles, Rate: Poor Dependent variables: Independent variables: Requirements: Why: Se #11 80 This might be a sub- hypothesis to #11. 14: In x years the number of unsolved bugs will leave Gentoo Linux as a noncompetitive distribution. Source: Bugzilla, Rate: Bad Dependent variables: Independent variables: Requirements: Why: Se #11 vår analyse av når ett OS inneholder for mange bugs til å fungere tilfredsstillende, tidligere arbeid utført av andre på dette feltet. Krever sammenligning med lignende strandede prosjekter. This might be a sub- hypothesis to #11. 15: The increase of unsolved bugs doesn't threaten Gentoo Linux. (still competitive due to the quality of the main features...) Source: Bugzilla, mail, forums, our brain. Rate: difficult Dependent variables: unsolved bugs and number of users Independent variables: management structure Requirements: Why: Se #11 Interesting but not unique hypothesis, very hard to measure quantitatively. Difficult to prove a connection between ie users and bugs. Opposite of #11 16: The reorganizing lead to a decrease in the average time used to solve bugs. Source: Bugzilla, forums, interview, etc... Rate: good Dependent variables: time used to solve bugs Independent variables: management structure and chosen bugs Requirements: 81 Requires complete bug documentation. Why: If its possible to find out how much time is spent fixing each bug, it will be possible to debate this hypothesis. It will demand a LOT of manual searching, but it certainly is a interesting statement. Given that we manage to gather data on this topic, it could be viewed as a direct measure of how effective the organization as a whole worked before and after reorganization. Solving bugs involve a number of people, han work has been done to streamline related communication and procedures. Even if the reorganization only results in a few improved processes the overall time will go down and that was the goal. 17: Reorganization led to a greater share of solved bugs compared to new bugs. Rate: Good Dependent variables: solved and new bugs Independent variables: management structure, time periods, Requirements: newsletters, python leet script Why: Did the reorganization decrease the gap between reported new bugs and solved bugs in a given period of time. If this is not the case then the organization might consider other efforts to increase efficiency. Is this an important issue for the organization? Or is it irrelevant. Once Gentoo Linux manages to solve more bugs than are reported a revolution will happen, the total number of unsolved bugs will decrease and eventually reach zero. Is this new? Have any other major OSS project achieved this? What has been the results? Did it make a difference at all? 82 Attachment C 83 Attachment D Bug ID 13004 13005 13072 13073 13139 13140 13211 13212 13281 13282 13344 13345 13409 13410 13473 13474 13546 13547 13619 13620 13698 13699 13761 13762 13820 13821 13891 13892 13955 13956 14012 14013 14081 14082 14130 14131 14166 14167 14128 14129 14274 14275 14341 14342 14409 14410 14479 14480 14524 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 Time 6 6 5 6 6 1 1 4 2 1 4 1 5 1 5 4 5 5 1 6 1 6 6 6 2 2 4 6 2 5 4 4 6 3 1 6 5 6 4 6 5 1 5 6 5 6 6 6 2 Bug ID 14525 14561 14562 14607 14608 14658 14659 14704 14705 14764 14765 14824 14825 14895 14896 14951 14952 15021 15022 15090 15091 15152 15153 15193 15194 15259 15260 15305 15306 15359 15360 15411 15412 15477 15478 15542 15543 15610 15611 15673 15674 15730 15731 15767 15768 15825 15826 15897 15898 15976 Pri 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 5 4 5 5 4 5 5 1 5 2 1 6 2 5 1 4 4 6 2 6 5 5 5 6 6 1 6 5 5 6 1 6 4 2 1 1 6 4 4 4 6 1 6 1 4 6 1 1 1 3 Bug ID 15977 16051 16052 16122 16123 16175 16176 16208 16209 16262 16263 16333 16334 16392 16393 16452 16453 16533 16534 16601 16602 16647 16648 16717 16718 16791 16792 16870 16871 16943 16944 17004 17005 17068 17069 17123 17124 17183 17184 17285 17286 17345 17346 17423 17424 17479 17480 17540 17541 17603 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 4 3 6 6 6 6 5 5 1 5 4 1 6 1 1 5 1 6 1 4 6 5 1 6 5 5 1 6 4 2 1 1 4 6 1 1 4 6 6 1 1 6 1 4 5 6 5 5 1 1 Bug ID 17604 17664 17665 17740 17741 17800 17801 17853 17854 17913 17914 17976 17977 18027 18028 18075 18076 18132 18133 18198 18199 18281 18282 18341 18342 18418 18419 18462 18463 18514 18515 18572 18573 18626 18627 18674 18675 18726 18727 18790 18791 18843 18844 18883 18884 18953 18954 19023 19024 19068 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 5 4 3 6 6 6 1 4 1 2 2 6 6 1 6 6 5 6 1 6 6 6 5 5 4 4 2 4 5 1 6 4 1 5 6 2 5 6 4 6 4 2 4 6 1 1 5 6 5 1 84 Bug ID 19069 19122 19123 19186 19187 19219 19220 19280 19281 19342 19343 19401 19402 19455 19456 19528 19529 19594 19595 19647 19648 19705 19706 19747 19748 19813 19814 19867 19868 19910 19911 19983 19984 20027 20028 20089 20090 20147 20148 20200 20201 20255 20256 20304 20305 20373 20374 20413 20414 20460 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 5 6 1 6 6 6 6 1 6 6 5 5 6 1 6 4 6 1 6 1 6 1 1 6 1 5 1 5 1 3 1 1 1 6 5 1 5 5 5 3 6 3 1 6 6 3 1 6 4 1 Bug ID 20461 20517 20518 20562 20563 20620 20621 20671 20672 20733 20734 20777 20778 20832 20833 20898 20899 20985 20986 21037 21038 21090 21091 21143 21144 21178 21179 21233 21234 21313 21314 21389 21390 21456 21457 21522 21523 21589 21590 21655 21656 21693 21694 21746 21747 21811 21812 21875 21876 21932 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 5 6 1 3 1 1 6 1 1 1 5 3 3 6 1 1 6 2 6 3 6 6 4 6 1 1 1 1 4 4 4 6 4 1 1 5 4 4 6 1 1 6 6 6 6 6 5 1 4 5 Bug ID 21933 21981 21982 22036 22037 22079 22080 22138 22139 22183 22184 22245 22246 22330 22331 22382 22383 22426 22427 22475 22476 22540 22541 22594 22595 22657 22658 22730 22731 22800 22801 22854 22855 22905 22906 22972 22973 23031 23032 23085 23086 23155 23156 23212 23213 23255 23256 23311 23312 23356 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 3 2 2 1 2 2 2 2 Time 5 4 4 1 5 1 6 2 1 5 1 6 4 6 6 6 4 6 5 6 4 5 1 5 2 2 6 6 4 3 3 4 2 6 5 4 4 1 4 5 3 3 6 4 6 6 1 6 3 1 Bug ID 23357 23437 23438 23508 23509 23565 23566 23634 23635 23676 23677 23738 23739 23798 23799 23855 23856 23903 23904 23938 23939 23959 23960 23984 23985 24032 24033 24092 24093 24148 24149 24217 24218 24268 24269 24328 24329 24372 24373 24433 24434 24504 24505 24569 24570 24641 24642 24716 24717 24789 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 1 4 5 6 1 3 4 6 1 6 5 1 2 1 1 1 1 1 6 6 6 3 3 1 5 4 1 6 4 4 1 1 4 4 6 5 6 6 6 6 5 5 1 4 6 6 5 2 5 4 85 Bug ID 24790 24857 24858 24949 24950 25034 25035 25100 25101 25166 25167 25231 25232 25305 25306 25350 25351 25408 25409 25485 25486 25555 25556 25608 25609 25685 25686 25746 25747 25803 25804 25854 25855 25933 25934 26025 26026 26107 26108 26179 26180 26248 26249 26316 26317 26377 26378 26464 26465 26522 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 3 4 4 1 3 1 6 4 5 5 1 6 4 2 6 1 5 1 2 6 1 6 6 1 3 4 4 6 5 1 4 1 1 5 1 4 1 5 6 2 2 6 5 1 5 5 1 4 3 2 Bug ID 26523 26594 26595 26660 26661 26701 26702 26777 26778 26844 26845 26900 26901 26971 26972 27033 27034 27095 27096 27162 27163 27212 27213 27268 27269 27344 27345 27396 27397 27459 27460 27513 27514 27581 27582 27633 27634 27683 27684 27760 27761 27841 27842 27899 27900 27984 27985 28036 28037 28100 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 5 1 2 1 4 6 5 5 2 1 5 6 4 1 1 5 1 4 6 5 4 4 1 1 2 2 4 2 6 6 5 4 6 5 5 4 6 5 2 6 4 4 1 5 1 4 4 4 4 6 Bug ID 28101 28154 28155 28241 28242 28329 28330 28421 28422 28497 28498 28586 28587 28672 28673 28758 28759 28847 28848 28947 28948 29036 29037 29069 29070 29174 29175 29246 29247 29315 29316 29407 29408 29489 29490 29562 29563 29644 29645 29762 29727 29787 29788 29874 29875 29973 29974 30046 30047 30140 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 2 2 Time 4 1 1 3 5 1 4 1 1 5 1 2 1 1 4 1 1 1 1 1 1 6 4 5 1 2 1 1 5 2 4 1 6 2 1 4 6 5 1 2 4 1 3 5 6 2 1 1 1 1 Bug ID 30141 30231 30232 30300 30301 30366 30367 30471 30472 30542 30543 30638 30639 30734 30735 30807 30808 30888 30889 30952 30953 31016 31017 31098 31099 31180 31181 31244 31245 31334 31335 31391 31392 31473 31474 31573 31574 31651 31652 31729 31730 31795 31796 31887 31888 31959 31960 32012 32013 32083 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 3 6 5 1 2 4 1 2 2 6 1 1 1 6 1 1 1 2 2 4 1 4 6 6 1 4 1 4 4 1 2 1 5 3 1 6 1 1 1 1 1 4 1 6 6 2 1 1 1 4 86 Bug ID 32084 32171 32172 32245 32246 32312 32313 32363 32364 32446 32447 32526 32527 32600 32601 32688 32689 32776 32777 32853 32854 32931 32932 32989 32990 33065 33066 33122 33123 33197 33198 33286 33288 33369 33370 33445 33446 33540 33541 33586 33587 33660 33661 33732 33733 33800 33801 33902 33903 33992 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 1 4 6 6 5 5 3 2 6 6 5 4 6 1 3 4 1 1 5 4 2 3 6 1 3 1 3 5 5 6 5 5 1 6 4 1 5 3 4 1 6 4 2 1 1 2 6 3 1 3 Bug ID 33993 34084 34085 34161 34162 34240 34241 34314 34315 34390 34391 34498 34499 34602 34603 34676 34677 30743 30744 34803 34804 34867 34868 34954 34955 35031 35033 35121 35122 35177 35178 35240 35241 35316 35317 35420 35421 35505 35506 35579 35580 35642 35643 35703 35704 35783 35784 35860 35861 35923 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 1 2 Time 3 2 4 6 1 5 4 5 1 2 4 6 1 1 4 4 1 6 2 2 4 1 5 6 1 1 1 5 1 3 3 4 4 1 4 5 1 6 1 5 2 2 1 2 1 5 4 6 5 3 Bug ID 35924 35978 35979 36044 36045 36103 36104 36159 36160 36218 36219 36282 36285 36351 36352 36410 36411 36471 36472 36506 36507 36566 36567 36623 36624 36704 36705 36789 36790 36875 36876 36930 36931 36997 36998 37077 37078 37166 37167 37275 37276 37375 37376 37462 37463 37575 37576 37659 37660 37766 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 5 5 4 1 5 4 3 5 6 1 3 2 1 2 1 4 1 6 6 4 6 6 2 6 6 1 4 5 6 6 1 2 3 6 1 4 2 1 4 1 3 6 1 6 1 1 2 5 1 1 Bug ID 37767 37865 37866 37945 37946 38053 38054 38143 38144 38259 38260 38376 38377 38488 38489 38577 38578 38681 38682 38785 38786 38896 38897 38995 38996 39124 39125 39230 39231 39317 39318 39426 39427 39543 39544 39641 39642 39732 39733 39845 39846 39943 39944 40043 40044 40153 40154 40247 40248 40352 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 6 6 1 1 6 1 4 5 1 1 4 5 6 1 2 1 1 1 1 5 4 1 2 4 5 6 1 3 6 4 6 4 1 4 3 4 6 1 1 5 5 6 4 1 1 6 1 3 5 2 87 Bug ID 40353 40447 40448 40562 40563 40686 40687 40807 40808 40944 40945 41083 41084 41206 41207 41337 41338 41447 41448 41539 41540 41634 41635 41753 41754 41866 41867 42007 42008 42116 42117 42225 42226 42332 42333 42453 42454 42553 42554 42711 42712 42849 42850 42962 42963 43066 43067 43175 43176 43256 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 Time 4 1 1 3 5 4 1 1 3 1 4 4 5 6 5 1 5 1 1 4 5 6 1 1 5 4 4 5 1 3 1 1 3 1 2 6 6 1 1 5 5 4 4 6 6 1 1 3 1 6 Bug ID 43257 43353 43354 43446 43447 43565 43566 43684 43685 43775 43776 43855 43856 43936 43937 44009 44010 44104 44105 44220 44221 44328 44329 44424 44425 44538 44540 44646 44647 44726 44727 44812 44814 44924 44925 45001 45002 45110 45111 45192 45194 45258 45259 45354 45355 45475 45476 45580 45581 45683 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 6 1 6 1 1 6 4 1 1 6 3 1 1 5 1 1 5 1 1 1 1 1 3 1 5 2 5 5 1 1 1 4 4 4 1 5 4 3 3 4 4 4 3 1 4 1 1 2 1 1 Bug ID 45686 45775 45776 45865 45866 45955 45956 46081 46082 46210 46211 46328 46329 46451 46452 46562 46563 46665 46666 46747 46748 46827 46828 46922 46923 47041 47042 47184 47185 47307 47309 47411 47412 47487 47488 47582 47583 47671 47672 47795 47796 47885 47886 48008 48009 48113 48114 48193 48194 48294 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 2 5 4 2 1 6 1 1 1 2 2 1 1 1 1 1 5 1 4 1 1 1 1 1 1 6 1 1 5 1 2 2 3 4 4 3 1 5 4 6 1 1 4 2 1 1 6 3 1 1 Bug ID 48295 48437 48438 48538 48539 48656 48658 48762 48763 48852 48853 48917 48918 49011 49012 49131 49132 49251 49252 49353 49354 49468 49469 49574 49575 49676 49677 49808 49809 49935 49937 50075 50076 50181 50182 50321 50322 50430 50431 50512 50514 50600 50601 50714 50715 50824 50825 50903 50904 51011 Pri 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 2 2 2 2 2 2 Time 1 4 1 4 1 1 1 2 1 1 3 3 1 1 5 1 2 5 5 4 1 6 3 5 6 1 1 5 4 1 4 1 1 6 2 1 4 2 3 1 3 4 6 2 1 1 1 5 5 3 88 Bug ID 51013 51116 51117 51182 51183 51254 51255 51352 51353 51459 51460 51535 51536 51617 51618 51722 51723 51803 51805 51896 51897 51972 51973 52067 52069 52163 52164 52237 52238 52320 52322 52418 52419 52529 52530 52659 52662 52759 52760 52852 52853 52949 52950 53036 53040 53114 53115 53196 53198 53278 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 Time 5 1 1 4 5 5 6 1 5 1 4 4 6 4 1 4 2 1 1 4 2 3 5 6 1 4 1 4 3 4 4 1 1 5 6 1 5 5 3 5 5 4 1 1 1 4 1 1 4 1 Bug ID 53279 53381 53382 53486 53488 53613 53615 53695 53696 53780 53781 53864 53865 53953 53957 54064 54065 54171 54173 54278 54279 54393 54394 54498 54499 54601 54602 54713 54714 54861 54862 54975 54976 55122 55124 55215 55216 55299 55300 55388 55389 55507 55508 55641 55644 55756 55757 55856 55856 55940 Pri 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Time 6 3 3 5 1 4 1 5 4 4 6 4 1 1 1 1 3 1 5 1 1 3 4 3 1 6 1 1 1 6 1 1 3 1 4 2 1 1 1 1 1 1 5 3 1 5 5 2 3 5 Bug ID 55942 56018 56019 56107 56108 56201 56202 56305 56306 56418 56420 56516 56517 56590 56591 56658 56660 56760 56761 56870 56872 56986 56987 57131 57132 57257 57258 57365 57366 57456 57457 Pri 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 Time 5 1 1 4 1 2 4 1 3 1 1 4 1 5 1 1 1 1 1 1 1 4 4 4 4 4 1 3 1 2 1 89 Attachment E Moves 2003 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week 18 Week 19 Week 20 Week 21 Week 22 Week 23 Week 24 Week 25 Week 26 Week 27 Week 28 Week 29 Week 30 Week 31 Week 32 Week 33 Week 34 Week 35 Week 36 Week 37 Week 38 Week 39 Week 40 Week 41 Week 42 Week 43 Week 44 Week 45 Week 46 Week 47 Week 48 Week 49 Week 50 Week 51 Week 52 0 0 0 0 0 2 0 0 2 0 0 0 2 1 0 1 0 2 1 0 0 0 0 2 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Adds 5 0 1 0 2 1 3 5 1 4 2 4 1 7 5 5 11 0 0 0 0 2 2 0 0 4 0 7 8 1 8 4 0 0 0 0 0 6 1 3 0 2 0 3 3 1 0 0 0 0 0 0 Total 102 107 107 108 108 110 109 112 117 116 120 122 126 125 131 136 140 151 149 148 148 148 150 152 150 150 150 150 157 165 166 174 178 178 178 178 178 178 184 185 188 188 190 190 193 196 197 197 197 197 197 197 2004 Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week 18 Week 19 Week 20 Week 21 Week 22 Week 23 Week 24 Week 25 Week 26 Week 27 Week 28 Week 29 Week 30 Week 31 Week 32 Week 33 Week 34 Week 35 Week 36 Week 37 Week 38 Week 39 Week 40 Moves 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 3 1 2 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Adds 0 0 9 5 2 0 4 1 1 3 4 1 4 0 5 2 0 0 0 1 2 1 4 0 0 9 2 0 5 3 4 0 0 0 0 0 0 0 0 0 Total 196 196 196 205 210 212 212 216 217 218 221 224 225 229 229 234 233 232 230 230 230 232 233 237 237 236 245 247 247 252 255 259 259 259 259 259 259 259 259 259 90 91

Related docs
Measure for Measure
Views: 9  |  Downloads: 0
Measure for Measure
Views: 18  |  Downloads: 0
Measure for Measure
Views: 8  |  Downloads: 0
HOW TO MEASURE EXHIBITION SUCCESS
Views: 24  |  Downloads: 3
MEASURE FOR MEASURE
Views: 3  |  Downloads: 0
William Shakespeare - Measure for Measure
Views: 1  |  Downloads: 0
How Do You Measure Success
Views: 246  |  Downloads: 4
How do you measure success
Views: 89  |  Downloads: 3
How Do You Measure Success
Views: 139  |  Downloads: 0
How do you measure success
Views: 77  |  Downloads: 3
KJC4 How Do You Measure Success
Views: 33  |  Downloads: 0
Other docs by user002
meeting the digital challenge
Views: 938  |  Downloads: 79
Introduction to Data Mining
Views: 1893  |  Downloads: 311
Information Management Framework
Views: 1492  |  Downloads: 281
Information Management Framework metadata
Views: 827  |  Downloads: 99
Information Management Framework Data Quality
Views: 1062  |  Downloads: 183
Information Management Classification Guideline
Views: 913  |  Downloads: 112
Information Architecture
Views: 724  |  Downloads: 58
HelloPartner Data Model
Views: 597  |  Downloads: 19
Emotional Intelligence
Views: 649  |  Downloads: 32
Developing Strategies for Managing Your Files
Views: 386  |  Downloads: 17
Data Quality Framework
Views: 507  |  Downloads: 69
Data quality assessment guidelines
Views: 680  |  Downloads: 103
Categorization of Software for mobile work
Views: 705  |  Downloads: 45
Competitive Intelligence
Views: 462  |  Downloads: 43