Open Source Data Sources Academy of Management PDW

Reviews
Open Source Data Sources Academy of Management PDW 13 August 2006, Atlanta James Howison PhD Candidate Syracuse University School of Information Studies Supported by The Syracuse FLOSS project with Prof. Kevin Crowston. (NSF Grants 03-41475 and 04-14468. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.) Overview 1. 2. 3. 4. 5. Types of data on open source teams Ethical issues Where and how can I get this data? Difficulties in using data Integrating types of data Slides and References at: http://floss.syr.edu/presentations/FlossDataTutAoM2006/ Open Source Data Tutorial, James Howison, Academy PDW 13 August 2006 What‟s available? Project level data  „Demographics‟ (Start date, license etc)  Team (Founder, roles etc)  Communications (Email lists, IRC etc)  Code repositories and release history Cross project data  Project lists and counts  Relative statistics (Downloads, activity etc) Open Source Data Tutorial, James Howison, Academy PDW 13 August 2006 Ethical Issues with Data Use Action in public, intended to be shared and observed  But not for research … consider risks Anonymized data can easily be traced Should your research be available to the community it is based on? Open Source Data Tutorial, James Howison, Academy PDW 13 August 2006 Sources of open source data  Manual collection & „spidering‟  Academic data and analysis sets  Notre Dame‟s Sourceforge Dumps  FLOSSmole  CVSanalY  Non-academic data and analysis sets  OpenBRR  Ohloh Open Source Data Tutorial, James Howison, Academy PDW 13 August 2006 Notre Dame Sourceforge dumps  Greg Madey working with Sourceforge  Single interface to academic community  Monthly dumps of (almost) entire Sourceforge database  „Demographics‟  Communications (except Mailing Lists!)  Bug Tracker details  Contract with Madey‟s group needed  Web form for SQL query, text file download  Wiki recently setup for community interaction Open Source Data Tutorial, James Howison, Academy PDW 13 August 2006 FLOSSmole  Collaborative group of academic researchers  Collective spidering of Sourceforge, Rubyforge, Freshmeat and ObjectWeb  Scripts to collect mailing lists from Sourceforge  Some data from Savannah and Apache  Web SQL interface, script access available on request  Analysis scripts largely available  Mailing list and blog for communication Open Source Data Tutorial, James Howison, Academy PDW 13 August 2006 CVSanalY  Gregorio Robles and Libre Software Engineering project from Spain  Scripts convert code repository (eg CVS) logs into relational database  “Who‟s contributed the most code?”  MySQL dump of all Sourceforge projects available for download  Scripts can run against any CVS server Open Source Data Tutorial, James Howison, Academy PDW 13 August 2006 Other sources  Ohloh  “Objective metrics”  Contributor graphs, COCOMO cost estimates  Open Business Readiness Rating  Attempt at systematic ratings of projects to be used in software specification  Aim to share ratings done by different organizations Open Source Data Tutorial, James Howison, Academy PDW 13 August 2006 Data difficulties  Dirty data  Not all use all features of repositories  Many projects outside your scope (eg single person or „dumped‟ school projects)  Highly skewed data (sampling difficulties)  Non-research data have response bias and low variance  Includes Freshmeat ratings or Sourceforge‟s „trove‟ categories  Manual creation of comparable sets, manual confirmation of data comparability Open Source Data Tutorial, James Howison, Academy PDW 13 August 2006 Integrating Data and Next steps Most studies use one only type of data I‟m currently developing a „Browser‟ which combines sources using a simple „Actor‟ does „Action‟ structure Data sharing is good, analysis script sharing is excellent :-) Open Source Data Tutorial, James Howison, Academy PDW 13 August 2006 References Slides, References and links at: http://floss.syr.edu/presentations/FlossDataTutAoM2006/ Open Source Data Tutorial, James Howison, Academy PDW 13 August 2006

Related docs
sources for project and data management
Views: 0  |  Downloads: 0
Open_source
Views: 21  |  Downloads: 0
Primary Sources
Views: 14  |  Downloads: 0
open-source-applications
Views: 52  |  Downloads: 3
PDW 2004 - Call for Submission Form
Views: 2  |  Downloads: 0
when to use open source
Views: 2  |  Downloads: 0
open source development with cvs
Views: 1  |  Downloads: 0
academy of science
Views: 261  |  Downloads: 1
Other docs by Sam Rob
de121ma
Views: 215  |  Downloads: 0
dv110k
Views: 109  |  Downloads: 0
Hear Oh Israel
Views: 316  |  Downloads: 0
Things to remember
Views: 251  |  Downloads: 3
Bankruptcy proceedings representation
Views: 300  |  Downloads: 4
Fisher v Carrousel Motor Hotel Inc
Views: 504  |  Downloads: 4
Property Outline (Second Half) Prof. Knapland
Views: 458  |  Downloads: 15
Wine Tasting Glossary: Italian-English
Views: 742  |  Downloads: 17
AP French Literature 2004 Scoring Guidelines
Views: 734  |  Downloads: 1
dv145s
Views: 254  |  Downloads: 0
Hannan v Dusch
Views: 452  |  Downloads: 7
Martin v State
Views: 339  |  Downloads: 0
adr105
Views: 116  |  Downloads: 0
You Have Been Good
Views: 265  |  Downloads: 0
Let Us Worship the Father
Views: 325  |  Downloads: 3