Conceptual Framework for Agent- Based Modeling and Simulation The

Document Sample
Conceptual Framework for Agent- Based Modeling and Simulation The Powered By Docstoc
					   Conceptual Framework for Agent-
   Based Modeling and Simulation:
      The Computer Experiment


Yongqin Gao                 Vincent Freeh            Greg Madey
CSE Department              CS Department            CSE Department
University of Notre Dame    NCSU                     University of Notre Dame


                           NAACSOS Conference
                             Pittsburgh, PA
                              June 25, 2003

                           Supported in part by the
      National Science Foundation - Digital Society & Technology Program
The Computer Experiment
      Agent-Based Simulation as
         a Component of the
          Scientific Method
                 Modeling
                 (Hypothesis)


               Social Network
               Model of F/OSS

Observation                      Agent -Based
                                  Simulation
                                  (Experiment)
 Analysis of
                                Grow Artificial
SourceForge
                                SourceForge
   Data
                     Outline
•   Investigation: Free/Open Source Software (F/OSS)
•   Conceptual framework(s)
•   Model description
•   ER model
•   BA model
•   BA model with constant fitness
•   BA model with dynamic fitness
•   Summary
GNU   Open Source Software (OSS)    Linux

            • Free …
               –   to view source
               –   to modify
               –   to share
               –   of cost
 Savannah   • Examples
               –   Apache
               –   Perl
               –   GNU
               –   Linux
               –   Sendmail
               –   Python
               –   KDE
               –   GNOME
               –   Mozilla
               –   Thousands more
Free Open Source Software (F/OSS)

• Development
  – Mostly volunteer
  – Global teams
  – Virtual teams
  – Self-organized - often peer-based meritocracy
  – Self-managed - but often a “charismatic” leader
  – Often large numbers of developers, testers, support help, end
    user participation
  – Rapid, frequent releases
  – Mostly unpaid
  Typical
Charismatic
 Leaders?

                       Larry Wall
                          Perl




Linus Tolvalds                          Richard Stallman
    Linux                               GNU Manifesto
                 Eric Raymond
                 Cathedral and Bazaar
                  F/OSS: Significance
• Contradicts traditional wisdom:       • Research issues:
    –   Software engineering               –   Understanding motives
    –   Coordination, large numbers        –   Understanding processes
    –   Motivation of developers
                                           –   Intellectual property
    –   Quality
    –   Security
                                           –   Digital divide
    –   Business strategy                  –   Self-organization
• Almost everything is done                –   Government policy
  electronically and available in          –   Impact on innovation
  digital form                             –   Ethics
• Opportunity for Social Science           –   Economic models
  Research -- large amounts of online      –   Cultural issues
  data available
                                           –   International factors
SourceForge
              • VA Software
              • Part of OSDN
              • Started 12/1999
              • Collaboration tools
              • 58,685 Projects
              • 80,000 Developers
              • 590,00 Registered
              Users
Savannah
           • Uses SourceForge
           Software
           • Free Software
           Foundation
           •1,508 Projects
           •15,265 Registered
           Users
             F/OSS: Importance
Major Component of e-Technology Infrastructure with major
presence in
   e-Commerce
   e-Science
   e-Government
   e-Learning
Apache has over 65% market share of Internet Web servers
Linux on over 7 million computers
Most Internet e-mail runs on Sendmail
Tens of thousands of quality products
Part of product offerings of companies like IBM, Apple
   Apache in WebSphere, Linux on mainframe, FreeBSD in OSX
   Corporate employees participating on OSS projects
   Free/Open Source Software
• Seems to challenge traditional economic assumptions
• Model for software engineering
• New business strategies
   –   Cooperation with competitors
   –   Beyond trade associations, shared industry research, and
       standards processes — shared product development!
• Virtual, self-organizing and self-managing teams
• Social issues, e.g., digital divide, international
  participation
• Government policy issues, e.g., US software industry,
  impact on innovation, security, intellectual property
                        Research Model

                       Conceptual
                   Explanatory Model of                       Parameter Values
                     OSS: Agent-Based
                   Modeling and Simulation
                                                   Structural Features

                                                                              Understanding the
Cross Validation    Combined Data Mining                                      Social and Task
                                           Parameter Values                 Dynamics that Predict
                                                                             Developer Behaviors
                                                   Structural Features
                       Social Network
                   Analysis: Longitudinal
                     Study of Preferential                       Parameter Values
                   Attachment and Dynamic
                         Attachment
        Data Collection — Monthly
• Web crawler (scripts)
    –   Python
    –   Perl                   PROJ|DEVELOPER
    –   AWK                    8001|dev378
    –   Sed                    8001|dev8975
•   Monthly                    8001|dev9972
•   Since Jan 2001             8002|dev27650
•   ProjectID                  8005|dev31351
•   DeveloperID                8006|dev12509
                               8007|dev19395
•   Almost 2 million records
                               8007|dev4622
•   Relational database
                               8007|dev35611
                               8008|dev8975
                         F/OSS Developers - Social Network Component
                       Developers are nodes / Projects are links
                                   24 Developers
                                     5 Projects
                               2 Linchpin Developers                                                Project 7597
                                     1 Cluster                                                               dev[64]

                                                                                          dev[72]                                 dev[67]
     Project 6882

                     dev[52]                   Project 7028                                                                                 dev[65]
                                                                            dev[70]
                                                               dev[57]
            6882 dev[47]                                                                 7597 dev[46]
                                                                                            dev[64]
                                                                                 7597 dev[46]    7597 dev[46]                               dev[45]
              dev[52]                          dev[99]
                                                                                   dev[72]          dev[67]
                    6882 dev[47]                                                                     7597 dev[46]
dev[47]               dev[55]        dev[55]                                7597 dev[46]                dev[55]
                                                                 7028 dev[46] dev[70]                 7597 dev[46]
                                                            7028 dev[46]
                                                                    dev[57]                              dev[45]                            dev[61]
                                               dev[51]        dev[99]
     6882 dev[47]                                            7028 dev[46]                            7597 dev[46]
                6882 dev[58]                                                                            dev[61]
       dev[79]    dev[47]                                      dev[51]
                                                                                              7597 dev[46]                 dev[58]
                                                                                                dev[58]
                                                                            dev[46]
                                                                                              9859 dev[46]
                                                     15850 dev[46]                              dev[54]                dev[54]
                                                        dev[58]
          dev[79]                                                                                    9859 dev[46]
                                   dev[58]                                          9859 dev[46]         dev[49]
                                                                                       dev[53] 9859 dev[46]
                                                                         15850 dev[46]
                                                     15850 dev[46]          dev[56]              dev[59]
                                                        dev[83] 15850 dev[46]
                                                                   dev[48]
                                                                                                                                       dev[49]
                                                                                              dev[53]
                                                                                                                        dev[59]
                                     dev[83]                                    dev[56]

                                                                                                                          Project 9859
                                                             dev[48]
                                     Project 15850
Models of the F/OSS Social Network
     (Alternative Hypotheses)
• General model features
   – Agents are nodes on a graph (developers or projects)
   – Behaviors: Create, join, abandon and idle
   – Edges are relationships (joint project participation)
   – Growth of network: random or types of preferential
     attachment, formation of clusters
   – Fitness
   – Network attributes: diameter, average degree, power law,
     clustering coefficient
• Four specific models
   –   ER (random graph)
   –   BA (scale free)
   –   BA ( + constant fitness)
   –   BA ( + dynamic fitness)
ER model – degree distribution
                     • Degree
                       distribution is
                       binomial
                       distribution while
                       it is power law in
                       empirical data
                     • R2 = 0.9712 for
                       developer
                       network
                     • R2 = 0.9815 for
                       project network
ER model - diameter
          • Average degree is
            decreasing while it is
            increasing in empirical
            data
          • Diameter is increasing
            while it is decreasing in
            empirical data
ER model – clustering coefficient
                 • Clustering coefficient is
                   relatively low around 0.4
                   while it is around 0.7 in
                   empirical data.
                 • Clustering coefficient is
                   decreasing while it is
                   increasing in empirical
                   data
ER model – cluster distribution
                • Cluster distribution in ER
                  model also have power law
                  distribution with R2 as 0.6667
                  (0.9953 without the major
                  cluster) while R2 in empirical
                  data is 0.7457 (0.9797
                  without the major cluster)
                • The actual distribution is
                  different from empirical data
                • The later models (BA and
                  further models) have similar
                  behaviors
BA model – degree distribution
                • Power laws in degree
                  distribution, similar to
                  empirical data (+ for
                  simulated data and x for
                  empirical data).
                • For developer distribution:
                  simulated data has R2 as
                  0.9798 and empirical data has
                  R2 as 0.9712.
                • For project distribution:
                  simulated data has R2 as
                  0.6650 and empirical data has
                  R2 as 0.9815.
                •
BA model – diameter and CC
             • Small diameter and high
               clustering coefficient like
               empirical data
             • Diameter and clustering
               coefficient are both
               decreasing like empirical
               data
BA model with constant fitness
               • Power laws in degree distribution,
                 similar to empirical data (+ for
                 simulated data and x for empirical
                 data).
               • For developer distribution:
                 simulated data has R2 as 0.9742
                 and empirical data has R2 as
                 0.9712.
               • For project distribution: simulated
                 data has R2 as 0.7253 and empirical
                 data has R2 as 0.9815.
               • Diameter and CC are similar to
                 simple BA model.
BA model with dynamic fitness
               • Power laws in degree
                 distribution, similar to
                 empirical data (+ for
                 simulated data and x for
                 empirical data).
               • For developer distribution:
                 simulated data has R2 as
                 0.9695 and empirical data has
                 R2 as 0.9712.
               • For project distribution:
                 simulated data has R2 as
                 0.8051 and empirical data has
                 R2 as 0.9815.
  Advantage of BA with dynamic fitness

• Intuition: Fitness should decreasing with time.
• Statistics: project has life cycle behavior which
  can not be replicated by BA model with
  constant fitness but can be replicated by BA
  model with dynamic fitness
          Conceptual Framework
   Agent-Based Modeling and Simulation
  as Components of the Scientific Method
                  Hypothesis




Observation                        Experiment
                 Summary
• We use ABM to model and simulate the
  SourceForge collaboration network.
• Conceptual framework is proposed for agent-
  based modeling and simulation.
• Case study of this framework: SourceForge
  study through ER, BA, BA with constant fitness
  and BA with dynamic fitness.
Thank you