Developing open source GIS what are the challenges

Document Sample
Developing open source GIS what are the challenges Powered By Docstoc
					      Developing open source
      GIS: what are the
      challenges?
      Gilberto Câmara
      INPE – Brasil
      www.terralib.org

Institute for Geoinformation – TU Wien – 16 June 2004
The Promise of Open Source

   When an OSS project reaches a “critical size” we obtain
    many benefits
   Robustness
       ``Given enough eyeballs, all bugs are shallow.''
   Cooperation
       ``Somebody finds the problem and somebody else understands
        it'„ (Linus Thorvalds)
   Continuous Improvement
       “Treating your users as co-developers is your least-hassle route
        to rapid code improvement and effective debugging”
Naïve view of open source projects

   Software
       Product of an individual or small group (peer-pressure)
       Based on a “kernel” with “plausible promise”
   Development network
       Large number of developers, single repository
   Open source products
       View as complex, innovative systems (Linux)
   Incentives to participate
       Operate at an individual level (“self-esteem”)
       Wild-west libertarian (“John Waynes of the modern era”)
Idealized model of OS software




           Networks of committed individuals
The Reality of Open Source


   Previous existence of conceptual designs of similar
    products (the potential for reverse engineering)
       Design is the hardest part of software (Fred Brooks)



   Problem granularity (the potential for distributed
    development)
       Effective peer-production requires high granularity
Potential for Reverse Engineering

   Post-mature
       A private company develops a software product.
       Product becomes popular and it becomes part of the “public
        commons”.
       Others develop a public domain equivalent (e.g.,Open Office)


   Standards-led
       Standards consolidate a technology
       Allow compatible solutions to compete in the marketplace.
       SQL database standard (e.g.,mySQL and PostgreSQL).
       POSIX standard (guidance to Linux)
       OpenGIS specifications (e.g.,Degree, MapServer, GeoServer)
Potential for Distributed Development

   Parts of a software product
       kernel and additional functions that use it (its periphery).


   Operating systems (Linux)
       well-defined kernel for process control
       periphery consisting of programs such as device drivers,
        applications, compilers and network tools.


   Database management systems
       strong kernel of highly integrated functions (such as the parser,
        scheduler, and optimizer)
       much smaller periphery.
Potential for Distributed Development

   Each type of software product - periphery/kernel ratio
       constrains the potential for distributed development


   Kernel
       a tightly-organized and highly-skilled programming team.
   Periphery
       More widespread programmers of various skills


   Example
        Out of more than 400 developers, the top 15 programmers of
        the Apache web server contribute 88% of added lines [Mockus,
        2002 #2293].
Four Types of Open Source Software


   High reverse engineering, high distribution potential

   High reverse engineering, low distribution potential

   Low reverse engineering, high distribution potential

   Low reverse engineering, low distribution potential
Type 1 – High-High

   High reverse engineering, high distribution potential:

   Archetypical open source projects
       The “Linux” model.


   Developers
       May have a separate job
       Time allocated in agreement with their employer.


   community-led projects.
Type 2 – High-Low

   High reverse engineering, low distribution potential

   Large number of projects
        Databases, office automation tools, web services.

   Large presence of private companies
      products similar to market leaders.
      reduced risk in reverse engineering.
      main design decisions take place within the institution


   Examples
        mySQL and PostgreSQL DBMS,
        GNOME from Ximian

   corporation-led projects.
Type 3 – Low/High

   Low reverse engineering, high distribution potential

   Stable kernel, innovative periphery
       usually there is no commercial counterpart
       share a relatively simple software kernel

   Origin
       academic environments
   Examples
       GRASS GIS software and the R suite of statistical tools.

   collaborative projects
Type 4 – Low/Low

   Low reverse engineering, low distribution potential

   Innovative kernel, small periphery

   Small teams under a public R&D contract
       addressing specific requirements
       aiming to demonstrate novel scientific work.


   High mortality rate
       most of them are restricted to the lifetime of a research grant.


   innovative products.
            High-Low                                   High-High

            mySQL
Potential                  PostgreSQL               Linux
Rev Eng     OpenOffice
                                             perl

                                        Apache

                                                      GRASS
                       Postgres
                                                       R
                        NCSA browser



             Low-Low                                   Low-High



                                                              Potential
                                                              Distrib Develop
            High-Low                                High-High


Potential
Rev Eng            corporate                 communitary




                  innovative                 collaborative



             Low-Low                                Low-High


                               Challenges?                   Potential
                                                             Distrib Develop
Lessons from Open Source Projects

   “It's fairly clear that one cannot code from the ground up
    in bazaar style . One can test, debug and improve in
    bazaar style, but it would be very hard to originate a
    project in bazaar mode. Linus didn't try it. Your nascent
    developer community needs to have something runnable
    and testable to play with” (Eric Raymond)
Moving from the Low-Low Quadrant

   Software in the “Low-Low” quadrant
       Unsustainable in the long run


   Moving from an innovative to a collaborative project
       Sharing innovation
       Transforming a crude prototype into a modular, well designed
        system


   How do you build innovation into a modular design?
Moving from the Low-Low Quadrant

   “Perfection in design is achieved not when there is
    nothing more to add, but rather when there is nothing
    more to take away”. (Saint-Exupery)

   How do you achive perfection in information science?
       Good scientific foundation
       Usually, sound mathematical abstractions


   What is the situation in GIS?
Do we have a solid foundation for GIS?


id name year          selection
                     projection        SELECT name
                   cartesian prod      FROM faculty
                  union difference     WHERE year > 1960


   relations      relational algebra   SQL query language



                   Operations on
                     ST types                ?


Spatio-temporal   Spatial algebra        GIS language
data types
Challenges for geoinformation
                           Source: Gassem Asrar (NASA)
     The Road Ahead: Smart Sensors




SMART DUST
Autonomous sensing and
communication in a cubic
millimeter

     Source: Univ Berkeley, SmartDust project
Knowledge gap for spatial data




                             source: John McDonald (MDA)
What’s the Current Status of Open Source
GIS?
   High-Low products
     Standards-based
     Spatial DBMS: mySQL, PostgreSQL
     OpenGIS + Web: MapServer, Degree


   Low-high products
     Stable kernel, innovation at the periphery
     GRASS and R


   What about GIScience challenges?
       spatio-temporal data models, geographical ontologies, spatial statistics
        and spatial econometrics, dynamic modelling and cellular automata,
        environmental modelling, neural networks for spatial data
TerraLib: Open source GIS library

   Data management
       All of data (spatial + attributes) is in
        database
   Functions
       Spatial statistics, Image Processing,
        Map Algebra
   Innovation
       Based on state-of-the-art techniques
       Same timing as similar commercial
        products
   Web-based co-operative development
       http://www.terralib.org
Operational Vision of TerraLib



                                                           DBMS
                           TerraLib
                                                                Oracle
               Spatial                    Spatial     Access    Spatial
              Operations                 Operations
Geographic                  API for
Application                 Spatial                             MySQL
                           Operations                 Postgre
                                                       SQL




 TerraLib  MapObjects + ArcSDE + cell spaces + spatio-temporal models
    TerraLib applications

   Cadastral Mapping
        Improving urban management of large
         Brazilian cities
   Public Health
        Spatial statistical tools for
         epidemiology and health services
   Social Exclusion
        Indicators of social exclusion in inner-
         city areas
   Land-use change modelling
        Spatio-temporal models of
         deforestation in Amazonia
   Emergency action planning
        Oil refineries and pipelines (Petrobras)
TerraCrime
Palm-top
Exemplos de Produtos Web
TerraLib Structure
   Java Interface        COM Interface       C++ Interface        OGIS Services




                                    Functions


kernel
         Visualization            Spatio-Temporal            File and DBMS
           Controls               Data Structures                 Access




                                   I/O Drivers


                    External
                                                     DBMS
                      Files
Spatio-Temporal Data Types
 Events
               time


Near in space, near
      in time?




                      y




     x
Dynamical Spatial Model


f ( I (t) )       f ( I (t+1) )       f ( I (t+2) )        f ( I (tn ))

              F                   F
                                                      ..


“A dynamical spatial model is a mathematical
representation of a real-world process when a location
changes in response to external forces (Burrough)
Spatial Simulation




                          S2




Reality - Bauru in 1988

                          S3
Cell Spaces: Old Wine, New Bottle
Regression with Spatial Data: Understanding
Deforestation in Amazonia
       Future Deforestation Scenarios




                                        Terra do Meio




       South of Amazonas State


Hot-spots map for new deforestation
Modelling anisotropic space




      Spatial relations in Amazonia are not isotropic!
Desigining for Extensibility

   Algorithms
       basic core of most successful GIS
       large number of them do not depend on some particular
        implementation of a data structure
       based a few fundamental semantic properties of the structure
       properties can be - for example - the ability to get from one
        element of the data structure to the next, and to compare two
        elements of the data structure .
   Spatial analysis algorithms
       can be abstracted away from a particular data structure and
        described only in terms of their properties.
Same Algorithm, Different Geometries
Generic GIS Programming

   How to decouple algorithms from data structures ?
       Idea: Iterators (“inteligent pointers”)
       Algoritms are not classes !!
       “Decide which algorithms you want; parametrize them so
        they work for a variety of suitable types and data
        structures”




     Algorithms             Iterators          Geometries
Scientific Challenges for Innovation in GIS

   How can we design an algebra for ST types?
       What are the spatial-temporal data types?


   How do we design a language for spatial modelling?
       Requires a caracterization of measurents
       Cognitively meaningful interfaces


   Representation of Space
       How do we represent anisotropic space?

   Extensibility of Models and Algorithms
       How do we design for extensibility?
Why am I here today in TU-Wien?

   Innovation in GISystems
       Requires addressing challenges in GIScience


   Cooperation with prof. Andrew Frank
       Generic GIS Programming
       Semantics of Geographical Measurements
       Spatio-Temporal Types and Algebras
       Methods for Representation of Anisotropic Space
Result of Sound Scientific Work
            High-Low                                   High-High

            mySQL
Potential                  PostgreSQL               Linux
Rev Eng     OpenOffice
                                             perl

                                        Apache

                                                      GRASS
                       Postgres
                                                       R
                        NCSA browser

                             TerraLib

             Low-Low                                   Low-High



                                                              Potential
                                                              Distrib Develop
Conclusions

   Open Source software model
       The Linux example is not applicable to all situations
       Moving from the individual level to the organization level


   Geoinformation
       Innovative open source GIS software has a large role
       Sound research is needed to support innovation


   Cooperation in GIScience is fundamental
       The problem is enormous...requires a combination of R&D
       We are few R&D groups
       Cooperation is the only way to ensure a future for GIScience