Open Source Geospatial
Barry Rowlingson
School of Health and Medicine,
Lancaster University
1
I just want to give you an outline into what open
source geospatial software is, what it can do for
you, and why you should use it.
Open Source Geospatial Software
What is Open Source Geospatial Software?
What is 'source'?
What is 'open'?
What is 'geospatial'?
I assume we all know what 'software' is.
2
We should start by defining our terms. I'll break the
main term down into parts, and I'll define the first
three since I'll assume you all know or at least have
a fair idea of what 'software' is.
Source code. What is it? Well, I'm going to show you
part of a computer program now, and I want you to
work out what it does...
Volts
A0
A1
D0
A0 A1 D0 D1
D1
3
Here's a microscope picture of a computer processor chip,
and on the left is a plot of some voltages. You could stick
a probe on the contacts on this chip and read out these
voltages. When you run a program it is read from
memory into the CPU where the real computing
happens.What happens is that a pattern of voltages on
the Address lines produces a pattern of voltages on the
Data lines for the data stored at that address. See how
the voltages are discretized in time? That's the processor
speed – in a 3GHz processor you're talking 3 billion per
second.
Now we could change the program at this stage. With some
clever hardware piggybacked to the CPU it's possible to
change a high to a low voltage at just the right time to
modify the program. Some people actually do this, mainly
to work round encryption programs or other protection on
things like the Xbox or PlayStation. Clearly working with
volts and amps is not an easy way to change a program.
Let's convert those volts into another representation... Let's
call a high voltage a 1 and a low voltage a zero...
Bin
000000000000000000000000000000001110100011111100
111111111111111111111111100000111100010000000100
010110010101110110001101011000011111110011000011
000000000000000001001000011001010110110001101100
011011110010000001010111011011110111001001101100
011001000000000000000000010001110100001101000011
001110100010000000101000010001110100111001010101
001010010010000000110100001011100011000100101110
4
So I'm now looking at the program as a long
sequence of binary numbers. A very long string.
Somewhere in this string is all the logic of our
program. But as a binary string its very easy to lose
our place when scanning it. So we'll break it into
groups of eight binary bits and convert those into
hexadecimal....
Hex
0000050 0034 0000 0000 0028 000b 0008 4c8d 0424 e483 fff0 fc71 8955 51e5 ec83 c704 2404 0000 0000 fce8 ffff
0000120 83ff 04c4 5d59 618d c3fc 0000 6548 6c6c 206f 6f57 6c72 0064 4700 4343 203a 4728 554e 2029 2e34 2e31
0000170 2033 3032 3730 3930 3932 2820 7270 7265 6c65 6165 6573 2029 5528 7562 746e 2075 2e34 2e31 2d32 3631
0000240 6275 6e75 7574 2932 0000 732e 6d79 6174 0062 732e 7274 6174 0062 732e 7368 7274 6174 0062 722e 6c65
0000310 742e 7865 0074 642e 7461 0061 622e 7373 2e00 6f72 6164 6174 2e00 6f63 6d6d 6e65 0074 6e2e 746f 2e65
0000360 4e47 2d55 7473 6361 006b 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000430 0000 0000 0000 0000 0000 0000 001f 0000 0001 0000 0006 0000 0000 0000 0034 0000 0026 0000 0000 0000
0000500 0000 0000 0004 0000 0000 0000 001b 0000 0009 0000 0000 0000 0000 0000 0368 0000 0010 0000 0009 0000
0000550 0001 0000 0004 0000 0008 0000 0025 0000 0001 0000 0003 0000 0000 0000 005c 0000 0000 0000 0000 0000
0000620 0000 0000 0004 0000 0000 0000 002b 0000 0008 0000 0003 0000 0000 0000 005c 0000 0000 0000 0000 0000
0000670 0000 0000 0004 0000 0000 0000 0030 0000 0001 0000 0002 0000 0000 0000 005c 0000 000c 0000 0000 0000
0000740 0000 0000 0001 0000 0000 0000 0038 0000 0001 0000 0000 0000 0000 0000 0068 0000 0041 0000 0000 0000
0001010 0000 0000 0001 0000 0000 0000 0041 0000 0001 0000 0000 0000 0000 0000 00a9 0000 0000 0000 0000 0000
0001060 0000 0000 0001 0000 0000 0000 0011 0000 0003 0000 0000 0000 0000 0000 00a9 0000 0051 0000 0000 0000
0001130 0000 0000 0001 0000 0000 0000 0001 0000 0002 0000 0000 0000 0000 0000 02b4 0000 00a0 0000 000a 0000
0001200 0008 0000 0004 0000 0010 0000 0009 0000 0003 0000 0000 0000 0000 0000 0354 0000 0013 0000 0000 0000
0001250 0000 0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000
0001320 0004 fff1 0000 0000 0000 0000 0000 0000 0003 0001 0000 0000 0000 0000 0000 0000 0003 0003 0000 0000
0001370 0000 0000 0000 0000 0003 0004 0000 0000 0000 0000 0000 0000 0003 0005 0000 0000 0000 0000 0000 0000
0001440 0003 0007 0000 0000 0000 0000 0000 0000 0003 0006 0009 0000 0000 0000 0026 0000 0012 0001 000e 0000
0001510 0000 0000 0000 0000 0010 0000 6800 6c65 6f6c 632e 6d00 6961 006e 7570 7374 0000 0014 0000 0501 0000
0001560 0019 0000 0902 0000
5
So here's the program in base-16 numbers,
hexadecimal, grouped into four hex digits each, and
on the left is a running count, or address.
These codes are a bit less abstract that the binary
numbers, since they correspond to instructions that
the processor can execute. But as well as being
instructions they can also be data.
So let's see what this program looks like as
instructions...
Assembly
.section .rodata
.LC0:
.string "Hello World"
.text
.globl main
.type main, @function
main:
leal 4(%esp), %ecx
andl $16, %esp
pushl 4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $4, %esp
movl $.LC0, (%esp)
call puts
addl $4, %esp
popl %ecx
popl %ebp
leal 4(%ecx), %esp
6
ret
Finally we get to something almost readable. There's a
section of data, which contains the string 'hello world',
and then a 'main' section which has these funny
instructions in them. This is Assembly language. You
use an Assembler program to convert this to binary.
Suppose I asked you to change this to say 'Good
Morning World', I'm pretty sure you'd know what
needed to be done. But you'd probably not want to
write a whole program like this. Some people do,
because these instructions are specific to a type of
processor, the common Intel x86 processor. To run on
a different processor, like a PowerPC, you need to
rewrite it with different instructions. Some code is
written in this language, but mainly for operating
systems or hardware support.
Mostly you'd write your program in another language and
use a compiler to create the assembly language.
Source
main(){
printf(”Hello World\n”);
}
C Assembly
compiler assembler Binary
(source code) Language
7
Here's the same program written in C. You use a
compiler to convert the C to Assembler instructions,
then an Assembler to convert that into binary, then
the CPU reads that in and (as long as nobody is
poking the CPU with any dodgy voltages) the
program runs.
C is one of many high-level language. They have big
advantages. You can easily see what they do, they
are independent of hardware since all you need is
an assembler that targets your CPU and you're
done.
We have just travelled upstream from the voltages in
the processor to their source – the C program. It's
the preferred form for changing the program.
So, what can you do if you've got the source?
Musical Source
8
Suppose you're a fan of Industrial rock band Nine
Inch Nails, but you think the bass is a bit rubbish
and you want to try and play the bass line yourself.
So you try turning the bass level down on your hifi.
But that gets rid of the bass drum as well, and the
grungy keyboard part.
What can you do?
Well... you could form a Nine Inch Nails tribute
band...
Nine Inch Fails
9
But you'd never sound as good as the original, and
you'd have to work out all the parts yourself.
Hmm that bass player looks familiar...
What you need is access to Trent Reznor's multitrack
tapes.
Multitrack
10
In a studio instruments are recorded on separate
tapes, then fed into a mixer to produce the final
recording. If you had these tapes, and a multitrack
mixer you could cut out the bass guitar. And plug
yourself in.
Musical Source
11
And then you can go 'listen, this is me playing bass
with Nine Inch Nails'. And you can actually do this.
One of the Nine Inch Nails DVDs contains the
individual instrument tracks.
Unfortunately most music in the shops today is sold
with a license that explicitly stops you from playing
with it in any way. You're not even allowed to copy
the original, and you are a million miles from the
multi-track tapes.
And back when software was in its infancy some
people realised that this kind of restriction on
software was going to seriously hinder the
development of good computer programs. So they
developed the philosophy of free and open source
software. Which is in effect like having access to
the master tapes of a recording.
Open Source Software
“I would love to change the
world, but they won't give
me the source code.”
12
Open Source means you're never just given a bunch
of binary numbers and told to run it. You get the
source.
Free Software means you are free to do stuff with it.
If it is Free and Open Source then you have
freedoms to do stuff with the source. Which is a
powerful thing to have. You could, for example,
improve the software by fixing bugs or adding new
features, you could even take parts of one program
and use them in your own fresh creations.
It makes remix mashups of software possible.
Foundations
Free Open
Software! Source!
Richard Stallman Bruce Perens
13
Software philosophy has its personalities and its
schisms and disagreements. There's one faction,
coming from the somewhat hippy, libertarian strand
of the American political spectrum, that see
software licensing as impinging on what should be
freedoms and liberties – these are the Free
Software people. The 'free' here refers to freedom,
and very rarely refers to 'free' as in zero-cost,
although that becomes part of the equation almost
in consequence of liberty.
Now obviously businesses are a bit frightened by
anything hippy and wooly like that, so some other
guys who wear suits and ties and cut their hair
turned this into 'Open Source', and worked on how
releasing your software with this kind of license can
benefit businesses, although it may seem like
giving something away.
Free and Open Source
http://www.opensource.org/ http://www.fsf.org/
Licensing: Freedoms:
Free Redistribution
Run the program for any
purpose
Source Code Availibility
Study and change the
Derived Works program
No Discrimination Copy the program
No Tie-ins Share your changes
14
Summarised its like this. The OSI tend to work at
licensing, but the FSF look at freedoms. You don't
have to take one position or the other, there's a lot
of agreement between the two. It's just nice to be
aware that there are these two friendly factions in
the open source/free software movement.
Open Source Licenses
Approval by OSI allows you to call a license
'Open Source' – they own the trademark!
GNU General Public License (GPL)
Artistic License
Mozilla Public License (MPL)
BSD, New BSD, Modified BSD
'copyleft', 'All Rights Reversed'
15
One thing the OSI don't freely distribute is the right to
call something 'Open Source' – they own the
trademark!
They have a big list of licenses that are allowed to
call themselves 'open source', which they list on
their web site. You'll see free software with these
licenses and in some cases it may be important to
know the differences between them. But if you see
the 'OSI approved' logo then you know what you've
got you're going to be free to use and distribute.
You'll see things like the GPL, the various BSD
licenses, puns like 'Copyleft' and 'All Rights
Reversed'.
Then there's ”Public Domain”. This is a where you
relinquish any rights over a creative work. You don't
even have copyright on it any more.
It's not just programs
Programs are useless without data
Especially when that data are maps
Do you really want your data locked up in a
secret box that you rent a key for?
Really?
I didn't think so. What you really want is
Open Standards for your data
16
Now programs, especially mapping programs, are
useless without data.
So you have your data in data files which maybe you
get from somewhere. And they are in a file format
which is designed and controlled by a company that
doesn't tell you how the data is encoded. If you
don't keep buying their software, you won't be able
to read the data. And it's not just data you buy, it's
any files you save in that format. It's like having
your jewels locked away in a safe, but you're only
renting a key.
So as well as a push for open source programs,
there's a corresponding push for open file format
standards, so we can all use them, and we never
get locked out of our data
...and communications
These days programs and data are useless
unless they can work together.
Do you really want your programs talking to
each other using a language that you only
have a lease for?
Really?
I didn't think so. What you really want is
Open Communication Standards
17
But I'll go a bit further. These days, programs and
data are useless unless they can work with other
programs and other data. And by 'work together', I
mean communicate.
Now, your programs need a language to talk to each
other so they need a common language. And if
control of that language is by a non-public source,
and you are only licensed to use it, you could end
up having to pay to get programs talking to each
other. Just imagine if you needed a license to
speak English?
Just as nobody can stop you talking English, why
should anyone stop computer programs talking in
their way? We need open communication
standards
Open Standards
EU Commissioner Erkki Liikanen: "Open standards are
important to help create interoperable and affordable
solutions for everybody. They also promote
competition by setting up a technical playing field that
is level to all market players. This means lower costs
for enterprises and, ultimately, the consumer."
W3C Director Tim Berners-Lee: "The decision to make
the Web an open system was necessary for it to be
universal. You can't propose that something be a
universal space and at the same time keep control of
it."
18
Open standards are so useful in all areas of business
that their existence pre-dates their use in
computers. The ISO is an organisation for ratifying
standards, from such things as paper size (such as
'A4'), machine screw specifications, business
processes, and so on. However getting ISO
certification can be a long and expensive process.
So who is going to decide what standards to use?
Because there's a problem with standards...
Open Source Software
“The great thing about
standards is that there are
so many to choose from”
19
Just imagine that when you built a house you could
have it connected to any number of electrical
standards – 240V, 110V, 50Hz, 60Hz – and then
had to buy appliances to match.... Surely this will
just lead to anarchy?
Organisation
pen Source?
20
Well, with the rise of cheap communication via the
internet, the Open Source world has organised
itself very well. There are a number of
organisations working on standards for things –
and very rarely do you find more than one
organisation working on different standards for the
same thing. Quality tends to win. The evidence
seems to be that working with common standards
is a win for everyone. And the more common the
standard, the bigger the win...
So who decides open standards in the world of maps
and spatial data?
The OGC
The Open Geospatial
Consortium, Inc.® (OGC)
is a non-profit,
international, voluntary
consensus standards
organization that is leading
the development of
standards for geospatial
and location-based
services.
21
The OGC is...all that.
Note that they develop standards. Basically
documents, designs, outlines, schemas.
And this is no fly-by-night project. There are some big
names putting money into open standards. Here's a
few of the big member companies who are putting
hundreds of thousands of dollar into the
organisation...
OGC members
Plus dozens of small companies, national and local
government agencies,universities, research institutes etc
22
Big names indeed.
You'll notice a some commercial GIS companies, US
govt agencies, misc IT companies, some
aerospace companies and the US Dept of
Homeland Security....
Here I've arranged them in order of my personal
feeling in their evilness, with increasing evilness
going to the right...
So, what standards does the OGC work on?
OGC Standards
Catalogue Service, CityGML, Coordinate Transformation,
Filter Encoding, Geographic Objects, Geography Markup
Language, Geospatial eXtensible Access Control Markup
Language (GeoXACML), GML in JPEG 2000, Grid
Coverage Service, KML, Location Services (OpenLS),
Observations and Measurements, Sensor Model
Language, Sensor Observation Service, Sensor Planning
Service, Simple Features, Simple Features CORBA,
Simple Features OLE/COM, Simple Features SQL, Styled
Layer Descriptor, Symbology Encoding, Transducer
Markup Language, Web Coverage Service, Web Feature
Service, Web Map Context, Web Map Service, Web
Processing Service, Web Service Common
23
Here's a list of most of the set standards according to
the OGC web site.
As well as these formal standards they also have
various standards in development as well as more
general information.
But the OGC doesn't write any software. And the
members don't necessarily write open source
software. So who is using these standards and
writing open source software?
24
It's the internet. Here's the first map of the workshop,
and it's a map of the internet.
We're all on here, as are every major and minor
company on the planet. And millions of these
people are writing open source software, and of
those some are writing geospatial software.
Let's just look at a few more maps of the internet...
The Internet
25
This is the internet as an underground tube train
map. The bigger sites are bigger stations with more
interconnects, and the different lines connect web
sites with the same theme.
26 www.xkcd.com
Here's a more technical way of mapping the internet,
this time by using the underlying IP number that is
related to the more familiar IP name. This map
shows how large parts of the number range are
allocated to different organisations.
www.xkcd.com
27
Here's another more fanciful view. The internet as an
old-style treasure map!
Coming back to open-source software, what stops
the software developers degenerating into chaos?
Well, most programs have a fairly well-defined
hierarchy...
Project Structure
BDFL
Core The Program Users
Developers 28
In a community-developed open source project you'll often
find one person, whose idea it was, at the top of the tree,
as a BDFL, or Benevolent Dictator For Life. Working with
the BDFL you might find a small group of 'Core'
developers, and only the BDFL and the Core can change
the code in the official release.
You might also have a less structured and organised
clique of other developers who may not be able to
change the released program without submitting their
changes upstream to the Core.
And because it's open source, everyone can see the code,
so every user has the potential to be a developer. And
even users who can't write programs can help with
documentation, testing, sending bug reports, and so on.
But what if YOU don't like what the BDFL or Core are
doing? And they won't listen to your reason? Well, in that
case, because the code is open source, you can take it
and start your own version – this is called Forking.
Forking
Program Program
X Users XX
29
Here's what forking looks like. One of the developers
from the original project on the left has taken a copy
and declared himself BDFL of the new project on the
right. He's also managed to take a few developers with
him. Now users have a choice of two programs to use,
and people can work on one or both.
Not only that, but the projects themselves can use each
others code.
Forks aren't always hostile, sometimes there are two
valid ways of doing something, or perhaps it's an
opportunity for one fork to try something radical while
the other fork remains conservative. Sometimes forks
will merge back into one happy project again. And if a
fork doesn't get any users or developers, it withers
away. But never really dieing as long as the code is on
an archive somewhere...
So that's how projects tend to be organised. Is there a
bigger organisational picture?
Umberella ella ella
30
http://www.flickr.com/photos/special
Yes!
There are a number of umbrella organisations that
promote the writing of open-source software.
Some are domain-specific, some will take on just
about anything.
They provide various services such as web sites and
mailing lists, publicity, conferences and so on.
Let's have a look at some...
HTTP Server * ActiveMQ * Ant * APR * Archiva * Beehive *
Cayenne * Cocoon * Commons * Continuum * CXF * DB *
Directory * Excalibur * Felix * Forrest * Geronimo * Gump *
Hadoop * Harmony * HiveMind * HttpComponents * iBATIS *
Incubator * Jackrabbit * Jakarta * James * Labs * Lenya *
Logging * Lucene * Maven * Mina * MyFaces * ODE * OFBiz *
OpenEJB * OpenJPA * Perl * POI * Portals * Roller * Santuario
* ServiceMix * Shale * SpamAssassin * STDCXX * Struts *
Synapse * Tapestry * TCL * Tiles * Tomcat * Turbine *
Tuscany * Velocity * Wicket * Web Services * Xalan * Xerces *
XML * XMLBeans * XML Graphics
31
Here's one you may or may not have heard of, but
you've definitely made use of. The apache
foundation support all sorts of projects, mostly
those concerned with web site development. I
suspect about 90% of web sites on the internet use
these technologies somewhere.
They also organise a big conference where people
interested in these things gather to discuss it all.
http://savannah.gnu.org/
32
Remember the Free Software Foundation? Those
hippy guys who think software should be free for all
to use and modify and copy? Well they have a web
site set up to promote that. Anyone can register on
the site and then use the facilities to develop and
distribute their software. As long as you release
your software under an acceptable free license of
course.
The Savannah web site gives your a source code
management system, mailing list, bug tracking, web
pages and so on.
Sourceforge
33
A similar site with fewer philosophical restrictions is
SourceForge. They allow free registration and give
you source code management, mailing lists, bug
trackers etc, but are a bit more generous in the type
of software licenses they accept.
Also, the software that runs the sourceforge site isn't
open source itself, which has caused the Free
Software Foundation to have a bit of a screaming
fit. Savannah, of course, runs only with freely
available software in keeping with the FSF
philosophy.
So what does Open Source Geospatial Software
have?
The Open Source Geospatial Foundation
“Created to support and build the highest
quality open source geospatial software.
Our goal is to encourage the use and
collaborative development of community
led projects”.
34
For geospatial software development we haev the
Open Source Geospatial Foundation, or OSGeo to
its friends.
OSGeo projects
Web Mapping Geospatial Libraries
deegree FDO
Mapbender GDAL/OGR
MapBuilder GEOS
MapGuide Open Source GeoTools
MapServer MetaCRS
OpenLayers
Metadata Catalog
Desktop Applications GeoNetwork
GRASS GIS
OSSIM Other Projects
Quantum GIS Public Geospatial Data
gvSIG Education and Curriculum
35
Here's the list of projects that OSGeo nurture. We'll
be looking at many of these in the rest of the
course. There's everything here for corporate and
research geospatial needs, and if you think there
isn't, you can use the tools to build it yourself – or
get a programmer in to do it.
Notice that they also lobby for public access to
geospatial data, and to education. And they've
organised the FOSS4G conferences for the past
few years...
FOSS4G
36
Been there, done that, quite literally got the T-shirt.
This is the biggest open source geospatial
conference in the world – FOSS4G. Over 500
geospatial experts gathered together for
workshops, lectures, code sprints, exhibitions, and
a whole lot of talking.
Conferences of course are a great way to really get
involved in the open source community. It's not
essential of course, you can work with people for
years on projects without actually meeting them,
but it is good to shake hands and buy them a beer.
And also it's a great opportunity for a bit of
sightseeing and photography...
37
www.rowlingson.com
There, a bit of blatant self-promotion. I should quit my
job and be a postcard photographer in Canada..
And it is the community of open source developers
that make open source programs what they are,
and you shouldn't forget that. It has been claimed
that because open source programming seems
counter to capitalism and corporate structures, that
using it and developing it is akin to communism, as
this slightly tongue in cheek poster illustrates...
38
But there is also the capitalist side of Open Source.
You can't sell the software, so sell the service, with
commercial support and training.
Commercial Support
Because sometimes
you can't rely on the
internet
OSGeo Service
Provider Directory
39
There are lots of companies making money from
open source. They sell training, customisation,
graphics, design, bespoke packages and so on.
They'll use open-source components so that you
can't ever get locked in to one provider.
And OSGeo provide a directory of experts willing to
take your money to do this kind of stuff for
geospatial services. For example you can search
for companies that will help you with OpenLayers in
England and you'll find a few - including a little outfit
called Oxford Archaeology...
But it's the community that makes open source what
it is. In fact, it's not communism. I prefer to think of
it like this:
40
Hopefully it's not a community of clones who look
quite as similar as this.
Here endeth the session...
After a short break we'll take a look at some of the
basic foundations of geospatial data itself...