ROADS - First Impressions

Document Sample
ROADS - First Impressions Powered By Docstoc
					                                                                                               Page 1
                                              ROADS - First Impressions - V0.1 - 21st. February, 1996
                                                            Mark S. Burrell, ADAM Technical Officer

              This is an INTERNAL ADAM document - NOT for external distribution



1. ROADS - First Impressions
SP1f9602

1.1. Introduction
ROADS V0.25 has been installed on EDEN, and a couple of example records have been
added (one to point to the simple UNIX guide, the other to point to the EDEN homepage).

1.2. Accessing ROADS
To add a record (‘template’ in their language) point your browser to
       http://eden.unn.ac.uk/cgi-bin/mktemp.pl

To search the (rather limited!) database point to
       http://eden.unn.ac.uk/cgi-bin/search.pl

If you want to view the actual storage, then the completed templates are stored in
       /web/ROADS/templates/

while the inverted index is within
       /web/ROADS/inverted/

1.3. File Structure
ROADS does not use any form of database file to store record or index information. Instead
it users the UNIX file structure coupled with flat ASCII files as storage. This is a very
unusual method of storage and very much pits disk space against speed.

The UNIX file structure has very fast access times (after 20 years of development it should be
fast) but it means that it can be wasteful on size of storage. For us this is not currently a
problem as we have about 9 gigabytes of free disk space at the moment.

When a record is added, this is what happens. An ID for the record is automatically created
and the new record with the ID as its filename is created in the templates directory. On
EDEN this is at

       /web/ROADS/templates

Each search term (as far as I can see this is all the text stored within the record) is then added
to the index. for example if we are indexing the word ‘computer’ this is what happens.

 Check to see if the is a directory within the /web/ROADS/inverted directory that’s called
  ‘co’ (this in ROADS fancy terms is called a bigraph directory).
 If the directory ‘co’ does not exist then create it.
 Enter the directory /web/ROADS/inverted/co
 Check to see if a file called ‘computer’ exists.
                                                                                                Page 2
                                               ROADS - First Impressions - V0.1 - 21st. February, 1996
                                                             Mark S. Burrell, ADAM Technical Officer

              This is an INTERNAL ADAM document - NOT for external distribution

 If it doesn’t exist then create it.
 Append the ID of the template to the file.

Technical or what. In other words, check in the file called

       /web/ROADS/inverted/co/computer

to find the IDs of all the records that include the word ‘computer’. By the way, if the word
had been ‘Computer’ the file name would also be ‘Computer’, in other words If we were
doing a case-insensitive search we would search the files ‘computer’ and ‘Computer’ - both
would be in the sub-directory ‘co’.

1.4. ROADS Templates
As you all know, these are the IAFA templates. It does seem easy to add extra fields as
required - we don’t need to stick with the default templates, it seems easy enough to change
them.

1.5. ROADS Current Status
ROADS is very much in its early stages, a number of features that are useful to have are not
yet in the current version. Currently ROADS cannot handle multiple thesauri (it is in the
ROADS v.1 draft), nor can it handle distributed databases (this is aimed for inclusion in
ROADS v2).

Z39.50 Compliancy is currently being investigated by the ROADS team, but at present,
ROADS is not Z39.50 compliant. By the way, Z39.50 is just another fancy name for a
particular protocol that works within the application layer of the ISO OSI (Open Systems
Integration) model (it has 7 layers - going from lowest level to highest level they are physical,
datalink, network, transport, session, presentation and application layers. And yes, I do
have this sort of rubbish in my head - good mnemonics help). So think of the application
layer as a sort of information processing layer. Whois++ is just another protocol that does a
similar job (actually, it does (did?) a narrow job - but has branched out). Providing
compliancy between different protocols is not technically difficult (more of a pain, really),
the thing you have to watch though is ‘matching up template attributes to official Z39.50
profiles’ (according to Jon Knight).

Link checking is in the draft of ROADS v1. Even if it wasn’t it would be relatively easy to
write a Perl script that would do this job.

The performance of the current ROADS access methods, according to Jon Knight, looks
reasonable - even on large databases. They are talking about moving to a compiled language
when the Perl scripts start to get too large. One other thing of interest that they are talking
about is allowing ‘other backend databases to be plugged in’ (Jon Knight).

1.6. Could We Add Charging for Usage?
Yes. We could set up the server such that a username/password is required for access. The
service could then be accessed for possible charging. Also at the same time we could log all
                                                                                               Page 3
                                              ROADS - First Impressions - V0.1 - 21st. February, 1996
                                                            Mark S. Burrell, ADAM Technical Officer

              This is an INTERNAL ADAM document - NOT for external distribution

queries so that the way in which people used the service could be examined. This would
require updating the ROADS Perl scripts - but not too major a task.

2. The Competition
The following packages were checked out - to be included in this list a package had to be
‘free’, available on the net, accessible through Perl, and have a significant user base and
support group.

 Glimpse - search engine of Harvest. Looks interesting. If Harvest are using it then it can’t
  be all bad, although I think its a relatively simple application.
 mSQL - subset of SQL commands available in this Australian package. Jon Knight
 reports it to be slower than the current ROADS implementation, but I think its well worth
  running some tests on it as it seems to have a reasonable amount of satisfied users.
 University Ingres - no longer being supported - superseded by Postgres95.
 rdb - an implementation from a 1991 copy of UNIX world of a relational database.
  Looks like its poorly used, so I think it should be avoided.
 Postgres95 - taken over form University Ingres and Postgres. This one looks very
  interesting. It sits on top of 10 years of development from the great database gurus of
  Berkeley, so can’t be all bad.

A number of these packages (glimpse, postgres95 and mSQL) are worth investigating further
- to see if they would be useful to the project as a backend database.

3. Conclusions
I never thought I’d say this, but I think ROADS is actually adequate for the job - at least in
the short to medium term. It will enable us to get something up and running in a minimum
time period. If we come up with a better database engine at some point then it will prove
relatively simple to extract the data (it’s all in flat files anyway!) and move it into the new
structure.

I’m still not convinced about using IAFA records but, again, it does seem easy to use our own
record structure - or our own additions to the IAFA templates. However, what we store is
not up to me, my domain is that of how we store our information and how we make it
accessible.

Also, other systems (‘free’ systems) are available - some of them may well suit our needs
more fully. (As could a ‘bought in’ database system).

4. Recommendations
My recommendations would be the following :-

 Our first priority should be to create a user requirements specification, to enable us to fully
  describe the functions that our information gateway should provide. What do our users
  actually want? At present I feel that too much is clouded in buzzwords and
  generalisations. (Maybe we ourselves won’t even know what our users require until we
  have been running a prototype ROADS service for a number of months.)
                                                                                            Page 4
                                           ROADS - First Impressions - V0.1 - 21st. February, 1996
                                                         Mark S. Burrell, ADAM Technical Officer

             This is an INTERNAL ADAM document - NOT for external distribution



 Start storing records within the ROADS system and have a first generation browser
  available for our users (and us!). Moving from this system will always be possible (and
  relatively easy) at a later date.

 Add ‘add-ons’ to ROADS to enable us to examine how searches are being conducted.
  (ROADS V0.30 is meant to have some sort of search log).

 Analyse a number of the above relational systems and produce some speed benchmarks
  for a given number of records (maybe I can have access to record sets from other SBIGs).
  If any one system looks closer to meeting our specification then we should aim at
  producing a prototype system.

 See about getting a demonstration/example copy of Oracle - to examine the possibilities of
  a ‘bought-in’ system.

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:3
posted:2/24/2010
language:Slovenian
pages:4