Page 1 ROADS - First Impressions - V0.1 - 21st. February, 1996 Mark S. Burrell, ADAM Technical Officer This is an INTERNAL ADAM document - NOT for external distribution 1. ROADS - First Impressions SP1f9602 1.1. Introduction ROADS V0.25 has been installed on EDEN, and a couple of example records have been added (one to point to the simple UNIX guide, the other to point to the EDEN homepage). 1.2. Accessing ROADS To add a record (‘template’ in their language) point your browser to http://eden.unn.ac.uk/cgi-bin/mktemp.pl To search the (rather limited!) database point to http://eden.unn.ac.uk/cgi-bin/search.pl If you want to view the actual storage, then the completed templates are stored in /web/ROADS/templates/ while the inverted index is within /web/ROADS/inverted/ 1.3. File Structure ROADS does not use any form of database file to store record or index information. Instead it users the UNIX file structure coupled with flat ASCII files as storage. This is a very unusual method of storage and very much pits disk space against speed. The UNIX file structure has very fast access times (after 20 years of development it should be fast) but it means that it can be wasteful on size of storage. For us this is not currently a problem as we have about 9 gigabytes of free disk space at the moment. When a record is added, this is what happens. An ID for the record is automatically created and the new record with the ID as its filename is created in the templates directory. On EDEN this is at /web/ROADS/templates Each search term (as far as I can see this is all the text stored within the record) is then added to the index. for example if we are indexing the word ‘computer’ this is what happens. Check to see if the is a directory within the /web/ROADS/inverted directory that’s called ‘co’ (this in ROADS fancy terms is called a bigraph directory). If the directory ‘co’ does not exist then create it. Enter the directory /web/ROADS/inverted/co Check to see if a file called ‘computer’ exists. Page 2 ROADS - First Impressions - V0.1 - 21st. February, 1996 Mark S. Burrell, ADAM Technical Officer This is an INTERNAL ADAM document - NOT for external distribution If it doesn’t exist then create it. Append the ID of the template to the file. Technical or what. In other words, check in the file called /web/ROADS/inverted/co/computer to find the IDs of all the records that include the word ‘computer’. By the way, if the word had been ‘Computer’ the file name would also be ‘Computer’, in other words If we were doing a case-insensitive search we would search the files ‘computer’ and ‘Computer’ - both would be in the sub-directory ‘co’. 1.4. ROADS Templates As you all know, these are the IAFA templates. It does seem easy to add extra fields as required - we don’t need to stick with the default templates, it seems easy enough to change them. 1.5. ROADS Current Status ROADS is very much in its early stages, a number of features that are useful to have are not yet in the current version. Currently ROADS cannot handle multiple thesauri (it is in the ROADS v.1 draft), nor can it handle distributed databases (this is aimed for inclusion in ROADS v2). Z39.50 Compliancy is currently being investigated by the ROADS team, but at present, ROADS is not Z39.50 compliant. By the way, Z39.50 is just another fancy name for a particular protocol that works within the application layer of the ISO OSI (Open Systems Integration) model (it has 7 layers - going from lowest level to highest level they are physical, datalink, network, transport, session, presentation and application layers. And yes, I do have this sort of rubbish in my head - good mnemonics help). So think of the application layer as a sort of information processing layer. Whois++ is just another protocol that does a similar job (actually, it does (did?) a narrow job - but has branched out). Providing compliancy between different protocols is not technically difficult (more of a pain, really), the thing you have to watch though is ‘matching up template attributes to official Z39.50 profiles’ (according to Jon Knight). Link checking is in the draft of ROADS v1. Even if it wasn’t it would be relatively easy to write a Perl script that would do this job. The performance of the current ROADS access methods, according to Jon Knight, looks reasonable - even on large databases. They are talking about moving to a compiled language when the Perl scripts start to get too large. One other thing of interest that they are talking about is allowing ‘other backend databases to be plugged in’ (Jon Knight). 1.6. Could We Add Charging for Usage? Yes. We could set up the server such that a username/password is required for access. The service could then be accessed for possible charging. Also at the same time we could log all Page 3 ROADS - First Impressions - V0.1 - 21st. February, 1996 Mark S. Burrell, ADAM Technical Officer This is an INTERNAL ADAM document - NOT for external distribution queries so that the way in which people used the service could be examined. This would require updating the ROADS Perl scripts - but not too major a task. 2. The Competition The following packages were checked out - to be included in this list a package had to be ‘free’, available on the net, accessible through Perl, and have a significant user base and support group. Glimpse - search engine of Harvest. Looks interesting. If Harvest are using it then it can’t be all bad, although I think its a relatively simple application. mSQL - subset of SQL commands available in this Australian package. Jon Knight reports it to be slower than the current ROADS implementation, but I think its well worth running some tests on it as it seems to have a reasonable amount of satisfied users. University Ingres - no longer being supported - superseded by Postgres95. rdb - an implementation from a 1991 copy of UNIX world of a relational database. Looks like its poorly used, so I think it should be avoided. Postgres95 - taken over form University Ingres and Postgres. This one looks very interesting. It sits on top of 10 years of development from the great database gurus of Berkeley, so can’t be all bad. A number of these packages (glimpse, postgres95 and mSQL) are worth investigating further - to see if they would be useful to the project as a backend database. 3. Conclusions I never thought I’d say this, but I think ROADS is actually adequate for the job - at least in the short to medium term. It will enable us to get something up and running in a minimum time period. If we come up with a better database engine at some point then it will prove relatively simple to extract the data (it’s all in flat files anyway!) and move it into the new structure. I’m still not convinced about using IAFA records but, again, it does seem easy to use our own record structure - or our own additions to the IAFA templates. However, what we store is not up to me, my domain is that of how we store our information and how we make it accessible. Also, other systems (‘free’ systems) are available - some of them may well suit our needs more fully. (As could a ‘bought in’ database system). 4. Recommendations My recommendations would be the following :- Our first priority should be to create a user requirements specification, to enable us to fully describe the functions that our information gateway should provide. What do our users actually want? At present I feel that too much is clouded in buzzwords and generalisations. (Maybe we ourselves won’t even know what our users require until we have been running a prototype ROADS service for a number of months.) Page 4 ROADS - First Impressions - V0.1 - 21st. February, 1996 Mark S. Burrell, ADAM Technical Officer This is an INTERNAL ADAM document - NOT for external distribution Start storing records within the ROADS system and have a first generation browser available for our users (and us!). Moving from this system will always be possible (and relatively easy) at a later date. Add ‘add-ons’ to ROADS to enable us to examine how searches are being conducted. (ROADS V0.30 is meant to have some sort of search log). Analyse a number of the above relational systems and produce some speed benchmarks for a given number of records (maybe I can have access to record sets from other SBIGs). If any one system looks closer to meeting our specification then we should aim at producing a prototype system. See about getting a demonstration/example copy of Oracle - to examine the possibilities of a ‘bought-in’ system.