Office of Marketing and Communication The University of Queensland
Brisbane Qld 4072 Australia
Telephone (07) 3365 3409
Colleen Clur International +61 7 3365 3409
Facsimile (07) 3365 1488
UQ Search Engine
Developed for the Web Interface Working Party
by Chris Taylor, Executive Manager Information Access Service, UQ Library
and Trevor Burke, Manager, Corporate Web Services, Office of Marketing and
Version 1.00 25 Feb, 2005 Trevor Burke
Version 1.10 28Feb, 2005 Chris Taylor, Barbara Freeman &
Simple Search 4
Advanced Search 4
Access Control/Authentication 5
Special note – Document and File Management 5
Specific Requirements 6-8
File type support 6
Spell checking and dictionaries 6
Searching within results 7
Result sorting 7
Multiple search terms - "and" operator 7
Common words – “+” operator 7
Automated Stemming (righthand) 7
Negative terms – “-“ operator 8
Synonym search – “~” operator 8
"OR” operator 8
Administration and configuration 9
Excluding sites and or pages 9
Broken link removal 9
Automated indexing 9
Submitting sites 9
UQ Search Engine Requirements 2
The University of Queensland currently uses the free domain search by Google for
‘all of site’ searching.
Google provides an advanced search solution, however it does not provide two key
options. These are:
the ability to prioritise keywords and sites
the ability to control access to public and private data
This document seeks to outline the principles that can be applied to other search
solutions to achieve search results similar to Google’s and at the same time, include
the additional requirements listed above.
UQ Search Engine Requirements 3
In the case of a simple search, the word ‘simple’ only applies to how easy it is to use.
The results from a simple search are developed from a complex set of rules applied to
the data. The quality or (relevance) of the results is the main criteria rather than
quantity of results (recall).
The ability for the system to determine what information best meets the user’s needs
is an algorithm of associations and weightings. With Google, not all aspects of how
relevance is determined are known, but the following items are part of the calculation:
Frequency of the term/s in the page
Position in the heading
Position in the URL and
The number of other sites linking to it as a reference
Any search solution will need to provide accurate and relevant search results to the
same, comparable or better level of accuracy as the Google domain search provides.
In addition, it is important that some sites and/or pages can be prioritized over others
so that the results from these appear at the top of the list. This would be achieved by
keyword association through an index or taxonomy management system.
An advanced search adopts much of the functionality of the simple search with
additional controls over the data-set searched.
Listed below are the options required for an advanced search:
include all search terms (from basic search)
and/or “exact phrase”
and/or at least one of the words
and/or “does not contain any of these words”
and/or created during a date range
search within certain file types
search within certain websites
search within certain databases or catalogues
More details of these functions are included on the following pages.
UQ Search Engine Requirements 4
The new search solution will need the ability to interact with the University-wide
authentication system that uses both Kerberos and LDAP to provide integrated single
sign-on to applications and websites.
It is anticipated that once a user has signed in from any site using single sign-on, they
would then be able to pass through any specific search gateway/s. If the user is not
signed in, the content will remain invisible or identified as password protected.
In addition, privileges relating to the user’s position/s, should also be recognized.
This should enable that user, based on their role or level, to search additional or other
information. Elements such as employment status (i.e. employed, contractor or
casual), role (i.e. Director of Studies) and the level of appointment (i.e. HEW level
10) could be the criteria. A compatible roles database using LDAP may be required.
It is anticipated that the system will read and/or write a session cookie to identify who
the user is.
Special note – Document and File Management:
To provide secure access to files in addition to websites and databases, the
system should outline functionality offered for document management. It
should provide an integrated solution for managing binary data stores and/or
UQ Search Engine Requirements 5
File type support
File types identified include the following. It is important that over time or as
required, interfaces to other file types can be added.
Microsoft Excel Document
Microsoft Word Document
Adobe PDF Document
Microsoft Powerpoint Document
Various database files (i.e. those with a PHP or ASP interface)
Owing to issues relating to the accuracy and maintenance of metadata, the
ability to interpret a formal metadata schema is not essential. Wherever
possible non-textural items will include a textural description.
The automated generation of metadata could be advantageous and enable
some additional functionality. It is anticipated that such a system would
enable control through the administration interface. This could be beneficial if
linked to a system to manage taxonomy or site indexing.
Spell checking and dictionaries
It will be important to have a dictionary that can cope with common
The ability to include an additional dictionary or dictionaries is also important.
This will cater for certain academic, scientific, administrative, educational and
UQ specific words and acronyms.
Searching within results
There should be an option to search over the results from the initial search and
then again at each stage to allow continual refinement of the results.
UQ Search Engine Requirements 6
The most relevant items to appear first.
Underline terms that are related to the search query.
Provide in-context display of search terms.
Display link title.
Display link address.
Cases sensitivity is not required. I.e. Weather, weather and WEATHER all
produce the same search results.
Multiple search terms - "and" operator
It is important that more search terms produce a better result, providing a
refined result rather than simply more results. By default the “and” operator
should be included when there is more than one word entered.
Common words - “+” operator
Common words such as “where” “but” “is” “a” should be ignored. Where a
common word is essential, it can be included using a “+”.
The inclusion of quotation marks searches for the exact phrase. (I.e.
Automated Stemming (righthand)
Words are automatically reduced to their stem. For example 'computer',
'computing' and 'computers' are treated equally.
Negative terms - “-“ operator
A “-“ negative sign in front of a term means this word should be will not be
included in the search results .
UQ Search Engine Requirements 7
Synonym search – “~” operator
If you want to search not only for your search term but also for its synonyms,
place the tilde sign ("~") immediately in front of your search term.
To find pages that include either of two search terms, add an uppercase OR
between the terms.
UQ Search Engine Requirements 8
Administration and configuration
The most essential part of administration is that once set up and configured, the
system should be virtually maintenance free or as automated as possible.
Any configurations should be easy to make and require a minimum of training.
Content experts without a high level of technical expertise should be able to
administer keywords and taxonomy management.
Excluding sites and or pages
The ability to exclude sites and pages is required. This is in addition to the
recognition of “robot.txt” instructions.
The search solution should provide reports on search terms and log other
Broken link removal
Where the system determines that a page is no longer available, it should be
removed from the stored data.
The full indexing of the site should be done on a daily basis or provide a way
to update sections on request if full indexing requires too much bandwidth. A
manual over-ride so that areas or the whole site is re-indexed should be
An area to log new sites for searching should be available.
Currently site usage statistics show between 10,000 and 15,000 individual
enquiries are made from UQ Websites everyday. This figure seems low given
the number of staff and students and should therefore be treated as the
UQ Search Engine Requirements 9
The Essentials of Google Search
Internet FAQ Archives
Google Help Centre – Advanced Search Made Easy
Search Results Page
Vision for University of Queensland UQ Search Facility
Author: Graham Whittall – Citr/ISMC
Choosing a Search Engine; Website Management
Author: Kim Guenther
Publication Date: Jan/Feb 2005
dtsearch: The Little Search Engine That Could
Author: Ernest Perez
Publication Date: Jan/Feb 2005
Search and Searchability; Does Your Enterprise Pass the Test?
Author: David Hawking and the CSIRO Enterprise Search Team
Publication Date: 25/11/2003
Evaluation criteria for Penn Web index/search packages
Author: Information Systems and Computing, University of Pennsylvania
UQ Search Engine Requirements 10