DESIRE Peer Review Report

Reviews
Shared by: armedman2
Stats
views:
11
rating:
not rated
reviews:
0
posted:
12/10/2008
language:
English
pages:
0
DESIRE II - Development of a European Service for Information on Research and Education II DESIRE: Peer Review Report Project Number: Project Title: RE 4004 (RE) DESIRE II - Development of a European Service for Information on Research and Education II Deliverable Number: D3.6 Deliverable Title: Deliverable Type: Deliverable Kind: Principal Reviewer: Automatic classification PU PR Name Address Padmini Srinivasan Professor and Director School of Libraryand Information Science, The University of Iowa Iowa City, IA 52246, USA E-Mail Telephone Fax Credentials Padmini-srinivasan@uiowa.edu +1-319-335-5708 +1-319-335-5374 (Qualifications/Relevant experience/short CV) Padmini Srinivasan has been a Professor at the School of Library & Information Science since 1989. Prior to that she was a professor in the Computer Science department of George Mason University. This was preceded in 1986 by a short post doctoral fellowship in OCLC, Office of Research. Her PhD is from the School of Information Studies, Syracuse University, Syracuse, NY. Srinivasan has been interested in the role of controlled vocabularies in text retrieval. Her earlier work has demonstrated the value of MeSH (Medical Subject Headings) for the automatic expansion of MEDLINE queries. Her present research focus is on exploiting the UMLS (Unified Medical Language System) for query expansion. Her research efforts have included investigations of automatic text categorization as well as cross language information retrieval. Srinivasan is currently the chair of the ASIS SIG/CR group. This special interest group is very active in classification research and annually offers a workshop on classification research for interested researchers and practitioners. (if relevant) Project RE 4004 (RE) Page of 5 Deliverable Dn.m - xxx yyy zzz DESIRE II - Development of a European Service for Information on Research and Education II Summary: Relevant (1 = poor, 5 = excellent) 5 This project offers the demonstrator as a deliverable and is very relevant not only to the DESIRE project but also to the goals of classification research and development. The ability to automatically classify web documents is critical. Moreover the present approach which is to combine the output of the automatic classification with the results of a manual classification is one that has great potential for effective impact. State-of-Art 4 Although the approach is in itself very worthy and has produced excellent results, there are other approaches to automatic classification and categorization that may be considered. For example, there are various learning algorithms using neural networks and Bayesian methods that have been tested towards the same goals. It is of course not necessary to adopt any of these alternative methods since the literature does not yet provide a clear winner from these methods. It is also the case that the current approach of using statistical features and positional features is one that has yielded good returns in other projects as well. The possibility of combining these with some surface linguistic analysis (extraction and assessment of noun phrases) is very likely to increase the effectiveness of classification. Meets Objectives 4 I believe most of the objectives have been met. However, I am unable to determine if their objective to support: ”Cross-searching (with a gateway for simultaneous Z39.50 searching, Zebra, Dublin Core)” has been accomplished. At least this is not clear from their presentation. 4 Some of the sections are not very clear. For example it would be good to have some indication of what differences one can expect to find as one tries different options in the demonstrator system. Otherwise one has to read the report again to find the relevant section. Similarly it would be nice to give a little more information about the intent behind the various parameters for proper guidance for the user. But if this is just a demonstration of the underlying approach and the intent is not to design a user interface then please ignore the previous comment. Clarity Value to Users 5 I think this demonstration project is of great value to users. Indexers or those who want to acquire links to relevant material on the Web can easily use this system to classify their web pages. More importantly they can change the parameters according to their local constraints. This allows further experimentation and improvement. Under what conditions do you expect your present set of ”best Specific Criticisms 1 Project RE 4004 (RE) Page of 5 Deliverable Dn.m - xxx yyy zzz DESIRE II - Development of a European Service for Information on Research and Education II solution classification method” to remain stable? How will one test the appropriateness of this method over time? What alerts are possible to indicate that the best method is no longer working as well as previously? Or, perhaps one can get surprised and find that the method is working even better than originally. In other words how does maintain or at least monitor performance? 2 Was it intuition that lead to your system of weighting, or was it prior experimentation or related to the previous work by other researchers? Somewhat unclear. It is not clear how the list of significant words and noun phrases identified by OCLC (Jean Godby) will be combined with the Ei Classification Demonstrator. But perhaps this is best left to a future review of the project. 3 4 Developer Response: 1 The approach presented here is applied to a “frozen” database of records using a stable version of the Ei vocabulary. Our methods and heuristics need to be adapted to a specific service, collection of documents and subject vocabulary. We will most probably receive the task to do this for the “living” and changing EELS service. The Ei vocabulary used in EELS is only changed once in a couple of years. The content of our robot-generated engineering index collected from the net will change a great deal, however. So will the use of metadata and practices regarding formulations and use of titles, headings and other content and structural elements in web documents. This will require regular monitoring and analysis of the outcome of our classification processes. If we are running these methods for a living service, we will carry out all our different evaluations, especially the expert evaluation, (as reported in working paper 2) on a regular basis. This will allow us to monitor performance and to correct and adapt the heuristics and, if necessary, the methods. We certainly do not expect the heuristics to remain stable. Further work in the area of automatic classification and cooperation with related projects might result in rather different high level methods and solutions. We expect our current very simple approach to remain useful for projects lacking resources and expertise to develop and improve their own solutions. When we have the opportunity to gather more real service experiences, we intend to write a guideline and recommendation document for simple automatic classification. 2 We based our simple weighting algorithm on experiences with ranking methods in our distributed Nordic Web Index search services and on testing different solutions. The goal was to keep it very simple and easily adaptable. The rest is heuristics. Every service using our automatic classification methods can change the weighting algorithm to suit the characteristics of their documents, vocabulary and service. Project RE 4004 (RE) Page of 5 Deliverable Dn.m - xxx yyy zzz DESIRE II - Development of a European Service for Information on Research and Education II 3 The classification demonstrator for single engineering pages does not use noun phrases or significant words. It uses our classification solution based on the matching of the full text of documents against the Ei vocabulary. The identified phrases will be used to provide key phrase browsing in every class in the full browsing structure according to the Ei classification (the full outcomes of our classification as displayed in the demonstration page). Since the Ei classification is not deep enough to provide sufficient substructure to a large document collection, browsing with key phrases provides some kind of detailed topical access to the many pages in the same Ei class. We will add this feature to the full Ei classification of the DESIRE II database as part of our cooperation with OCLC. It will be added to the EELS service as soon as the service providers so decide. In cooperation with OCLC, we will use the phrases and significant words for a classification of our database according to DDC and compare and evaluate the outcome of this effort to our Ei classification using the full text of the documents. 4 Clarifications regarding all criticims above are added to the documentation. Done. Remarks under “State-of-Art”: We are aware of alternative approaches to automatic classification not applying established library classification systems. In a state-of-the-art report scheduled for May 2000 we will summarise these developments. We believe that a combination of different methods might have the potential for further improvement. Remarks under “Meets Objectives”: We provide a cross-searching feature between the manually created quality controlled subject gateway EELS and the robot-generated subject index “All” Engineering at: http://eels.lub.lu.se/aeels/search.html There is a link to it on the result overview page. We now express this more clearly in the documentation. Done. (Free Text Detailed Report 2-3 Pages) This project is important and relevant not only to the goals of the DESIRE project but also to progress in classification research and development in general. The problem undertaken which is to automatically classify/categorize web documents in a subject area is not only a critical goal but also a challenging one. The demonstrator project indicates clearly that the investigators have had significant success in achieving their goals. What is also appealing to me as a researcher is that the investigators have conducted several Project RE 4004 (RE) Page of 5 Deliverable Dn.m - xxx yyy zzz DESIRE II - Development of a European Service for Information on Research and Education II different types of experiments. These include not only the automatic classification of web documents and their evaluation by experts but also the automatic classification of items that have already been classified by humans. Moreover, their work includes obtaining a good understanding of each classification scheme that they involve. Their present approach of collaboration with other groups such as OCLC is valuable and one that leads their project into appropriate directions. It is clear to me that the investigators have achieved significant accomplishments in their demonstration project and thorough the related experimentation. I wish them further success in their future work. Project RE 4004 (RE) Page of 5 Deliverable Dn.m - xxx yyy zzz

Related docs
Peer Review of
Views: 158  |  Downloads: 3
Peer Review of
Views: 84  |  Downloads: 1
Desire Petroleum
Views: 9  |  Downloads: 0
peer review report (PDF)
Views: 2  |  Downloads: 0
Report of the Peer Team on
Views: 2  |  Downloads: 0
REPORT OF THE PEER TEAM
Views: 1  |  Downloads: 0
Summary of the Peer Review Group Report
Views: 0  |  Downloads: 0
EPA response to Peer Review Report (PDF)
Views: 1  |  Downloads: 0
Peer Review Team Report on The Learning Center
Views: 11  |  Downloads: 0
Peer SupportRecovery Task Force Report
Views: 1  |  Downloads: 0
Other docs by armedman2
Sample Financials Breeze Technology
Views: 313  |  Downloads: 5
Sample Operations VeriType
Views: 255  |  Downloads: 1
Sample Marketing Plan AudioRush
Views: 492  |  Downloads: 24
Chinese Exclusion Act _1882_
Views: 135  |  Downloads: 13
Louisiana Purchase Treaty _1803_ - 2
Views: 93  |  Downloads: 1
Brown v. Board of Education _1954_ - 1
Views: 132  |  Downloads: 0
EXEMPLIFICATION CERTIFICATE
Views: 170  |  Downloads: 0
Sample Press Release heartsoft
Views: 282  |  Downloads: 2
OSHA HAND AND POWER TOOLS
Views: 282  |  Downloads: 11
Sample Marketing Strategy ProTrax
Views: 1707  |  Downloads: 29