Document Sample
PKU Powered By Docstoc
					Viewing the sky from
  CALIS/HKCAN beyond Chinese

         by Charlene Chou
 May 23, 2005 at Peking University

Questions to ponder

• With internet technology, what role does
  library play now and in future for the
  formation of global library?
• Does Unicode solve all problems of
  multilingual display in library OPAC?
• What’s crosswalk between library and
• How does CALIS fit into this picture?

• Sharing Chinese cataloging and authority
  files, e.g. HKCAN & CALIS
• Current US and international
  developments with cataloging
• Multilingual display and searching in OPAC
  and on Web
• Crosswalk between library and Web
• Global Library—a multilingual or
  multicultural digital library?            3
Regional sharing Chinese files:
• Cross searching of national Chinese name
  authority files
  – Sharing Chinese authority files for NLC, CNAD,
    LC/NAD and HKCAN via one-stop searching
  – Fulfill the models of VIAF/LEAF
  – Bilingual: bridge the gap--limitation of LC/NAF and
    NLC searching for now
  – Expanded use by OCLC
  – Will CALIS be included next?

HKCAN’s impact on LC/NAF

• Modified workflow for non-unique names
  – Verifying Chinese names in vernacular scripts
    at LC and Columbia University
  – LC breaking undifferentiated record based on
    vernacular script in HKCAN
  – Romanized/English form in HKCAN with the
    qualifier of birth date
• CUL adding 4XX field for variant form
• Is CALIS next, esp. with OCLC access?
Multilingual bibliographic and
authority record in US
• OCLC/Connexion and RLIN
  – Parallel fields are available in bibliographic
    records for years
  – The key is if ILS can display vernacular scripts
  – To include vernacular scripts in authority
    records in the near future?
     • ILS problem for implementing Unicode and others
     • Cataloging policies and economic perceptions to be
      dealt with                                        6
Multilingual bibliographic and
authority record
  – MARC21 has the standards for bibliographic
    and authority records
  – Currently has no further plans after 2001-
  – Can be communicated through the full
    Unicode UTF-8 character sets or limited to
    MARC-8 characters?

Current challenges & efforts
for authority control
• Economic concerns in USA
  – LC/PCC access records for Web resources
     • Subject vs. descriptive access
     • Costly for authority control
  – Multi-level policy: controlled vs. uncontrolled headings
• FRANAR—international efforts & standards
  – UBCIM—Minimal Level
  – Shift from ISADN to VIAF & INTERPARTY
     • VIAF: linking is key to retain flexibility and local
Age of convergence for
• ISSN: international harmonization for continuing
•   FRBR and AACR3 (now called RDA after JSC
    meeting in May)
•   FRBR for serials: MFHD/ISSN/FRBR->super
    record or authority record for serials
•   FRBRized display in OPAC: collocation
    – Contents vs. carrier
    – Grouping multi-manifestation records (only 20%
      records in OCLC WorldCat)

Coordinating digitization
projects in US
• Digital Library Federation
  – Clearinghouse for all DL projects in US
  – Sharing and communicating among projects
  – Digitization: Registry of Digital Master
     • CONSER will seek communication for cataloging

Unicode for OPAC display
• ILS vendors: active interest in recent
  – MARC21 spec now support Universal Coded
    Character Set
  – Mapping to Unicode from MARC character
    sets completed
  – Unicode fonts: more characters available now
  – Newer operating system: support Unicode

Unicode: which barriers for ILS
to implement?
• Which scripts have been implemented
  – ILS implement language by language/Which
    scripts are supported, e.g. Arabic, but may
    not support the additional extended Arabic
    characters in such languages as Persian
• How sorting of particular language is
  – Depending on the language
  – Which encoding formats are supported to
    import/export records in Unicode              12

• Not complete solution for language
  support in library system
• A foundation to build and will make
  development for multiple language much
• A solution for now, maybe new technology
  will have a better solution in future?

Impact of search engines on
subject searching in OPAC
• CSULA transaction logs’ analysis
• Next generation of the OPAC interface
  – Concept-based search: allow natural-language
    searching with keyword search first, and focus
    on quick-search need
  – Relevance feedback: bring back related pages
  – Spelling correction
  – Relevance-ranked output

Cross-language searching on
• Multilingual dictionary:
  – Google definitions are multilingual, e.g.
    definitions of voip on the Web in all languages--
    Chinese (Simplified), English, French, German,
    Italian, Russian, and Spanish
• Translation tool for website
  – Babel Fish, etc.
  – Translate entire or partial website by user’s
• User’s choice of language on website                  15
Cross-language searching on
  – How to implement the translation process of the
    search engine?
     • From Translation vendors in multiple languages to
       direct communication with the translators--Individual
       translators specialized in a given language
     • Self-service translation application
  – Localize the content of its multilingual site
  – Currently support: native interface in14 languages
    (+English) & search in 25 languages (+ English)

Cross-language searching on
• Open Language Archives Community (OLAC)--
 Specialized OAI sub-domain
  – Features of the search engine include:
     • A thesaurus of alternative language names
     • Language code searching
     • Keyword-in-context display in search results
     • Search for similarly spelled words
     • Search for similar items
     • Support for standard string search operators and
       domain-specific inline syntax
     • Automatically derived search links for other web search
  – Contribution: the inclusion in the search engine
    results of a metadata quality-centric sorting algorithm
Multilingual searching vs.
• Currently Web browser displays multi-scripts and
    fonts; Babel Fish for translating website;
    Website has language choice; Speech
    recognition software available
•   Translation machine is just one component of
    complex multilingual system (John White)
•   Multilingual metadata is a cross-cultural
    retrieval--far more than mere cross-language
    searching (Cliff Lynch)

Crosswalk between library
portals and search engines
• Why Google is courting libraries for scanning the
    text of collections?
•   Web OPAC, Hot-link to resources, OpenURL, etc.
    linked to Google, and Amazon, etc?
•   Searching: precision vs. recall
•   Library and Google Scholar
•   Future OPAC: Topic maps, controlled
    vocabulary, preferred languages, available
    translation and edition (RLG & Google), and
    online seller
Semantic Web based FRBR
union catalog
• IFLA’s FRBR: a semantic expression of the relationships
    between items in the library catalog
•   Berners-Lee’s ultimate vision for XML: the Semantic
    Web—layering a logic layer over the top—info on WWW
    to have logical or semantic relationships
•   RDF—triples—two nodes (one subject & one object),
    connected with a predicate relationship->DAML+OIL;
    Topic Maps: triples—topics, associations and scopes
•   MARC XML open to web harvesters, such as OAI
•   FRBR, RDF, DAML+OIL, or Topic Maps: all used to set
    up a FRBR ontology
•   RDF: can serve as a semantic layer to make FRBR
    expressive and be layered over MARC XML
Authority control: can be a
building block for semantic
map and beyond
• Tillett’s perspectives on the future:
  – VIAF: an integral part of a future Semantic
  – Ontology: infrastructure and controlled
    vocabularies; can be used to enable displays
    in the user’s own language and script
  – Help users of the Web for collocation and
    search precision
  – Tools to connect: biographical dictionaries,
    telephone directories, A&I, etc.               21
Global Library
• Language is an important perspective or a
  major barrier, but there are much more…
• Multilingual searching across diverse
• Five layers of technology (Noriko Kando)
  – The pragmatic layer: cultural and social
    aspects, convention; the semantic layer:
    concept mapping; the lexical layer: language
    identification, indexing; the symbolic layer:
    encoding issues; the physical layer: the
    network                                       22
Global Library
• Multilingual, multicultural and multimedia
  digital libraries
  – HKCAN is bilingual but may be multicultural in
    certain contexts; CALIS can be more diverse
• MARBI 2001-DP05: rename as ―context-
  sensitive‖ authority record
  – From multilingual to multi-context?
• Global vs. localized library
  – VIAF/LEAF—‖glocatization‖ movement?
  – Google: multilingual vs. localized site      23
Future development for CALIS

• OCLC in China or much more…
• Sharing Chinese bibliographic and authority
    records from Asia to other continents
•   Wider range of libraries: beyond academic
•   Multilingual choice for international users
•   Wider range of users: from librarians to students
•   Hotlinks to extended resources

Future development for CALIS

• Leading research in library and information
    science, such as FRBR projects in OCLC
•   Digitizing more historical materials
•   Supporting use of rare book collections
•   Expanding from subject-oriented collections to
    Semantic Maps for Chinese collections and
•   What is the vision of CALIS to participate in the
    global library?


Shared By: