									 From Theory to Practice in
     Digital Libraries:
    5S and Educational
Applications (NDLTD, CSTC)
   Workshop on Digital Libraries
      Albuquerque 7/7-9/99

     Edward A. Fox (fox@vt.edu)
  Virginia Tech, Blacksburg, VA, USA
 Co-PIs:Marc Abrams, Robert Akscyn, John
 Eaton, Scott Grissom, Rachelle Heller, Brian
 Kleiner, Deborah Knox, Gail McMillan, …
         (Selected): Robert France, Neill
 Students
 Kipp, Paul Mather, Constantinos
 Phanouriou, Ohm Sornil, David Watkins, …
         ACM, Adobe, IBM, Microsoft,
 Sponsors:
 NSF, OCLC, US Dept. of Education, ...

 NDLTD     (join!)
 CSTC, CRIM (contribute, use)
 NUDL (help propose)
   – (campus/distance/lifelong) learning
   – multilingual federated search
   – superstorage system
 5S (to understand and build DLs)
 Introduction

 Building   digital libraries
 Digital   libraries for computing

 Collaboration    (NUDL)
 Conclusion
         Virginia Tech Background
 Largest university in Virginia, land-grant, town
  population 35K plus 25K students
 Blacksburg Electronic Village, since 1992, with
  80% of community on Internet
 Net Work Virginia, largest ATM network, with
  over 600 sites, for education, research, govt
 LMDS, Local Multipoint Distribution Service,
  gigabit wireless networking - 1/3 of Virginia
 Math Emporium, 500 workstations
 Faculty Development Initiative, round 2
 Advanced    Communications and Information
  Technology Center, opening summer 2000
 Connects to the library, with a focus on IT
 1/3 high-tech (multimedia) classrooms
 1/3 digital/electronic library (reading room)
 1/3 research labs: 10, including:
  – Digital Library Research Laboratory (DLRL)
  – Center for Applied Technologies in the Humanities
  – HCI; HPC; Multimedia; Visualization (CAVE), ...
Supporting Authors (Teachers and Learners)

Faculty                     Cataloging/
Devel.            EPub                     MM
Initiative               Virginia Tech               University
                         Digital Library IR          Libraries
ETD                        Collaboration

          Model Classroom of the 21st Century
Technology Showcase      ATM         Video Server   Develop MM
     Logging/           McB 110 extended            Develop
     HCI study          virtually campuswide        sim’s
 Digital   Library Machine (object store)
 Parallelcomputer / storage utility for scale of
  1000 to 1,000,000 gigabytes (terabyte/petabyte)
 Knowledge Systems Incorporated is supplying
  VT-PetaPlex-1 with
  – high speed backbone connection (OC-12)
  – 2.5 terabytes through 100 “Nanoservers”:
  – Network connection + IBM 25GB disk +
    233 MHz Pentium II + Linux
  Digital Libraries --- Objectives
 World  Lit.: 24hr / 7day / from desktop
 Integrated “super” information systems: 5S:
  streams, structures, spaces, scenarios, societies
 Ubiquitous, Higher Quality, Lower Cost
 Education, Knowledge Sharing, Discovery
 Disintermediation -> Collaboration
 Universities Reclaim Property
 Interactive Courseware, Student Works
 Scalable, Sustainable, Usable, Useful
       DLs: Why of Global Interest?
 National   projects can preserve antiquities and
  heritage: cultural, historical, linguistic, scholarly
  - ex., Library of Congress, NPS, Smithsonian
 Knowledge and information are essential to
  economic and technological growth, education
 DL - a domain for international collaboration
   –   wherein all can contribute and benefit
   –   which leverages investment in networking
   –   which provides useful content on Internet & WWW
   –   which will tie nations and peoples together more
       strongly and through deeper understanding
           How do universities and
            digital libraries relate?
 Each  University should have its own digital library.
 All students will learn how to use and how to “feed”
  digital libraries (and bring those habits to future
  work as needs and skills).
 All digital library problems (e.g., federation,
  flexibility, personalization) appear at U’s (so they
  are a good type of testbed, with willing collaborators
  in-place for developing solutions).
Digital Libraries --- Virginia Tech
 CS DL Prototype - ENVISION (NSF, ACM)
 TULIP (Elsevier, OCLC)
 DL for CS Education - EI (NSF, ACM)
 NDLTD (SURA, US Dept. of Education)
 WCA (Log) Repository (W3C, OCLC)
 Introduction

 Building   digital libraries
 Digital   libraries for computing

 Collaboration    (NUDL)
 Conclusion
 How to Build a Digital Library

 Understand   the problem (using the 5S
 Solve   the problem (using the Star
  – design, develop, evaluate,
  – refine, operate
 5S Layers

 Definition: Digital Libraries
  are complex systems that

 help satisfy info needs of users (societies)
 provide info services (scenarios)
 organize info in usable ways (structures)
 present useful info (spaces)
 communicate info with users (streams)
          Definition: 5S Framework
 Societies:interacting people (, computers)
 Scenarios: services, functions, operations,
 Spaces: domains + constraints (e.g., distance,
  adjacency): 2D, vector, probability
 Structures: relations, trees, nodes and arcs
 Streams: sequences of items (text, audio,
  video, network traffic)
   (Chinese 5 Elements: Fire, Wood, Earth, Metal, Water)
                5S: Components
 Societies:   roles, rituals, reasons, relationships,
 Scenarios: acquire, index, consult, administer,
 Spaces: physical, temporal, functional,
  presentational, conceptual
 Structures: architectures, taxonomies, schema,
  grammars, links, objects
 Streams: granularities, protocols, paths, flows,
              5S: Combinations

 Societies + Scenarios = user model
 Societies + Scenarios + Spaces =
  user interface
 Streams + Structures = markup
 Streams + Structures + Scenarios = object
 Structures + Scenarios = DBMS
Star Methodology
 Introduction

 Building   digital libraries
 Digital   libraries for computing

 Collaboration    (NUDL)
 Conclusion
   Why of Interest in Computing?
 Next step in fields of DBMS, HT, IR, MM
 Efficiency requires advances in, e.g.,
  – algorithms and data structures (e.g., in DBMS)
  – networking (ex., HTTP-NG)
  – OS (ex., support for streams)
 Effectiveness   requires advances in, e.g.,
  – AI (ex., multilingual texts, user adaptation)
  – HCI (ex., visualization, DLs embedded in activities)
 CSsub-communities need repositories; CS
 Educ. can benefit; CS can aid Distance Educ.
         Network Research Group
 NSF 3 year grant on WWW logging,
 characterization, and optimization: Abrams,
 Fox, Pollard (CNS)
 Coremember of Web Characterization
 Activity of World-Wide Web Consortium
 Providing  DL (with OCLC) to support WCA
 (at http://www.cs.vt.edu/repository/):
  – logs
  – tools
  – publications
                 NRG Tools

WebJamma:        Artificial HTTP traffic generator
WebWatcher:      HTTP traffic monitoring and
                 logging system
CLFmunge:        Anonymizes common log format
HTTPdump:        Protocol decode for tcpdump
Caching proxy simulator
Splus programs
              SMETE Library
             (from www.dlib.org)
 Context: Global movement toward Digital
  Libraries (see April 1998 CACM)
 NSF effort: Science, Mathematics,
  Engineering, and Technology Education
  Digital Library (focussed on undergraduates)
  – 3 workshops, yearly increasing funds / new calls
 SMETE    Library likely to operate as distributed
  federation, with separate parts for each key
  discipline, and to lead to a global effort
 NSF   “A User-Centered Database from the
  Computer Science Literature” (1991-93)
 Collected bib/typesetter data, converted to SGML
 Scanned thousands of page images
 MARIAN search engine - can be made available
  (also applied to the Virginia Tech library catalog)
  used as part of a prototype object-based DL, with
  tailored visualization interface (L. Nowell
        MARIAN Layers
User       User      User       User

        User Interface Layer

       User Information Layer

        Search Engine Layer

          Database Layer
   NSF Education Innovation (EI)
 NSF  “Interactive Learning with a Digital
  Library in Computer Science” (1993-98)
 45 online courses (esp. Internet, IR, MM,
  Professionalism, overall EI project pages):
  100+K accesses/wk
 Tools: SWAN (visualization), QUIZIT
 Evaluation
  – traditional
  – network logging and analysis
  – tools for visualization
       Digital Library Courseware
 http://ei.cs.vt.edu/~dlib/
 WWW    pages or large PDF copy files
 Online quizzes based on book by Michael Lesk
  (Morgan Kaufmann Publishers)
 Contents based on book, with several other
  popular topics added (e.g., agents)
 Separate pages to supplement: Definitions,
  Resources (People, Projects), and References
      Approaches for Education

Hypermedia                  Digital                     Interactive
Resources                  Libraries                   Experiences

     ETDs                   NUDL                  Submit           Using
   Electronic              Networked               ETD          Interactive
   Theses &                University           (Metadata,     Multimedia
  Dissertations               DL                PDF, XML)      Courseware

 Know. Modules       NDLTD           CSTC          DL Use:
   (Interactive    Networked          CS           Browse,
  Multimedia)      DL Theses        Teaching       Search,
                  Dissertations      Center        Retrieve

                                     CRIM           (text,
                                   Curriculum      markup,
                                   Resources     hypermedia)
                                   Inter. MM
         CS -> CSTC -> CRIM
 NSF  and ACM Education Committee are funding
  a 2 year project “A Computer Science Teaching
  Center” - CSTC - http://www.cstc.org/
 College of NJ, U. Ill. Springfield, Virginia Tech
 Focus initially on labs, visualization, multimedia
 Multimedia part is also supported by a 2nd grant
  to Virginia Tech and The George Washington
  University: http://www.cstc.org/~crim/ (with
  curricular guidelines also under development)
    CS Teaching Center (CSTC)
 Instead of building large, expensive multimedia
  packages, that become obsolete and are difficult
  to re-use, concentrate on small knowledge units.
 Learnersbenefit from having well-crafted
  modules that have been reviewed and tested.
 Use   digital libraries to build a powerful base of
  support for learners, upon which a variety of
  courses, self-study tutorials & reference resources
  can be built. (See NSF SMETE-Lib Study at
 Introduction

 Building   digital libraries
 Digital   libraries for computing

 Collaboration    (NUDL)
 Conclusion
ETDs Got Your Interest?

   ETD Web Site

                                       Graduate Students

                                             U. Laval
                           Singapore AM
                           Chronicle of Higher Ed.
   Media                   National Public Radio
                           NY Times ...
   Status of the Local Project
 Approved   by university governance
  Spring 1996; required starting 1/1/97
 Submission & access software in place
 Submission workshops for students
  (and faculty) occur often: beginner/adv.
 Faculty training as part of Faculty
  Development Initiative
 Over 2000 ETDs in collection
     Institutional Members

Coalition   for Networked Information
Committee on Inst. Coop. (CIC)
National Library of Portugal
         US University Members
Air University   (Alabama)            U. of Iowa
Cal Tech
                                       U. of Maine
Clemson   University
                                       U. of Oklahoma
College of William & Mary
                                       U. of South Florida
Concordia University (Illinois)
                                       U. of Tennessee, Knoxville
East Tenn. State University
Florida Institute of Tech.
                                       U. of Tennessee, Memphis
Florida International University      U. of Texas at Austin
Michigan Tech                         U. of Virginia
Naval Postgraduate School (CA)        U. Wisconsin - Madison
North Carolina State U.               Vanderbilt U.
Penn. State University                Virginia Tech - required since 1/97
Rochester Institute of Tech.          West Virginia U. - required
U. of Florida                          beginning fall 1998
U. of Georgia                         Worcester Polytechnic Inst.
University of Hawaii, Manoa
   Australian Project Members

U. New South Wales (lead institution)
U. of Melbourne
U. of Queensland
U. of Sydney
Australian National University
Curtin U. of Technology
Griffith U.
     German Project Members

Humboldt     University (lead institution)
3   other universities
5   learned societies
1   computing center
2   major libraries
    Other International Members
Chinese  University of Hong Kong
Chungnam National U., Dept of CS (S. Korea)

City University, London (UK)

Darmstadt U. of Tech. (Germany)

Free University of Berlin (Germany - Vet. Med.)

Gyeongsang National U. (Korea)

India Institute of Technology, Bombay (India)

Nanyang Technological U. (Singapore, part)

National U. of Singapore (Singapore, part)

*National Library of Portugal

Polytechnic University of Valencia (Spain)

Rhodes U. (South Africa)

St. Petersburg St. Tech.U (Russia)

Univ. de las Américas Puebla (Mexico)
U. Laval; U. of Guelph; U. Waterloo; Wilfrid Laurier U. (Canada)
                Access Statistics

                          1996        1997      1998
Total successful requests: 37,171    247,573    628,401
Av. successful requests/day: 102         685      1,690
Requests for .PDF files:     4,600   72, 854    343,236
Requests for .HTML file     28,225   129,831    215,896
Distinct hosts served        9,015   22,725      36,724
Total data transferred:     3,229M   25,953M     74,051M
Av. data transferred/day:       9M        73M       222M
    A Digital Library Case Study
Domain:    graduate
 education, research
                          Networked Digital
Genre:ETDs=electronic     Library of Theses &
 theses & dissertations    Dissertations
Submission:              (NDLTD) http://
 http://etd.vt.edu        www.ndltd.org
Key Ideas:       Networked infrastructure

Scalability      University collaboration

                 Workflow, automation

                 Education is the rationale
                      8th graders vs. grads
                     Authors must submit
Maximal access
                      PDF, SGML, MM
                      MARC, DC, URNs
                      Federated search
 What are the long term goals?
 400K  US students / year getting grad
  degrees are exposed / involved
 200K/yr rich hypermedia ETDs that
  may turn into electronic portfolios
 Dramatic increase in knowledge
  sharing: lit. reviews, bibliographies, …
 Services providing lifelong access for
  students: browse, search, prior
  searches, citation links
Student Defends and Finalizes ETD

     My Thesis

Student Gets Committee Signatures
and Submits ETD


                            Grad School
Library Catalogs ETD and New Students
Have Access to the New Research


    ETD Initiative (and UMI)
 Students             TDs              Global TDs
Learn about       become more         become more
 DL, EPub          expressive          accessible,


UMI            N. Amer. (T)Ds are
               accessible, archived
   How can a university get
 Select   planning/implementation team
  –   Graduate School
  –   Library
  –   Computing / Information Technology
  –   Institutional Research / Educ. Tech.
 Sendus letter, give us contact names
 Adapt Virginia Tech solution
  –   Build interest and consensus
  –   Start trial / allow optional submission
Build Local ETD Site

Workshop/Training             Policies

 Digital Library
           Support Offered

 Software,  documentation, tech support
 Email, listservs (etd-l@listserv.vt.edu,       -
  eval, -grad, -library, -technical)
 Donations: Adobe, Microsoft
 Evaluation: instruments, analysis
 (Temporary storage / archiving; aid - in
  setting up an int’l service & archive)
 http://scholar.lib.vt.edu - solutions/statistics

 Dublin  Core spec, MARC crosswalk
 DTDs for SGML, XML(+ <discipline>ML)
 Annotation system (author, friends, notes)
 Routing system (based on Sift)
 Better federated search (w. Z39.50, planned
  with Dienst and Harvest)
 Multilingual WWW site, training materials
  (Spanish recently done in Valencia)
 Introduction

 Building   digital libraries
 Digital   libraries for computing

 Collaboration    (NUDL)
 Conclusion
 Networked   University Digital Library
  – VT: Library, Grad School, Industrial&Systems Eng.
  – Initial Partners: UK , Singapore, Russia, Korea,
    Greece, Germany, plus Iberoamerican group (Spain,
    Portugal, Argentina, Brazil, Chile, Mexico)
  – Problems: Multilingual search, multimedia
    submissions, requirements/usability, …
      with ETDs, then expand to other student
 Start
  works, portfolios, data sets, (CS) courseware, ...
National Coverage (red/white)
                        NUDL Partners
   Ricardo A. Baeza-Yates, Universidad de Chile, Chile
   José Luis Brinquete Borbinha, Biblioteca Nacional, Portugal
   José Hilario Canós Cerdá, Universidad Politécnica de Valencia, Spain
   Stavros Christodoulakis, Technical University of Crete, Greece
   Lautaro Guerra Genskowsky, Universidad Técnica Federico Santa Maria,Chile
   Juan José Goldschtein, Univesidad de Belgrano, Argentina
   Peter Diepold, Humboldt University, Germany
   Francisco Javier Jaén Martinez, Spain
   Sung Hyon Myaeng, Chungnam National University, Korea
   Ana Maria Beltran Pavani, Prédio Cardeal Leme, Brazil
   Lim Ee Peng, Nanyang Technological University, Singapore
   Alexander I. Plemnek, St.-Petersburg State Technical University, Russia
   J. Alfredo Sánchez, Universidad de las Américas-Puebla, Mexico
      User Search Support
         (multilingual, XML)

        NDLTD World Federated
        Virginia Tech ...       UMI ...
             (univ)           (corporate)
            CIC ...         Portugese NL ...
         (univ group)        (national lib)

Note: All groups shown are connected with NDLTD.

 Interface design (simple, 3D, VR)
 Usability studies
 Generic multi-lingual support
 Support for those with disabilities
 Hybrid collection (paper, MARC,
  abstracts, full-text, multimedia)
 Disciplinary classifications, tools
 Visualization of results, collection
SPIRE Visualization
 Introduction

 Building   digital libraries
 Digital   libraries for computing

 Collaboration    (NUDL)
 Conclusion
 NDLTD/NUDL        so every university will
  have its own digital library --- gently.
 CS education, and education in general, will
  make extensive use of digital libraries.
 Think of digital libraries as:
  – institutions; hardware/software systems;
  – super-stores; super-information systems;
  – supported by 5S Framework, Star Methodology.
 Interesting   research: parallel & distributed, ...

 NDLTD     (join!)
 CSTC, CRIM (contribute, use)
 NUDL (help propose)
   – (campus/distance/lifelong) learning
   – multilingual federated search
   – superstorage system
 5S (to understand and build DLs)

