Leveraging the New Multimedia Module: Automation and Integration Rebecca Snyder Smithsonian, NMNH Automation – Embedded metadata is my friend… Digitization Project Example: Automation Photographers 1 project. Integration 15 photographers. 18 months. Challenges 600,000 images. Future Work One gigantic mess Media Staging Area - MeSA Automation Integration Challenges Approved users map to the SAN unit like another networked drive. Future Work Centralized area for image and other data storage. Provides space to assemble, manage, organize and standardize projects. SAN is backed up and maintained by IT. Automation Project Data Organization and Embedded Metadata: Integration “When once is enough...” Challenges With all project data centralized in the Media Staging Area (MeSA) on the SAN, project personnel can use Future Work Adobe Bridge (or other product) to apply common metadata across all images within a group, so it only is entered once. Individual images/files can be edited with unique metadata as well. The Multimedia Module autopopulates the values entered into IPTC, Exif and XMP fields. For NMNH, focus is placed on the IPTC metadata fields since the DAM supports IPTC (IIM) and also autopopulates fields based on the embedded metadata. Metadata entered once, automatically read by two systems. Collections Based Images/Files Automation Images placed into EMu’s Integration Multimedia Module. Metadata embedded into Challenges multimedia file headers are read by EMu and fields autopopulated – minimizing Future Work the need to reenter data. Each derivative generated by EMu will inherit the embedded metadata chosen by the department. Especially useful when generating images for the web. These autopopulated fields can be searched and reported upon (with some CR filtering…). Non-Collections Based Images/Files Automation Integration Once project is either finished or at a designated Challenges milestone, completed project folders with organized and complete Future Work data are ingested en masse into the DAM via a backend connection by informatics staff. Assets within each project folder will have a common set of metadata applied to all within it. Embedded metadata is read by the system and DAM fields autopopulated. As needed/desired, researchers can log into the front end of the DAM and add additional information. Automation Integration NO REENTRY OF DATA REQUIRED. Challenges Same file in the DAM. All the embedded metadata is Future Work automatically read by the DAM and the appropriate fields populated. Integration – KE is my new best friend… Collections and Web Records Summary (06/2007): Automation UNIT EMu Records Web Records Online Types Anthropology 551,273 417,850 NA Integration Botany 783,742 765,911 95,287 Challenges Invertebrate Zoology 914,707 884,216 64,446 Fishes 312,102 309,572 17,780 Future Work Herps 558,298 550,775 12,669 Birds 411,667 388,943 3,967 Mammals 558,469 575,671 3,203 Paleobiology 583,661 582,986 (in dev.) 137,546 (in dev.) Mineral Sciences 366,621 *filtering not yet set 558 (in dev.) Entomology 317,800 (in dev. †) *filtering not yet set 120,000 (in dev.) Total Live 5,040,540 3,892,938 197,372 † Approximate. Includes Entomology Primary Types, Odonates, Mosquitoes NMNH Multimedia Records (10/2007): Automation All live environments: 543,247 multimedia records Integration Challenges Future Work A subset of images to be added within 12-18 months: • Entomology Type Imaging Project: ~111,000 type specimens each with at least 4-6 images (444-666,000 images) • Botany Cacti image collection: ~200,000 images • Botany Latin American Plant Initiative: ~600,000 images • Fishes: ~10,000 images Automation Information Management Integration Challenges Future Work File/Digital Asset Management Benefits of using an external digital asset management system: Automation • primary focus on managing the physical file Integration • focus beyond multimedia file formats Challenges • ability to identify files (multimedia, text, pdf, etc.) you determine are Future Work at risk of format obsolescence and bulk convert to new format – while keeping original and using new derivative as the ‘current use’ version (ie: version control) • version control also provides check-in/check-out functionality • for SI: integration with prototype federated searching layer and planned “Trusted Digital Repository” archiving system Coarse Planned Integration Flow: Automation SI Federated Searching Layer Integration Challenges NMNH NMNH EMu SI DAM Future Work EMu Web filter filter Linux UNIX/Texpress Oracle DBMS SI TDR (trusted digital repository) Envisioned DAM <-> EMu Integration: Automation • User ingests file within multimedia module as normal. Integration • EMu automatically generates thumbnail and ~200x200 pixel image for display on multimedia tab (a ‘quick access’ copy). Challenges • Either at time of save or at set intervals, EMu passes original Future Work image to the DAM – basically treating the DAM as a remote storage location. • EMu access and flag settings used to determine which security/permission model within DAM system to apply to file. •When EMu user requests original image download, EMu sends request to DAM for file. Maintaining EMu access flags in the DAM: Automation Integration Challenges Future Work Automation Integration Challenges Future Work Challenges: Automation • When to pass original image to DAM? At time of save? Weekly pushes? When Data Manager sets new DAM ready flag to yes? Integration • Writing new embedded metadata to image files… Will KE add this Challenges functionality? If so, must only be data that is not subject to change (ie: not specimen storage location, etc.). Future Work • Difficult to add values in fields from other modules. Possible to have most commonly used fields as local copies within multimedia module… •How to keep DAM and EMu records up to date if metadata changes? Possibly have backend of EMu take advantage of DAM’s check- in/check-out feature. EMu will have to ‘update resource’ to get new metadata values. Other Challenges: Automation How to handle ‘set’ data? Images and other media that are individual files but need to be treated and delivered as a set. Integration Examples: Challenges •DICOM. Large number of individual slices that combined make Future Work a single CT scan. All must be present to recreate specimen. Not practical to import all 1000+ files separately into EMu and link all 1000+ multimedia records to catalog record… • Extended Depth of Focus (EDF) Imaging. Image series that is used to create a single composite image. Best practice keeps original images, not just composite. • Raw images files. Separate record? Version of tiff surrogate? Contact Information: Rebecca Snyder Smithsonian, NMNH firstname.lastname@example.org 202.633.0754 Automation The “Big Picture” Integration With metadata standards standardized and used by NMNH systems, data Challenges can be easily shared with all other SI systems under the “EDAN” layer – Future Work making federated, pan- institutional searches possible. Ex: A US Senator's office calls and asks for everything the Smithsonian has on his/her state. Much easier to search once than to individually query each unit’s various systems. Working prototype with the various SI Library Systems.