The CDF Run II Data Catalog and
Data Access Modules
P. Calafiura,
J. Kowalkowski, S. Lammel, M. Lancaster, F. Ratnikov,
E. Sexton-Kennedy, I. Sfiligoi, T. Watts, E. Wicklund
CHEP 2000, Paolo Calafiura, LBNL 1 The CDF Run II Data Catalog...
Data Handling Software Components
S. Lammel - C 366
CHEP 2000, Paolo Calafiura, LBNL 2 The CDF Run II Data Catalog...
Data Access Hierarchy
•Data view
•Dataset
•Run Section
•Storage view
•(Tape) Stream
•Fileset/Partition
•File
CHEP 2000, Paolo Calafiura, LBNL 3 The CDF Run II Data Catalog...
Reading Data
CHEP 2000, Paolo Calafiura, LBNL 4 The CDF Run II Data Catalog...
Writing Data
CHEP 2000, Paolo Calafiura, LBNL 5 The CDF Run II Data Catalog...
The File Catalog
• Locate file(set)s belonging to a dataset from
– a time range
– a run range
– applying quality cuts, …
• Log output files and filesets info
• Maintain tape management info
• Log job progress (error recovery, checkpoint-restart)
• C++ API
• Command-line and web based tools
• Distributed access
CHEP 2000, Paolo Calafiura, LBNL 6 The CDF Run II Data Catalog...
The File Catalog Clients
L3 Data Offline
Farm Logger Farm
Reader
Writer Filtered
Data Data
Logger
Raw Data
Writer
DFC
DBManager
Oracle MSQL
CHEP 2000, Paolo Calafiura, LBNL 7 The CDF Run II Data Catalog...
The DBManager Package
J. Kowalkowski C236 Poster
• DBMS-independent C++ API (calibration,geometry,DFC)
• type-safe mapping table rows transient C++ objects
• smart pointers
– lazy instantiation
– caching
– update pointer when new key notified
• pluggable factory to select DBMS at run time
• code generator
– provide binding (Oracle, MSQL, JDBC, text) for predefined queries
– java-based table description language
CHEP 2000, Paolo Calafiura, LBNL 8 The CDF Run II Data Catalog...
Data Handling Input Module
• Module of the Babar/CDF AC++
framework
• Invisible to users
• Select relevant filesets in a
logical fashion
• Iterate over them
– stage ahead
– out-of-order
• Mantain state of request for error
recovery
CHEP 2000, Paolo Calafiura, LBNL 9 The CDF Run II Data Catalog...
Data Handling Output Module
• AC++ Module
• close files at target size but
• aligned to run section boundaries
(keep events from a section
together)
• Log output files info into catalog
• Commit blocks of completed
files to the DIM
CHEP 2000, Paolo Calafiura, LBNL 10 The CDF Run II Data Catalog...
Status and Outlook
• Defined Interfaces between all components
• All components have at least a prototype implementation
• Successful system integration for Mock Data Challenge 1
• T. Watts C 268 (tomorrow)
• Improve performance and reliability
L3 Data Offline
Farm Logger Farm
Filtered
Data Data
Logger
Raw Data
CHEP 2000, Paolo Calafiura, LBNL 11 The CDF Run II Data Catalog...