Data Integrity and Quality, and
Scribe: H. V. Jagadish
Moderator: Anthony Tomasic
• Quality Measurement
• Information Recording
• Integrity Maintenance
• Risk Representation
• Other Information Representation
• Enabled Big Science
• Much debate within the financial industry
on how we define, measure data quality.
• Seven metrics
– Completeness, conformity, precision,
accuracy, timeliness, association, and
• Three dimensions
– function, methods of measurement, and
definition of quality.
• Much will now be recorded, and public.
• Need procedure to make corrections.
• Must represent and record metadata.
• Provenance recording by extending audit
trails already used by banks.
• Time stamp granularity is concern for fast
• Sniff data on wire for fast alarms.
• Required at multiple layers, particularly at
• Discrepancies across data sources can
catch errors (and fraud).
• Patterns across multiple sources may be
helpful in detecting fraud.
• Need to build logic for what to do when an
error is detected.
• Single number (VAR/Risk Premium)
cannot be rolled up or manipulated.
• Full state space distribution is too costly.
• How to show risk to decision makers in a
manner that helps them act.
• How can you drill down from aggregate
data based on risk factors.
Other Information Representation
• Formula as attribute value.
– Treat better than as text field
• Accounting system used is also “data”.
– Enable queries on accounting rule.
– Can you measure risk exposure of accounting
Big Science Questions
Given the opening of the OFR, and
collection of large data that it promises,
what new research questions can we ask?
Big Questions (1 of 2)
• Validate VAR over long time periods.
• Can you model Knightian surprise? Can you predict
which instruments are more at risk?
• Modeling of Tail dynamics
• Pricing when liquidity breaks down and you cannot
create a replicating portfolio
• Characterize empirically when weak efficiency of market
• Can trade in the derivative of the security influence the
value of the underlying asset?
• Tradeoff between capital level required within the bank
and the rate of growth within the economy.
Big Questions (2 of 2)
• Structure of financial institution network. Shape of the
graph. Incentive structures of the individuals.
• Visualization of financial stability for policy makers.
Which are reasonable indiciators/measures to track.
How do you summarize effectively to communicate – to
policy makers, to marketplace.
• Herd behavior or asset bubbles. How do you detect?
How do you manage?
• Macro-economic models that include securitization and
financial markets, particularly things that happen outside
the banking system.
• Develop a new risk-based forward-looking accounting