The FRB and XML:
National data and International
Federal Reserve Board
The Fed is a statistical agency as well as a
central bank and regulatory agency.
Lots of data and information are available on
the public website.
Statistical data is varied: Monthly industrial production
indexes (non-financial), daily interest and exchange rates (financial) and
quarterly financial flows for various sectors of the economy, surveys of
small businesses and consumers, etc.
The different roles are often competing
Sometimes it seems that the statistical agency role is
Data are not always easy to find.
Downloads are not customizable.
Example: Trying to extract one industrial production
series: Requires two text files, cutting and pasting,
All or nothing approach.
Complete – yes. User Friendly – no.
Other agencies making great strides:
Bureau of Economic Analysis has wonderful
tabling capabilities: www.bea.gov
Bureau of Labor Statistics has query screens,
series select screens and frequently requested
Taking an extra step:
We wanted to build something forward looking; XML
was identified early on.
Most flexible and seems to be the trend for future.
Financial data already heading that way: FinXML,
FpML (financial product ML), MDDL (Market data
definition language), XBRL (eXtensible Business
How do we do it?
Build our own XML definitions:
- Pro: would fit our data perfectly
- Con: we’d be the only ones
Use financial definitions:
- Pro: lots of others use them
- Con: we have nonfinancial data
Try SDMX (Statistical Data and Metadata eXchange):
- Pro: designed for time series data
- Con: new kid on the block
But nothing goes smoothly at first:
SDMX is based on ‘key families’ and codelists where
every concept can be represented by a code with a
corresponding definition in a list:
HBBA Int. Rate, Official, Discount rate/Base rate
HBCA Int. Rate, Official, Intra-day loans
SCBA Indust. Production, Motor vehicles, NSA
SCBB Indust. Production, Motor vehicles, SA
We think about data differently
The Fed uses mnemonic series names where each
character in our series name has meaning and names
R.*:Rate J.*:Indices except of prices
R.I.*:Rate of interest in money and J.Q.*:Production
capital markets J.Q.I.:Industrial
R.I.F.*:Federal Reserve System _I.*:NAICS-based industry classification
R.I.F.S.*:Short-term or money market 02Y:codes from year 2002
R.I.F.S.P.*:Private securities 3361.:Motor Vehicle Manufacturing
R.I.F.S.P.FF.:Federal funds T:thru
_N.:Not seasonally adjusted 3363:Motor Vehicle Parts Manufacturing
.B:Business (Five days, Monday-Friday) _N.:Not seasonally adjusted
Fitting a square peg in a round hole….
Data represented by a concrete number of concepts are much
easier to represent with key family dimensions and attributes:
Q.SCBA.GB.92 → Freq.Topic.Country.BIS code
M.HBBA.US.01 → Freq.Topic.Country.BIS code
Hierarchical relationships and varying number of concepts
makes life more difficult – a single key family isn’t possible:
JQI_I02YMF_N.M → Topic_Industry_SA.Freq
RIFSPPNA2P2D30_N.B → Topic?_SA.Freq
SDMX only provides a framework:
We still needed to build the actual schemas to
describe our data within the SDMX metaschema
Each data release uses its own schema or set of
schemas. Each schema is based on a key family used
to describe the data.
Currently, our schemas are tailored to meet our data
Storage adds further complications:
We need to store data and metadata in a database to
be retrieved with queries.
Native XML databases in their infancy.
We couldn’t find many people storing XML tagged
data in relational databases
So what did we end up with?
Data model is hybrid: tree structure flattened to fit
We store the XML as carefully sliced text in a
relational database and we can build an index
structure that allows us to respond to ad-hoc queries
very efficiently, even for large volumes of data.
This kind of structure:
Looks like this in SDMX-ML:
<structure:KeyFamily id="CP_OUTST" agency="FRB">
<structure:Name xml:lang="en">Commercial Paper Outstandings</structure:Name>
<structure:TimeDimension concept="TIME" codelist="CL_TIME">
<structure:FrequencyDimension concept="FREQ" codelist="CL_FREQ"/>
<structure:Dimension concept="CP_SA" codelist="CL_CP_SA"/>
<structure:Dimension concept="CP_IND_TYPE" codelist="CL_CP_IND_TYPE"/>
<structure:Dimension concept="CP_ORIG" codelist="CL_CP_ORIG"/>
<structure:Dimension concept="CP_OWN" codelist="CL_CP_OWN"/>
<structure:Dimension concept="CP_NSASC" codelist="CL_CP_NSASC"/>
<structure:Attribute concept="UNIT" codelist="CL_UNIT" attachmentLevel="Group"
<structure:Attribute concept="UNIT_MULT" codelist="CL_UNIT_MULT"
<structure:Attribute concept="OBS_STATUS" codelist="CL_OBS_STATUS"
<structure:Attribute concept="SERIES_NAME" attachmentLevel="Series"
<structure:Attribute concept="DESCRIPTION" attachmentLevel="Series"
Which gets stored like this:
And the end result?
The Data Download Project (DDP) is the largest,
most complex application on the Board’s public
It’s also the first production application to deliver
customized data extracts in SDMX format.
Performance testing and verify server load
Polish interface, do usability testing and verify
compliance with Section 508 regulations.
Long run: work with other central banks on common
Release on the unsuspecting public! Target: Third
The last slide…
Thank you for your attention!