Managing master data, part IV
In today’s data-integration mix, where does master-data management fit in?
By Mike Fleckenstein
[For a full PDF of the article or all articles in the series, including all graphics, please e-mail the
editor, Graeme Burton. We will turn around requests within 24 hours. Apologies for any
previous requests for PDFs that may have gone ignored.]
As the world’s economic interactions have become more electronic so the need, likewise, to
integrate data. After all, how can systems communicate consistently unless they understand what
each and every element of data refers to in each application? How can transactions be accurately
recorded if the relevant data is inconsistent, incomplete or inadequately described?
Many technology concepts and offerings have been used to address the various different
approaches to data integration. This final article in the series will explore some of these concepts,
what they are and how they relate to master-data management (MDM).
We will examine how traditional approaches to data integration are developing: offerings in the
areas of data cleansing, data extraction and transformation, and data warehousing have been
around for some time. But how are they evolving and where does MDM fit in? We will also
examine some more recent concepts. For example, we’ve all heard about the service-oriented
architecture (SOA), enterprise-application integration (EAI) and enterprise-information
integration (EII). But how do they relate to master-data management?
In some respects, better data integration is being forced upon us. Regulatory guidelines and
mandates provide a case in point. For example, the Basel II Accord, which is intended to
improve risk management in the banking system and, in the US, the Sarbanes-Oxley Act, which
requires publicly listed companies to report detailed financial data, with the CEO’s head on the
line if they get it wrong.
Additionally, many companies realise that they need to better integrate data in order to improve
their own business performance and to serve customers better. Obviously, few organisations can
afford to replace all of their application software at once – it must be done one system at a time
and the more new applications that can ‘plug and play’, the better. It is simply cost-effective to
minimise data-integration efforts in this way because customer-coded integration is expensive to
achieve and poses ongoing maintenance and upgrade challenges, too.
In previous articles we’ve explored everything from candidates for MDM to architectural
approaches for housing master data. We also looked at some real-life case studies. In part IV, we
will examine different approaches to data integration available in the industry as a whole and
their relationship to MDM.
MDM and the data warehouse Historically, one common approach to integrating data – normally
for the purpose of analysis – has been to construct a data warehouse and to export a copy of the
data there. In a traditional data warehouse, data from several different sources is extracted,
cleansed and transformed into an appropriate format, then loaded into a single database schema.
Bill Inmon, sometimes referred to as ‘the father of data warehousing’, stated that a data
warehouse must be subject-oriented (that is to say, real-world relationships should be reflected in
the data structure); time-variant (changes are tracked over time); non-volatile (data is never over-
written) – and integrated (the data from multiple applications must be consistent).
Typically, the data in data warehouses is batch-loaded, normally every night at the close of
business. Such data warehouses are very useful for analysing historic trends over time and
projecting trends going forward.
However, data warehouses are evolving and becoming more dynamic. There are a number of
reasons for this. First, continual improvements in compression technology enable much more
data to be stored, reducing hardware and input/output costs.
Second – and related to the first – is the proliferation of unstructured data (such as
documentation, letters and invoices) and the need to incorporate this information in a data
For example, in addition to line-by-line details of transactions, an organisation might also need
access to e-mails, notes or other related information. Data warehouses, such as IBM’s DB2
Viper, now accommodate access to unstructured data by making it searchable and integrating it
side by side with structured, relational-database data. Furthermore, some vendors, such as
Sybase, which is particularly strong in the financial-services industry, are delivering new data-
indexing techniques to make data access very fast. In a nutshell, all these offerings are helping to
make data warehouses larger, more comprehensive and faster to access.
Another development is more tightly integrated data. The need for tighter coupling of data is a
natural outcome of having more data in one place, since only then can the data be viewed in its
proper context. Naturally, regulatory mandates and guidelines – such as Sarbanes Oxley in
corporate America and Basel II in the European banking community – have forced a tighter
coupling of data as well.
Finally, companies are seeking faster feedback to changes in data. While batch updates serve
historical, analytical purposes well, they are insufficient for tweaking operational processes;
today they must be instantaneous. Think of the benefit a retailer attains by realising that they are
running low on widgets as the consumer is checking out; information can be instantaneously
integrated and reported to alert the supplier. The desire for near real-time access to data stems
from the need to tweak business processes more quickly in a service-oriented world.
To better facilitate these requirements, data warehouses are using master-data management.
While separate from the warehouse, an MDM repository can dynamically share tightly coupled
key-data, both with the data warehouse as well as source and downstream applications. Table
one summarises some of the trends in data warehousing.
MDM and EAI
Enterprise-application integration fosters data propagation and business-process execution
among distinct applications to make them appear as a single, global application. Its focus is on
the messaging between applications, to integrate operational business functions that involve
several different applications or systems, such as taking an order, creating an invoice and
shipping a product. One common way to accomplish this is to leverage an enterprise-service bus.
This provides a unified interface that enables application developers to more easily tap into
multiple environments without having to custom-code transport messages between these
The intent of EAI can be summarised as providing a common façade for multiple systems,
integrating data and processes across applications, making an environment vendor-independent
and ensuring that data is kept consistent across different applications.
Note the last item in that list – consistent data. EAI’s focus is on managing the message flow
among the disparate systems. Thus, it is left to MDM to ensure that key entities are defined
consistently. Once defined, the MDM repository can be linked to the enterprise-service bus to
accommodate enterprise application integration. While an EAI system may provide for data
transformation, it certainly does not ensure that a given product or customer is dynamically
defined the same way in two or more applications.
Figure one illustrates how a master-data repository fits into the EAI architecture using an
MDM and EII
In stark contrast to the tighter coupling of data noted above, a parallel trend in data integration
has been to loosen the coupling between data. The idea behind enterprise-information integration
is to provide a uniform query-interface over a virtual schema.
The EII tool seamlessly transforms the initial query into database-specific queries against the
The end-users can therefore utilise business-intelligence tools and other applications to query a
single, albeit virtual, schema. The loose coupling of data sources enables data in the virtual
schema to be reflected both in terms of the source as well as in integrated terms. The extent to
which data is integrated (that is to say, tightly coupled) determines the extent of the global view.
One issue is how to resolve data-definition differences between heterogeneous systems. For
example, if two companies merge their databases, certain definitions (such as ‘earnings’) in their
respective schemas will conceivably have different meanings. In one database it may mean
profits in dollars, while in the other it might be the number of sales.
Here of course, MDM can help homogenise key corporate data prior to its EII incorporation,
thereby lessening the resulting semantic conflicts. By integrating data prior to its introduction
into the EII umbrella a company assures better data consistency. Figure two reflects this concept.
MDM and SOA
Service-oriented architecture (SOA) can be defined as a loosely coupled array of re-usable
software components, exposed as services, which can be integrated with each other and also
invoked as services by other applications. The strength of the SOA concept is flexibility. The key
idea behind it is to enable organisations to put together applications and processes by stringing
together different software components – but the data, more than ever under SOA, must be
Think of placing an online order, for example. This type of software service can be integrated, as
necessary, with finance, marketing, third-party inventory systems and so on. EAI and EII, as
described above, can be part of the SOA structure.
In practice, SOA means different things to different people. To the business manager, it means
the process governance and organisation for project/program management and the business
components that can be tweaked or re-used to reduce cost. The legal team needs to know
whether the service is creating a liability outside the company, and what regulatory issues and
exposure this might cause. To an IT architect, SOA means the overall enterprise design that
enables the IT department to deploy business rapidly. And finally, for the chief information
officer, SOA is simply the IT strategy for delivering business capability: what business functions
are automated at what cost, maturity and return on investment?
The roadmap to SOA is the same as with any other IT effort:
Understand business services and how they need to be integrated. This requires a close
working relationship between IT and the business community (ie. requirements);
Identify key performance metrics, such as reducing product defects by a certain
percentage (ie. design);
Develop an SOA outline that highlights the benefits in business terms (ie. user
Identify quick wins (ie. implementation).
So how does MDM fit into the SOA construct? Note the second and fourth points in the above
outline. It is impossible to correctly measure key metrics across the enterprise unless they are
consistently defined. So, defining key metrics is an integral part of SOA and MDM is the
underlying construct to accomplish this; it definitely helps to produce quick wins.
MDM, data cleansing and ETL The above approaches to data integration show that MDM is key
to improving data integration as data is more and more gleaned from diverse systems. However,
data must first be cleansed, extracted and transformed before it can reside in a master-data
repository – it must be consistent and correct. Extraction, transformation and loading (ETL)
tools, together with data-cleansing tools, can help accomplish much of this task. However, in
order to get data correct there may also be a manual component to cleansing.
For example, take customer data that is stored in a series of fields labeled as follows: address1,
address2, address3, address4 and address5. The first step here is to ensure that the first name is
always in the address1 field, the last name is always in the address2 field and so on. This type of
effort can only be done manually. No cleansing tool can accomplish this. Once manual cleansing
is completed, though, data cleansing tools can be applied to examine the inputted data, and to de-
duplicate names and standardise the addresses.
There are many third-party vendors that can also contribute to the process.
Services range from dynamic postalcode validation, provided by companies such as Dunn &
Bradsteet, to vendors performing bulk-cleansing services for pennies per record. Another
example is a nick-name list that recognises that ‘Richard’ and ‘Dick’ may be the same person.
All of these types of tools are often bundled in cleansing software and can also be purchased
Data-cleansing tools can be set up to enable continuous cleansing as well. It is relatively easy to
conceive that a customer may be entered into the system multiple times, even once the data has
been initially cleansed and a master-data repository set up.
No tool on the market can prevent this. However, allowing end-users to de-duplicate or merge
these types of entries in an ongoing way helps to ensure that the master-data repository remains
ETL tools can also extract similar data (for example, customer data) from multiple applications,
transform that data into a homogenous form and deposit it in an integrated way into a single
However, these tools do not actually share a standardised definition of an entity with their source
systems. Some ETL tool vendors are positioning themselves as MDM software providers. The
difference, though, is that these tools simply and routinely transform the data at the time of
That’s where the management of master data begins. Once extracted and deposited into the
master-data repository changes to it are dynamically shared with source and downstream
systems. Whether it consists of exposing customer data over the web to enable customers to
manage their profile – which ought to make the data more accurate, while cutting costs – or
whether internal users are managing a product hierarchy, this is where a decision on what
software to use to manage master data can be made.
Bringing it all together
This article has examined a number of approaches towards integrating corporate data. We looked
at some technologies that have been around for a while, such as data warehousing and data
cleansing, and how they are evolving and relate to MDM. We also discussed how MDM relates
to some more recent technology trends, such as EAI, EII and SOA.
In each case MDM aids the integration effort. It is a fundamental requirement for tightly coupled
data and fosters the global view of data for loosely coupled data. Finally, it is difficult to imagine
how a service-oriented environment can function if each service has its own definition of key
Mike Fleckenstein is principal analyst at Project Performance Corporation (PPC). He has 20
years experience developing and deploying data solutions for public and private-sector clients
around the world. He is currently involved in leading the MDM and insurance practices at PPC
(www.ppc.com), a Washington DC-based IT consultancy, using best-of-breed technologies and
solutions for data warehousing and master-data management, among others. Prior to joining
PPC, Fleckenstein served as application manager at Medmarc Insurance and ran his own IT
consulting firm, Windsor Systems Inc, which specialised in IT and data solutions.
Table one: Traditional and dynamic data warehousing
Traditional data warehousing Dynamic data warehousing
Provides a window into past operational Provides a window into near-real-time
data for historical analysis and reporting. operational and transactional data for
both strategic planning and operational
Consists of multiple un-integrated Provides tight integration among
systems. enterprise-wide business systems.
Accesses only a limited number of
business processes and systems.
Supports only structured data. Accesses structured, unstructured and
Requires specialised skills or knowledge Delivers information to all enterprise
to access and use.
constituents within the context of the
activities they are performing.