Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

An Introduction to Data Virtualization in Business Intelligence by DavidWalker18


A brief description of what Data Virtualisation is and how it can be used to support business intelligence applications and development. Originally presented to the ETIS Conference in Riga, Latvia in October 2013

More Info
									An Introduction to Data Virtualization
       in Business Intelligence

               David M Walker
       Data Management & Warehousing

          18 OKTOBRIS 2013
What Is Data Virtualization?

•  Wikipedia:
  “Data virtualization is [..] an application to retrieve
  and manipulate data without requiring technical
  details about the data, such as how it is formatted
  or where it is physically located.”
•  Or more simply:
  A solution that sits in front of multiple data
  sources and allows them to be treated as a single
  SQL database
Basic Model
 End$Users$access$                ETL$treats$$
                                                        Data$Publishing$          Message$Based$
  via$a$Repor0ng$                DV$plaWorm$$
                                                        Batch/RESTful$            SOA/Publica0on$
       Tools$                    as$a$source$


•  Tradi0onal$Databases$             •  NoSQL$/$NewSQL$              •  Other$Formats$
    •     IBM$(DB2$&$Netezza)$           •    Apache$Hadoop$               •    Microso@$Office$
    •     Microso@$(SQL$Server)$         •    Cassandra$                   •    Messaging$
    •     Oracle$(Oracle$&$MySQL)$       •    Mongo$                       •    Flat$Files$
    •     Postgres$                      •    Neo4J$                       •    XML$
    •     Sybase$(ASE$&$IQ)$             •    etc.$                        •    Web$
    •     Etc.$                                                            •    Cloud$
                                                                           •    Applica0on$APIs$
                                                                           •    etc.$
Advanced Features:
Role Based Access Control & Data Masking
                        User$1$                                             User$2$

   First&Name&   Last&Name&      DoB&        Salary&        First&Name&    Last&Name&   Age&

      Joe$        Bloggs$     30^Jan^1983$   NULL$             Joe$          Bloggs$    30$

      Jane$        Smith$     17^Jun^1978$   NULL$             Jane$         Smith$     35$

 Role$Based$                             Data$Virtualiza0on$PlaWorm:$
Authen0ca0on$                  Manages$sensi0ve$informa0on$based$on$a$users$role$

                            First&Name&        Last&Name&                 DoB&          Salary&
                               Joe$               Bloggs$        30^Jan^1983$           €60,100$
                               Jane$               Smith$        17^Jun^1978$           €75,400$
Advanced Features:

                                   $$       Cached$Copy$of$$


Advanced Features:
Creating a Canonical Data Model


     Finance$System$          CRM$System$              Other$Systems$

                 Billing$System$            Website$
But it’s not a Silver Bullet
•  Can be slow
   –  Depending on how much data has to be fetched from remote
      systems to the DV platform – platforms try to be smart to
      reduce this
•  Can impact performance on underlying systems
   –  Lots of BI users making queries on resource sensitive OLTP
      systems is not a good idea
•  Requires Resources
   –  Another set of servers, technologies, etc. to manage, but this
      cost is often offset against the reduction in complexity
•  Not a replacement – it is an additional tool
   –  You will still need ETL and Messaging
BI Use Cases:
Agile Data Mart Design
•  Access data               Data$Virtualiza0on$PlaWorm$

   warehouse data
   quickly and easily        A$        OR$          B$
•  Design the data mart
   you think you want
•  Test it with real data         Data$Warehouse$
   and your actual
   reporting tool
•  Also possible with data
   warehouse design
BI Use Case:
Virtual Data Marts
•  Big Tin Appliance with
   lots of horse power?
•  Don’t want to duplicate
   data in the appliance
   and consume disk
   space for a data mart          Data$Warehouse$

   but want the star
   schema for ease of
BI Use Case:
Data Mart Extensions
•  Existing (physical) data   Data$
   mart                       PlaWorm$
•  New Data source that
   needs to be
   incorporated quickly
•  Create virtual copy of
   existing data mart and
   data source                     Data$Mart$   New$Data$
•  Integrate into updated                        $
   data mart design
BI Use Case:
Agile Set Based ELT Design
•  If your normal ETL style
   is a series of set SQL
   queries built on top of
   each other then you
   can quickly prototype
   ETL before moving it
   into your normal ETL
   engine to persist
   execute (normally for
                              Source$      Source$       Source$
BI Use Case:
Big Data Integration
•  DV Platform                 SQL$based$tools$
   connects to Big Data
•  Data Sources are              SQL$Interface$

   mapped into DV         Data$Virtualiza0on$PlaWorm$

•  User accesses them     Map$Reduce,$etc.$Interface$
   via standard tools
   (SQL, RESTful
   interfaces, etc.)
BI Use Case:
Source System Analysis
•  Apply your data quality
   and data profiling tools
   to all your data sources      Data$Quality$&$Profiling$Tools$
•  Look for relationships
   across systems                 Data$Virtualiza0on$PlaWorm$
•  Remove limitations of
   accessibility by
   enabling caching so        Source$      Source$        Source$
   that you are not hitting
   the source system but
   have fresh data
BI Use Case:
Data Masking
•  Currently building two
   versions of a data
   mart, one with                     AND$

   sensitive data in and
   one without
•  Instead build one and        Physical$Data$Mart$

   use Role Based Access
   Control (RBAC) to
   restrict what an
   individual can see
BI Use Cases

•  Some examples
  –  Usefulness of each example depends on the
•  Generally an enabler for more agility
  –  Quicker prototyping and integration
•  Will not solve all your problems
  –  And has a cost associated with it (license &
Vendors: What The Analysts Say
•  Forrester Wave Data      •  Forrester Wave Q1/12
   Virtualization Q1 2012      –  Informatica
                               –  IBM
                               –  Denodo
                                   •  EU (Spanish) Origins
                               –  Composite
                                   •  Now part of Cisco
                                   •  Was OEM’d by Informatica
                               –  Microsoft
                               –  SAP
                               –  And others
                            •  Gartner
                               –  No Magic Quadrant, instead
                                  includes Data Virtualization
                                  in Data Integration
Vendors: Product Positioning

Stand Alone                         Integrated
•  Players                          •  Players
   –  Cisco (Composite)                –  IBM
   –  Denodo                           –  Informatica
•  Selection                        •  Selection
   –  Popular where IBM/               –  Popular with organisations
      Informatica are not already         that already have the vendor
      embedded                            ETL tool
An Introduction to Data Virtualization
       in Business Intelligence

               David M Walker
       Data Management & Warehousing


To top