An Introduction to Data Virtualization in Business Intelligence

Document Sample
An Introduction to Data Virtualization in Business Intelligence Powered By Docstoc
					An Introduction to Data Virtualization
       in Business Intelligence


               David M Walker
       Data Management & Warehousing
            http://datamgmt.com




          18 OKTOBRIS 2013
What Is Data Virtualization?

•  Wikipedia:
  “Data virtualization is [..] an application to retrieve
  and manipulate data without requiring technical
  details about the data, such as how it is formatted
  or where it is physically located.”
•  Or more simply:
  A solution that sits in front of multiple data
  sources and allows them to be treated as a single
  SQL database
Basic Model
 End$Users$access$                ETL$treats$$
                                                        Data$Publishing$          Message$Based$
  via$a$Repor0ng$                DV$plaWorm$$
                                                        Batch/RESTful$            SOA/Publica0on$
       Tools$                    as$a$source$

                                 Data$Virtualiza0on$PlaWorm$
         Defines$a$‘model’$of$the$source$systems$(similar$in$concept$to$a$BO$Universe)$
                   Models$can$generally$be$layered$on$top$of$other$models$$$

•  Tradi0onal$Databases$             •  NoSQL$/$NewSQL$              •  Other$Formats$
    •     IBM$(DB2$&$Netezza)$           •    Apache$Hadoop$               •    Microso@$Office$
    •     Microso@$(SQL$Server)$         •    Cassandra$                   •    Messaging$
    •     Oracle$(Oracle$&$MySQL)$       •    Mongo$                       •    Flat$Files$
    •     Postgres$                      •    Neo4J$                       •    XML$
    •     Sybase$(ASE$&$IQ)$             •    etc.$                        •    Web$
    •     Etc.$                                                            •    Cloud$
                                                                           •    Applica0on$APIs$
                                                                           •    etc.$
Advanced Features:
Role Based Access Control & Data Masking
                        User$1$                                             User$2$

   First&Name&   Last&Name&      DoB&        Salary&        First&Name&    Last&Name&   Age&

      Joe$        Bloggs$     30^Jan^1983$   NULL$             Joe$          Bloggs$    30$

      Jane$        Smith$     17^Jun^1978$   NULL$             Jane$         Smith$     35$




 Role$Based$                             Data$Virtualiza0on$PlaWorm:$
Authen0ca0on$                  Manages$sensi0ve$informa0on$based$on$a$users$role$

                            First&Name&        Last&Name&                 DoB&          Salary&
                               Joe$               Bloggs$        30^Jan^1983$           €60,100$
                               Jane$               Smith$        17^Jun^1978$           €75,400$
Advanced Features:
Caching
          User$sees$performance$as$if$all$the$data$was$local$



                     Data$Virtualiza0on$PlaWorm$
                                    $
                                   $$       Cached$Copy$of$$
                                        Remote$Database$Table$


        Local$Database$Table$$
        with$good$connec0vity$$

                                        Remote$Database$Table$
                                        with$poor$connec0vity$$
Advanced Features:
Creating a Canonical Data Model
       User$sees$system$as$a$single$CDM$and$not$mul0ple$sources$

                       Data$Virtualiza0on$PlaWorm$
                                      $
                                     $$
                             Data$mapped$to$
                               conform$to$a$$$
                             Canonical$Model$


     Finance$System$          CRM$System$              Other$Systems$


                 Billing$System$            Website$
But it’s not a Silver Bullet
•  Can be slow
   –  Depending on how much data has to be fetched from remote
      systems to the DV platform – platforms try to be smart to
      reduce this
•  Can impact performance on underlying systems
   –  Lots of BI users making queries on resource sensitive OLTP
      systems is not a good idea
•  Requires Resources
   –  Another set of servers, technologies, etc. to manage, but this
      cost is often offset against the reduction in complexity
      elsewhere.
•  Not a replacement – it is an additional tool
   –  You will still need ETL and Messaging
BI Use Cases:
Agile Data Mart Design
•  Access data               Data$Virtualiza0on$PlaWorm$

   warehouse data
   quickly and easily        A$        OR$          B$
•  Design the data mart
   you think you want
•  Test it with real data         Data$Warehouse$
   and your actual
   reporting tool
•  Also possible with data
   warehouse design
BI Use Case:
Virtual Data Marts
                             Data$Virtualiza0on$PlaWorm$
•  Big Tin Appliance with
   lots of horse power?
•  Don’t want to duplicate
   data in the appliance
   and consume disk
   space for a data mart          Data$Warehouse$

   but want the star
   schema for ease of
   use?
BI Use Case:
Data Mart Extensions
•  Existing (physical) data   Data$
                              Virtualiza0on$
   mart                       PlaWorm$
•  New Data source that
   needs to be
   incorporated quickly
•  Create virtual copy of
   existing data mart and
   data source                     Data$Mart$   New$Data$
                                                 Source$
•  Integrate into updated                        $
   data mart design
BI Use Case:
Agile Set Based ELT Design
                                  Data$Virtualiza0on$PlaWorm$
•  If your normal ETL style
   is a series of set SQL
   queries built on top of
   each other then you
   can quickly prototype
   ETL before moving it
   into your normal ETL
   engine to persist
   execute (normally for
                              Source$      Source$       Source$
   performance)
BI Use Case:
Big Data Integration
•  DV Platform                 SQL$based$tools$
   connects to Big Data
   Sources
•  Data Sources are              SQL$Interface$

   mapped into DV         Data$Virtualiza0on$PlaWorm$

•  User accesses them     Map$Reduce,$etc.$Interface$
   via standard tools
   (SQL, RESTful
   interfaces, etc.)
BI Use Case:
Source System Analysis
•  Apply your data quality
   and data profiling tools
   to all your data sources      Data$Quality$&$Profiling$Tools$
•  Look for relationships
   across systems                 Data$Virtualiza0on$PlaWorm$
•  Remove limitations of
   accessibility by
   enabling caching so        Source$      Source$        Source$
   that you are not hitting
   the source system but
   have fresh data
BI Use Case:
Data Masking
                            Data$Virtualiza0on$PlaWorm$
•  Currently building two
   versions of a data
   mart, one with                     AND$

   sensitive data in and
   one without
•  Instead build one and        Physical$Data$Mart$

   use Role Based Access
   Control (RBAC) to
   restrict what an
   individual can see
BI Use Cases

•  Some examples
  –  Usefulness of each example depends on the
     organization
•  Generally an enabler for more agility
  –  Quicker prototyping and integration
•  Will not solve all your problems
  –  And has a cost associated with it (license &
     hardware
Vendors: What The Analysts Say
•  Forrester Wave Data      •  Forrester Wave Q1/12
   Virtualization Q1 2012      –  Informatica
                               –  IBM
                               –  Denodo
                                   •  EU (Spanish) Origins
                               –  Composite
                                   •  Now part of Cisco
                                   •  Was OEM’d by Informatica
                               –  Microsoft
                               –  SAP
                               –  And others
                            •  Gartner
                               –  No Magic Quadrant, instead
                                  includes Data Virtualization
                                  in Data Integration
Vendors: Product Positioning

Stand Alone                         Integrated
•  Players                          •  Players
   –  Cisco (Composite)                –  IBM
   –  Denodo                           –  Informatica
•  Selection                        •  Selection
   –  Popular where IBM/               –  Popular with organisations
      Informatica are not already         that already have the vendor
      embedded                            ETL tool
An Introduction to Data Virtualization
       in Business Intelligence


               David M Walker
       Data Management & Warehousing
            http://datamgmt.com




         THANK YOU - PALDIES

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:2
posted:11/2/2013
language:English
pages:18
Description: A brief description of what Data Virtualisation is and how it can be used to support business intelligence applications and development. Originally presented to the ETIS Conference in Riga, Latvia in October 2013