Document Sample
DBMS_Performance-ColumnOriented_DBMS_for_Business_Intelligence_Applications Powered By Docstoc
DBMS for
The most efficient engine for large data
                  warehouse applications

SADAS (System for static data analysis) is a

algorithms based on the “read only” nature of data, thus adopting techni-
ques that are not applicable in a traditional OLTP (On Line Transaction
Processing) environment.

                       The SADAS allocation in a typical data warehouse architecture

        OLTP                     replication
        EDW                      data quality
         files                                                                    SQL
        source                                       Feeds                      ODBC
       systems                            (Rewrite, append, update)             JDBC

                                                                                                                                               efficiency through a smarter approach
                                                                                         row-based DBMSs • Item information
from 10 up to 100 times includes:                                                                        stored contiguously
Column-oriented A “column-based” organization, which minimizes the
                                                                                            Mario          Rossi     11/04/1961     100.000
   amount of data to be physically accessed during the execution of large
                                                                                            Aldo          Grosso     07/10/1958      50.000

  the I/O workload.                                                                         Pino          Bianchi    25/12/1965     340.000

Redundancy The static nature of data which allows an intensive usage of
                                                                                            Fabio           Alga     05/08/1970     800.000
  redundancy: SADAS creates and manages automatically complex struc-

   grouping indexes, able to reduce the processing time of complex queries.
                                                                                         max (date)                         RESULT
Learn by usage                                                                          sum (values)
                                                                                                                     25/12/1965     1290.000
   capability to detect the most frequently used queries and dynamically
   create and manage additional data structures to reduce even more inquiry
   times. As a result, the inquiry time of the most frequently used queries            SADAS • Attribute values stored contiguously

Intelligent upload The “intelligent upload” utility greatly improves the data            Mario          Rossi       11/04/1961     100.000

   loading time compared to traditional row-based DBMS: this is mainly due
                                                                                          Aldo         Grosso       07/10/1958      50.000
   to the fact that many sorting operations (especially the ones for index
                                                                                          Pino         Bianchi      25/12/1965     340.000
   creations) are performed on compressed data. During an append opera-
   tion instead, all indexes are not dropped and rebuilt, but just updated by            Fabio          Alga        05/08/1970     800.000

   dynamically sorting only the incoming data.

                     Performance                          Operating costs                            DB Management

                     SADAS can return the                                                -           SADAS is very easy to use and
                     result of a query 100 times          tion in the “column-based” model
                     faster if compared to                takes less data processing power           nal DBAs: the management is
                     traditional DBMS. Nowa-              and thus requires less expensive
                     days the average inquiry             hardware and software platforms            creation and maintenance of
                     time of a data warehouse             to run any data warehouse.                 such structures as indexes, views
                                                          Furthermore       other   SADAS            and aggregations.
                     issue to address before
                     approaching a new project.           the Total Cost of Ownership.


                                SADAS can be introduced in any data
                                                                                         ODBC                   JDBC
                                warehouse environment with little

                                SADAS is SQL 99 Standard compatible                 SADAS CALL INTERFACE (SCI)
                                and can be interfaced using:
                                - Direct Call Interface (DLL)
                                - JDBC driver
                                - ODBC driver
                                - Ado connection
                                - Delphi environment
                                - Internal WEB interface


                           Traditional DBMSs drawbacks
                           - The traditional “row-based” approach used in DBMSs is appropriate for OLTP applications,
                              not for OLAP
                           - The growing complexity of design and management of the data warehouse
                           - Response time on batch processes grows as data grows
                           Countermeasures typically adopted
                           - The increasing of the processing power (multiprocessor etc)
                           - The creation of redundant structures (views, pre-aggregations)
                           Known Issues

                           -   The growing management complexity (design, tuning, partitioning, loading, etc)
                           -   The increase of hw and sw costs
                           -   Loss of detail visibility in aggregations

What is SADAS?

amounts of data stored into a read-only data warehouse for the purpose of being analyzed.

Why a “column-based” model?
The traditional row-based approach used in DBMSs is appropriate for OLTP applications, which typically make use of
several attributes of one or few records of a particular table. On the other hand, because OLAP applications typically

How does SADAS improve performances over traditional DBMSs?

se applications.
There are however other important techniques which are used to take advantage of the static nature of the data (which
are essentially read-only):

-   A “learn by usage” approach. The SADAS engine analyzes the characteristics of each processed query in order
    to identify the most common type of inquiries, and dynamically creates additional index structures to reduce their
    execution time.

                          The SADAS allocation in a typical data warehouse architecture
          OLTP                       replication
          EDW                        data quality
           files                                                                     SQL
          source                                         Feeds                     ODBC
         systems                              (Rewrite, append, update)            JDBC

There is a very limited impact on the storage space required because of the SADAS’s compression techniques. Indicati-
vely, SADAS requires a 20% space more than a traditional “row-based” DBMS.
What is the practical consequence of the “Learn by usage” approach?

queries (or similar) will be even faster.
What is the level of improvment which can be obtained against “row-based” architectures?
Tests based on the TPC-H model have shown an improvement which ranges from 5 to 75 times in processing time. This
does not consider the additional advantages which can later be obtained with the “learn by usage” approach.
Is SADAS a potential substitute for traditional “row-based” DBMSs?
Not at all. SADAS is a DBMS which has been designed for OLAP applications and not for OLTP applications.
What is the impact of SADAS introduction?
SADAS has been designed to minimize the
impact on existing OLAP applications. It is
SQL based (SQL-99 standard) and can be
accessed through:
                                                                                 max (date)
-   Direct Call Interface (.DLL)                                                sum (values)

-   JDBC driver
-   ODBC driver
-   Delphi environment
-   Internal Web interface
It is just the matter of replacing the DBMS     row-based DBMSs • Item information             SADAS • Attribute values stored contiguously
                                                         stored contiguously
engine without hurting existing applica-
                                                Mario     Rossi    11/04/1961   100.000
tions or BI tools.                                                                               Mario    Rossi     11/04/1961   100.000

Which are the practical advantages               Aldo    Grosso    07/10/1958   50.000           Aldo    Grosso     07/10/1958   50.000

of using SADAS?
                                                                                                 Pino    Bianchi    25/12/1965   340.000
                                                 Pino    Bianchi   25/12/1965   340.000
There are substantially two kinds of
                                                                                                 Fabio    Alga      05/08/1970   800.000
advantages:                                     Fabio     Alga     05/08/1970   800.000

-   Operational: Users can obtain the same
    results in a much shorter time. In some
    situations this can even determine the                                            RESULT
                                                                         25/12/1965             1290.000
-   Economical: Data warehouses can be
    used on hardware platforms which are
    one or two orders less expensive of
    those needed by traditional DBMSs.
Is SADAS suitable for very large
amount of data?
SADAS has been explicitly designed to
operate on very large set of data (when
traditional DBMS start showing perform-
ances issues). It can work in 64-bits mode
and therefore has no internal limitation                   ODBC                                                    JDBC
over the amounts of data to be processed.
Is SADAS a research project or an
                                                        SADAS CALL INTERFACE (SCI)
industrial product?
The original design of SADAS is the result of
a research project sponsored by the Italian
Ministry of University and Research (MIUR)
and coordinated by the Computer Science
Department of the Pisa University. The
technology has then been developed at
industrial level and it is used in complex
decision support applications.
"Column databases typically outperform all other databases for analytic applica-
                                       Donald Feinberg, Gartner Vice President, 2008


to drop, add or update a column in a columnar database as it is to insert, change
or delete a row in a relational database"
                             Boris Evelson, Forrester Research Principal Analyst, 2008

    SADAS has been developed in 2005 based on the result of a research project partially supported by the Italian Ministry of
    University and Research (MIUR), under FAR Fund DM 297/99, project number 11384.
    The project partners were:
                                                                       The coordinator of the project was Prof. Antonio Albano

    AntonioAlbano is a Professor of Computer Science at the University of Pisa. He is an internationally known
    expert on object oriented database programming languages. He has written books on databases and research papers that
    have been published in leading database conferences and journals. He has served on several international database confe-
    rence program committees.

    LucioGoglia is co-founder of Advanced Systems S.r.l. and CEO since its constitution. He has always paid attention
    to DBMS performance in data warehouse environment, and with his intuition and determination has given a fundamental
    contribution on driving products to success.
    SADAS is his last ambitious project.

    Advanced Systems s.r.l.                                                                         30

    vertical markets.

                                                                                    Centro Meridiana • Via Napoli, 159 • 80013
                                                                                                Casalnuovo di Napoli • Napoli
                                                                                         Tel. 081 8427111 • Fax 081 842 7171
                                                                                                Via Turati, 26 • 20121 Milano
                                                                                       Tel. 02 2901 7449 • Fax 02 2901 4479

Shared By: