Docstoc

Designing Real-time Sensor Data Warehouse Architecture Using MySQL ... - PowerPoint

Document Sample
Designing Real-time Sensor Data Warehouse Architecture Using MySQL ... - PowerPoint Powered By Docstoc
					     Real-Time Sensor Data Warehouse
    Architecture Using MySQL Database

                                                               Jacob Nikom

                                                   MIT Lincoln Laboratory

                                     The MySQL Users Conference 2005

                                                               19 April 2005


MySQL Users Conf.
                                                                                                                    MIT Lincoln Laboratory
                    This work was sponsored by the U.S. Army Space and Missile Defense Command under Air Force Contract# F19628-00-C-0002.      1
   04-19-2005
                    Opinions, interpretations, recommendations and conclusions are that of the author and are not necessarily endorsed by the
                    United States Government.
                              Outline


        •      Introduction

        •      Corporate Information Factory (CIF) and its
               Data Management Architecture (DMA)

        •      Designing ROCC DMA using CIF architecture

        •      Summary




MySQL Users Conf.
                                                      MIT Lincoln Laboratory
   04-19-2005                  8/25/2011 9:47:19 PM                            2
                                        Outline

           • Introduction
                     – Reagan Test Site (RTS) and its instrumentation
                     – What is RTS Operations Coordination Center (ROCC)?
                     – ROCC primary operations
                     – ROCC logical component block diagram
                     – ROCC modernization
                     – New ROCC Data Management Architecture
           • Corporate Information Factory (CIF) and its Data
                    Management Architecture (DMA)
           • Designing ROCC DMA based on CIF architecture
           • Summary


MySQL Users Conf.
                                                                    MIT Lincoln Laboratory
   04-19-2005                             8/25/2011 9:47:19 PM                               3
                              Reagan Test Site (RTS) and its
                                    Instrumentation
           • The Reagan Test Site (RTS) range instrumentation
                    –   Multiple RF sensors collecting data in several regions of electromagnetic spectrum




                    –   Multiple optical sensors collecting objects’ metrics and spectral characteristics




                    –   Telemetry systems capable of tracking multiple targets




                    –   Mobile and fixed ground safety instrumentation

MySQL Users Conf.
                                                                                       MIT Lincoln Laboratory
   04-19-2005                                      8/25/2011 9:47:19 PM                                         4
                    What is RTS Operations Coordination
                              Center (ROCC)?
• RTS instrumentation is controlled by the ROCC
             Current DMA


                    Data Analysis                                Decision
                     Algorithms      Displays                   Algorithms
                                                                               Flat Files


                                                       Network




                                            Sensors

• ROCC primary operations
         – Executes the prepared scenario for the acquisition session
         –      Manages the data flow from multiple sensors
         –      Processes the acquired data
         –      Provides operator displays to track and predict the path of space objects
         –      Stores the acquired data for later analysis and reporting
         –      Facilitates training and simulation of performed activities
MySQL Users Conf.
                                                                             MIT Lincoln Laboratory
   04-19-2005                            8/25/2011 9:47:19 PM                                         5
                            What kind of system is ROCC?
                        Feedback control system block diagram
                                                     FORWARD PATH
                    COMPARATOR
       reference                          error                                actuating                  controlled
       Input r(t)       +              signal e(t)                            signal m(t)                variable c(t)
                                                     CONTROLLER                             PLANT
                            -
                 feedback       b(t)
                   signal                                                       c(t)
                                                       feedback
                                                       processor

                                                     FEEDBACK PATH

•       Control is the process of making a system variable adhere to a particular value, called
        reference value
•       A system designed to follow a changing reference is called tracking control system

        ROCC is a tracking control system following the predefined reference input


    MySQL Users Conf.
                                                                                             MIT Lincoln Laboratory
       04-19-2005                                      8/25/2011 9:47:19 PM                                           6
                      Current ROCC DMA Block Diagram
       •      ROCC controls the data acquisition, analysis and distribution processes
       •      Maximizes the quality of delivered data over specified time

                          Tactical decision control loop



       Reference                                        Data Plant                               Output
         Data                                                                                     Data
                                                        Sensors

                                                                                                Report:
           Planning                                    Simulation                             Data analysis




                                          Manual Processing & Analysis

                                   Displays            Voice          Operators


                               Automatic Real-Time Processing & Analysis

                                   Tracking        Classification      Trajectory
                                    Fusion         Identification      Estimation




MySQL Users Conf.
                                                                                    MIT Lincoln Laboratory
   04-19-2005                                  8/25/2011 9:47:19 PM                                           7
                               ROCC Modernization
           • Obsolete system hardware
                    – Old central processors and boards are no longer supported
                    – Not enough computational power to perform new tasks
                    – Old components and interfaces are incompatible with modern
                      technology
           • Aging system software
                    –   Centralized monolithic architecture
                    –   Flat files for storing data
                    –   Use of old procedural languages
                    –   Alphanumeric displays
           • Modernized system
                    –   Industry standard 32/64-bit Xeon or Opteron servers
                    –   Software vendor independence: Linux and Java
                    –   Database-based storage
                    –   Distributed architecture using publish/subscribe paradigm
                    –   Graphical user interface for visualization tools
                    –   Targeted dataflow rates: 5 MB/s (sustained), 10 MB/s (peak)
                    –   Data accumulation rate: 1 TB/year

MySQL Users Conf.
                                                                     MIT Lincoln Laboratory
   04-19-2005                              8/25/2011 9:47:19 PM                               8
                    New Data Management Architecture
•     ROCC data management challenges
         –      Support powerful high-precision instrumentation with almost real-time response
         –      Support intensive and costly data collection process involving many human
                operators with high level of reliability
         –      Support data analysis leading to changes in data acquisition environment
         –      Be adequate for the wide range of transaction types – from simple real-time
                record reads and inserts to complex multidimensional analytical queries
         –      Manage combination of streaming data with traditional structures
         –      Provide request management, configuration management and data quality
                management capabilities
•     Search for new data management architecture
         –      New system represents conceptual change from the old architecture
         –      Instrumentation and Control software traditionally concentrates on algorithm
                development and lacks good data architecture
         –      Need for framework supporting “analysis – decision – execution” paradigm
         –      Enterprise software is a leading implementer of distributed architecture and
                publish/subscribe paradigm
MySQL Users Conf.
                                                                          MIT Lincoln Laboratory
   04-19-2005                              8/25/2011 9:47:19 PM                                    9
                                        Outline

           • Introduction
           • Corporate Information Factory (CIF) for Data Management
                    Architecture
                     – What is Corporate Information Factory (CIF)?
                     – CIF data flow diagram
                     – CIF data
                     – CIF layers
                     – CIF logical component block diagram
           • Designing ROCC data management architecture using CIF
                    architecture
           • Summary



MySQL Users Conf.
                                                                      MIT Lincoln Laboratory
   04-19-2005                             8/25/2011 9:47:19 PM                                 10
                        What is Corporate Information
                               Factory (CIF) ? *
    •      Information ecosystem is a model of corporate information processing
              –     “CIF is the physical embodiment of the notion of an information ecosystem”
    •      CIF consists of the following components
              –     External world
              –     Applications
              –     An integration and transformation layer (I & T layer)
              –     An operational data store (ODS)
              –     A data warehouse (DW) with current and historical detailed data
              –     A data mart(s)
              –     An internet and intranet
              –     A metadata repository
              –     An exploration and data mining warehouse
              –     Alternative (secondary) storage
              –     Decision support system (DSS)
    •      CIF approach could be used for modeling information processing in any
           organization (“forest vs. trees” view)
  * “Corporate Information Factory”, by W.H. Inmon, Claudia Imhoff, Ryan Sousa. Wiley; 2 edition (December 18, 2000)
MySQL Users Conf.
                                                                                       MIT Lincoln Laboratory
   04-19-2005                                  8/25/2011 9:47:19 PM                                               11
                                 CIF Data Flow Diagram
                                                                                                                  External data
  External
   world                        Data                                    Primary                     Data
                             acquisition                                storage                    delivery
                                                 Reference            management
                                                   data                                                           Exploration
                                                                            Historical                            warehouse
                                                                            reference                                            Statistical
                                                                               data                                               analysis
         Internet                                                                                                 Data mining
                                                                                                                  warehouse
       Enterprise
       Resource               Application   Integration Operational              Warehouse                Report & Analysis
                                layer       &Transform    layer                    layer                        layer
        Planning                               layer
         (ERP)                                                                       Alternative
                                                                                      storage           eComm
                                 eComm                                                                   (rpt)
                                   (tx)
                                                                                                         CRM
                                                                                                         (rpt)
                                                                                                                              DSS
                                   ERP
                                   (tx)                                                                  ERP               applications
                                                                                                         (rpt)

                                                                                                          BI
                                  CRM                                                                    (rpt)
                                  (tx)
        Enterprise
       transactions
                                    BI
                                                                                 DW                              Finance

                                   (tx)                                                                          Sales            Data
CRM = Customer                                                                                                                    marts
                                                                                                                 Marketing
Relation Management
                                                                ODS                                              Accounting
BI = Business Intelligence


                                   Operational
                                                                             Row detailed data                Metadata management
                                    reports


  MySQL Users Conf.
                                                                                                     MIT Lincoln Laboratory
     04-19-2005                                           8/25/2011 9:47:19 PM                                                             12
                                                 CIF Data
•      External data
           –      Data is defined outside of corporation. Could have erroneous, redundant or unnecessary items
           –      Data format is defined outside of corporation. Reformatting could be required
•      Reference data
           –      Allows to standardize on commonly used names for important and frequently used information
           –      Allows consistent interpretation of corporate data across different departments
           –      Could be aliases for common and often referred names
•      Historical data
           –      Volume of data – longer history more data
           –      Usefulness of data – recent data is more useful than the older one
           –      Granularity of data – older data likely be used on summary level
                                                      Corporate timeline

                   Ancient history            Recent history              Most current activity          Immediate future

       Data



                                                                                 ODS                     Applications
                                     DW
    MySQL Users Conf.
                                                                                                  MIT Lincoln Laboratory
       04-19-2005                                  8/25/2011 9:47:19 PM                                                     13
                                  CIF Layers

            eComm
              (tx)

                      • Application layer
              ERP
              (tx)        – Interacting directly with end user
                          – Gathering detailed transaction data
              CRM
              (tx)
                          – Auditing and adjusting data
                BI        – Editing data
               (tx)




                      • Integration and transformation layer
                          – Combined non-integrated data from multiple application
                          – Transform external data into corporate data
                          – Creating appropriate metadata
                          – Mathematical transformation
                          – Reformatting and resequencing
MySQL Users Conf.
                                                                  MIT Lincoln Laboratory
   04-19-2005                        8/25/2011 9:47:19 PM                                  14
                                  CIF Layers (Continued)
                                    •   Operational layer
                                         –   Subject-oriented
                                         –   Integrated
                                         –   Volatile
                      ODS                –   Current-valued
                                         –   Detailed
                                         –   Normalized

                                    •   Warehouse layer
                                         –   Subject-oriented
                                         –   Integrated
                       Data              –   Nonvolatile
                                         –   Time-variant
                    Warehouse            –   Comprised of both summary and detailed data
                                         –   Summary data optimized for Report & Analyses queries
                                         –   Normalized and de-normalized data


                     Statistics
                                    •   Report & Analysis layer
                                         –   Statistical analysis
                      eComm
                                              – Exploration reporting
                       (rpt)                  – Data mining reporting
                      CRM (rpt)
                                         –   DSS analysis and reporting
                                              – Finance
                        ERP
                                              – Sales
                        (rpt)                 – Marketing
                         BI                   – Accounting
                        (rpt)

MySQL Users Conf.
                                                                               MIT Lincoln Laboratory
   04-19-2005                                 8/25/2011 9:47:19 PM                                      15
                        CIF Logical Component Block Diagram
•       System controls the corporation resources using real-time and long-term DSS
•       Maximized the expected profit of corporation over specified time
                             Strategic decision control loop

                                       Tactical decision control loop
            Reference                                                                  Output
              Data                                          Data Plant                  Data

            Corporate                                  Applications                  Corporate
              Goals                                                                   Report


                                                       Real-time DSS

                                                          Operational
                                                          Data Store



                                                        Long-term DSS

                                                            Data
                                                         Warehouse

    MySQL Users Conf.
                                                                          MIT Lincoln Laboratory
       04-19-2005                                  8/25/2011 9:47:19 PM                            16
                                         Outline

           • Introduction
           • Corporate Information Factory (CIF) for Data Management
                    Architecture (DMA)
           • Designing ROCC DMA using CIF architecture
                     – ROCC data flow diagram
                     – ROCC data
                     – ROCC layers
                     – ROCC logical component block diagram
                     – Database selection
                     – Three dangers of database design
           • Summary


MySQL Users Conf.
                                                                   MIT Lincoln Laboratory
   04-19-2005                               8/25/2011 9:47:19 PM                            17
                                   ROCC Data Flow Diagram
                              Data                      Operational                  Archived
                           acquisition                     data                        data
                                            Reference
                                              data
External
 world                        Integration                    Operational             Warehouse       Report & Analysis
                              &Transform                       layer                   layer               layer
                                 layer
                                                                       Planning
                                                                                     Secondary              Bias
                                                                                      storage              modeling        Long-term
                                                                                                                          reporting &
                                Multicast middleware                     DSS                                                analysis
                                                                      applications                        Data mining
                                                                                                          warehouse
                                  RIB                                  Classifier



                                  RIB                                 Best Choice
                                                                                                                Post
                                                                                                    BET       overview
                                                    ODS                                                                       Short-term
                                                                                                                              reporting &
                                  RIB                                  Smoother
                                                                                       DW          Impact        …
                                                                                                                                analysis




                                  RIB                                 Data Fusion
Sensor control data

                                                                                                                      Data
                                                                                                   Space              marts
                                                               Quick Look
                                                                reports




RIB = ROCC Interface Box

 MySQL Users Conf.
                                                                                                 MIT Lincoln Laboratory
    04-19-2005                                             8/25/2011 9:47:19 PM                                                             18
                                          ROCC Data

•      External data
           –        Data is defined outside of ROCC. Could have erroneous, redundant, or
                    unnecessary items
           –        Data format is defined outside of ROCC. Reformatting or object conversion
                    could be required
•      Reference data
           –        Comprise geophysics models and constants necessary for external data
                    interpretation
           –        Comprise common locations, sensor names, name of computers, programs
           –        Comprise the user names, passwords, access rights and privileges
•      Historical data
           –        Operational data being migrated to the warehouse become historical data
           –        Detailed historical data are used to produce summarized historical data
           –        Historical data only inserted, never updated
•      Planning data
           –        Comprise configuration data for the sensors’ acquisition procedures
           –        Comprise ROCC software components’ configuration data (XML format)
           –        Comprise data to plan specific activities to acquire space objects’ coordinates
MySQL Users Conf.
                                                                             MIT Lincoln Laboratory
   04-19-2005                                  8/25/2011 9:47:19 PM                                   19
                                     ROCC Layers

  •      External world
                          –   Simultaneous output from multiple sensors up to 10 MB/s
                          –   Capable to produce data autonomously
                          –   Capable to work under the guidance of DSS applications
                          –   Produces data as streams with considerable output rates

                                Feedback from
                               DSS applications




  •      Integration and transformation layer
              RIB   Plays vitally important role in reconciling the incoming external data
                    content and format with the internal data requirements
              RIB         –   Converts incoming data into appropriate Java objects
                          –   Creates necessary metadata
              RIB         –   Mathematical transformation
                          –   Reformatting and resequencing
              RIB



MySQL Users Conf.
                                                                       MIT Lincoln Laboratory
   04-19-2005                                 8/25/2011 9:47:19 PM                              20
                             ROCC Layers (continued)
  •      Operational Layer

                     –   Subject-oriented
                          Focusing on basic transaction processing. Inserts and reads the streams of integrated and
                          transformed sensor data
                          • Tracks, Ids, Control blocks, etc.
       ODS           –   Integrated
                          Physical unification and cohesiveness
                          • Uniform key structures
                          • Table naming conventions
                          • Common physical units and coordinate systems
                          • Data layouts and Metadata
                     –   Volatile
         DSS
      applications
                          ODS data could be updated (replaced) as a normal part of processing. After acquisition
                          session is done the data are moved to the DW
       Classifier
                     –   Current-valued
                          ODS data values are related to the current event (current acquisition session). For the next
                          mission the ODS will be updated and its content will be moved to the DW (data migration)
      Best Choice
                     –   Detailed
                          ODS contains inserted values of the published sensor objects and does not expect to have
       Smoother           summary data
                     –   Normalized
                          ODS contains normalized data
      Data Fusion
                     –   Decision Support System Applications
                          Makes real-time operational decisions like ID assignment, sensor allocation, etc

MySQL Users Conf.
                                                                                         MIT Lincoln Laboratory
   04-19-2005                                   8/25/2011 9:47:19 PM                                                21
                                     ROCC ODS Specifics
• Data streams of objects
        –      Streams of measurements usually don’t have very complex structures
        –      Object-relational mapping is straightforward and not computationally intensive
• Indices
        –      High-speed insertion does not allow to use indices
        –      Relatively small size of the ODS allows to work without indices
        –      Indices do exist in the DW
• Real-time DSS feedback
        –      Could control the sensors, which in turn influences the input data
        –      Typical analytical application assume that data producer is not changed during
               the query
• Fault-tolerance (primary and secondary ODS)

                     Network   Network                             Additional benefits
                                           Network
                                                                   •    Necessary operations could be performed
                      ODS       ODS          DW                        during the copying
                                                                   •   Two operational databases could be used in
                                                                       parallel right after the acquisition
                     Primary   Secondary   Archive
                     System     System     System


 MySQL Users Conf.
                                                                                          MIT Lincoln Laboratory
    04-19-2005                              8/25/2011 9:47:19 PM                                                    22
                             ROCC Layers (continued)
           •        Historical (data warehouse) layer
                     –   Subject-oriented
    Data                    Organized like ODS around major ROCC entities, but focused on the
 Warehouse                  modeling and analysis of data
                     –   Integrated
                            Data migrated into DW from ODS are integrated with the rest of DW data
                     –   Time-variant
                            Every datum in the data warehouse is identified with a particular time
                            period. All summarized data are correct only for the particular period to
                            whom the corresponding detailed data are identified with
                     –   Non-volatile
                            There are no updates in the warehouse, only inserts. The past cannot be
                            changed, only expanded
                     –   Comprised of both summary and detailed data
                            Once detailed data from ODS migrated into DW, they became a part of
                            history. In addition to the detailed historical data DW contains summary
                            data. They are pre-calculated to reduce analytical query times
                     –   ROCC DW specifics
                            ROCC DW does not use multidimensional data model yet, only summarized
                            tables

MySQL Users Conf.
                                                                                    MIT Lincoln Laboratory
   04-19-2005                                    8/25/2011 9:47:19 PM                                        23
                                  ROCC Layers (continued)

   • Analysis and Reporting layer
                       Continuous automatic monitoring of sensor metric
                                          performance
                     Example: Angle Bias Modeling using ROCC Data Warehouse

What is Angle Bias Modeling?                   Creation of a mathematical model to describe differences
                                               between reported and actual antenna pointing positions
                           Real-time queries
                                                               Sensor data
                            Storing sensor
                                                                collection
                             data streams
          ODS                                   RIB                                                               Bias


                        Data
                      migration

                                                                                                    Corrected
                          Analytical                          Bias model
                                             Bias                                                   pointing
                           queries                            coefficients
          Data                                                                                      information
                                           Modeling
       Warehouse                          Application


                                                                           Sensor Control System

 MySQL Users Conf.
                                                                                                   MIT Lincoln Laboratory
    04-19-2005                                          8/25/2011 9:47:19 PM                                                24
                        Angle Bias Modeling using ROCC
                                Data Warehouse

         Organization of Sensor-Specific Summary Track Data in the Warehouse
                               Observed Data                             Truth Data (Time-aligned and in Sensor Coord)               Residual Data

Source Time         Range     Az      El   Iono Corr Tropo Corr   SNR        Range            Az              El         Delta Rng      Delta Az     SNR




                                    Bias Modeling Application Data Flow

                                                                                                                         Strategic decision
                                                                                               Bias Model
                              Truth                                                                                         control loop
                                                                                                Analytic
                              Data                                                              Equation                    Sensor Control
                                                                                                                               System

    Data                     Observed             Generate              Residual              Multivariate
 Warehouse                     Data               Residuals               Data                Regression


                            Atmospheric
                                                                                              Bias Model                        Data
                               Data                                      Report
                                                                                              Coefficients                   Warehouse



MySQL Users Conf.
                                                                                                                MIT Lincoln Laboratory
   04-19-2005                                                 8/25/2011 9:47:19 PM                                                                   25
                    ROCC Logical Component Block Diagram
  • ROCC controls the RTS resources using tactical and strategic DSS
  • Maximizes the quality of collected data over specified time
                        Strategic decision control loop

        Reference               Tactical decision control loop                                          Output
          Data
                                                        Data Plant                                       Data
                                                          Sensors
         Planning                                       Simulation                                      Report
                                                                                                     Data Analysis
                                                 Tactical real-time DSS
                                            Displays       Voice          Operators

                                          Tracking       Classification      Trajectory
                                           Fusion        Identification      Estimation


                                                          Operational
                                                          Data Store



                                                     Strategic long-term DSS
                                       Bias Modeling Sensor Comparison Operators

                                                              Data
                                                           Warehouse




MySQL Users Conf.
                                                                                          MIT Lincoln Laboratory
   04-19-2005                                  8/25/2011 9:47:19 PM                                                  26
                           Database Selection
  •      The same server should work adequately for both ODS and DW
  •      Deficiency in sophistication could be mitigated by custom programming
    Comparison             MySQL       Oracle          DB2 (IBM)     SQL Server    PostgreSQL
    criteria
                                                                     (Microsoft)
    (qualitative values)

    Speed                    High       High                  High      High           Low

    Sophistication         Moderate     High                  High      High           High


    Reliability              High       High                  High   Moderate          Low

    Administration           High       Low                   Low    Moderate          High
    simplicity
    Standardization          High     Moderate          Moderate     Moderate       Moderate


    Savings                  High       Low                   Low       Low            High


MySQL Users Conf.
                                                                          MIT Lincoln Laboratory
   04-19-2005                          8/25/2011 9:47:19 PM                                        27
                    Three dangers of ROCC DMA design


   • “Balkanization” of data
              – Different groups of data have different design
              – Attempt to fit data definitions into requirements of the existing tool
              – In the long run increase the maintenance cost
   • Dialectism
              – Usage of specific database dialects
              – Deviation from existing SQL standards
              – Locks the user with specific vendor
   • “Dirty” repository design
              – Part of the data stored in the database, another (closely related on)
                stored in the file system
              – Duplication of data between database and file system
              – Increases the maintenance const



MySQL Users Conf.
                                                                    MIT Lincoln Laboratory
   04-19-2005                           8/25/2011 9:47:19 PM                                 28
                                     Outline



                    • Introduction
                    • Corporate Information Factory (CIF) for Data
                      Management Architecture

                    • Designing ROCC data management architecture
                      using CIF Architecture

                    • Summary




MySQL Users Conf.
                                                               MIT Lincoln Laboratory
   04-19-2005                          8/25/2011 9:47:19 PM                             29
                                               Summary

•      Modernization of the ROCC calls for a new type of data management architecture
           –        New high-performance hardware
           –        Significant increase of generated and managed volumes of data
           –        Introduction of new services
•      CIF satisfies the requirements
           –        Designed to support large scale information system
           –        Effectively manages different types of information queries
           –        Provides flexibility in distributing data between multiple producers and consumers
•      ODS and DW represent two types of repositories for information request
           –        ODS supports near real-time storage requirements and targeted, low granular queries
           –        DW is used for complex queries against summary-level data
•      ODS and DW are parts of different control loops
           –        ODS provides information for tactical decisions about near real-time data acquisition
           –        DW delivers feedback for strategic decisions leading to system improvements
•      MySQL is a good fit for ODS and DW databases
           –        Good performance for fast queries in ODS
           –        Capable of storing large amount of data in DW
           –        Simple installation and licensing allow many independent servers to run inside one system
                    being used as ODS, DW, data marts, etc.
           –        Excellent Java support allows seamless integration with the rest of the software

MySQL Users Conf.
                                                                                     MIT Lincoln Laboratory
   04-19-2005                                     8/25/2011 9:47:19 PM                                        30

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:16
posted:8/26/2011
language:English
pages:30