Designing Real-time Sensor Data Warehouse Architecture Using MySQL ... - PowerPoint
Document Sample


Real-Time Sensor Data Warehouse
Architecture Using MySQL Database
Jacob Nikom
MIT Lincoln Laboratory
The MySQL Users Conference 2005
19 April 2005
MySQL Users Conf.
MIT Lincoln Laboratory
This work was sponsored by the U.S. Army Space and Missile Defense Command under Air Force Contract# F19628-00-C-0002. 1
04-19-2005
Opinions, interpretations, recommendations and conclusions are that of the author and are not necessarily endorsed by the
United States Government.
Outline
• Introduction
• Corporate Information Factory (CIF) and its
Data Management Architecture (DMA)
• Designing ROCC DMA using CIF architecture
• Summary
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 2
Outline
• Introduction
– Reagan Test Site (RTS) and its instrumentation
– What is RTS Operations Coordination Center (ROCC)?
– ROCC primary operations
– ROCC logical component block diagram
– ROCC modernization
– New ROCC Data Management Architecture
• Corporate Information Factory (CIF) and its Data
Management Architecture (DMA)
• Designing ROCC DMA based on CIF architecture
• Summary
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 3
Reagan Test Site (RTS) and its
Instrumentation
• The Reagan Test Site (RTS) range instrumentation
– Multiple RF sensors collecting data in several regions of electromagnetic spectrum
– Multiple optical sensors collecting objects’ metrics and spectral characteristics
– Telemetry systems capable of tracking multiple targets
– Mobile and fixed ground safety instrumentation
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 4
What is RTS Operations Coordination
Center (ROCC)?
• RTS instrumentation is controlled by the ROCC
Current DMA
Data Analysis Decision
Algorithms Displays Algorithms
Flat Files
Network
Sensors
• ROCC primary operations
– Executes the prepared scenario for the acquisition session
– Manages the data flow from multiple sensors
– Processes the acquired data
– Provides operator displays to track and predict the path of space objects
– Stores the acquired data for later analysis and reporting
– Facilitates training and simulation of performed activities
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 5
What kind of system is ROCC?
Feedback control system block diagram
FORWARD PATH
COMPARATOR
reference error actuating controlled
Input r(t) + signal e(t) signal m(t) variable c(t)
CONTROLLER PLANT
-
feedback b(t)
signal c(t)
feedback
processor
FEEDBACK PATH
• Control is the process of making a system variable adhere to a particular value, called
reference value
• A system designed to follow a changing reference is called tracking control system
ROCC is a tracking control system following the predefined reference input
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 6
Current ROCC DMA Block Diagram
• ROCC controls the data acquisition, analysis and distribution processes
• Maximizes the quality of delivered data over specified time
Tactical decision control loop
Reference Data Plant Output
Data Data
Sensors
Report:
Planning Simulation Data analysis
Manual Processing & Analysis
Displays Voice Operators
Automatic Real-Time Processing & Analysis
Tracking Classification Trajectory
Fusion Identification Estimation
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 7
ROCC Modernization
• Obsolete system hardware
– Old central processors and boards are no longer supported
– Not enough computational power to perform new tasks
– Old components and interfaces are incompatible with modern
technology
• Aging system software
– Centralized monolithic architecture
– Flat files for storing data
– Use of old procedural languages
– Alphanumeric displays
• Modernized system
– Industry standard 32/64-bit Xeon or Opteron servers
– Software vendor independence: Linux and Java
– Database-based storage
– Distributed architecture using publish/subscribe paradigm
– Graphical user interface for visualization tools
– Targeted dataflow rates: 5 MB/s (sustained), 10 MB/s (peak)
– Data accumulation rate: 1 TB/year
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 8
New Data Management Architecture
• ROCC data management challenges
– Support powerful high-precision instrumentation with almost real-time response
– Support intensive and costly data collection process involving many human
operators with high level of reliability
– Support data analysis leading to changes in data acquisition environment
– Be adequate for the wide range of transaction types – from simple real-time
record reads and inserts to complex multidimensional analytical queries
– Manage combination of streaming data with traditional structures
– Provide request management, configuration management and data quality
management capabilities
• Search for new data management architecture
– New system represents conceptual change from the old architecture
– Instrumentation and Control software traditionally concentrates on algorithm
development and lacks good data architecture
– Need for framework supporting “analysis – decision – execution” paradigm
– Enterprise software is a leading implementer of distributed architecture and
publish/subscribe paradigm
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 9
Outline
• Introduction
• Corporate Information Factory (CIF) for Data Management
Architecture
– What is Corporate Information Factory (CIF)?
– CIF data flow diagram
– CIF data
– CIF layers
– CIF logical component block diagram
• Designing ROCC data management architecture using CIF
architecture
• Summary
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 10
What is Corporate Information
Factory (CIF) ? *
• Information ecosystem is a model of corporate information processing
– “CIF is the physical embodiment of the notion of an information ecosystem”
• CIF consists of the following components
– External world
– Applications
– An integration and transformation layer (I & T layer)
– An operational data store (ODS)
– A data warehouse (DW) with current and historical detailed data
– A data mart(s)
– An internet and intranet
– A metadata repository
– An exploration and data mining warehouse
– Alternative (secondary) storage
– Decision support system (DSS)
• CIF approach could be used for modeling information processing in any
organization (“forest vs. trees” view)
* “Corporate Information Factory”, by W.H. Inmon, Claudia Imhoff, Ryan Sousa. Wiley; 2 edition (December 18, 2000)
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 11
CIF Data Flow Diagram
External data
External
world Data Primary Data
acquisition storage delivery
Reference management
data Exploration
Historical warehouse
reference Statistical
data analysis
Internet Data mining
warehouse
Enterprise
Resource Application Integration Operational Warehouse Report & Analysis
layer &Transform layer layer layer
Planning layer
(ERP) Alternative
storage eComm
eComm (rpt)
(tx)
CRM
(rpt)
DSS
ERP
(tx) ERP applications
(rpt)
BI
CRM (rpt)
(tx)
Enterprise
transactions
BI
DW Finance
(tx) Sales Data
CRM = Customer marts
Marketing
Relation Management
ODS Accounting
BI = Business Intelligence
Operational
Row detailed data Metadata management
reports
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 12
CIF Data
• External data
– Data is defined outside of corporation. Could have erroneous, redundant or unnecessary items
– Data format is defined outside of corporation. Reformatting could be required
• Reference data
– Allows to standardize on commonly used names for important and frequently used information
– Allows consistent interpretation of corporate data across different departments
– Could be aliases for common and often referred names
• Historical data
– Volume of data – longer history more data
– Usefulness of data – recent data is more useful than the older one
– Granularity of data – older data likely be used on summary level
Corporate timeline
Ancient history Recent history Most current activity Immediate future
Data
ODS Applications
DW
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 13
CIF Layers
eComm
(tx)
• Application layer
ERP
(tx) – Interacting directly with end user
– Gathering detailed transaction data
CRM
(tx)
– Auditing and adjusting data
BI – Editing data
(tx)
• Integration and transformation layer
– Combined non-integrated data from multiple application
– Transform external data into corporate data
– Creating appropriate metadata
– Mathematical transformation
– Reformatting and resequencing
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 14
CIF Layers (Continued)
• Operational layer
– Subject-oriented
– Integrated
– Volatile
ODS – Current-valued
– Detailed
– Normalized
• Warehouse layer
– Subject-oriented
– Integrated
Data – Nonvolatile
– Time-variant
Warehouse – Comprised of both summary and detailed data
– Summary data optimized for Report & Analyses queries
– Normalized and de-normalized data
Statistics
• Report & Analysis layer
– Statistical analysis
eComm
– Exploration reporting
(rpt) – Data mining reporting
CRM (rpt)
– DSS analysis and reporting
– Finance
ERP
– Sales
(rpt) – Marketing
BI – Accounting
(rpt)
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 15
CIF Logical Component Block Diagram
• System controls the corporation resources using real-time and long-term DSS
• Maximized the expected profit of corporation over specified time
Strategic decision control loop
Tactical decision control loop
Reference Output
Data Data Plant Data
Corporate Applications Corporate
Goals Report
Real-time DSS
Operational
Data Store
Long-term DSS
Data
Warehouse
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 16
Outline
• Introduction
• Corporate Information Factory (CIF) for Data Management
Architecture (DMA)
• Designing ROCC DMA using CIF architecture
– ROCC data flow diagram
– ROCC data
– ROCC layers
– ROCC logical component block diagram
– Database selection
– Three dangers of database design
• Summary
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 17
ROCC Data Flow Diagram
Data Operational Archived
acquisition data data
Reference
data
External
world Integration Operational Warehouse Report & Analysis
&Transform layer layer layer
layer
Planning
Secondary Bias
storage modeling Long-term
reporting &
Multicast middleware DSS analysis
applications Data mining
warehouse
RIB Classifier
RIB Best Choice
Post
BET overview
ODS Short-term
reporting &
RIB Smoother
DW Impact …
analysis
RIB Data Fusion
Sensor control data
Data
Space marts
Quick Look
reports
RIB = ROCC Interface Box
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 18
ROCC Data
• External data
– Data is defined outside of ROCC. Could have erroneous, redundant, or
unnecessary items
– Data format is defined outside of ROCC. Reformatting or object conversion
could be required
• Reference data
– Comprise geophysics models and constants necessary for external data
interpretation
– Comprise common locations, sensor names, name of computers, programs
– Comprise the user names, passwords, access rights and privileges
• Historical data
– Operational data being migrated to the warehouse become historical data
– Detailed historical data are used to produce summarized historical data
– Historical data only inserted, never updated
• Planning data
– Comprise configuration data for the sensors’ acquisition procedures
– Comprise ROCC software components’ configuration data (XML format)
– Comprise data to plan specific activities to acquire space objects’ coordinates
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 19
ROCC Layers
• External world
– Simultaneous output from multiple sensors up to 10 MB/s
– Capable to produce data autonomously
– Capable to work under the guidance of DSS applications
– Produces data as streams with considerable output rates
Feedback from
DSS applications
• Integration and transformation layer
RIB Plays vitally important role in reconciling the incoming external data
content and format with the internal data requirements
RIB – Converts incoming data into appropriate Java objects
– Creates necessary metadata
RIB – Mathematical transformation
– Reformatting and resequencing
RIB
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 20
ROCC Layers (continued)
• Operational Layer
– Subject-oriented
Focusing on basic transaction processing. Inserts and reads the streams of integrated and
transformed sensor data
• Tracks, Ids, Control blocks, etc.
ODS – Integrated
Physical unification and cohesiveness
• Uniform key structures
• Table naming conventions
• Common physical units and coordinate systems
• Data layouts and Metadata
– Volatile
DSS
applications
ODS data could be updated (replaced) as a normal part of processing. After acquisition
session is done the data are moved to the DW
Classifier
– Current-valued
ODS data values are related to the current event (current acquisition session). For the next
mission the ODS will be updated and its content will be moved to the DW (data migration)
Best Choice
– Detailed
ODS contains inserted values of the published sensor objects and does not expect to have
Smoother summary data
– Normalized
ODS contains normalized data
Data Fusion
– Decision Support System Applications
Makes real-time operational decisions like ID assignment, sensor allocation, etc
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 21
ROCC ODS Specifics
• Data streams of objects
– Streams of measurements usually don’t have very complex structures
– Object-relational mapping is straightforward and not computationally intensive
• Indices
– High-speed insertion does not allow to use indices
– Relatively small size of the ODS allows to work without indices
– Indices do exist in the DW
• Real-time DSS feedback
– Could control the sensors, which in turn influences the input data
– Typical analytical application assume that data producer is not changed during
the query
• Fault-tolerance (primary and secondary ODS)
Network Network Additional benefits
Network
• Necessary operations could be performed
ODS ODS DW during the copying
• Two operational databases could be used in
parallel right after the acquisition
Primary Secondary Archive
System System System
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 22
ROCC Layers (continued)
• Historical (data warehouse) layer
– Subject-oriented
Data Organized like ODS around major ROCC entities, but focused on the
Warehouse modeling and analysis of data
– Integrated
Data migrated into DW from ODS are integrated with the rest of DW data
– Time-variant
Every datum in the data warehouse is identified with a particular time
period. All summarized data are correct only for the particular period to
whom the corresponding detailed data are identified with
– Non-volatile
There are no updates in the warehouse, only inserts. The past cannot be
changed, only expanded
– Comprised of both summary and detailed data
Once detailed data from ODS migrated into DW, they became a part of
history. In addition to the detailed historical data DW contains summary
data. They are pre-calculated to reduce analytical query times
– ROCC DW specifics
ROCC DW does not use multidimensional data model yet, only summarized
tables
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 23
ROCC Layers (continued)
• Analysis and Reporting layer
Continuous automatic monitoring of sensor metric
performance
Example: Angle Bias Modeling using ROCC Data Warehouse
What is Angle Bias Modeling? Creation of a mathematical model to describe differences
between reported and actual antenna pointing positions
Real-time queries
Sensor data
Storing sensor
collection
data streams
ODS RIB Bias
Data
migration
Corrected
Analytical Bias model
Bias pointing
queries coefficients
Data information
Modeling
Warehouse Application
Sensor Control System
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 24
Angle Bias Modeling using ROCC
Data Warehouse
Organization of Sensor-Specific Summary Track Data in the Warehouse
Observed Data Truth Data (Time-aligned and in Sensor Coord) Residual Data
Source Time Range Az El Iono Corr Tropo Corr SNR Range Az El Delta Rng Delta Az SNR
Bias Modeling Application Data Flow
Strategic decision
Bias Model
Truth control loop
Analytic
Data Equation Sensor Control
System
Data Observed Generate Residual Multivariate
Warehouse Data Residuals Data Regression
Atmospheric
Bias Model Data
Data Report
Coefficients Warehouse
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 25
ROCC Logical Component Block Diagram
• ROCC controls the RTS resources using tactical and strategic DSS
• Maximizes the quality of collected data over specified time
Strategic decision control loop
Reference Tactical decision control loop Output
Data
Data Plant Data
Sensors
Planning Simulation Report
Data Analysis
Tactical real-time DSS
Displays Voice Operators
Tracking Classification Trajectory
Fusion Identification Estimation
Operational
Data Store
Strategic long-term DSS
Bias Modeling Sensor Comparison Operators
Data
Warehouse
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 26
Database Selection
• The same server should work adequately for both ODS and DW
• Deficiency in sophistication could be mitigated by custom programming
Comparison MySQL Oracle DB2 (IBM) SQL Server PostgreSQL
criteria
(Microsoft)
(qualitative values)
Speed High High High High Low
Sophistication Moderate High High High High
Reliability High High High Moderate Low
Administration High Low Low Moderate High
simplicity
Standardization High Moderate Moderate Moderate Moderate
Savings High Low Low Low High
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 27
Three dangers of ROCC DMA design
• “Balkanization” of data
– Different groups of data have different design
– Attempt to fit data definitions into requirements of the existing tool
– In the long run increase the maintenance cost
• Dialectism
– Usage of specific database dialects
– Deviation from existing SQL standards
– Locks the user with specific vendor
• “Dirty” repository design
– Part of the data stored in the database, another (closely related on)
stored in the file system
– Duplication of data between database and file system
– Increases the maintenance const
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 28
Outline
• Introduction
• Corporate Information Factory (CIF) for Data
Management Architecture
• Designing ROCC data management architecture
using CIF Architecture
• Summary
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 29
Summary
• Modernization of the ROCC calls for a new type of data management architecture
– New high-performance hardware
– Significant increase of generated and managed volumes of data
– Introduction of new services
• CIF satisfies the requirements
– Designed to support large scale information system
– Effectively manages different types of information queries
– Provides flexibility in distributing data between multiple producers and consumers
• ODS and DW represent two types of repositories for information request
– ODS supports near real-time storage requirements and targeted, low granular queries
– DW is used for complex queries against summary-level data
• ODS and DW are parts of different control loops
– ODS provides information for tactical decisions about near real-time data acquisition
– DW delivers feedback for strategic decisions leading to system improvements
• MySQL is a good fit for ODS and DW databases
– Good performance for fast queries in ODS
– Capable of storing large amount of data in DW
– Simple installation and licensing allow many independent servers to run inside one system
being used as ODS, DW, data marts, etc.
– Excellent Java support allows seamless integration with the rest of the software
MySQL Users Conf.
MIT Lincoln Laboratory
04-19-2005 8/25/2011 9:47:19 PM 30
Get documents about "