No Slide Title

W
Shared by: pengxiuhui
Categories
Tags
-
Stats
views:
2
posted:
8/16/2011
language:
English
pages:
58
Document Sample
scope of work template
							                     http://www.netbeams.org
           http://code.google.com/p/netbeams


A
 Key-Value-based Persistence Model
        for Sensor Networks


      Candidate: Marcello Alves de Sales Junior

       Masters of Science in Computer Science

          Advisor: Prof. Arno Puder, Ph.D.
  Committee Chair: Prof. Marguerite Murphy, Ph.D.

          Department of Computer Science
                       Outline
1. Motivation and State of the Art in Data Persistence for SN

2. Data Persistence in Sensor Networks: a Proposed
   Taxonomy

3. NetBEAMS: A Case Study

4. Technology Selection: Empirical Analysis

5. DSP Data Sensor Platform: Design and Architecture

6. 
   Experimental Results: Correct Behavior and Performance

7. Conclusions and Future Works

8. References
1. Motivation

•NetBEAMS: a Component-based sensor network for environmental
monitoring  09]
           [PJS+

   •Open Problem: Data Persistence for collected data

•How is Data Persistence done in the community?

•Biologists are the main users of the system

   •No skills in database systems;

   •But they have in programming languages (MatLAB)

•What types of database systems?

   •Use the traditional relational data model?

   •[BDD09]
           proposes programming languages (domain specific)
 1. State of the Art in Data
    Persistence for Sensor Networks
                 [ASSC02, 
•Sensor Networks         RM04]

   •Infrastructure (Topology) and Node Types

   •Size, Lifetime and Quality-Of-Service
    
    [ASSC02]
                                               [wp]
•Sensor Networks Nodes 
                       [RM04]

   •
    Deployment and Mobility

   • Resources, Cost, Energy,
    Size,
    Heterogeneity

   •
    Communication Mode

   •Coverage and Connectivity
                                                      [snc]
   •
 1. State of the Art in Data
    Persistence for Sensor Networks
•
 Persistence   Storage for Sensor Networks 
                                           [SRK+ 03]

•How the Collected Data is Used 

   •Real-time Data Stream [LNH05, SLM06]
   • Archival [SLM06, DML03]
    Data                                        [sd]

•
 Storage Location for Collected Data

    Local or External 
   •                 [SRK+ 03]
   •Data-Centric 
                 [SRK+ 03, SLM06]

•
 Query Processing Used

    In-Network 
   •            [SRK+ 03, KGH07]
   •Centralized [KGH07]

•Data Volume Produce                                   [ns]
1. State of the Art in Data
   Persistence for Sensor Networks
• Models and Query Engines 
 Data                       [Olk09]

•Tabular Data Model: .csv Files (Subversion)

   •Text Comparison
   •No Index                                   Data Sink
•
 Relational Data Model: binary files [LKK07]

   •
    Relational Algebra
   •Data Normalization
   •
    Structured Query Language (SQL)
       •Modified SQL for Sensor Networks 

                        [HRN+ 08, 
•Structured Data Model:          OCR+ 09]      Database
   •XML Schema: document structure               System
   •XML Xpath: query
 1. State of the Art in Data
    Persistence for Sensor Networks
 
• Collected Data is Described? Data Provenance 
 How                                                 [LNH05]
   •Data Stream: 334 55.45 -23.44 119.394 44 1 22 | 5

•Metadata: Data about data

   •hat was collected?
    W

       •Temperature = 54.3 : data
       •Scale = 
                ‟fahrenheit‟ : metadata

    
   •
    When was the data collected?

       • Time = Collected at 10:34am
        Valid
       •Transaction Time = Time to Arrive

    
   •
    From where was the data collected?

       •GPS Coordinates: (12.342, -145.304)
       •Site: „lower-pier‟ : metadata
 1. State of the Art in Data
    Persistence for Sensor Networks
•
 Problem: recent oil spill in the San Francisco Bay (Oct 2009) [sfb09]

   •What changes in the collected data?

   •Are there correlations between the
   collected data and the oil spill?
                                                        “Polluted” Data
   •How to describe the event?

•Data Annotation

   •Descriptive Metadata

   •Describing Video Frames from
   sensor cameras [lcsd09]

   •Tags for Web 2.0, YouTube Video Tag

                                                                         [an]
 2. Data Persistence in Sensor
Networks: a Proposed Taxonomy
2. Data Persistence in Sensor
   Networks Taxonomy
•
 Taxonomy (Greek τάξις, taxis (meaning 'order', 'arrangement') 
and νόμος, nomos

    •practice and science of classification,

    •using taxonomic units known as taxa (singular taxon)

•Represented by hierarchical diagrams

    •Relationships between the root and branches
  2. Data Persistence in Sensor
     Networks Taxonomy
                                                                                                                  Database System
                                 Data                                                                              Organization
                                Volume                                  Query Processing
                                                                          Mechanism

                                                                                                            Distributed      Centralized
                  Small         Medium          Large                                                        System           System
                                                               In-Network            Centralized
                                                                  Query                Query
Purpose of                                                     Processing            Processing
 Collected                                                                                                           Database
                                                                                                   Replication
Sensor Data                                                                                                          Partition


                                                 
                                                 Location of the
Real-Time
                Data Archival
                                                  Sensor Data                                            Data Model
Stream Use



                                          Local              External
                                         Storage             Storage                  Schema-Dependent                    Schema-less
       Data Description




                     Data                                  Data-Centric               Relational     Structured              Tabular
 Annotations                                                                           Model           Model                 Model
                  Provenance                                 Storage


 What: Data       When: Time      Where: Data
  Identity        Dimension        Location
3. NetBEAMS: A Case Study
 3. NetBEAMS: A Case Study

•NetBEAMS: Data collection using Data Sensor Platform (DSP)

   •Automates operation of SF-BEAMS

•SF-BEAMS: single-star sensor network – data archival

   •Nodes are fixed geographically, single-hop communication

   •Nodes produce data at intervals:
   1, 6 or 15 minutes

   •Heterogeneous Devices,
   Different Sizes, Internal Battery

   •Coverage: Tiburon coast,
   wired or wireless connection

   •1 Data Sink (RTC Labs)
3. NetBEAMS: A Case Study

•YSI 6600EDS V2: COTS Water
Quality Monitoring

•13 Measurement parameters

•1 Year worth of raw data

   •Max 23.99 Mb at 1/min

   •483,840 samples per year

•5 YSI in current deployment
                               [ysi]
  3. NetBEAMS: A Case Study
                                                                  
                                                                  Location of the
              Purpose of                                           Sensor Data
               Collected
              Sensor Data
                                                                                                                          Database System
                                                                               External                                    Organization
                                                         Local Storage
                                                                               Storage

  Real-Time                                                                                                       Distributed
                       Data Archival                                                                               System
                                                                                                                                         Centralized
  Stream Use                                                                                                                              System
                                                                            Data-Centric
                                               Query Processing               Storage
                                                 Mechanism
                                                                                                                          Database
SF-BEAMS                                                                                                Replication
                                                                                                                          Partition


Classification                         In-Network
                                          Query
                                                            Centralized
                                                              Query
                                       Processing           Processing

                                                                                                                 Data Model
                                                           Data
            Data                                          Volume
         Description

                                                                                                   Schema-
                                                                                                  Dependent                       Schema-less
                     Data                   Small         Medium             Large
 Annotation
                  Provenance



 What: Data      When: Time      Where: Data                                               Relational           Structured
                                                                                                                                      Tabular Model
  Identity       Dimension        Location                                                  Model                 Model
 3. NetBEAMS: A Case Study
•Data Sensor Platform (DSP): OSGi-based Components
   •Data Producers, Data Consumers
3. NetBEAMS: A Case Study


                              Missing Component!!!!




    12.20 192 179 55
    
    88.40
    0.09 0.084 0.059 7.98
    -79.6 99.5 8.83 0.4 8.7
      Collected Data          DSP Messages
  3. NetBEAMS: A Case Study


Functional Requirements       Non-Functional
                              Requirements

                               •Open-Source

                               •Support Data-Centric

                               •Free of charge

                               •Accessibility (API)

                               •Cope with RTC
                               Small Volume of Data
4. Technology Selection
  Empirical Analysis
  4. Technology Selection

•Technologies used by the literature reviewed

•MySQL: used in linux cluster for sensor networks [Nik05];

                                            [MFHH05, 
•TinyDB: regularly used for sensor networks         LKK07];

                                         [Bai09, 
•mongoDB: reported in new trends surveys        BYV+ 09];

•DB2: used as a hybrid approach of XML and Relational models [CZR03].
 4. Technology Selection                        Purpose of
                                                Collected
                                               Sensor Data
•Why Relational Model, SQL? [Bai09]

   •Traditional approach: 30 years    Real-Time
                                                     Data Archival
                                      Stream Use

   •Handles Small Volume of Data
                                                                 Data
                                                                Volume

•Accommodate constant changes?
                                                       Small   Medium    Large

   •Try adding new entities;

   •Try adding new columns;

•Changes to the schema

   •Maintain schema
    normalized

   •Change Software Layers
4. Technology Selection based on
   Empirical Analysis
 •Literature Review: Key-Value Pair Databases = Cloud Computing

 •Schema-less: Accommodates Changes, Application Layer control

    •Data Collections: Any Number of Keys, Indexes, No Referential
    Integrity (“denormalized” data located in the same place)
                                                       Data
                                                    Description



                                                                Data
                                            Annotation
                                                             Provenance



                                            What: Data      When: Time      Where: Data
                                             Identity       Dimension        Location



                                                     Data Model


                                                                    Schema-less
4. Technology Selection based on
   Empirical Analysis
•KVP Databases Better Supports Horizontal Data Partitioning

    •Data-Centric Storage (Concentration of data with similar
    characteristics) [SRK+ 03]
                                                             Database System
                                                              Organization



Tiburon, CA                                           Distributed              Centralized
                                                       System                   System



                                            Replication                Database
                                                                       Partition



                                                          
                                                          Location of the
                                                           Sensor Data



                                                    Local               External
                                                   Storage              Storage


                                                                                       Data-Centric
                                                                                         Storage
4. Technology Selection based on
   Empirical Analysis
5. DSP Data Sensor Platform:
  Design and Architecture
 5. DSP Data Sensor Platform:
    Design and Architecture
Analysis of Data Persistence for NetBEAMS:
UML Business Process Diagram 
5. DSP Data Sensor Platform:
   Design and Architecture

                         Where


                                 When




                      What
5. DSP Data Sensor Platform:
   Design and Architecture
                       Adding DSP Data Component




                        Adding mongoDB
5. DSP Data Sensor Platform:
   Design and Architecture

DSP MongoCRUDService UML Class Diagram
                                         •CRUD Service
                                           •Create
                                           •Retrieve
                                           •Update
                                           •Delete

                                         •Dependencies on
                                          YSI Data Type
 5. DSP Data Sensor Platform:
    Design and Architecture
•Deployment of the DSP Data Persistence

   •As External Storage Single Server

   •As Data-Centric Distributed Server,
   using mongoDB Database Shards.




                 UML Deployment Diagrams
6. 
   Experimental Results: Correct
  Behavior and Performance
6. 
   Experimental Results: Correct
   Behavior and Performance
•Goal: Can the proposed data be deployed in real-world?

•Experiment Setup - Infrastructure

   •Key-Value definition – mongoDB Java Driver;

   •Randomly YSI Sonde Data Creation - mongoDB Java Driver (R0);

   •Simulates the External Storage, Data-Centric - simple mongoDB
   and mongoDB Cluster/Shards), using Virtualization Technology;

•Workload

   •Compatible data volume used by RTC

       •1 YSI = 483,840 documents = First Round

       •5 YSI = 5 * 
                    483,840 = 2,419,200 = Consecutive Rounds
6. 
   Experimental Results: Correct
   Behavior and Performance
•Scenarios – Use Cases as Agile User Stories – Personal, Action, Result
•(R1) ”As a marine biologist, I would like to search observations by filtering values of the
sensor device’s properties such as water temperature and salinity, so that I can find specific
associated values to the observation.”;

db.SondeDataContainer.find( { observation.Salinity : 0.01, observation.WaterTemperature : 46.47 } )

•(R2) ”As an oceanographer, I would like to search observations that took place at the
geographic coordinates (37.891611, -122.446446), so that I can assess the area around the
given coordinates.”;


db.SondeDataContainer.find( { sensor.location.latitude : 37.891611, sensor.location.longitude : -
122.446446} ).

•(U1) ”As a estuarine ecologist, I would like to annotate observations from the time the “oil
spill” occurred in the San Francisco Bay, so that I can maintain historical evidence of the
impact of such event.”


db.SondeDataContainer.update( {"time.valid" : { "$gte":new Date(2009,11,12) , "$lt":new Date(2009,11,13) }} ,
{$set : {tag: "oil spill"}})
6. 
   Experimental Results: Correct
   Behavior and Performance

•Implementation of use cases diversity;
Scenari Languag         mongoDB Method Call             Use Case
   o     e Used                                        Implemente
 Type                                                       d
  Create     Java                                          R0
                        db.SondeDataContainer.insert
                                    ()
 Retrive   Javascript   db.SondeDataContainer.find()    R1, R2, R3

 Update    Javascript   db.SondeDataContainer.updat        U1
                                   e()
  Delete   Javascript   db.SondeDataContainer.remo         D1
                                   ve()
6. 
   Experimental Results: Correct
   Behavior and Performance
•Implementation fulfills all the taxonomies

   •1.35GB Claimed Disk Space for Used Volume, ~
                                                25,091 Inserts/min

•Simpler Implementation of Use Cases (Programming Language-SQL)

   •Easier data accessibility to programmers and non-experts;

   •Different APIs in different languages

•No Use of Database Schema Allows changes without refactoring;

•Researchers can share exported data (OPeNDAP)

•No “Data Plumbing” as performed using SQL

•Reusable produced data ready for Internet Application Integration;
6. 
   Experimental Results: Correct
   Behavior and Performance
•Data-Centric approach can scale in terms of disk space available and
decreased processing time;

   •Less data in a shard, faster query processing

•Novel approach since there are no use of such approach in the current
literature;

   •New Data Model Taxonomy
7. Conclusions and Future Works
7. Conclusions and Future Works

•How Important is Data Collection from Sensor Devices

   •Environmental Sensor Networks: Hazard Alerts

   •How to describe data: Data Provenance guidelines

   •What data model used: KVP as a better option

   •Important descriptions: annotations, tags


•Contributions of this work

   •Data Persistence in Sensor Networks Taxonomies

   •Novel Approach of data persistence for sensor networks

   •Implementation Design for External or Data-Centric Approaches
7. Conclusions and Future Works

•Future Works

   •Data-Centric Deployment with MapReduce Application (Technical
   Exploration with mongoDB MapReduce API)
7. Conclusions and Future Works

•Future Works

   •RTC gathers data by time period; Data are mostly repeated

      •[WX06] surveys schedulers for Sensor Networks;

      •[YG08, CHZ09] shows the use of Data Clustering before
      sending data to data sink;

      •Creation of a DSP Data Clustering before persisting data;

   •Event-Based application developed on top of YSI Sonde Data

      •“observation.Battery” keys carries the battery life-time;
      •Scheduler System that shows which devices need assistance.
  References

          
•[PJS+ 09] Puder, Teresa Johnson, Kleber Sales, Marcello de Sales, andDale Davidson. A component-
          Arno
based sensor network for environmen-tal monitoring. In SNA-2009: 1st International Conference on
SensorNetworks and Applications, pages 54–60, San Francisco, CA, USA,2009. The International Society for
Computers and Their Applica-tions - ISCA.

•
 [ASSC02] I.F. Akyildiz, Weilian Su, Y. Sankarasubramaniam, and E. Cayirci.A survey on sensor networks.
Communications Magazine, IEEE,40(8):102–114, Aug 2002.




•Images
•[sd] http://www.zess.uni-siegen.de/ipp_home/ipp/research/master-student-topics/
•[snc] http://www.dei.unipd.it/~schenato/pics/SensorNetwork.jpg
•[ns] http://www.imagingnotes.com/ee_assets/enews/SensorWebImageForEnewsJuly2.jpg
•[an] http://eurekr.com/pics/AnnotatinganImageinWPF_A7D8/image.png
•[]ysi] http://www.ckjorc.org/cn/admin/news/edit/UploadFile/200681616301130.jpg
                                    http://www.netbeams.org
                          http://code.google.com/p/netbeams


  A Key-Value-based Persistence Model
          for Sensor Networks
          Marcello de Sales
Master of Science in Computer Science
          (msales@sfsu.edu)
                                                                                 ?
           Department of Computer Science




“The brick walls are not there to keep us out. The brick walls are thereto give us a chance to show how badly we
         want something. Because the brick walls are there to stop the people who don't want it badly enough.”
                                                                                                Dr. Randy Pausch
DSP in practice = NetBEAMS
Use Cases
 •Data Payload for the YSI Sonde 6600V2

    •SondeDataType: representation for the collected data

    •SondeDataContainer: collection of the collected data
Data Sensor Platform (DSP)
Message Structure
 •DSP Message

    •Header

       •Producer

       •Consumer

    •Body

       •Message Content


 •DSP Messages Container

    •Package of DSP Messages
Data Sensor Platform (DSP)
Communication Mechanism
 •DSP Broker

    •Local delivery

    •Remote delivery

       •Gateway Component




 •DSP Matcher                   PRODUCER      CONSUMER       TARGET       GATEWAY

                                Component A        -        Component X        -

    •Filtering based on rules   Component B   Component C   Component C        -

                                Component C        -        Component Y   Component Z
    •Independent Per Host                                                 192.168.0.11
DSP Data Persistence Component
DSP Data Persistence Component
 3. NetBEAMS: A Case Study
•NetBEAMS Gateway Node
   •YSI Sonde + Gumstix Embedded System + GSM Modem




        Centralized Data Sink
             RTC Labs
  3. NetBEAMS: A Case Study


   DSP Data Persistence component Requirements

•Open-Source

•Support Data-Centric

•Free of charge

•Accessibility (API)

•Cope with RTC
Small Volume of Data
3. NetBEAMS: A Case Study




                            Missing
                            Database
                             System
4. Technology Selection

                          Normalized
                          Relational Model
                          Example



State Instance
4. Technology Selection based on
   Empirical Analysis
 •Optional Approach: Key-Value Pairs Approach = Sensor
 Attributes




                                               •Data Concentration
                                               in one Single Table

                                                •Does not Scale on
                                                  Single Server

                                                   •Hard to use
5. DSP Data Sensor Platform:
   Design and Architecture
•Data Model Based on Data Provenance Taxonomy

    
   •
    What: identifies what was collected, tracks the DSP Message
      •essage_id
       m
      •observation.[raw-data-1, raw-data-2, …, raw-data-n]
         where raw-data-i are the set of the attributes of a given sensor;

   •Where: defines the coverage of the data 
      •sensor
          •Ip_address
          •location
              •latitude
              •Longitude

   •hen: 
    W      tracks when the data was collected and downloaded
      •time
          •transaction,
          •fact
5. DSP Data Sensor Platform:
   Design and Architecture

DSP Data Persistence Component   •Collected Messages
Activator UML Class Diagram      delivered to the DSP
                                 Data Persistence

                                    •Component
                                    Bootstrap

                                    •Measurement
                                    Messages from
                                    sensor devices
5. DSP Data Sensor Platform:
   Design and Architecture

DSP Data Flusher UML Class Diagram   •Concurrent Thread
                                        •Are there
                                        transient
                                        messages to be
                                        flushed?

                                       •Collect them and
                                       send to the CRUD
                                       service
 6. 
    Experimental Results: Correct
    Behavior and Performance

•(R3) ”As a marine biologist, I would like to search observations that took place last week, so that I can
assess past environmental conditions”;


db.SondeDataContainer.find( { time.valid : { $gte:new Date(2009,11,8) , $lt:new Date(2009,11,15) }} )



•(D1) ”As an oceanographer, I would like to remove specific observations collected
yesterday, so that the research group does not use ’junkdata’.” (Delete)
•(R4) ”As a biologist, I would like to export the collected data produced during this month
using the OPeNDAP data format, so that I can collaborate with other research groups that use
this data format.”; (EXPORT)

• ”As a scientist from RTC, I would like to analyze of the observed data from yesterday using
a spreadsheet, so that I can verify measurements using Microsoft Excel.” (EXPORT)

•Access the data through API, Programming Languages: ”As a marine biologist who learned
the Python scripting language, I would like to write a software that reads; (Programming)
6. 
   Experimental Results: Correct
   Behavior and Performance
•Execution through script plus a Java class for data insertion

•Measurements collected to log files with provenance information

   •Execution time, memory used
   •Execution steps
   •Database server and client execution

•Measured Results
                                                 
                                                 1 YSI       5 YSIs
   •Claimed Disk Space:
                            
                            Indexed, Long Keys   
                                                 278.33 MB    GB
                                                             1.35


   •Insertion Average: decreased after the execution of the third round
   from ~           8,013 documents per minute
          25,091 to ~
DSP Data Persistence Component

						
Related docs
Other docs by pengxiuhui
84th USARRTC Leadership Developm
Views: 2  |  Downloads: 0
Interest Rates
Views: 116  |  Downloads: 0
CALIFORNIA STATE UNIVERSITY_ EAST BAY FACULTY
Views: 104  |  Downloads: 0
presentation - 﨧 icrosoft P owe
Views: 100  |  Downloads: 0
Vendor Information
Views: 74  |  Downloads: 0
M
Views: 8  |  Downloads: 0
The UK and the €uro Background and Prospects
Views: 86  |  Downloads: 0