No Slide Title
Document Sample


http://www.netbeams.org
http://code.google.com/p/netbeams
A
Key-Value-based Persistence Model
for Sensor Networks
Candidate: Marcello Alves de Sales Junior
Masters of Science in Computer Science
Advisor: Prof. Arno Puder, Ph.D.
Committee Chair: Prof. Marguerite Murphy, Ph.D.
Department of Computer Science
Outline
1. Motivation and State of the Art in Data Persistence for SN
2. Data Persistence in Sensor Networks: a Proposed
Taxonomy
3. NetBEAMS: A Case Study
4. Technology Selection: Empirical Analysis
5. DSP Data Sensor Platform: Design and Architecture
6.
Experimental Results: Correct Behavior and Performance
7. Conclusions and Future Works
8. References
1. Motivation
•NetBEAMS: a Component-based sensor network for environmental
monitoring 09]
[PJS+
•Open Problem: Data Persistence for collected data
•How is Data Persistence done in the community?
•Biologists are the main users of the system
•No skills in database systems;
•But they have in programming languages (MatLAB)
•What types of database systems?
•Use the traditional relational data model?
•[BDD09]
proposes programming languages (domain specific)
1. State of the Art in Data
Persistence for Sensor Networks
[ASSC02,
•Sensor Networks RM04]
•Infrastructure (Topology) and Node Types
•Size, Lifetime and Quality-Of-Service
[ASSC02]
[wp]
•Sensor Networks Nodes
[RM04]
•
Deployment and Mobility
• Resources, Cost, Energy,
Size,
Heterogeneity
•
Communication Mode
•Coverage and Connectivity
[snc]
•
1. State of the Art in Data
Persistence for Sensor Networks
•
Persistence Storage for Sensor Networks
[SRK+ 03]
•How the Collected Data is Used
•Real-time Data Stream [LNH05, SLM06]
• Archival [SLM06, DML03]
Data [sd]
•
Storage Location for Collected Data
Local or External
• [SRK+ 03]
•Data-Centric
[SRK+ 03, SLM06]
•
Query Processing Used
In-Network
• [SRK+ 03, KGH07]
•Centralized [KGH07]
•Data Volume Produce [ns]
1. State of the Art in Data
Persistence for Sensor Networks
• Models and Query Engines
Data [Olk09]
•Tabular Data Model: .csv Files (Subversion)
•Text Comparison
•No Index Data Sink
•
Relational Data Model: binary files [LKK07]
•
Relational Algebra
•Data Normalization
•
Structured Query Language (SQL)
•Modified SQL for Sensor Networks
[HRN+ 08,
•Structured Data Model: OCR+ 09] Database
•XML Schema: document structure System
•XML Xpath: query
1. State of the Art in Data
Persistence for Sensor Networks
• Collected Data is Described? Data Provenance
How [LNH05]
•Data Stream: 334 55.45 -23.44 119.394 44 1 22 | 5
•Metadata: Data about data
•hat was collected?
W
•Temperature = 54.3 : data
•Scale =
‟fahrenheit‟ : metadata
•
When was the data collected?
• Time = Collected at 10:34am
Valid
•Transaction Time = Time to Arrive
•
From where was the data collected?
•GPS Coordinates: (12.342, -145.304)
•Site: „lower-pier‟ : metadata
1. State of the Art in Data
Persistence for Sensor Networks
•
Problem: recent oil spill in the San Francisco Bay (Oct 2009) [sfb09]
•What changes in the collected data?
•Are there correlations between the
collected data and the oil spill?
“Polluted” Data
•How to describe the event?
•Data Annotation
•Descriptive Metadata
•Describing Video Frames from
sensor cameras [lcsd09]
•Tags for Web 2.0, YouTube Video Tag
[an]
2. Data Persistence in Sensor
Networks: a Proposed Taxonomy
2. Data Persistence in Sensor
Networks Taxonomy
•
Taxonomy (Greek τάξις, taxis (meaning 'order', 'arrangement')
and νόμος, nomos
•practice and science of classification,
•using taxonomic units known as taxa (singular taxon)
•Represented by hierarchical diagrams
•Relationships between the root and branches
2. Data Persistence in Sensor
Networks Taxonomy
Database System
Data Organization
Volume Query Processing
Mechanism
Distributed Centralized
Small Medium Large System System
In-Network Centralized
Query Query
Purpose of Processing Processing
Collected Database
Replication
Sensor Data Partition
Location of the
Real-Time
Data Archival
Sensor Data Data Model
Stream Use
Local External
Storage Storage Schema-Dependent Schema-less
Data Description
Data Data-Centric Relational Structured Tabular
Annotations Model Model Model
Provenance Storage
What: Data When: Time Where: Data
Identity Dimension Location
3. NetBEAMS: A Case Study
3. NetBEAMS: A Case Study
•NetBEAMS: Data collection using Data Sensor Platform (DSP)
•Automates operation of SF-BEAMS
•SF-BEAMS: single-star sensor network – data archival
•Nodes are fixed geographically, single-hop communication
•Nodes produce data at intervals:
1, 6 or 15 minutes
•Heterogeneous Devices,
Different Sizes, Internal Battery
•Coverage: Tiburon coast,
wired or wireless connection
•1 Data Sink (RTC Labs)
3. NetBEAMS: A Case Study
•YSI 6600EDS V2: COTS Water
Quality Monitoring
•13 Measurement parameters
•1 Year worth of raw data
•Max 23.99 Mb at 1/min
•483,840 samples per year
•5 YSI in current deployment
[ysi]
3. NetBEAMS: A Case Study
Location of the
Purpose of Sensor Data
Collected
Sensor Data
Database System
External Organization
Local Storage
Storage
Real-Time Distributed
Data Archival System
Centralized
Stream Use System
Data-Centric
Query Processing Storage
Mechanism
Database
SF-BEAMS Replication
Partition
Classification In-Network
Query
Centralized
Query
Processing Processing
Data Model
Data
Data Volume
Description
Schema-
Dependent Schema-less
Data Small Medium Large
Annotation
Provenance
What: Data When: Time Where: Data Relational Structured
Tabular Model
Identity Dimension Location Model Model
3. NetBEAMS: A Case Study
•Data Sensor Platform (DSP): OSGi-based Components
•Data Producers, Data Consumers
3. NetBEAMS: A Case Study
Missing Component!!!!
12.20 192 179 55
88.40
0.09 0.084 0.059 7.98
-79.6 99.5 8.83 0.4 8.7
Collected Data DSP Messages
3. NetBEAMS: A Case Study
Functional Requirements Non-Functional
Requirements
•Open-Source
•Support Data-Centric
•Free of charge
•Accessibility (API)
•Cope with RTC
Small Volume of Data
4. Technology Selection
Empirical Analysis
4. Technology Selection
•Technologies used by the literature reviewed
•MySQL: used in linux cluster for sensor networks [Nik05];
[MFHH05,
•TinyDB: regularly used for sensor networks LKK07];
[Bai09,
•mongoDB: reported in new trends surveys BYV+ 09];
•DB2: used as a hybrid approach of XML and Relational models [CZR03].
4. Technology Selection Purpose of
Collected
Sensor Data
•Why Relational Model, SQL? [Bai09]
•Traditional approach: 30 years Real-Time
Data Archival
Stream Use
•Handles Small Volume of Data
Data
Volume
•Accommodate constant changes?
Small Medium Large
•Try adding new entities;
•Try adding new columns;
•Changes to the schema
•Maintain schema
normalized
•Change Software Layers
4. Technology Selection based on
Empirical Analysis
•Literature Review: Key-Value Pair Databases = Cloud Computing
•Schema-less: Accommodates Changes, Application Layer control
•Data Collections: Any Number of Keys, Indexes, No Referential
Integrity (“denormalized” data located in the same place)
Data
Description
Data
Annotation
Provenance
What: Data When: Time Where: Data
Identity Dimension Location
Data Model
Schema-less
4. Technology Selection based on
Empirical Analysis
•KVP Databases Better Supports Horizontal Data Partitioning
•Data-Centric Storage (Concentration of data with similar
characteristics) [SRK+ 03]
Database System
Organization
Tiburon, CA Distributed Centralized
System System
Replication Database
Partition
Location of the
Sensor Data
Local External
Storage Storage
Data-Centric
Storage
4. Technology Selection based on
Empirical Analysis
5. DSP Data Sensor Platform:
Design and Architecture
5. DSP Data Sensor Platform:
Design and Architecture
Analysis of Data Persistence for NetBEAMS:
UML Business Process Diagram
5. DSP Data Sensor Platform:
Design and Architecture
Where
When
What
5. DSP Data Sensor Platform:
Design and Architecture
Adding DSP Data Component
Adding mongoDB
5. DSP Data Sensor Platform:
Design and Architecture
DSP MongoCRUDService UML Class Diagram
•CRUD Service
•Create
•Retrieve
•Update
•Delete
•Dependencies on
YSI Data Type
5. DSP Data Sensor Platform:
Design and Architecture
•Deployment of the DSP Data Persistence
•As External Storage Single Server
•As Data-Centric Distributed Server,
using mongoDB Database Shards.
UML Deployment Diagrams
6.
Experimental Results: Correct
Behavior and Performance
6.
Experimental Results: Correct
Behavior and Performance
•Goal: Can the proposed data be deployed in real-world?
•Experiment Setup - Infrastructure
•Key-Value definition – mongoDB Java Driver;
•Randomly YSI Sonde Data Creation - mongoDB Java Driver (R0);
•Simulates the External Storage, Data-Centric - simple mongoDB
and mongoDB Cluster/Shards), using Virtualization Technology;
•Workload
•Compatible data volume used by RTC
•1 YSI = 483,840 documents = First Round
•5 YSI = 5 *
483,840 = 2,419,200 = Consecutive Rounds
6.
Experimental Results: Correct
Behavior and Performance
•Scenarios – Use Cases as Agile User Stories – Personal, Action, Result
•(R1) ”As a marine biologist, I would like to search observations by filtering values of the
sensor device’s properties such as water temperature and salinity, so that I can find specific
associated values to the observation.”;
db.SondeDataContainer.find( { observation.Salinity : 0.01, observation.WaterTemperature : 46.47 } )
•(R2) ”As an oceanographer, I would like to search observations that took place at the
geographic coordinates (37.891611, -122.446446), so that I can assess the area around the
given coordinates.”;
db.SondeDataContainer.find( { sensor.location.latitude : 37.891611, sensor.location.longitude : -
122.446446} ).
•(U1) ”As a estuarine ecologist, I would like to annotate observations from the time the “oil
spill” occurred in the San Francisco Bay, so that I can maintain historical evidence of the
impact of such event.”
db.SondeDataContainer.update( {"time.valid" : { "$gte":new Date(2009,11,12) , "$lt":new Date(2009,11,13) }} ,
{$set : {tag: "oil spill"}})
6.
Experimental Results: Correct
Behavior and Performance
•Implementation of use cases diversity;
Scenari Languag mongoDB Method Call Use Case
o e Used Implemente
Type d
Create Java R0
db.SondeDataContainer.insert
()
Retrive Javascript db.SondeDataContainer.find() R1, R2, R3
Update Javascript db.SondeDataContainer.updat U1
e()
Delete Javascript db.SondeDataContainer.remo D1
ve()
6.
Experimental Results: Correct
Behavior and Performance
•Implementation fulfills all the taxonomies
•1.35GB Claimed Disk Space for Used Volume, ~
25,091 Inserts/min
•Simpler Implementation of Use Cases (Programming Language-SQL)
•Easier data accessibility to programmers and non-experts;
•Different APIs in different languages
•No Use of Database Schema Allows changes without refactoring;
•Researchers can share exported data (OPeNDAP)
•No “Data Plumbing” as performed using SQL
•Reusable produced data ready for Internet Application Integration;
6.
Experimental Results: Correct
Behavior and Performance
•Data-Centric approach can scale in terms of disk space available and
decreased processing time;
•Less data in a shard, faster query processing
•Novel approach since there are no use of such approach in the current
literature;
•New Data Model Taxonomy
7. Conclusions and Future Works
7. Conclusions and Future Works
•How Important is Data Collection from Sensor Devices
•Environmental Sensor Networks: Hazard Alerts
•How to describe data: Data Provenance guidelines
•What data model used: KVP as a better option
•Important descriptions: annotations, tags
•Contributions of this work
•Data Persistence in Sensor Networks Taxonomies
•Novel Approach of data persistence for sensor networks
•Implementation Design for External or Data-Centric Approaches
7. Conclusions and Future Works
•Future Works
•Data-Centric Deployment with MapReduce Application (Technical
Exploration with mongoDB MapReduce API)
7. Conclusions and Future Works
•Future Works
•RTC gathers data by time period; Data are mostly repeated
•[WX06] surveys schedulers for Sensor Networks;
•[YG08, CHZ09] shows the use of Data Clustering before
sending data to data sink;
•Creation of a DSP Data Clustering before persisting data;
•Event-Based application developed on top of YSI Sonde Data
•“observation.Battery” keys carries the battery life-time;
•Scheduler System that shows which devices need assistance.
References
•[PJS+ 09] Puder, Teresa Johnson, Kleber Sales, Marcello de Sales, andDale Davidson. A component-
Arno
based sensor network for environmen-tal monitoring. In SNA-2009: 1st International Conference on
SensorNetworks and Applications, pages 54–60, San Francisco, CA, USA,2009. The International Society for
Computers and Their Applica-tions - ISCA.
•
[ASSC02] I.F. Akyildiz, Weilian Su, Y. Sankarasubramaniam, and E. Cayirci.A survey on sensor networks.
Communications Magazine, IEEE,40(8):102–114, Aug 2002.
•Images
•[sd] http://www.zess.uni-siegen.de/ipp_home/ipp/research/master-student-topics/
•[snc] http://www.dei.unipd.it/~schenato/pics/SensorNetwork.jpg
•[ns] http://www.imagingnotes.com/ee_assets/enews/SensorWebImageForEnewsJuly2.jpg
•[an] http://eurekr.com/pics/AnnotatinganImageinWPF_A7D8/image.png
•[]ysi] http://www.ckjorc.org/cn/admin/news/edit/UploadFile/200681616301130.jpg
http://www.netbeams.org
http://code.google.com/p/netbeams
A Key-Value-based Persistence Model
for Sensor Networks
Marcello de Sales
Master of Science in Computer Science
(msales@sfsu.edu)
?
Department of Computer Science
“The brick walls are not there to keep us out. The brick walls are thereto give us a chance to show how badly we
want something. Because the brick walls are there to stop the people who don't want it badly enough.”
Dr. Randy Pausch
DSP in practice = NetBEAMS
Use Cases
•Data Payload for the YSI Sonde 6600V2
•SondeDataType: representation for the collected data
•SondeDataContainer: collection of the collected data
Data Sensor Platform (DSP)
Message Structure
•DSP Message
•Header
•Producer
•Consumer
•Body
•Message Content
•DSP Messages Container
•Package of DSP Messages
Data Sensor Platform (DSP)
Communication Mechanism
•DSP Broker
•Local delivery
•Remote delivery
•Gateway Component
•DSP Matcher PRODUCER CONSUMER TARGET GATEWAY
Component A - Component X -
•Filtering based on rules Component B Component C Component C -
Component C - Component Y Component Z
•Independent Per Host 192.168.0.11
DSP Data Persistence Component
DSP Data Persistence Component
3. NetBEAMS: A Case Study
•NetBEAMS Gateway Node
•YSI Sonde + Gumstix Embedded System + GSM Modem
Centralized Data Sink
RTC Labs
3. NetBEAMS: A Case Study
DSP Data Persistence component Requirements
•Open-Source
•Support Data-Centric
•Free of charge
•Accessibility (API)
•Cope with RTC
Small Volume of Data
3. NetBEAMS: A Case Study
Missing
Database
System
4. Technology Selection
Normalized
Relational Model
Example
State Instance
4. Technology Selection based on
Empirical Analysis
•Optional Approach: Key-Value Pairs Approach = Sensor
Attributes
•Data Concentration
in one Single Table
•Does not Scale on
Single Server
•Hard to use
5. DSP Data Sensor Platform:
Design and Architecture
•Data Model Based on Data Provenance Taxonomy
•
What: identifies what was collected, tracks the DSP Message
•essage_id
m
•observation.[raw-data-1, raw-data-2, …, raw-data-n]
where raw-data-i are the set of the attributes of a given sensor;
•Where: defines the coverage of the data
•sensor
•Ip_address
•location
•latitude
•Longitude
•hen:
W tracks when the data was collected and downloaded
•time
•transaction,
•fact
5. DSP Data Sensor Platform:
Design and Architecture
DSP Data Persistence Component •Collected Messages
Activator UML Class Diagram delivered to the DSP
Data Persistence
•Component
Bootstrap
•Measurement
Messages from
sensor devices
5. DSP Data Sensor Platform:
Design and Architecture
DSP Data Flusher UML Class Diagram •Concurrent Thread
•Are there
transient
messages to be
flushed?
•Collect them and
send to the CRUD
service
6.
Experimental Results: Correct
Behavior and Performance
•(R3) ”As a marine biologist, I would like to search observations that took place last week, so that I can
assess past environmental conditions”;
db.SondeDataContainer.find( { time.valid : { $gte:new Date(2009,11,8) , $lt:new Date(2009,11,15) }} )
•(D1) ”As an oceanographer, I would like to remove specific observations collected
yesterday, so that the research group does not use ’junkdata’.” (Delete)
•(R4) ”As a biologist, I would like to export the collected data produced during this month
using the OPeNDAP data format, so that I can collaborate with other research groups that use
this data format.”; (EXPORT)
• ”As a scientist from RTC, I would like to analyze of the observed data from yesterday using
a spreadsheet, so that I can verify measurements using Microsoft Excel.” (EXPORT)
•Access the data through API, Programming Languages: ”As a marine biologist who learned
the Python scripting language, I would like to write a software that reads; (Programming)
6.
Experimental Results: Correct
Behavior and Performance
•Execution through script plus a Java class for data insertion
•Measurements collected to log files with provenance information
•Execution time, memory used
•Execution steps
•Database server and client execution
•Measured Results
1 YSI 5 YSIs
•Claimed Disk Space:
Indexed, Long Keys
278.33 MB GB
1.35
•Insertion Average: decreased after the execution of the third round
from ~ 8,013 documents per minute
25,091 to ~
DSP Data Persistence Component
Get documents about "