Anda di halaman 1dari 103

Implementation and

Research Issues in Query


Processing for Wireless
Sensor Networks

Wei Hong Sam Madden


Intel Research, Berkeley MIT
whong@intel-research.net madden@csail.mit.edu
1
MDM Tutorial, January 19th 2004
Motivation
• Sensor networks (aka sensor webs, emnets) are here
– Several widely deployed HW/SW platforms
• Low power radio, small processor, RAM/Flash
– Variety of (novel) applications: scientific, industrial, commercial
– Great platform for mobile + ubicomp experimentation

• Real, hard research problems to be solved


– Networking, systems, languages, databases

• We will summarize: Berkeley


– The state of the art
– Our experiences building TinyDB Mote
– Current and future research directions

2
Sensor Network Apps
Habitat Monitoring: Storm
petrels on Great Duck Island,
microclimates on James
Reserve.
Earthquake monitoring in shake-
test sites.
Vehicle detection: sensors along a
road, collect data about passing
vehicles.
Just the tip of the iceberg
-- more tomorrow!

Traditional
3
monitoring
Declarative Queries
• Programming Apps is Hard
– Limited power budget
– Lossy, low bandwidth communication
– Require long-lived, zero admin deployments
– Distributed Algorithms
– Limited tools, debugging interfaces
• Queries abstract away much of the complexity
– Burden on the database developers
– Users get:
• Safe, optimizable programs
• Freedom to think about apps instead of details

4
TinyDB: Prototype declarative
query processor
• Platform: Berkeley Motes + TinyOS
• Continuous variant of SQL : TinySQL

• Power and data-acquisition based in-


network optimization framework
• Extensible interface for aggregates, new
types of sensors

5
Agenda
• Part 1 : Sensor Networks (50 Minutes)
– TinyOS
– NesC
• Short Break
• Part 2: TinyDB (1 Hour)
– Data Model and Query Language
– Software Architecture
• Long Break + Hands On
• Part 3: Sensor Network Database Research
Directions (1 Hour, 10 Minutes)

6
Part 1
• Sensornet Background
• Motes + Mote Hardware
– TinyOS
– Programming Model + NesC
• TinyOS Architecture
– Major Software Subsystems
– Networking Services

7
A Brief History of
Sensornets
• People have used sensors for a long time
• Recent CS History:
– (1998) Pottie + Kaiser: Radio based networks of sensors
– (1998) Pister et al: Smart Dust
• Initial focus on optical communication
UCLA
• •By 1999,/ radio
USC /based
Berkeley Continue
networks, COTStoDust,
Lead “Motes”
Research
– (1999)•Many other
Estrin players now
+ Govindan
•TinyOS/Motes as most common platform
• Ad-hoc networks of sensors
– (2000) Culler/Hill et al: TinyOS + Motes
• Emerging commercial space:
– (2002) Hill / Dust: SPEC, mm^3 scale computing
• Crossbow, Ember, Dust, Sensicast, Moteiv, Intel

8
Why Now?
• Commoditization of radio hardware
– Cellular and cordless phones, wireless communication
– (some radio pictures, etc.)
• Low cost -> many/tiny -> new applications!

• Real application for ad-hoc network research


from the late 90’s

• Coming together of EE + CS communities

9
Motes

Mica Mote 4Mhz, 8 bit Atmel RISC uProc


40 kbit Radio
4 K RAM, 128 K Program
Flash, 512 K Data Flash
AA battery pack
Mica2Dot Based on TinyOS

10
History of Motes
• Initial research goal wasn’t hardware
– Has since become more of a priority with emerging
hardware needs, e.g.:
• Power consumption
• (Ultrasonic) ranging + localization
– MIT Cricket, NEST Project
• Connectivity with diverse sensors
– UCLA sensor board
– Even so, now on the 5th generation of devices
• Costs down to ~$50/node (Moteiv, Dust)
• Greatly improved radio quality
• Multitude of interfaces: USB, Ethernet, CF, etc.
• Variety of form factors, packages

11
Motes vs. Traditional
Computing
• Lossy, Adhoc Radio Communication
• Sensing Hardware
• Severe Power Constraints

12
Radio Communication
• Low Bandwidth Shared Radio Channel
– ~40kBits on motes
– Much less in practice
• Encoding, Contention for Media Access (MAC)
• Very lossy: 30% base loss rate
– Argues against TCP-like end-to-end retransmission
• And for link-layer retries
• Generally, not well behaved

From Ganesan, et al. “Complex Behavior at Scale.” UCLA/CSD-TR 02-0013


13
Types of Sensors
• Sensors attach via daughtercard

•Weather •Vibration
–Temperature –2 or 3 axis
–Light x 2 (high accelerometers
intensity PAR, low •Tracking
intensity, full
spectrum) –Microphone (for ranging
and acoustic signatures)
–Air Pressure
–Magnetometer
–Humidity
• GPS

14
Power Consumption and
Lifetime
• Power typically supplied by a small battery
– 1000-2000 mAH
– 1 mAH = 1 milliamp current for 1 hour
• Typically at optimum voltage, current drain rates
– Power = Watts (W) = Amps (A) * Volts (V)
– Energy = Joules (J) = W * time

• Lifetime, power consumption varies by application


– Processor: 5mA active, 1 mA idle, 5 uA sleeping
– Radio: 5 mA listen, 10 mA xmit/receive, ~20mS / packet
– Sensors: 1 uA -> 100’s mA, 1 uS -> 1 S / sample

15
Energy Usage in A Typical
Data Collection Scenario
• Each mote collects 1 sample
Power Consumption Breakdown
of
Processor Energy Breakdown
(light,humidity) data every 10 seconds,
forwards it
90 50
45
80
• 70Each mote can “hear” 10 40 other
35
motes
• 60Process: 30
50 25

40
– Wake up, collect samples (~ 1 second)
20
15
30– Listen to radio for messages to forward (~1
10
20 second) 5
Percentage of Total Energy
Percentage of Total Power 0
10– Forward data
Idle Waiting Waiting Sending
0 for Radio for
Radio Sensors Processor Sensors
Hardware Element Processing Phase

16
Sensors: Slow, Power
Hungry, Noisy Time of
Time of Day
Day vs.
vs. Light
Light

200
Chamber Sensor
Chamber Sensor
Sensor 69 (Median of Last 10)
180 Sensor 69

160

140

120

100

Lux
80
Light (Lux)

60

40

20

0
20:09 20:38 21:07 21:36 22:04 22:33 23:02
23:02 23:31
23:31 0:00
0:00 0:28
0:28 0:57
0:57 1:26
1:26
-20 17
Time
Time of Day
Programming Sensornets:
TinyOS
• Component Based Programming Model
• Suite of software components
– Timers, clocks, clock synchronization
– Single and multi-hop networking
– Power management
– Non-volatile storage management

18
Programming Philosophy
• Component Based
– “Wiring” to components together via interfaces,
configurations
• Split-Phased
– Nothing blocks, ever.
– Instead, completion events are signaled.
• Highly Concurrent
– Single thread of “tasks”, posted and scheduled
FIFO
– Events “fired” asynchronously in response to
interrupts.

19
NesC
• C-like programming language with component model
support
– Compiles into GCC-compatible C
• 3 types of files:
– Interfaces
• Set of function prototypes; no implementations or variables
– Modules
• Provide (implement) zero or more interfaces
• Require zero or more interfaces
• May define module variables, scoped to functions in module
– Configurations
• Wire (connect) modules according to requires/provides relationship

20
Component Example: Leds
module LedsC { ….
async command result_t Leds.redOn() {
provides interface Leds;
dbg(DBG_LED, "LEDS: Red on.\n");
}
atomic {
implementation
TOSH_CLR_RED_LED_PIN();
{ ledsOn |= RED_BIT;
uint8_t ledsOn; }
return SUCCESS;
enum { }
RED_BIT = 1, ….
GREEN_BIT = 2, }
YELLOW_BIT = 4
};

21
Configuration Example
configuration CntToLedsAndRfm {
}
implementation {
components Main, Counter, IntToLeds, IntToRfm, TimerC;

Main.StdControl -> Counter.StdControl;


Main.StdControl -> IntToLeds.StdControl;
Main.StdControl -> IntToRfm.StdControl;
Main.StdControl -> TimerC.StdControl;
Counter.Timer -> TimerC.Timer[unique("Timer")];
IntToLeds <- Counter.IntOutput;
Counter.IntOutput -> IntToRfm;
}

22
Split Phase Example
module IntToRfmM { … }
implementation { …
command result_t IntOutput.output
(uint16_t value) { event result_t Send.sendDone
IntMsg *message = (IntMsg *)data.data;
if (!pending) {
(TOS_MsgPtr msg,
pending = TRUE; result_t success) {
message->val = value; if (pending && msg == &data) {
atomic { pending = FALSE;
message->src = TOS_LOCAL_ADDRESS; signal IntOutput.outputComplete
}
(success);
if (call Send.send(TOS_BCAST_ADDR,
sizeof(IntMsg), &data)) }
return SUCCESS; return SUCCESS;
pending = FALSE; }
} }
return FAIL;
}

23
Major Components

• Timers: Clock, TimerC, LogicalTime

• Networking: Send, GenericComm,


AMStandard, lib/Route
• Power Management:
HPLPowerManagement
• Storage Management: EEPROM, MatchBox

24
Timers

• Clock: Basic abstraction over hardware


timers; periodic events, single frequency.

• LogicalTime: Fire an event some number of


H:M:S:ms in the future.

• TimerC: Multiplex multiple periodic timers


on top of LogicalTime.

25
Radio Stack
• Interfaces: IntMsg *message = (IntMsg *)data.data;

– Send message->val = value;
• Broadcast, or to a specific ID atomic {
• split phase message->src = TOS_LOCAL_ADDRESS;
– Receive }
• asynchronous signal call Send.send(TOS_BCAST_ADDR,
sizeof(IntMsg), &data))
• Implementations:
– AMStandard
• Application specific messages event TOS_MsgPtr ReceiveIntMsg.
• Id-based dispatch receive(TOS_MsgPtr m) {
– GenericComm IntMsg *message = (IntMsg *)m->data;
• AMStandard + Serial IO call IntOutput.output(message->val);
– Lib/Route }
return m;
• Mulithop

Wiring to equate IntMsg to ReceiveIntMsg


26
Multihop Networking
• Standard implementation “tree based
routing”
Problems: A
B B
Parent Selection R:{…} R:{…}
Asymmetric Links B
Adaptation vs. Stability B C
B

Node D Node C R:{…}B B B


Neigh Qual Neigh Qual
B .75 A .5 D B R:{…}
R:{…}B
C .66 B .44
B B
E .45 D .53
F .82 F .35 F
E B
27
Geographic Routing
• Any-to-any routing via geographic
coordinates
– See “GPSR”, MOBICOM 2000, Karp + Kung.

•Requires coordinate
system*
B
•Requires endpont
coordinates
A •Hard to route
around local minima
(“holes”)
28
*Could be virtual, as in Rao et al “Geographic Routing Without Coordinate Information.” MOBICOM 2003
Power Management
• HPLPowerManagement
– TinyOS sleeps processor when possible
– Observes the radio, sensor, and timer state

• Application managed, for the most part


– App. must turn off subsystems when not in use
– Helper utility: ServiceScheduler
• Peridically calls the “start” and “stop” methods of an app
– More on power management in TinyDB later
– Approach works because:
• single application
• no interactivity requirements

29
Non-Volatile Storage
• EEPROM
– 512K off chip, 32K on chip
– Writes at disk speeds, reads at RAM speeds
– Interface : random access, read/write 256 byte
pages
– Maximum throughput ~10Kbytes / second
• MatchBox Filing System
– Provides a Unix-like file I/O interface
– Single, flat directory
– Only one file being read/written at a time

30
TinyOS: Getting Started
• The TinyOS home page:
– http://webs.cs.berkeley.edu/tinyos
– Start with the tutorials!
• The CVS repository
– http://sf.net/projects/tinyos
• The NesC Project Page
– http://sf.net/projects/nescc
• Crossbow motes (hardware):
– http://www.xbow.com
• Intel Imote
– www.intel.com/research/exploratory/motes.htm.

31
Part 2

The Design and Implementation


of TinyDB

32
Part 2 Outline
• TinyDB Overview
• Data Model and Query Language
• TinyDB Java API and Scripting
• Demo with TinyDB GUI
• TinyDB Internals
• Extending TinyDB
• TinyDB Status and Roadmap

33
TinyDB Revisited
SELECT MAX(mag)
• High level abstraction:
FROM sensors
– Data centric programming WHERE mag > thresh
– Interact with sensor SAMPLE PERIOD 64ms
network as a whole
– Extensible framework App
• Under the hood:
– Intelligent query processing: Query,
query optimization, power Data
Trigger
efficient execution
– Fault Mitigation: TinyDB
automatically introduce
redundancy, avoid problem
areas Sensor Network

34
Feature Overview
• Declarative SQL-like query interface
• Metadata catalog management
• Multiple concurrent queries
• Network monitoring (via queries)
• In-network, distributed query processing
• Extensible framework for attributes,
commands and aggregates
• In-network, persistent storage

35
Architecture

TinyDB GUI
JDBC
TinyDB Client API
DBMS
PC side
Mote side 0
0
TinyDB query
processor
1 2
83

4 5 6
Sensor network
7
36
Data Model
• Entire sensor network as one single, infinitely-long logical
table: sensors
• Columns consist of all the attributes defined in the network
• Typical attributes:
– Sensor readings
– Meta-data: node id, location, etc.
– Internal states: routing tree parent, timestamp, queue length,
etc.
• Nodes return NULL for unknown attributes
• On server, all attributes are defined in catalog.xml
• Discussion: other alternative data models?

37
Query Language (TinySQL)
SELECT <aggregates>, <attributes>
[FROM {sensors | <buffer>}]
[WHERE <predicates>]
[GROUP BY <exprs>]
[SAMPLE PERIOD <const> | ONCE]
[INTO <buffer>]
[TRIGGER ACTION <command>]

38
Comparison with SQL
• Single table in FROM clause
• Only conjunctive comparison predicates in
WHERE and HAVING
• No subqueries
• No column alias in SELECT clause
• Arithmetic expressions limited to column
op constant
• Only fundamental difference: SAMPLE
PERIOD clause

39
TinySQL Examples

“Find the sensors in bright


nests.”

1 Sensors
Epoch Nodeid nestNo Light
SELECT nodeid, nestNo, light
0 1 17 455
FROM sensors
WHERE light > 400 0 2 25 389
EPOCH DURATION 1s 1 1 17 422

1 2 25 405

40
TinySQL Examples (cont.)

2 SELECT AVG(sound) “Count the number occupied


FROM sensors
nests in each loud region of
the island.”
EPOCH DURATION 10s

Epoch region CNT(…) AVG(…)


3 SELECT region, CNT(occupied) 0 North 3 360
AVG(sound)
0 South 3 520
FROM sensors
1 North 3 370
GROUP BY region
1 South 3 520
HAVING AVG(sound) > 200
EPOCH DURATION 10s
Regions w/ AVG(sound) > 200 41
Event-based Queries
• ON event SELECT …
• Run query only when interesting events
happens
• Event examples
– Button pushed
– Message arrival
– Bird enters nest
• Analogous to triggers but events are user-
defined

42
Query over Stored Data
• Named buffers in Flash memory
• Store query results in buffers
• Query over named buffers
• Analogous to materialized views
• Example:
– CREATE BUFFER name SIZE x (field1 type1, field2
type2, …)
– SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO
name
– SELECT field1, field2, … FROM name SAMPLE PERIOD d

43
Using the Java API
• SensorQueryer
– translateQuery() converts TinySQL string into
TinyDBQuery object
– Static query optimization
• TinyDBNetwork
– sendQuery() injects query into network
– abortQuery() stops a running query
– addResultListener() adds a ResultListener that is invoked
for every QueryResult received
– removeResultListener()
• QueryResult
– A complete result tuple, or
– A partial aggregate result, call mergeQueryResult() to
combine partial results
• Key difference from JDBC: push vs. pull
44
Writing Scripts with
TinyDB
• TinyDB’s text interface
– java net.tinyos.tinydb.TinyDBMain –run
“select …”
– Query results printed out to the console
– All motes get reset each time new query
is posed
• Handy for writing scripts with shell,
perl, etc.

45
Using the GUI Tools
• Demo time

46
Inside TinyDB

SELECT T:1, AVG: 225


AVG(temp) Queries Results T:2, AVG: 250
WHERE
light > 400 Multihop
Network
Query Processor
Aggavg(temp)

~10,000 Lines Embedded C Code


Filter Name: temp
light >400

~5,000 LinesSamples
(PC-Side) JavaTime to sample: 50 uS
got(‘temp’)
get (‘temp’) Tables
Cost to sample: 90 uJ
~3200 Bytes RAM (w/ 768 byte
Schema heap) Table: 3
Calibration
getTempFunc(…)
Units: Deg. F
~58 kB compiled
TinyOS code
Error: ± 5 Deg F
Get fProgram)
(3x larger than 2nd largest TinyOS : getTempFunc()…47
TinyDB
Tree-based Routing

Q:SELECT …
• Tree-based routing A
– Used in: Q Q
R:{…} R:{…}
• Query delivery
• Data collection Q
• In-network aggregation B Q
C
– Relationship to indexing?
R:{…}Q Q Q

D Q R:{…}
R:{…}Q Q Q
F
E Q
48
Power Management
Approach
Coarse-grained app-controlled communication scheduling
Epoch (10s -100s of seconds)
Mote ID
1 … zzz … … zzz …

5
time
49
2-4s Waking Period
Time Synchronization
• All messages include a 5 byte time stamp indicating system
time in ms
– Synchronize (e.g. set system time to timestamp) with
• Any message from parent
• Any new query message (even if not from parent)
– Punt on multiple queries
– Timestamps written just after preamble is xmitted
• All nodes agree that the waking period begins when (system
time % epoch dur = 0)
– And lasts for WAKING_PERIOD ms

• Adjustment of clock happens by changing duration of sleep


cycle, not wake cycle.

50
Extending TinyDB
• Why extending TinyDB?
– New sensors  attributes
– New control/actuation  commands
– New data processing logic  aggregates
– New events
• Analogous to concepts in object-
relational databases

51
Adding Attributes
• Types of attributes
– Sensor attributes: raw or cooked sensor
readings
– Introspective attributes: parent,
voltage, ram usage, etc.
– Constant attributes: constant values
that can be statically or dynamically
assigned to a mote, e.g., nodeid, location,
etc.

52
Adding Attributes (cont)
• Interfaces provided by Attr component
– StdControl: init, start, stop
– AttrRegister
• command registerAttr(name, type, len)
• event getAttr(name, resultBuf, errorPtr)
• event setAttr(name, val)
• command getAttrDone(name, resultBuf, error)
– AttrUse
• command startAttr(attr)
• event startAttrDone(attr)
• command getAttrValue(name, resultBuf, errorPtr)
• event getAttrDone(name, resultBuf, error)
• command setAttrValue(name, val)

53
Adding Attributes (cont)
• Steps to adding attributes to TinyDB
1) Create attribute nesC components
2) Wire new attribute components to
TinyDBAttr configuration
3) Reprogram TinyDB motes
4) Add new attribute entries to catalog.xml
• Constant attributes can be added on the
fly through TinyDB GUI

54
Adding Aggregates
• Step 1: wire new nesC components

55
Adding Aggregates (cont)
• Step 2: add entry to catalog.xml
<aggregate>
<name>AVG</name>
<id>5</id>
<temporal>false</temporal>

<readerClass>net.tinyos.tinydb.AverageClass</readerClass
>
</aggregate>
• Step 3 (optional): implement reader class in Java
– a reader class interprets and finalizes aggregate state
received from the mote network, returns final result as a
string for display.

56
TinyDB Status
• Latest released with TinyOS 1.1 (9/03)
– Install the task-tinydb package in TinyOS 1.1 distribution
– First release in TinyOS 1.0 (9/02)
– Widely used by research groups as well as industry pilot
projects
• Successful deployments in Intel Berkeley Lab and
redwood trees at UC Botanical Garden
– Largest deployment: ~80 weather station nodes
– Network longevity: 4-5 months

57
The Redwood Tree
Deployment
• Redwood Grove in UC Botanical
Garden, Berkeley
• Collect dense sensor readings to
monitor climatic variations across
– altitudes,
– angles,
– time,
– forest locations, etc.
• Versus sporadic monitoring points
with 30lb loggers!
• Current focus: study how dense
sensor data affect predictions of
conventional tree-growth models

58
Data from Redwoods
Humidity vs. Time
101 104 109 110 111

95

85
36m
75

33m: 111 65

32m: 110 55
Rel Humidity (%)
30m: 109,108,107 45

35

20m: 106,105,104 Temperature vs. Time

33

10m: 103, 102, 101 28

23

18

13
Temperature (C)

8
7/7/03 7/7/03 7/7/03 7/7/03 7/7/03 7/8/03 7/8/03 7/8/03 7/8/03 7/8/03 7/8/03 7/9/03 7/9/03 7/9/03 7/9/03
9:40 13:11 16:43 20:15 23:46 3:18 6:50 10:21 13:53 17:25 20:56 0:28 4:00 7:31 11:03

Date 59
TinyDB Roadmap (near
term)
• Support for high frequency sampling
– Equipment vibration monitoring, structural monitoring,
etc.
– Store and forward
– Bulk reliable data transfer
– Scheduling of communications
• Port to Intel Mote
• Deployment in Intel Fab equipment monitoring
application and the Golden Gate Bridge monitoring
application

60
For more information
• http://berkeley.intel-research.net/tinydb
or http://triplerock.cs.bekeley.edu/
tinydb

61
Part 3

Database Research Issues in Sensor


Networks

62
Sensor Network Research
• Very active research area
– Can’t summarize it all
• Focus: database-relevant research topics
– Some outside of Berkeley
– Other topics that are itching to be scratched
– But, some bias towards work that we find
compelling

63
Topics
• In-network aggregation
• Acquisitional Query Processing
• Heterogeneity
• Intermittent Connectivity
• In-network Storage
• Statistics-based summarization and sampling
• In-network Joins
• Adaptivity and Sensor Networks
• Multiple Queries

64
Topics
• In-network aggregation
• Acquisitional Query Processing
• Heterogeneity
• Intermittent Connectivity
• In-network Storage
• Statistics-based summarization and
sampling
• In-network Joins
• Adaptivity and Sensor Networks
• Multiple Queries
65
Tiny Aggregation (TAG)
• In-network processing of aggregates
– Common data analysis operation
• Aka gather operation or reduction in || programming
– Communication reducing
• Operator dependent benefit
– Across nodes during same epoch

• Exploit query semantics to improve


efficiency!

Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002. 66


Basic Aggregation
• In each epoch:
– Each node samples local sensors once
– Generates partial state record (PSR)
• local readings
• readings from children
1
– Outputs PSR during assigned comm. interval

• At end of epoch, PSR for whole network output at root 2 3


• New result on each successive epoch

• Extras:
4
– Predicate-based partitioning via GROUP BY

67
Illustration: Aggregation

SELECT COUNT(*)
FROM sensors Interval 4
Sensor #
1
Epoch
1 2 3 4 5

4 1 2 3
3
Interval #

2
4
1
1
4
5 68
Illustration: Aggregation

SELECT COUNT(*)
FROM sensors Interval 3
Sensor #
1
1 2 3 4 5

4 1 2 3
3 2
Interval #

2
2
4
1

4
5 69
Illustration: Aggregation

SELECT COUNT(*)
FROM sensors Interval 2
Sensor #
1
1 2 3 4 5 1 3
4 1 2 3
3 2
Interval #

2 1 3
4
1

4
5 70
Illustration: Aggregation

SELECT COUNT(*)
FROM sensors 5 Interval 1
Sensor #
1
1 2 3 4 5

4 1 2 3
3 2
Interval #

2 1 3
4
1 5

4
5 71
Illustration: Aggregation

SELECT COUNT(*)
FROM sensors Interval 4
Sensor #
1
1 2 3 4 5

4 1 2 3
3 2
Interval #

2 1 3
4
1 5
1
4 1
5 72
Aggregation Framework

• As in extensible databases, TinyDB supports any


aggregation function conforming to:
Aggn={finit , fmerge , fevaluate }
Finit {a0} → <a0> Partial State Record (PSR)

Fmerge {<a1>,<a2>} → <a12 >


Fevaluate {<a1>} → aggregate value

Example: Average
AVGinit {v} → <v,1>
AVGmerge {<S1, C1>, <S2, C2>} → < S1 + S2 , C 1 + C 2 >
AVGevaluate {<S, C>} → S/C
Restriction: Merge associative, commutative 73
Taxonomy of Aggregates

• TAG insight: classify aggregates according to various


functional properties
– Yields a general set of optimizations that can automatically be
applied

Property Examples Affects


Partial State Drives an API!
MEDIAN : unbounded, Effectiveness of TAG
MAX : 1 record
Monotonicity COUNT : monotonic Hypothesis Testing, Snooping
AVG : non-monotonic
Exemplary vs. MAX : exemplary Applicability of Sampling,
Summary COUNT: summary Effect of Loss
Duplicate MIN : dup. insensitive, Routing Redundancy
Sensitivity AVG : dup. sensitive
74
Use Multiple Parents
• Use graph structure
– Increase delivery probability with no communication overhead
• For duplicate insensitive aggregates, or
• Aggs expressible as sum of parts
– Send (part of) aggregate to all parents SELECT COUNT(*)
• In just one message, via multicast
– Assuming independence, decreases variance
R

# of parents = n
P(link xmit successful) = p B C
P(success from A->R) = p2
E(cnt) = n * (c/n * p2)
E(cnt) = c * p2
c
c/n c/n
Var(cnt) = n * (c/n) *
2
Var(cnt) = c2 * p2 * (1 – p2)
≡ V
p2 * (1 – p2) = V/n
n=2 A
75
Multiple Parents Results

No Splitting With Splitting


• Better than previous
analysis expected! Benefit of Result Splitting
Critical
• Losses aren’t Link! 1400
(COUNT query)

independent! 1200

• Insight: spreads data 1000 Splitting


over many links 800 No Splitting
600
Avg. COUNT
400

200

0 (2500 nodes, lossy radio model, 6 parents per


node)

76
Acquisitional Query
Processing (ACQP)
• TinyDB acquires AND processes data
– Could generate an infinite number of samples
• An acqusitional query processor controls
– when,
– where,
– and with what frequency data is collected!
• Versus traditional systems where data is provided a priori

Madden, Franklin, Hellerstein, and Hong. The Design of An


Acqusitional Query Processor. SIGMOD, 2003. 77
ACQP: What’s Different?
• How should the query be processed?
– Sampling as a first class operation
• How does the user control acquisition?
– Rates or lifetimes
– Event-based triggers
• Which nodes have relevant data?
– Index-like data structures
• Which samples should be transmitted?
– Prioritization, summary, and rate control

78
Operator Ordering: Interleave
Sampling + Selection

SELECT light, mag At 1 sample / sec, total power savings


FROM sensors • could
E(sampling mag) as
be as much >> 3.5mW
E(sampling
 light)
WHERE pred1(mag) 1500 uJ vs.
Comparable to 90 uJ
processor!
AND pred2(light)
EPOCH DURATION 1s Correct ordering
(unless pred1 is very selective
Traditional DBMS and pred2 is not):
σ
d1)
(pre σ
d1)
(pre σ
2)
(pred

σ
Costly mag light
(pred ACQP
2)
σ
2)
(pred σ
d1)
(pre

Cheap light mag


79
mag light
Exemplary Aggregate
Pushdown
SELECT WINMAX(light,8s,8s)
FROM sensors
WHERE mag > x γAX WINM • Novel, general
pushdown
technique
EPOCH DURATION 1s
Traditional DBMS
σ
x)
(mag>

γAX
• Mag sampling is
WINM the most
ACQP mag expensive
operation!
σ
x)
(mag>
σ (light >
MAX)

light
80
mag light
Topics
• In-network aggregation
• Acquisitional Query Processing
• Heterogeneity
• Intermittent Connectivity
• In-network Storage
• Statistics-based summarization and sampling
• In-network Joins
• Adaptivity and Sensor Networks
• Multiple Queries

81
Heterogeneous Sensor
Networks
• Leverage small numbers of high-end nodes
to benefit large numbers of inexpensive
nodes
• Still must be transparent and ad-hoc
• Key to scalability of sensor networks
• Interesting heterogeneities
– Energy: battery vs. outlet power
– Link bandwidth: Chipcon vs. 802.11x
– Computing and storage: ATMega128 vs. Xscale
– Pre-computed results
– Sensing nodes vs. QP nodes
82
Computing Heterogeneity
with TinyDB
• Separate query processing from sensing
– Provide query processing on a small number of nodes
– Attract packets to query processors based on “service
value”
• Compare the total energy consumption of the
network
• No aggregation
• All aggregation
• Opportunistic aggregation
• HSN proactive
aggregation
Mark Yarvis and York Liu, Intel’s Heterogeneous Sensor
Network Project, 83
ftp://download.intel.com/research/people/HSN_IR_Day_Poster_03.pdf.
5x7 TinyDB/HSN Mica2
Testbed

84
Data Packet Saving
Data Packet Saving
0.00%

-5.00%

-10.00%

-15.00%
How many aggregators are desired?
• -20.00%
11% aggregators
Does placement matter?
• -25.00%

-30.00% achieve 72% of max


data reduction
-35.00%

-40.00%
% Change in Data Packet Count
-45.00%

-50.00%
1 2 3 4 5 6 All (35)

Number of Aggregator

Data Packet Saving - Aggregator Placement


0.00%

-5.00%
-10.00%

-15.00%

-20.00%

-25.00%

-30.00% Optimal placement 2/3


-35.00%

-40.00%
distance from sink.
% -45.00%
Change in Data Packet Counnt

-50.00%
25 27 29 31 All (35)
Aggregator Location 85
Occasionally Connected
Sensornets
internet
TinyDB Server GTWY

TinyDB QP

Mobile GTWY

Mobile GTWY Mobile GTWY

TinyDB QP
GTWY

TinyDB QP

86
Occasionally Connected
Sensornets Challenges
• Networking support
– Tradeoff between reliability, power consumption
and delay
– Data custody transfer: duplicates?
– Load shedding
– Routing of mobile gateways
• Query processing
– Operation placement: in-network vs. on mobile
gateways
– Proactive pre-computation and data movement
• Tight interaction between networking and QP

Fall, Hong and Madden, Custody Transfer for Reliable Delivery in Delay Tolerant
87
Networks, http://www.intel-research.net/Publications/Berkeley/081220030852_157.pdf.
Distributed In-network
Storage
• Collectively, sensornets have large amounts
of in-network storage
• Good for in-network consumption or
caching
• Challenges
– Distributed indexing for fast query
dissemination
– Resilience to node or link failures
– Graceful adaptation to data skews
– Minimizing index insertion/maintenance cost

88
Example: DIM
• Functionality
– Efficient range query for
multidimensional data.
• Approaches E2= <0.6, 0.7>
– Divide sensor field into bins.
E1 = <0.7, 0.8>

– Locality preserving mapping


from m-d space to
geographic locations.
– Use geographic routing such
as GPSR.
• Assumptions
– Nodes know their locations
Q1=<.5-.7, .5-1>

and network boundary


– No node mobility

Xin Li, Young Jin Kim, Ramesh Govindan and Wei Hong, Distributed Index 89
for Multi-dimentional Data (DIM) in Sensor Networks, SenSys 2003.
Statistical Techniques
• Approximations, summaries, and
sampling based on statistics
• Applications:
– Limited bandwidth and large number of
nodes -> data reduction
– Lossiness -> predictive modeling
– Uncertainty -> tracking correlations and
changes over time

90
IDSQ
• Idea: task sensors in order of best
improvement to estimate of some value:
– Choose leader(s)
• Suppress subordinates
• Task subordinates, one at a time
– Until some measure of goodness (error bound) is met
» E.g. “Mahalanobis Distance” -- Accounts for
correlations in axes, tends to favor minimizing
principal axis

See “Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous
Sensor Networks.” Chu, Haussecker and Zhao. Xerox TR P2001-10113. May, 2001. 91
Graphical Representation
Principal Axis
Model location
estimate as a point
with 2-dimensional S1
Gaussian
uncertainty.
Residual 1 Residual 2 S2

Preferred
because it
reduces error
along principal 92
Area of residuals is equal axis
In-Net Regression
• Linear regressionX :vssimple
Y w/ Curveway
Fit to predict
future12values, identify outliers
y = 0.9703x - 0.0067
10
• Regression can be acrossRlocal
2
= 0.947or remote
values, 8multiple dimensions, or with high
degree6polynomials
– E.g., node
4 A readings vs. node B’s
– Or, location
2
(X,Y), versus temperature
E.g., over many nodes
0
1 3 5 7 9
Guestrin, Thibaux, Bodik, Paskin, Madden. “Distributed Regression: an Efficient93
Framework for Modeling Sensor Network Data .” Under submission.
In-Net Regression
(Continued)
• Problem: may require data from all sensors to
build model
• Solution: partition sensors into overlapping
“kernels” that influence each other
– Run regression in each kernel
• Requiring just local communication
– Blend data between kernels
– Requires some clever matrix manipulation
• End result: regressed model at every node
– Useful in failure detection, missing value estimation

94
Correlated Attributes
• Data in sensor networks is correlated; e.g.,
– Temperature and voltage
– Temperature and light
– Temperature and humidity
– Temperature and time of day
– etc.

95
Exploiting Correlations in
Query Processing
• Simple idea:
– Given predicate P(A) over expensive attribute A
– Replace it with P’ over cheap attribute A’ such that
P’ evaluates to P
– Problem: unless A and A’ are perfectly correlated, P’
≠ P for all time
• So we could incorrectly accept or reject some readings
• Alternative: use correlations to improve
selectivity estimates in query optimization
– Construct conditional plans that vary predicate
order based on prior observations
96
Exploiting Correlations
(Cont.)
• Insight: by observing a (cheap and correlated) variable not
involved in the query, it may be possible to improve query
performance
– Improves estimates of selectivities
• Use conditional plans
• Example

Light
Light>> Temp
Temp<< Expected
ExpectedCost
Cost==150
110
100
100Lux
Lux 20°
20°CC
T
Cost
Cost==100
100 Cost
Cost==100
100
Time in Selectivity
Selectivity==.5.1 Selectivity
Selectivity==.5.9
[6pm, 6am]

F Temp
Temp<< Light
Light>> Expected
ExpectedCost
Cost==150
110
20°
20°CC 100
100Lux
Lux

Cost
Cost==100
100 Cost
Cost==100
100 97
Selectivity
Selectivity==.5.1 Selectivity
Selectivity==.5.9
In-Network Join Strategies
• Types of joins:
– non-sensor -> sensor
– sensor -> sensor
• Optimization questions:
– Should the join be pushed down?
– If so, where should it be placed?
– What if a join table exceeds the memory
available on one node?

98
Choosing Where to Place
Operators
• Idea : choose a “join node” to run the operator
• Over time, explore other candidate placements
– Nodes advertise data rates to their neighbors
– Neighbors compute expected cost of running the
join based on these rates
– Neighbors advertise costs
– Current join node selects a new, lower cost node

Bonfils + Bonnet, Adaptive and Decentralized Operator 99


Placement for In-Network QueryProcessing IPSN 2003.
Topics
• In-network aggregation
• Acquisitional Query Processing
• Heterogeneity
• Intermittent Connectivity
• In-network Storage
• Statistics-based summarization and sampling
• In-network Joins
• Adaptivity and Sensor Networks
• Multiple Queries

100
Adaptivity In Sensor
Networks
• Queries are long running
• Selectivities change
– E.g. night vs day
• Network load and available energy vary
• All suggest that some adaptivity is needed
– Of data rates or granularity of aggregation when
optimizing for lifetimes
– Of operator orderings or placements when selectivities
change (c.f., conditional plans for correlations)
• As far as we know, this is an open problem!

101
Multiple Queries and Work
Sharing
• As sensornets evolve, users will run many
queries simultaneously
– E.g., traffic monitoring
• Likely that queries will be similar
– But have different end points, parameters, etc
• Would like to share processing, routing as
much as possible
• But how? Again, an open problem.

102
Concluding Remarks
• Sensor networks are an exciting emerging technology,
with a wide variety of applications

• Many research challenges in all areas of computer science


– Database community included
– Some agreement that a declarative interface is right

• TinyDB and other early work are an important first step

• But there’s lots more to be done!

103

Anda mungkin juga menyukai