Embed
Email

defense

Document Sample

Shared by: xiaoyounan
Categories
Tags
Stats
views:
0
posted:
12/1/2011
language:
English
pages:
59
Lower Bound Techniques

for Data Structures

Mihai Pătrașcu





Committee:

• Erik Demaine (advisor)

• Piotr Indyk

• Mikkel Thorup

Data Structures

I don’t study stacks, queues and binary search trees!





I do study data structure problems (a.k.a. Abstract Data Types)





Preprocess T = { n numbers }

pred(q): max { y є T | y my work ≤ Complexity of predecessor ≤ O(lg n)/operation



“Augmented binary search trees solve partial sums”

=> my work ≤ Complexity of partial sums ≤ O(lg n)/operation



Preprocess T = { n numbers }

pred(q): max { y є T | y lg n bits (pointers = words)

• random access to memory

• any operation on CPU registers (arithmetic, bitwise…)



Just prove lower bound on # memory accesses







“Array Mem*1.. S]

of w-bit words”

“Black box”

Why Data Structures?

I want to understand computation.

The gospel: Other settings:

• data structures L.B. : some • streaming L.B. : many

 understand some nontrivial  not very “computational”

computational phenomena mostly storage / info thy

• space-bounded (P vs L)

• efficient algorithms L.B. : a few, Ω(n √lg n)

circuit L.B. not forthcoming  unnatural questions

• algebraic L.B. : some

• hard optimization

 cool, but not real computing…

 NP-completeness

L.B. : one per STOC/FOCS  • depth 3 circuits with mod-6 gates

??

Why Data Structures?

I want to understand computation.

The gospel: Other settings:

• data structures L.B. : some • streaming L.B. : many

 understand some nontrivial  not very “computational”

computational phenomena mostly storage / info thy

• space-bounded (P vs L)

Weak as some L.B. : a be,

• efficient algorithms of the lower bounds mayfew, Ω(n √lg n)

L.B. not area that has gotten farthest

circuit it’s the forthcoming  unnatural questions

• computation” L.B. : some

towards “understandingalgebraic

• hard optimization

 cool, but not real computing…

 NP-completeness

L.B. : one per STOC/FOCS  • depth 3 circuits with mod-6 gates

??

*Yao, FOCS’78+

History*

*Ajtai’88+ -- predecessor (static) *Omitted: bounds for

succinct data structures.

Observations:

• huge influence

• 2nd papers

• result wrong (better upper bound known)

• no journal version; many claims without proof



*Fredman, Saks’89+ -- partial sums, union find (dynamic)

*Yao, FOCS’78+

History*

*Ajtai’88+ -- predecessor (static) *Omitted: bounds for

*Bing Xiao, Stanford’92+ ** succinct data structures.

*Miltersen STOC’94+

*Miltersen, Nisan, Safra, Wigderson STOC’95+

*Beame, Fich STOC’99+

*Sen ICALP’01+ === richness lower bounds ===

*Borodin, Ostrovsky, Rabani STOC’99+ p.m.

(1+ε)-nearest neighbor: *Barkol, Rabani STOC’00+ rand. NN

• *Chakrabarti, Chazelle, Gum, Lvov STOC’99+ *Jayram,Khot,Kumar,Rabani STOC’03+ p.m.

• *Chakrabarti, Regev FOCS’04+ *Liu’04+ det. ANN



*Fredman, Saks’89+ -- partial sums, union find (dynamic)

[Ben-Amram, Galil FOCS’91+

*Miltersen, Subramanian, Vitter, Tamassia’93+

*Husfeldt, Rauhe, Skyum’96+

*Fredman, Henzinger’98+ planar connectivity

*Husfeldt, Rauhe ICALP’98+ nondeterminism

*Alstrup, Husfeldt, Rauhe FOCS’98+ marked ancestor

• *Alstrup, Husfeldt , Rauhe SODA’01+ dynamic 2D NN

[Alstrup, Ben-Amram, Rauhe STOC’99+ union-find

*Yao, FOCS’78+

Three Main Ideas

*Ajtai’88+ -- predecessor (static)

*Bing Xiao, Stanford’92+ **

*Miltersen STOC’94+

*Miltersen, Nisan, Safra, Wigderson STOC’95+

*Beame, Fich STOC’99+

*Sen ICALP’01+ === richness lower bounds ===

3. Round Elimination

(1+ε)-nearest neighbor:

*Borodin, Ostrovsky, Rabani STOC’99+ p.m.

2. Asym. Communication, NN

*Barkol, Rabani STOC’00+ rand.

• *Chakrabarti, Chazelle, Gum, Lvov STOC’99+ Rectangles

*Jayram,Khot,Kumar,Rabani STOC’03+ p.m.

• *Chakrabarti, Regev FOCS’04+ *Liu’04+ det. ANN



*Fredman, Saks’89+ -- partial sums, union find (dynamic)

[Ben-Amram, Galil FOCS’91+

*Miltersen, Subramanian, Vitter, Tamassia’93+

1. Epochs

*Husfeldt, Rauhe, Skyum’96+

*Fredman, Henzinger’98+ planar connectivity

*Husfeldt, Rauhe ICALP’98+ nondeterminism

*Alstrup, Husfeldt, Rauhe FOCS’98+ marked ancestor

• *Alstrup, Husfeldt , Rauhe SODA’01+ dynamic 2D NN

[Alstrup, Ben-Amram, Rauhe STOC’99+ union-find

*Yao, FOCS’78+

Three Main Ideas

*Ajtai’88+ -- predecessor (static)

*Bing Xiao, Stanford’92+ **

*Miltersen STOC’94+

*Miltersen, Nisan, Safra, Wigderson STOC’95+

*Beame, Fich STOC’99+

*Sen ICALP’01+ === richness lower bounds ===

3. Round Elimination

(1+ε)-nearest neighbor:

*Borodin, Ostrovsky, Rabani STOC’99+ p.m.

2. Asym. Communication, NN

*Barkol, Rabani STOC’00+ rand.

• *Chakrabarti, Chazelle, Gum, Lvov STOC’99+ Rectangles

*Jayram,Khot,Kumar,Rabani STOC’03+ p.m.

• *Chakrabarti, Regev FOCS’04+ *Liu’04+ det. ANN



*Fredman, Saks’89+ -- partial sums, union find (dynamic)

[Ben-Amram, Galil FOCS’91+

*Miltersen, Subramanian, Vitter, Tamassia’93+

1. Epochs

*Husfeldt, Rauhe, Skyum’96+

*Fredman, Henzinger’98+ planar connectivity

*Husfeldt, Rauhe ICALP’98+ nondeterminism

*Alstrup, Husfeldt, Rauhe FOCS’98+ marked ancestor

• *Alstrup, Husfeldt , Rauhe SODA’01+ dynamic 2D NN

[Alstrup, Ben-Amram, Rauhe STOC’99+ union-find

Review: Epoch Lower Bounds





time



update: mark/unmark node [tu]

#updates: r3 r2 r1 r0 query: # marked ancestors? [tq]

bits written: tuw∙r3 tuw∙r2 tuw∙r t uw







• epoch j: rj updates

• epochs {0, .., j-1} write O(tuw∙rj-1) bits most updates from epoch j

• pick r >> tuw not known outside epoch j



random query needs to read a cell from epoch j

tq = Ω(lg n / lg r) = Ω(lg n / lg(tuw)) max {tq , tu } = Ω(lg n / lglg n)

Review: Epoch Lower Bounds

See also:

“Big Challenges” *Miltersen’99+ [Fredman JACM ’81]

[Fredman JACM ’82]

• prove some ω(lg n/lglg n) bound [Yao SICOMP ’85]

[Fredman, Saks STOC ’89]

Candidate: Ω(lg n) for the partial [Ben-Amram, Galil FOCS ’91] ’93]

sums problem

[Hampapuram, Fredman FOCS

[Chazelle STOC ’95]

[Husfeldt, Rauhe, Skyum SWAT ’96]



• prove ω(lg n) in the bit-probe model

[Husfeldt, Rauhe ICALP ’98]

[Alstrup, Husfeldt, Rauhe FOCS ’98]









Maintain an array A[n] under:

update(i, Δ): A[i] += Δ

sum(i): return A*0+ + … + A*i]

Our contribution

[P., Demaine SODA’04+ Ω(lg n) for partial sums

[P., Demaine STOC’04+ Ω(lg n) for dynamic trees, etc.

* very simple proof

* not based on epochs





[P., Tarniţă ICALP’05+ Ω(lg n) via epoch argument!!

=> Ω(lg2n/lg2lg n) in the bit-probe model

Ω(lg n) via Epoch Arguments?

j









Old: information about epoch j outside j

≤ #cells written by epochs ,0, .., j-1}

≤ O(tu∙rj-1)

Ω(lg n) via Epoch Arguments?

j









New: information about epoch j outside j

≤ #cells read by epochs ,0, .., j-1} from epoch j

still ≤ O(tu∙rj-1) in the worst case 



Foil worst-case by randomizing epoch construction!

Ω(lg n) via Epoch Arguments?









#cells read by epochs {0, .., j-1} from epoch j

≤ O((tu / #epochs) ∙ rj-1) on average

=> max { tu, tq } = Ω(lg n)



Foil worst-case by randomizing epoch construction!

The “Very Simple Ω(lg n) Proof”

Maintain an array A[n] under: π

update(i, Δ): A[i] = Δ Δ1 Δ2

sum(i): return A*0+ + … + A*i] Δ3

Δ4

Δ5

Δ6

The hard instance: Δ7

Δ8

Δ9

π = random permutation Δ10

Δ11

for t = 1 to n: Δ13

Δ12



query: sum(π(t)) Δ15

Δ14



Δt= rand() Δ16

time

update(π(t), Δt)

Δ1 Δ2

Δ3

Δ4

Δ5

Δ6

Δ7

Δ8

Δ9

Δ10

Δ11

Δ12

Δ13

How can Mac help PC run t = 9,…,12 ? Δ14

Δ16

Communication ≈ # memory locations Δ17

time

* read during t = 9,…,12

* written during t = 5, …, 8

Δ1 Δ2

Δ3

Δ4

Δ5

Δ8

Δ7 Δ9

Δ1+Δ5+Δ3

Δ1 +Δ7+Δ2



Δ1+Δ5+Δ3

Δ13

How much information Δ1+Δ5+Δ3+Δ7

Δ14

needs to be transferred? Δ16 +Δ2 +Δ8 +Δ4

Δ17

At least Δ5 , Δ5+Δ7 , Δ5+Δ7+Δ8 time

=> i.e. at least 3 words

(random values incompressible)

The general principle

Lower bound

= # down arrows



How many down arrows? (in expectation)

k operations (2k-1) ∙ Pr* + ∙ Pr* +

k operations = (2k-1) ∙ ½ ∙ ½ = Ω(k)

Recap

Communication = # memory locations

* read during pink period

* written during yellow period



Communication between periods of k items

= Ω(k)









# memory locations * read during pink period

= Ω(k)

* written during yellow period

Putting it all together





Every load instruction counted once

@ lowest_common_ancestor(

write time , read time )

aaaa









time

Q.E.D.



 Augmented binary search trees are optimal.

 First “Ω(lg n)” for any dynamic data structure.

*Yao, FOCS’78+

Three Main Ideas

*Ajtai’88+ -- predecessor (static)

*Bing Xiao, Stanford’92+ **

*Miltersen STOC’94+

*Miltersen, Nisan, Safra, Wigderson STOC’95+

*Beame, Fich STOC’99+

*Sen ICALP’01+ === richness lower bounds ===

3. Round Elimination

(1+ε)-nearest neighbor:

*Borodin, Ostrovsky, Rabani STOC’99+ p.m.

2. Asym. Communication, NN

*Barkol, Rabani STOC’00+ rand.

• *Chakrabarti, Chazelle, Gum, Lvov STOC’99+ Rectangles

*Jayram,Khot,Kumar,Rabani STOC’03+ p.m.

• *Chakrabarti, Regev FOCS’04+ *Liu’04+ det. ANN



*Fredman, Saks’89+ -- partial sums, union find (dynamic)

[Ben-Amram, Galil FOCS’91+

*Miltersen, Subramanian, Vitter, Tamassia’93+

1. Epochs

*Husfeldt, Rauhe, Skyum’96+

*Fredman, Henzinger’98+ planar connectivity

*Husfeldt, Rauhe ICALP’98+ nondeterminism

*Alstrup, Husfeldt, Rauhe FOCS’98+ marked ancestor

• *Alstrup, Husfeldt , Rauhe SODA’01+ dynamic 2D NN

[Alstrup, Ben-Amram, Rauhe STOC’99+ union-find

Review: Communication Complexity

Review: Communication Complexity

lg S bits

w bits





lg S bits



query(a,b,c) w bits database

=> space S



Traditional communication complexity:

“total #bits communicated ≥ X”

=> tq∙(lg S + w) ≥ X => tq = Ω(X/w)

But wait! X ≤ CPU input ≤ O(w)

Review: Communication Complexity

lg S bits

w bits





lg S bits



query(a,b,c) w bits database

=> space S



Asymmetric communication complexity:

“either Alice sends A bits or Bob sends B bits”

=> either tq∙lg S ≥ A or tq∙w ≥ B

=> tq ≥ min , A/lg S, B/w}

Richness Lower Bounds

Prove: “either Alice sends A bits or Bob sends B bits”

Assume Alice sends o(A), Bob sends o(B) Bob

=> big monochromatic rectangle

1/2o(A)









Alice

Show any big rectangle is bichromatic

(standard idea in comm. complex.)



1/2o(B)



Example: Alice --> q є {0,1}d Bob --> S=n points in {0,1}d

Goal: find argminxєS || x-q ||2

[Barkol, Rabani] A=Ω(d), B=Ω(n1-ε)

=> tq ≥ min , d/lg S, n1-ε/w }

Richness Lower Bounds

upper bound ≈ either:

What does this really mean? • exponential space

tq • near-linear query time

“optimal space lower bound

for constant query time”







1 S

lower bound Θ(n) 2Θ(d)

S = 2Ω(d/tq)



Example: Alice --> q є {0,1}d Bob --> S=n points in {0,1}d

Goal: bound for || x-q ||2

Also: optimal lowerfind argmin decision trees

xєS



[Barkol, Rabani] A=Ω(d), B=Ω(n1-ε)

=> tq ≥ min , d/lg S, n1-ε/w }

Results

Partial match -- database of n strings in {0,1}d, query є {0,1,*}d

[Borodin, Ostrovsky, Rabani STOC’99+

[Jayram,Khot,Kumar,Rabani STOC’03+ A = Ω(d/lg n)

[P. FOCS’08+ A = Ω(d)

Nearest Neighbor on hypercube (ℓ1, ℓ2):

deterministic γ-approximate: *Liu’04+ A = Ω(d/ γ2)

randomized exact: [Barkol, Rabani STOC’00+ A = Ω(d)

rand. (1+ε)-approx: [Andoni, Indyk, P. FOCS’06+ A = Ω(ε-2lg n)

“Johnson-Lindenstrauss space is optimal!”

Approximate Nearest Neighbor in ℓ∞:

[Andoni, Croitoru, P. FOCS’08+ “*Indyk FOCS’98+ is optimal!”

Limits of Communication Approach

tq



branching

programs

Implication of richness

lower bound undervalued! 1 S

Θ(n) 2Θ(d)

“ Alice must send Ω(A) bits”

=> tq= Ω(A / lg S)

No separation between

Separation of Ω(lg n / lglg n)

S=O(n) and S=nO(1) !

between S=O(n) and S=nO(1) !

Richness Gets You More

CPU(s) --> memory communication:

• one query: lg S

S S

• k queries: () ( )

lg k =Θ k lg k

Richness Gets You More

CPU(s) --> memory communication:

• one query: lg S

S S

• k queries: () ( )

lg k =Θ k lg k Prob.1



Prob.2

Prob.3





Prob.k

Richness Gets You More

CPU(s) --> memory communication:

• one query: lg S

S S

• k queries: () (

lg k =Θ k lg k ) Prob.1



Prob.2

Prob.3





Prob.k

Any richness lower bound

“Alice must send A or Bob must send B”

===>

k∙Alice must send k∙A

or k∙Bob must send k∙B

Richness Gets You More

CPU(s) --> memory communication:

• one query: lg S tq= Ω(A / lg(S/k))

S S

• k queries: () (

lg k =Θ k lg k )



Any richness lower bound

“Alice must send A or Bob must send B”

===>

k∙Alice must send k∙A

or k∙Bob must send k∙B

*Yao, FOCS’78+

Three Main Ideas

*Ajtai’88+ -- predecessor

*Bing Xiao, Stanford’92+

*Miltersen STOC’94+

*Miltersen, Nisan, Safra, Wigderson STOC’95+

*Beame, Fich STOC’99+

*Sen ICALP’01+ === richness lower bounds ===

3. Round Elimination

(1+ε)-nearest neighbor:

*Borodin, Ostrovsky, Rabani STOC’99+ p.m.

2. Asym. Communication,

*Barkol, Rabani STOC’00+ rand. NN

• *Chakrabarti, Chazelle, Gum, Lvov STOC’99+ Rectangles

*Jayram,Khot,Kumar,Rabani STOC’03+ p.m.

• *Chakrabarti, Regev FOCS’04+ *Liu’04+ det. ANN



*Fredman, Saks’89+ - partial sums, union find

[Ben-Amram, Galil FOCS’91+

Subramanian,

*Miltersen,1. Epochs Vitter, Tamassia’93+ 4. Range Queries

*Husfeldt, Rauhe, Skyum’96+

*Fredman, Henzinger’98+ planar connectivity

*Husfeldt, Rauhe ICALP’98+ nondeterminism

*Alstrup, Husfeldt, Rauhe FOCS’98+ marked ancestor

*Alstrup, Husfeldt , Rauhe SODA’01+ dynamic 2D NN

[Alstrup, Ben-Amram, Rauhe STOC’99+ union-find

Open Hunting Season

Nice trick, but “Ω(lg n / lglg n) with O(n polylg n) space”

not impressive argument for “curse of dimensionality”



But space n1+o(1) is hugely important in data structures

=> open hunting season for range queries etc.



2D range counting

SELECT count(*) 71000

70000

FROM employees

69000

WHERE salary y ?

The Power of Reductions

2D stabbing

Preprocess S={n rectangles}

• stab(x,y): is (x,y) inside some RєS?









reachability oracles

in butterfly graph

Preprocess G = subgraph of butterfly

• reachable(x,y): is there a path x->y ?

The Power of Reductions

Lopsided Set Disjointness

Alice: set S Bob: set T Hint:

“are S and T disjoint?” S = {one edge out of every node}

=> n queries from 1st to last level

T = {deleted edges}

S disjoint from T => all queries “yes”

reachability oracles

in butterfly graph

Preprocess G = subgraph of butterfly

• reachable(x,y): is there a path x->y ?

Reachability in Butterfly??









marked ancestor

problem

update(node): (un)mark node

query(leaf): any marked ancestor?

lopsided set disjointness (LSD)





reachability oracles partial match (1+ε)-ANN

in the butterfly ℓ1, ℓ2





dyn. marked ancestor NN in ℓ1, ℓ2

2D stabbing 3-ANN in ℓ∞





worst-case

union-find dyn. trees, graphs 4D reporting 2D counting



dyn. 1D stabbing





*P. FOCS’08+

partial sums dyn. 2D reporting

dyn. NN in 2D

*Yao, FOCS’78+

Three Main Ideas

*Ajtai’88+ -- predecessor

*Bing Xiao, Stanford’92+

*Miltersen STOC’94+

*Miltersen, Nisan, Safra, Wigderson STOC’95+

*Beame, Fich STOC’99+

*Sen ICALP’01+ === richness lower bounds ===

3. Elimination

3. Round Elimination

(1+ε)-nearest neighbor:

*Borodin, Ostrovsky, Rabani STOC’99+ p.m.

2. Asym. Communication,

*Barkol, Rabani STOC’00+ rand. NN

• *Chakrabarti, Chazelle, Gum, Lvov STOC’99+ Rectangles

*Jayram,Khot,Kumar,Rabani STOC’03+ p.m.

• *Chakrabarti, Regev FOCS’04+ *Liu’04+ det. ANN



*Fredman, Saks’89+ - partial sums, union find

[Ben-Amram, Galil FOCS’91+

Subramanian,

*Miltersen,1. Epochs Vitter, Tamassia’93+ 4. Range Queries

*Husfeldt, Rauhe, Skyum’96+

*Fredman, Henzinger’98+ planar connectivity

*Husfeldt, Rauhe ICALP’98+ nondeterminism

*Alstrup, Husfeldt, Rauhe FOCS’98+ marked ancestor

*Alstrup, Husfeldt , Rauhe SODA’01+ dynamic 2D NN

[Alstrup, Ben-Amram, Rauhe STOC’99+ union-find

Packet Forwarding/ Predecessor Search



Preprocess n prefixes of ≤ w bits:

 make a hash-table H with all prefixes of prefixes

 |H|=O(n∙w), can be reduced to O(n)



Given w-bit IP, find longest matching prefix:

 binary search for longest ℓ such that IP[0: ℓ] є H







[van Emde Boas FOCS’75]

[Waldvogel, Varghese, Turener, Plattner SIGCOMM’97]

[Degermark, Brodnik, Carlsson, Pink SIGCOMM’97]

[Afek, Bremler-Barr, Har-Peled SIGCOMM’99]

Review: Round Elimination

hi lo











I want to talk

0: continue searching for pred(hi)

to Alice i

1: continue searching for pred(lo) 1





o(k) bits

Message has negligible info 2

about the typical i

=> can be eliminated for fixed i

k

The Lemma

Observe: can’t work worst-case!

Traditional fix: introduce 2-sided error









Think outside the :

• easy proof with a different error model [P.-Thorup, STOC’06+

The Model

• Alice, Bob receive inputs

• they may reject inputs

• if they accept, they start communicating

and must produce a correct output



The point: error probability ½ is trivial

reject probability 0.99999 is still hard



“We regret to inform you that your input has not been accepted for

communication. We receive a large number of inputs, many of them of high

quality, and scheduling constraints unfortunately make it impossible to accept

all of them.”

The Proof

x1

Trie for Alice’s input (x1, …, xk) {m1,m2, m3}





• leaves = message sent x2

{m1,m2}



• node = set of msgs in subtree

x3

m1 m2 m2 m1 m3 m2 m1 m3 m1



Say msg size is m=k/2:

• |leaf|=1, |root|=2k/2 => (∀)root-to-leaf path,

½ of nodes have |node|≥ ½|parent|

fix i, x1, …, xi-1

• averaging over node-child pairs

=> (∃) node: ½ its children have |child|> ½|node|

• thus (∃) msg M: ¼ of children have M є child

fixed message reject ¾ of inputs (xi)

(eliminate)

Predecessor Search: Timeline

after [van Emde Boas FOCS’75]

… O(lg w) has to be tight!

[Beame, Fich STOC’99]

slightly better bound with O(n2) space

… must improve the algorithm for O(n) space!



[P., Thorup STOC’06]

tight Ω(lg w) for space O(n polylg n) !



Idea: * consider multiple queries

* prove round elimination under direct sum

Predecessor Search: Timeline

I want to talk

to Alice 2

1 2 k



I want to talk

1 2 k to Alice 1



I want to talk

1 2 k to Alice 2







Idea: * consider multiple queries

* prove round elimination under direct sum

Round Eliminated!

The End









…or Champagne?

The Partial Sums Problem

Textbook solution:

“augmented” binary search trees

Running time: O(lg n) / operation









Maintain an array A[n] under:

update(i, Δ): A[i] += Δ

sum(i): return A*0+ + … + A*i]



Related docs
Other docs by xiaoyounan
irregular plural verbs spelling
Views: 0  |  Downloads: 0
pres8
Views: 0  |  Downloads: 0
50889
Views: 0  |  Downloads: 0
inscritos_andaluz_absoluto_05
Views: 0  |  Downloads: 0
Week 2 Term 3 Aug 8th
Views: 0  |  Downloads: 0
F1
Views: 0  |  Downloads: 0
suspensions_extensions
Views: 0  |  Downloads: 0
dangerous minds journal
Views: 0  |  Downloads: 0
CommitteeontheRightsoftheChild
Views: 0  |  Downloads: 0
projectsummary_1
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!