Docstoc

Tight Lower Bounds for the Distinct Elements Problem

Document Sample
Tight Lower Bounds for the Distinct Elements Problem Powered By Docstoc
					Tight Lower Bounds
   for the Distinct
 Elements Problem
       David Woodruff
            MIT
      dpwood@mit.edu

  Joint work with Piotr Indyk
       The Problem


      4     3     7     3     1     1     0    …
•   Stream of elements a1, …, an each in {1, …, m}
•   Want F0 = # of distinct elements
•   Elements in adversarial order
•   Algorithms given one pass over stream
•   Goal: Minimum-space algorithm
          A Trivial Algorithm

      4     3    7    3     1    1     0    …
    00000000                    10011011
• Keep m-bit characteristic vector v of stream
   • j in stream $ vj = 1
• F0 = wt(10011011) = 5
• Space = m
                Can we do better?
     Negative Results

• Any algorithm computing F0 exactly must
  use (m) space [AMS96]
• Any deterministic alg. that outputs x with
  |F0 – x| < F0 must use (m) space
  [AMS96]
• What about randomized approximation
  algorithms?
    Rand. Approx. Algorithms for F0
• O(log log m/2 + log m log 1/) alg. outputs x with
          Pr[| F0 – x| < F0 ] > ¾ [BJKST02]
  • Lots of hashing tricks

                     Is this optimal?

• Previous lower bounds
   • (log m) [AMS96]
   • (1/) [Bar-Yossef]

• Open Problem of [BJKST02]: GAP: 1/ << 1/2
       Idea Behind Lower Bounds
               Alice                  Bob
x2   {0,1}m                                   y 2 {0,1}m


Stream s(x)                                 Stream s(y)
                  S
                 Internal
 (1 § ) F0      state of A                 (1 § ) F0
                                            algorithm A
 algorithm A


• Compute (1 § ) F0(s(x) ± s(y)) w.p. > ¾
• Idea: If can decide f(x,y) w.p. > ¾, space used
  by A at least f’s rand. 1-way comm. complexity
    Randomized 1-way comm. complexity
• Boolean function f: X £ Y ! {0,1}
• Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y)
• Only 1 message sent: must be from Alice to Bob
• Comm. cost of protocol = expected length of
  longest message sent over all inputs.
•  -error randomized 1-way comm. complexity
  of f, R(f), is comm. cost of optimal protocol
  computing f w.p. ¸ 1-

• How do we lower bound R(f)?
 The VC Dimension [KNR]
• F = {f : X ! {0,1}} family of Boolean
  functions
• f 2 F is length-|X | “bit string”
• For S µ X, shatter coefficient SC(fS) of S
  is |{f |S}f 2 F| = # distinct bit strings when F
  restricted to S
• SC(F, p) = maxS 2 X, |S| = p SC(fS)
• If SC(fS) = 2|S|, S shattered by F
• VC Dimension of F, VCD(F), = size of
  largest S shattered by F
Shatter Coefficient Theorem

• Notation: For f: X £ Y ! {0,1}, define:
    fX = { fx(y) : Y ! {0,1} | x 2 X },
     where fx(y) = f(x,y)


• Theorem [BJKS]: For every
  f: X £ Y ! {0,1}, every p ¸ VCD( fX ),
    R1/4(f) = (log(SC(fX, p)))
      The (1/) Lower Bound [Bar-Yossef]

• Alice has x 2R {0,1}m, wt(x) = m/2
• Bob has y 2R {0,1}m, wt(y) = m and:
   • Either wt(x Æ y) = 0      OR    wt(x Æ y) = m
        f(x,y) = 0                      f(x,y) = 1
• R1/4(f) = (VCD(fX)) = (1/) [Bar-Yossef]
• s(x), s(y) any streams w/char. vectors x, y
   • f(x,y) = 1 ! F0(s(x) ± s(y)) = m/2
   • f(x,y) = 0 ! F0(s(x) ± s(y)) = m/2 + m
   • (1+’)m/2 < (1 - ’)(m/2 + m) for ’ = ()
• Hence, can decide f ! F0 alg. uses (1/) space
   Our Results
• Remainder of talk:  (1/2) lower
bound for  = (m-1/(9+k)) for any k > 0.

  ! O(log log m/2 + log m log 1/)
  upper bound almost optimal

  IDEA: Reduce from protocol for
  computing dot product
        The Promise Problem
 • t = (1/2), Y = basis of unit vectors of Rt
                  Alice                 Bob
  x 2 [0,1]t                                   y2Y
  ||x|| = 1

                   Promise Problem :
     hx,yi = 0                           hx,yi = 2/t1/2
     f(x,y) = 0             OR            f(x,y) = 1


• X = {x 2 [0,1]t, ||x|| = 1 and 9 y 2 Y s.t. (x,y) 2 }
• We lower bound R1/4(f) via SC(fX, t)
     Bounding SC(fX, t)
•   Theorem: SC(fX, t/4) = 2(t)
•   Proof:
1.  8 T ½ {Y} s.t. |T| = t/4, put xT = (2/t1/2) ¢ e 2 T e
2.  Define X1 ½ X as X1 = {xT | T ½ {Y}, |T| = t/4}
3.  Claim: 8 s 2 {0,1}t w/ wt(x) = t/4, s 2 truth tab. of fX1
4.  Proof:
   1. Let s 2 {0,1}t with 1s in positions i1, …, it/4
   2. Put T = {ei1, …, eit/4}. 8 e 2 T, he, xTi = 2/t1/2 = 2
   3. 8 e 2 Y - T, h e, xT i = 0
5. There are 2(t) such s.
  Bounding R1/4(f)
• Corollary:




• Reduction: we need protocol computing
  f with communication = space used by
  any (1 § ) F0 approx. alg.
     Reduction
•   Recall:
      • hx,yi = 0 if f(x,y) = 0
      • hx,yi = 2/t1/2 if f(x,y) = 1

•   Goal: Reduce “separation” of hx,yi to separation of
    F0(s(x) ± s(y)) for streams s(x),s(y) Alice/Bob can
    derive from x,y

•    Use relation: ||y-x||2 = ||y||2 + ||x||2 – 2hx, yi
    • f(x,y) = 0 ! ||y-x|| = 21/2
    • f(x,y) = 1 ! ||y-x|| < 21/2 (1- 1/t1/2) = 21/2 (1 - ())
               Overview of Reduction
  x 2 [0,1]t                                           y2E
  ||x|| = 1

     (x)       1. Low-distortion embedding             (y)
                         : l2t ! l1poly(t)
                2. Rational Approximation

                3. Scale rationals to integers s
                4. Convert integer coords to unary
       x’                                                 y’
                   to get {0,1} vectors x’,y’
 s(x’)                                                      s(y’)
     F0 Alg      State                                 F0 Alg
F0(s(x’) ± s(y’)) can decide f(x,y) w.p. ¸ 3/4     F0(s(x’) ± s(y’))
Embedding l2t into l1poly(t)
• A (1+)-distortion embedding  : l2t ! l1d
  is mapping s.t. 8 p,q 2 l2t,



• Theorem [FLM77]: 8  9 a (1+ )-
  distortion embedding : l2t ! l1d with:
      Embedding l2t into l1d
 x 2 [0,1]t                                 y2E
 ||x|| = 1             Low-distortion
                        embedding
                         : l2t ! l1d       (y)
    (x)

• Using Theorem [FLM77], Alice/Bob get (x),
(y) 2 Rd with d = O(t ¢ (log 1/) / 2):




 •  specified later
Rational Approximation
• z = z(t): N ! N; assume z ¸ d

• Approximate each coord. of output of
  embedding by integer multiple of 1/z
      Scaling
• Alice (resp. Bob) multiplies each coord. of
        (resp.    ) by z

• Obtains s(      ) (resp. s(   )

• Claim: coords. are integers in range [-2z, 2z]

• Proof:
  1. |   | · |(¢)| + d/z · 2
  2. |s(    )| = z|     |
    Converting to Unary
• For i=1 to d
    • j à s(  )i
    • Replace s(    )i with 12z+j02z-j

•   Bob does same for s(       )
•   x’, y’ denote new length 4dz bitstrings
•   wt(x’) = |s(   )|, wt(y’) = |s(   )|
•   (x’,y’) = |s(   ) – s(   )|
     Reducing (x’,y’) to F0
• Alice (Bob) chooses stream ax’ (ay’) with
  char. vector x’ (y’).

• Lemma: If 1 < wt(x’), wt(y’) < 2, then:

  1 + (x’,y’)/2 < F0(ax’ ± ay’) < 2 + (x’,y’)/2

  Follows from fact: F0(ax’ ± ay’) = wt(x’ Ç y’)
  Reducing (x’,y’) to F0
• Use lemma to show:




• Set  = (), z = (1/5 log 1/) so that two
  cases distinguished by (1 § ()) F0 alg
   Conclusions
• ax’, ay’ must be in universe of size        ¸
  4zd = (log (1/)/9)
• Reduction only valid if 4zd · m
•  (1/2) bound for  = (m-1/(9+k)) 8 k > 0.

• Recently lower bound improved to:
  • (1/2) for  ¸ m-1/2, which is optimal
  • Find set of vectors directly in Hamming
  space via involved prob. method argument

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:4/30/2012
language:
pages:24