Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Get this document free

Tight Lower Bounds for the Distinct Elements Problem

VIEWS: 3 PAGES: 24

									Tight Lower Bounds
   for the Distinct
 Elements Problem
       David Woodruff
            MIT
      dpwood@mit.edu

  Joint work with Piotr Indyk
       The Problem


      4     3     7     3     1     1     0    …
•   Stream of elements a1, …, an each in {1, …, m}
•   Want F0 = # of distinct elements
•   Elements in adversarial order
•   Algorithms given one pass over stream
•   Goal: Minimum-space algorithm
          A Trivial Algorithm

      4     3    7    3     1    1     0    …
    00000000                    10011011
• Keep m-bit characteristic vector v of stream
   • j in stream $ vj = 1
• F0 = wt(10011011) = 5
• Space = m
                Can we do better?
     Negative Results

• Any algorithm computing F0 exactly must
  use (m) space [AMS96]
• Any deterministic alg. that outputs x with
  |F0 – x| < F0 must use (m) space
  [AMS96]
• What about randomized approximation
  algorithms?
    Rand. Approx. Algorithms for F0
• O(log log m/2 + log m log 1/) alg. outputs x with
          Pr[| F0 – x| < F0 ] > ¾ [BJKST02]
  • Lots of hashing tricks

                     Is this optimal?

• Previous lower bounds
   • (log m) [AMS96]
   • (1/) [Bar-Yossef]

• Open Problem of [BJKST02]: GAP: 1/ << 1/2
       Idea Behind Lower Bounds
               Alice                  Bob
x2   {0,1}m                                   y 2 {0,1}m


Stream s(x)                                 Stream s(y)
                  S
                 Internal
 (1 § ) F0      state of A                 (1 § ) F0
                                            algorithm A
 algorithm A


• Compute (1 § ) F0(s(x) ± s(y)) w.p. > ¾
• Idea: If can decide f(x,y) w.p. > ¾, space used
  by A at least f’s rand. 1-way comm. complexity
    Randomized 1-way comm. complexity
• Boolean function f: X £ Y ! {0,1}
• Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y)
• Only 1 message sent: must be from Alice to Bob
• Comm. cost of protocol = expected length of
  longest message sent over all inputs.
•  -error randomized 1-way comm. complexity
  of f, R(f), is comm. cost of optimal protocol
  computing f w.p. ¸ 1-

• How do we lower bound R(f)?
 The VC Dimension [KNR]
• F = {f : X ! {0,1}} family of Boolean
  functions
• f 2 F is length-|X | “bit string”
• For S µ X, shatter coefficient SC(fS) of S
  is |{f |S}f 2 F| = # distinct bit strings when F
  restricted to S
• SC(F, p) = maxS 2 X, |S| = p SC(fS)
• If SC(fS) = 2|S|, S shattered by F
• VC Dimension of F, VCD(F), = size of
  largest S shattered by F
Shatter Coefficient Theorem

• Notation: For f: X £ Y ! {0,1}, define:
    fX = { fx(y) : Y ! {0,1} | x 2 X },
     where fx(y) = f(x,y)


• Theorem [BJKS]: For every
  f: X £ Y ! {0,1}, every p ¸ VCD( fX ),
    R1/4(f) = (log(SC(fX, p)))
      The (1/) Lower Bound [Bar-Yossef]

• Alice has x 2R {0,1}m, wt(x) = m/2
• Bob has y 2R {0,1}m, wt(y) = m and:
   • Either wt(x Æ y) = 0      OR    wt(x Æ y) = m
        f(x,y) = 0                      f(x,y) = 1
• R1/4(f) = (VCD(fX)) = (1/) [Bar-Yossef]
• s(x), s(y) any streams w/char. vectors x, y
   • f(x,y) = 1 ! F0(s(x) ± s(y)) = m/2
   • f(x,y) = 0 ! F0(s(x) ± s(y)) = m/2 + m
   • (1+’)m/2 < (1 - ’)(m/2 + m) for ’ = ()
• Hence, can decide f ! F0 alg. uses (1/) space
   Our Results
• Remainder of talk:  (1/2) lower
bound for  = (m-1/(9+k)) for any k > 0.

  ! O(log log m/2 + log m log 1/)
  upper bound almost optimal

  IDEA: Reduce from protocol for
  computing dot product
        The Promise Problem
 • t = (1/2), Y = basis of unit vectors of Rt
                  Alice                 Bob
  x 2 [0,1]t                                   y2Y
  ||x|| = 1

                   Promise Problem :
     hx,yi = 0                           hx,yi = 2/t1/2
     f(x,y) = 0             OR            f(x,y) = 1


• X = {x 2 [0,1]t, ||x|| = 1 and 9 y 2 Y s.t. (x,y) 2 }
• We lower bound R1/4(f) via SC(fX, t)
     Bounding SC(fX, t)
•   Theorem: SC(fX, t/4) = 2(t)
•   Proof:
1.  8 T ½ {Y} s.t. |T| = t/4, put xT = (2/t1/2) ¢ e 2 T e
2.  Define X1 ½ X as X1 = {xT | T ½ {Y}, |T| = t/4}
3.  Claim: 8 s 2 {0,1}t w/ wt(x) = t/4, s 2 truth tab. of fX1
4.  Proof:
   1. Let s 2 {0,1}t with 1s in positions i1, …, it/4
   2. Put T = {ei1, …, eit/4}. 8 e 2 T, he, xTi = 2/t1/2 = 2
   3. 8 e 2 Y - T, h e, xT i = 0
5. There are 2(t) such s.
  Bounding R1/4(f)
• Corollary:




• Reduction: we need protocol computing
  f with communication = space used by
  any (1 § ) F0 approx. alg.
     Reduction
•   Recall:
      • hx,yi = 0 if f(x,y) = 0
      • hx,yi = 2/t1/2 if f(x,y) = 1

•   Goal: Reduce “separation” of hx,yi to separation of
    F0(s(x) ± s(y)) for streams s(x),s(y) Alice/Bob can
    derive from x,y

•    Use relation: ||y-x||2 = ||y||2 + ||x||2 – 2hx, yi
    • f(x,y) = 0 ! ||y-x|| = 21/2
    • f(x,y) = 1 ! ||y-x|| < 21/2 (1- 1/t1/2) = 21/2 (1 - ())
               Overview of Reduction
  x 2 [0,1]t                                           y2E
  ||x|| = 1

     (x)       1. Low-distortion embedding             (y)
                         : l2t ! l1poly(t)
                2. Rational Approximation

                3. Scale rationals to integers s
                4. Convert integer coords to unary
       x’                                                 y’
                   to get {0,1} vectors x’,y’
 s(x’)                                                      s(y’)
     F0 Alg      State                                 F0 Alg
F0(s(x’) ± s(y’)) can decide f(x,y) w.p. ¸ 3/4     F0(s(x’) ± s(y’))
Embedding l2t into l1poly(t)
• A (1+)-distortion embedding  : l2t ! l1d
  is mapping s.t. 8 p,q 2 l2t,



• Theorem [FLM77]: 8  9 a (1+ )-
  distortion embedding : l2t ! l1d with:
      Embedding l2t into l1d
 x 2 [0,1]t                                 y2E
 ||x|| = 1             Low-distortion
                        embedding
                         : l2t ! l1d       (y)
    (x)

• Using Theorem [FLM77], Alice/Bob get (x),
(y) 2 Rd with d = O(t ¢ (log 1/) / 2):




 •  specified later
Rational Approximation
• z = z(t): N ! N; assume z ¸ d

• Approximate each coord. of output of
  embedding by integer multiple of 1/z
      Scaling
• Alice (resp. Bob) multiplies each coord. of
        (resp.    ) by z

• Obtains s(      ) (resp. s(   )

• Claim: coords. are integers in range [-2z, 2z]

• Proof:
  1. |   | · |(¢)| + d/z · 2
  2. |s(    )| = z|     |
    Converting to Unary
• For i=1 to d
    • j à s(  )i
    • Replace s(    )i with 12z+j02z-j

•   Bob does same for s(       )
•   x’, y’ denote new length 4dz bitstrings
•   wt(x’) = |s(   )|, wt(y’) = |s(   )|
•   (x’,y’) = |s(   ) – s(   )|
     Reducing (x’,y’) to F0
• Alice (Bob) chooses stream ax’ (ay’) with
  char. vector x’ (y’).

• Lemma: If 1 < wt(x’), wt(y’) < 2, then:

  1 + (x’,y’)/2 < F0(ax’ ± ay’) < 2 + (x’,y’)/2

  Follows from fact: F0(ax’ ± ay’) = wt(x’ Ç y’)
  Reducing (x’,y’) to F0
• Use lemma to show:




• Set  = (), z = (1/5 log 1/) so that two
  cases distinguished by (1 § ()) F0 alg
   Conclusions
• ax’, ay’ must be in universe of size        ¸
  4zd = (log (1/)/9)
• Reduction only valid if 4zd · m
•  (1/2) bound for  = (m-1/(9+k)) 8 k > 0.

• Recently lower bound improved to:
  • (1/2) for  ¸ m-1/2, which is optimal
  • Find set of vectors directly in Hamming
  space via involved prob. method argument

								
To top