Documents
User Generated
Resources
Learning Center

# kunal (PowerPoint)

VIEWS: 5 PAGES: 22

• pg 1
```									                         Kunal Talwar
MSR SVC

[Dwork, McSherry, Talwar, STOC 2007]
Compressed Sensing:
If x 2 RN is k-sparse
Take M ~ CklogN/krandom Gaussian measurements

Then L1 minimization recovers x.

For what k does this make sense (i.eM<N)?

How small can C be?
 Privacy motivation

 Coding setting

 Results

 Proof Sketch
 Database of information about individuals
 E.g. Medical history, Census data, Customer info.
 Need to guarantee confidentiality of individual entries
 Want to make deductions about the database; learn large
scale trends.
 E.g. Learn that drug V increases likelihood of heart disease
 Do not leak info about individual patients

Curator            Analyst
 Simple Model (easily justifiable)
 Database: n-bit binary vector x
 Query: vector a
 True answer: Dot product ax
 Response is ax+e=True Answer + Noise
 Blatant Non-Privacy: Attacker learns n−o(n) bits of x.

 Theorem: If all responses are within o(√n) of the true
answer, then the algorithm is blatantly non-private even
against a polynomial time adversary asking O(nlog2n)
random questions.
Privacy has a Price
 There is no safe way to avoid increasing the noise as the
number of queries increases

Applies to Non-Interactive Setting
 Any non-interactive solution permitting answers that are
“too accurate” to “too many” questions is vulnerable to the
DiNi attack.

This work : what if most responses have small error, but
some can be arbitrarily off?
n
 Real vector x2R
mxn
 Matrix A2R     with i.i.d. Gaussian entries
m
 Transmit codeword Ax2R
 Channel corrupts message. Receive y=Ax+e
 Decoder must reconstruct x, assuming e has small support
 small support: at most m entries of e are non-zero.

Encoder        Channel          Decoder
min support(e')               min |e'|1
such that                     such that

y=Ax'+e'                       y=Ax'+e'
n                              n
x' 2 R                         x' 2 R

solving this would give the   this is a linear program;
original message x.            solvable in poly time.
 Theorem [Donoho/ Candes-Rudelson-Tao-Vershynin]
For an error rate <1/2000, LP decoding succeeds in
recovering x (for m=4n).

 This talk: How large an error rate  can LP decoding
tolerate?
 Let * = 0.2390318914495168038956510438285657…

 Theorem 1: For any <*, there exists c such that if A has
i.i.d. Gaussian entries, and if
 A has m = cn rows
 For k=m, every support k vector ek satisfies|e – ek| < 
then LP decoding reconstructs x’ where |x’-x|2 is O( ∕ √n).

 Theorem 2: For any >*, LP decoding can be made to
fail, even if m grows arbitrarily.
 In the privacy setting:
Suppose, for <*, the curator
 answers (1- ) fraction of questions within error o(√n)
 answers  fraction of the questions arbitrarily.
Then the curator is blatantly non-private.

 Theorem 3: Similar LP decoding results hold when the
entries of A are randomly chosen from §1.

 Attack works in non-interactive setting as well.
 Also leads to error correcting codes over finite alphabets.
 Theorem 1: For any <*, there exists c such that if B has
i.i.d. Gaussian entries, and if
 B has M = (1 – c) N rows
N
 For k=m, for any vector x2R

then given Ax, LP decoding reconstructs x’ where

jx ¡ x 0j 2 ·   pC    inf x k :jx k j 0 ·   k   jx ¡ x k j 1
N
 Let * = 0.2390318914495168038956510438285657…

 Theorem 1 (=0): For any <*, there exists c such that if
A has i.i.d. Gaussian entries with m=cn rows, and if the
error vector e has support at most m, then LP decoding
accurately reconstructs x.

 Proof sketch…
 LP decoding is scale and
translation invariant
Ax’

 Thus, without loss of generality,
transmit x = 0
 Thus receive y = Ax+e = e           Ax         y

 If reconstruct z  0,
then |z|2 = 1

 Call such a z bad for A.
Proof:
 Any fixed z is very unlikely to be bad for A:
Pr[z bad] · exp(-cm)

n
 Net argument to extend to R :
Pr[9 bad z]· exp(-c’m)

Thus, with high probability, A is such that LP decoding never
fails.
0      e1    a1z
|Az – e|1 < |A0 – e|1   0   T e2     a2z
)        |Az – e|1 < |e|1        0      e3    a3z
.       .     .
.       .     .
 Let e have support T.                 .   Tc 0.     .
.       .     .
 Without loss of generality,           0      em    amz
e|T = Az|T              0      y=e   Az

 Thus z bad:
|Az|Tc < |Az|T
)        |Az|T > ½|Az|1
A i.i.d. Gaussian ) Each entry of Az is an i.i.d. Gaussian
Let W = Az; its entries W1,…Wm are i.i.d. Gaussians

z bad ) i 2 T |Wi| > ½i |Wi|
Recall: |T| · m
T

Define S(W) to be sum of magnitudes of the top  fraction
0
of entries of W
Thus z bad ) S(W) > ½ S1(W)
Few Gaussians with a lot of mass!
 Let us look at E[S]

 Let w* be such that
w*

E[S*]
 Let * = Pr[|W| ¸ w*]               E[S]     =½ E[S1]

 Then E[S*] = ½ E[S1]

 Moreover, for any  < *, E[S] · (½ – ) E[S1]
 S depends on many independent Gaussians.
 Gaussian Isoperimetric inequality implies:
With high probability, S(W) close to E[S].
S1 similarly concentrated.

 Thus Pr[z is bad] · exp(-cm)                     E[S*]
E[S]   =½ E[S1]
E[S*]
=½ E[S1] E[S]
Now E[S] > ( ½ + ) E[S1]

Similar measure concentration argument shows that any z is
bad with high probability.

Thus LP decoding fails w.h.p. beyond *

Donoho/CRTV experiments used random error model.
Compressed Sensing:
If x 2 RN is k-sparse
Take M ~ CklogN/krandom Gaussian measurements

Then L1 minimization recovers x.

For what k does this make sense (i.eM<N)?
k <*N≈ 0.239 N
How small can C be?
C>(*log1/*)–1 ≈ 2.02
 Tight threshold for Gaussian LP decoding
 To preserve privacy: lots of error in lots of answers.

 Similar results hold for +1/-1 queries.

 Inefficient attacks can go much further:
 Correct (½-) fraction of wild errors.
 Correct (1-) fraction of wild errors in the list decoding sense.
 Efficient Versions of these attacks?
 Dwork-Yekhanin: (½-) using AG codes.

```
To top