VIEWS: 5 PAGES: 22 POSTED ON: 1/20/2011
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] Compressed Sensing: If x 2 RN is k-sparse Take M ~ CklogN/krandom Gaussian measurements Then L1 minimization recovers x. For what k does this make sense (i.eM<N)? How small can C be? Privacy motivation Coding setting Results Proof Sketch Database of information about individuals E.g. Medical history, Census data, Customer info. Need to guarantee confidentiality of individual entries Want to make deductions about the database; learn large scale trends. E.g. Learn that drug V increases likelihood of heart disease Do not leak info about individual patients Curator Analyst Simple Model (easily justifiable) Database: n-bit binary vector x Query: vector a True answer: Dot product ax Response is ax+e=True Answer + Noise Blatant Non-Privacy: Attacker learns n−o(n) bits of x. Theorem: If all responses are within o(√n) of the true answer, then the algorithm is blatantly non-private even against a polynomial time adversary asking O(nlog2n) random questions. Privacy has a Price There is no safe way to avoid increasing the noise as the number of queries increases Applies to Non-Interactive Setting Any non-interactive solution permitting answers that are “too accurate” to “too many” questions is vulnerable to the DiNi attack. This work : what if most responses have small error, but some can be arbitrarily off? n Real vector x2R mxn Matrix A2R with i.i.d. Gaussian entries m Transmit codeword Ax2R Channel corrupts message. Receive y=Ax+e Decoder must reconstruct x, assuming e has small support small support: at most m entries of e are non-zero. Encoder Channel Decoder min support(e') min |e'|1 such that such that y=Ax'+e' y=Ax'+e' n n x' 2 R x' 2 R solving this would give the this is a linear program; original message x. solvable in poly time. Theorem [Donoho/ Candes-Rudelson-Tao-Vershynin] For an error rate <1/2000, LP decoding succeeds in recovering x (for m=4n). This talk: How large an error rate can LP decoding tolerate? Let * = 0.2390318914495168038956510438285657… Theorem 1: For any <*, there exists c such that if A has i.i.d. Gaussian entries, and if A has m = cn rows For k=m, every support k vector ek satisfies|e – ek| < then LP decoding reconstructs x’ where |x’-x|2 is O( ∕ √n). Theorem 2: For any >*, LP decoding can be made to fail, even if m grows arbitrarily. In the privacy setting: Suppose, for <*, the curator answers (1- ) fraction of questions within error o(√n) answers fraction of the questions arbitrarily. Then the curator is blatantly non-private. Theorem 3: Similar LP decoding results hold when the entries of A are randomly chosen from §1. Attack works in non-interactive setting as well. Also leads to error correcting codes over finite alphabets. Theorem 1: For any <*, there exists c such that if B has i.i.d. Gaussian entries, and if B has M = (1 – c) N rows N For k=m, for any vector x2R then given Ax, LP decoding reconstructs x’ where jx ¡ x 0j 2 · pC inf x k :jx k j 0 · k jx ¡ x k j 1 N Let * = 0.2390318914495168038956510438285657… Theorem 1 (=0): For any <*, there exists c such that if A has i.i.d. Gaussian entries with m=cn rows, and if the error vector e has support at most m, then LP decoding accurately reconstructs x. Proof sketch… LP decoding is scale and translation invariant Ax’ Thus, without loss of generality, transmit x = 0 Thus receive y = Ax+e = e Ax y If reconstruct z 0, then |z|2 = 1 Call such a z bad for A. Proof: Any fixed z is very unlikely to be bad for A: Pr[z bad] · exp(-cm) n Net argument to extend to R : Pr[9 bad z]· exp(-c’m) Thus, with high probability, A is such that LP decoding never fails. z bad: 0 e1 a1z |Az – e|1 < |A0 – e|1 0 T e2 a2z ) |Az – e|1 < |e|1 0 e3 a3z . . . . . . Let e have support T. . Tc 0. . . . . Without loss of generality, 0 em amz e|T = Az|T 0 y=e Az Thus z bad: |Az|Tc < |Az|T ) |Az|T > ½|Az|1 A i.i.d. Gaussian ) Each entry of Az is an i.i.d. Gaussian Let W = Az; its entries W1,…Wm are i.i.d. Gaussians z bad ) i 2 T |Wi| > ½i |Wi| Recall: |T| · m T Define S(W) to be sum of magnitudes of the top fraction 0 of entries of W Thus z bad ) S(W) > ½ S1(W) Few Gaussians with a lot of mass! Let us look at E[S] Let w* be such that w* E[S*] Let * = Pr[|W| ¸ w*] E[S] =½ E[S1] Then E[S*] = ½ E[S1] Moreover, for any < *, E[S] · (½ – ) E[S1] S depends on many independent Gaussians. Gaussian Isoperimetric inequality implies: With high probability, S(W) close to E[S]. S1 similarly concentrated. Thus Pr[z is bad] · exp(-cm) E[S*] E[S] =½ E[S1] E[S*] =½ E[S1] E[S] Now E[S] > ( ½ + ) E[S1] Similar measure concentration argument shows that any z is bad with high probability. Thus LP decoding fails w.h.p. beyond * Donoho/CRTV experiments used random error model. Compressed Sensing: If x 2 RN is k-sparse Take M ~ CklogN/krandom Gaussian measurements Then L1 minimization recovers x. For what k does this make sense (i.eM<N)? k <*N≈ 0.239 N How small can C be? C>(*log1/*)–1 ≈ 2.02 Tight threshold for Gaussian LP decoding To preserve privacy: lots of error in lots of answers. Similar results hold for +1/-1 queries. Inefficient attacks can go much further: Correct (½-) fraction of wild errors. Correct (1-) fraction of wild errors in the list decoding sense. Efficient Versions of these attacks? Dwork-Yekhanin: (½-) using AG codes.