Protecting Statistical Databases Against Snoopers

Reviews
Protecting Statistical Databases Against Snoopers Comparison of two methods Disclosure vs. Anonymity  Information disclosure necessary for planning and numerical measurements  Anonymity necessary for protection of the individual and the public’s trust in systems Medical Data Necessary for:  Measuring effectiveness of current treatments  Finding sources of common medical mistakes  Tracking contagious disease  Government spending planning  Health Insurance Companies Anonymity: Not as Easy as it Looks Race Birth date Profession Zip code Sex Complete Identification Without Uniquely Identifying Information Outside Factors Affecting Privacy Snooper’s supplementary knowledge  Public data sources  Rarity  Comparing Two Methods of Protection  What are the privacy guarantees?  Can useful information be gained? Sensitivity-based Noise-adding Algorithm    Proposed by Dwork, McSherry, Nissim and Smith Adds noise to each answer based on the sensitivity of the series of queries Amount of privacy based on ε, a coefficient in the noisegenerating formula Sensitivity How much could changing one row change an answer?    MEAN COUNT HISTOGRAMS The sensitivity of a series of queries is the sum of the sensitivities of the queries Coin-flip Algorithm Proposed by Mishra and Sandler  A way for individuals to publish their own personal data  Amount of privacy based on ε, the bias in the coin-flip  Implementing the Coin-flip Algorithm     Each of the k possible answers to a query are ordered and numbered If an individual’s answer to the query is the ith answer, the profile would be a string of k bits where the ith is a one and the others are zero To sanitize, each bit is flipped with probability ½ + ε/2 All sanitized profiles resemble a random string of ones and zeros Example: HIV status      Ordered possible responses: “POSITIVE, NEGATIVE, UNKNOWN” The original profile of an HIV+ individual: “1, 0, 0” Results of coin-flips: “STAY, FLIP, STAY” Resulting sanitized profile: “1, 1, 0” What do we know about the individual from the sanitized profile? My Research    Compare the total amount of error generated by histogram / frequency queries Hypothesis: The noise-adding algorithm will generate less error for few queries and the coinflip algorithm will generate less error for many queries Research question: Where is the “sweet spot” where the error lines cross on a graph? Sum of Error 4000.00% 3500.00% 3000.00% sum of error as percent of n 2500.00% 2000.00% Coinflip Noise Additio 1500.00% 1000.00% 500.00% 0.00% 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 Number of Frequency Queries  The “sweet spot” first occurs at 101 queries. Sum of Error 4000.00% 3500.00% 3000.00% sum of error as percent of n 2500.00% 2000.00% Coinflip Noise Additio 1500.00% 1000.00% 500.00% 0.00% 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 Number of Frequency Queries  With the smallest histograms first, the first “sweet spot” occurs at 32 queries. Sum of Error 4000.00% 3500.00% 3000.00% sum of error as percent of n 2500.00% 2000.00% Coinflip Noise Addition 1500.00% 1000.00% 500.00% 0.00% 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 Number of Frequency Queries  With the largest histograms first, the first “sweet spot” occurs at 189 queries. A Second Look  Sum of Error 4000.00% 3500.00% 3000.00% Range of sensitivity: 2 to 136 Unordered histograms:  At sum of error as percent of n 2500.00% 2000.00% Coinflip Noise Addition 1500.00%  1000.00% 500.00% 0.00% 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 Number of Frequency Queries first “sweet spot”, sensitivity= 30. Sum of Error 4000.00% 3500.00% 3000.00% sum of error as percent of n 2500.00% 2000.00% Coinflip Noise Addition  Smallest histograms first:  At 1500.00% 1000.00% 500.00% first “sweet spot”, sensitivity= 32. 0.00% 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 Number of Frequency Queries Sum of Error 4000.00% 3500.00% 3000.00% sum of error as percent of n 2500.00%  Coinflip Noise Addition Largest histograms first:  At 2000.00% 1500.00% 1000.00% first “sweet spot”, sensitivity= 34. 500.00% 0.00% 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 Number of Frequency Queries Difference in Error 1600.00% 1400.00% 1200.00% Difference in percent error 1000.00% 800.00% 600.00% 400.00% 200.00% 0.00% 2 -200.00% Sensitivity 12 22 32 42 52 62 72 82 92 Conclusions  For histogram / frequency queries, “sweet spots” occur between sensitivity=30 and sensitivity=40, so for least error:  If sensitivity < 30, use NOISE-ADDING algorithm  If sensitivity > 40, use COIN-FLIP algorithm Quick Bibliography  Survey: N R Adam and J C Wortmann. Security-control methods for statistical databases: a comparative study. ACM Computing Surveys, 25(4), December 1989. Dwork, F McSherry, K Nissim, A Smith. Calibrating noise to sensitivity in private data analysis. 3rd Theory of Cryptography Conference, 2006. Mishra, M Sandler. Symposium on Principles of Database Systems, 2006.  Noise-adding algorithm: C  Coin-flip algorithm: N  Professor Nina Mishra, PhD  Professor Alf Weaver, PhD  REU program at UVa, sponsored by the National Science Foundation

Related docs
premium docs
Other docs by One Seven
CorpDocs-Articles of Incorporation California
Views: 294  |  Downloads: 10
Board Resolution to Acquire a Company
Views: 253  |  Downloads: 4
Board First Meeting Minutes California
Views: 275  |  Downloads: 12
Special Power of Attorney
Views: 823  |  Downloads: 31
Profit Sharing Retirement Plan
Views: 386  |  Downloads: 5
Non-Discrimination Policy
Views: 707  |  Downloads: 15
Ingram Micol Inc Ammendments and Bylaws
Views: 115  |  Downloads: 0
Service providers business plan financials
Views: 1021  |  Downloads: 183
Call Option Agreement - eBay Inc and iBazar SA
Views: 266  |  Downloads: 11