Docstoc

wright

Document Sample
wright Powered By Docstoc
					Privacy in Today’s World:
Solutions and Challenges

         Rebecca Wright
  Stevens Institute of Technology

           26 June, 2003
Talk Outline
• Overview of privacy landscape

• Privacy-preserving data mining

• Privacy-protecting statistical analysis of large
  databases

• Selective private function evaluation

• Conclusions
Erosion of Privacy
 “You have zero privacy. Get over it.”
                       - Scott McNealy, 1999

• Changes in technology are making privacy harder.
  – reduced cost for data storage
  – increased ability to process lots of data

• Increased need for security may make privacy seem
less critical.
Historical Changes
• Small towns, little movement:
  – very little privacy, social mechanisms helped
   prevent abuse

• Large cities, increased movement:
  – lost social mechanisms, but gained privacy
   through anonymity

• Now:
  – advancing technology is reducing privacy,
   social mechanisms not replaced.
What Can We Do?
• Use technology, policy, and education to
  – maintain/increase privacy
  – provide new social mechanisms
  – create new mathematical models for better
    understanding
Problem: Using old models and old modes of thought
in dealing with situations arising from new
technology.
What is Privacy?
• May mean different things to different people
  – seclusion: the desire to be left alone
  – property: the desire to be paid for one’s data
  – autonomy: the ability to act freely

• Generally: the ability to control the dissemination
  and use of one’s personal information.
Privacy of Data
• Stored data
  – encryption, computer security, intrusion
   detection, etc.

• Data in transit
  – encryption, network security, etc.

• Release of data
  – current privacy-oriented work: P3P, privacy
   bird, EPA, Internet Explorer 6.0, etc.
Internet Explorer V.6
Block All Cookies          Not usable

High                       Reasonable range of
                           behavior, blocking some
…                          cookies based on their
                           privacy policies
Low

Accept All Cookies         No privacy

Fairly simple, deals only with cookies, limited info
Different Types of Data
• Transaction data
  – created by interaction between stakeholder and
    enterprise
  – current privacy-oriented solutions useful
• Authored data
   – created by stakeholder
   – digital rights management (DRM) useful
• Sensor data
   – stakeholders not clear at time of creation
   – presents a real and growing privacy threat
Product Design as Policy Decision
• product decisions by large companies or public
  organizations become de facto policy decisions

• often such decisions are made without conscious
  thought to privacy impacts, and without public
  discussion

• this is particularly true in the United States, where
  there is not much relevant legislation
Example: Metro Cards
Washington, DC              New York City

  - no record kept of per     - transactions recorded by
    card transactions            card ID

  - damaged card can be       - damaged card can be
    replaced if printed         replaced if card ID still
    value still visible         readable

                              - have helped find
                                suspects, corroborate
                                alibis
Privacy Tradeoffs?
• Privacy vs. security: maybe, but doesn’t mean
                                                      
  giving up one gets the other (who is this person?       is
  this a dangerous person?)

• Privacy vs. usability: reasonable defaults, easy and
  extensive customizations, visualization tools

Tradeoffs are to cost or power, rather than inherent
conflict with privacy.
Surveillance and Data Mining
• Analyze large amounts of data from diverse
  sources.
• Law enforcement and homeland security:
  – detect and thwart possible incidents before they occur
  – identify and prosecute criminals after incidents occur

• Companies like to do this, too.

  – Marketing, personalized customer service
Privacy-Preserving Data Mining
   Allow multiple data holders to collaborate to
   compute important (e.g. security-related)
   information while protecting the privacy of other
   information.




   Particularly relevant now, with increasing focus on
   security even at the expense of privacy (e.g. TIA).
Advantages of privacy protection
 • protection of personal information

 • protection of proprietary or sensitive
   information

 • fosters collaboration between different
   agencies (since they may be more willing to
   collaborate if they need not reveal their
   information)
Cryptographic Approach

• Using cryptography, provably does not reveal
  anything except output of computation.

  – Privacy-preserving computation of decision trees [LP00]

  – Secure computation of approximate Hamming distance
    of two large data sets [FIMNSW01]

  – Privacy-protecting statistical analysis [CIKRRW01]

  – Privacy-preserving association rule mining [KC02]
Randomization Approach
• Randomizes data before computation (which can
  then either be distributed or centralized).

• Induces a tradeoff between privacy and
  computation error.

  – Distribution reconstruction algorithm from randomized
    data [AS00]

  – Association rule mining [ESAG02]
Comparison of Approaches


                cryptographic approach
 inefficiency
                              privacy loss


                        randomization approach
        inaccuracy
Comparison of Approaches



 inefficiency
                cryptographic approach
                               privacy loss


                        randomization approach
        inaccuracy
Privacy-Protecting Statistics [CIKRRW01]
          CLIENT                   SERVERS
           Wishes to                Each holds
            compute               large database
          statistics of
         servers’ data

• Parties communicate using
  cryptographic protocols designed so that:
  – Client learns desired statistics, but learns nothing
    else about data (including individual values or
    partial computations for each database)
  – Servers do not learn which fields are queried, or
    any information about other servers’ data
  – Computation and communication are very efficient
Non-Private and Inefficient Solutions

• Database sends client entire database (violates
  database privacy)

• For sample size m, use SPIR to learn m values
  (violates database privacy)

• Client sends selections to database, database does
  computation (violates client privacy)

• General secure multiparty computation (not
  efficient for large databases)
Secure Multiparty Computation
• Allows k players to privately compute a function f
  of their inputs.   P1
                                   P2
          Pk




• Overhead is polynomial in size of inputs and
  complexity of f [Yao, GMW, BGW, CCD, ...]
Symmetric Private Information
Retrieval
• Allows client with input i to interact with database
  server with input x to learn (only) x i
              Client           Server
          i                         x  x1 ,..., xn
              Learns x i

• Overhead is polylogarithmic in size of database x
  [CMS,GIKM]
Homomorphic Encryption
• Certain computations on encrypted messages
  correspond to other computations on the cleartext
  messages.

• For additive homomorphic encryption,

  – E(m1) • E(m2) = E (m1+ m2)

  – also implies E(m)x = E(mx)

• Paillier encryption is an example.
 Privacy-Protecting Statistics Protocol

 • To learn mean and variance: enough to learn sum
   and sum of squares.

 • Server stores:
                    x1 x 2     ...             xn
 ( zi  x )
         2
         i          z1 z 2     ...             zn
          and responds to queries from both
• efficient protocol for sum         efficient protocol for
                                     mean and variance
 Weighted Sum
Client wants to compute selected linear combination
of m items:  j 1 w j xi
                m
                                         j




             Client                              Server
Homomorphic encryption E, D
      w j if i  i                              computes
 i               j
                      E (1 ),..., E ( n )
      0 o/w
                                                 v  i 1 ( E ( i ) )
                                                          n           xi



                                                  E (i 1 i xi )
 decrypts to obtain                          v            n


            i xi   j 1 w j xi
     n                  m
     i 1                            j
Efficiency

• Linear communication and computation (feasible
  in many cases)

• If n is large and m is small, would like to do
  better
Selective Private Function Evaluation

• Allows client to privately compute a function f
  over m inputs xi1 ,  , xim

• client learns only f ( xi1 , , xim )

• server does not learn i1 ,..., im

Unlike general secure multiparty computation, we
 want communication complexity to depend on m,
 not n. (More accurately, polynomial in m,
 polylogarithmic in n).
Security Properties
• Correctness: If client and server follow the
  protocol, client’s output is correct.
• Client privacy: malicious server does not learn
  client’s input selection.
• Database privacy:
  – weak: malicious client learns no more than
   output of some m-input function g
  – strong: malicious client learns no more than
   output of specified function f
Solutions based on MPC
• Input selection phase:

  – server obtains blinded version of each xi j
• Function evaluation phase

  – client and server use MPC to compute f on the m
    blinded items
Input selection phase
 Client                                   Server
                                         Homomorphic encryption D,E
                                         Computes encrypted database
 Retrieves
 ) mix ( E ,...,) 1ix ( E                           E ( x1 ) ...   E ( xn )
 using SPIR                 SPIR(m,n), E

 Picks random
  c1 ,..., cm               ) j c  j ix (E
 computes
       ji
                                              Decrypts received values:
                                              s j  xi j  c j
  ) c  x (E    j
Function Evaluation Phase
• Client has c    c1 ,..., cm
• Server has s    s1 ,..., sm                 s j  xi j  c j

Use MPC to compute:
   ) mx ,..., 1x ( f  )c  s ( f  )s ,c( g

• Total communication cost polylogarithmic in n,
polynomial in m, | f |
Distributed Databases
• Same approach works to compute function over a
  distributed database.

  – Input selection phase done in parallel with each
   database server

  – Function evaluation phase done as single MPC

  – Database privacy means only final outcome is
   revealed to client.
Performance
    Complexity                       Security
 1 mSPIR(n,1,k) + O(k|f|)           Strong
 2 mSPIR(n,1,1) + MPC(m,|f|)        Weak
 3 SPIR(n,m,log n) + MPC(m,|f|) + km2 Weak
 4 SPIR(n,m,k) + MPC(m,|f|)          Honest client
                                     only

    Current experimentation to understand whether
these methods are efficient in real-world settings.
Initial Experimental Results
• Initial implementation of linear computation and
  communication solution [H. Subramaniam & Z. Yang]
  – implementation in Java and C++
  – uses Paillier encryption
  – uses synthetic data, with client and server as separate
    processes on the same machine (2-year old Toshiba
    laptop).
Initial Experimental Results
                 30

                 25
Time (minutes)




                 20

                 15                                                     Total time

                 10

                 5

                 0
                      0   20,000   40,000   60,000   80,000   100,000
                                    Database size
Initial Experimental Results
                 30

                 25
Time (minutes)




                 20
                                                                        Total time
                 15
                                                                        Encryption
                 10

                 5

                 0
                      0   20,000   40,000   60,000   80,000   100,000
                                    Database size
Conclusions
• Privacy is in danger, but some important progress
  has been made.
• Important challenges ahead:
  – Usable privacy solutions (efficiency and user interface)
  – Sensor data
  – Better use of hybrid approach: decide what can safely
    be disclosed, what needs moderate protection, and use
    cryptographic protocols to protect most critical
    information.
  – Mathematical/formal models to understand and
    compare different solutions.
Research Directions
• Investigate integration of cryptographic approach
  and randomization approach:
  – seek to maintain strong privacy and accuracy of
    cryptographic approach, ...
  – while benefitting from improved efficiency of
    randomization approach

• Understand mathematically what the resulting
  privacy and/or accuracy compromises are.

• Technology, policy, and education must work
  together.

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:7
posted:4/9/2011
language:English
pages:39