Docstoc

Privacy Preserving enhanced Collaborative Filtering

Document Sample
Privacy Preserving enhanced Collaborative Filtering Powered By Docstoc
					Privacy Preserving enhanced
    Collaborative Filtering
       Evan Xiang, HKUST
Motivation of Privacy Preservation
Facebook example
• Bob bought a gift for his wife for surprise
• Recommendation Systems recommend the
  same product to all of his friends on facebook,
  including his wife…
• -_-!
                      Outline
• Introduction to the Privacy issue in
  Collaborative Filtering

• Centralized Privacy Preservation
  • Wenliang Du(Syracuse University, NY)


• Distributed Privacy Preservation
  • John Canny(UC Berkeley)


• Discussion, Q&A
            Introduction to CF
• The task in collaborative filtering is to predict
  the utility of items to a particular user (the
  active user) based on a database of user votes
  from a sample or population of other users
  (the user database).

• Personalized Service <=> Privacy Preservation
       Collaborative filtering system
• Three main steps
  – Data Collections, Similarity Decisions, and
    Producing Recommendation
                calculate
                                   database
               similarities
                                  with ratings
                                                       music
  similarity                                                    user
                                                       player
    values
                predict
                missing        recommend
                ratings

                              server side        user side
Database with Collected Ratings
                             products
    Similarity Decisions and Recommendations
Memory Based Algorithms
• Utilize the entire user-item database to generate a prediction.
• Online computation
• Use statistical techniques to find the neighbors.
    – Pearson Correlation
    – Cosine Similarity.
Model based Algorithms
• First uses rating database to build a model (e.g. extract basic rating
  profiles)
• Offline computation
• Uses model for prediction
    – Singular value decomposition(SVD)
    – Principal component analysis(PCA)
    – Bayesian networks...
  Pearson Correlation as an example
Step 1: Determine similarities between users




Step 2: Make a prediction for the empty cell
                        Example
• Tea and coffee flavors
  – 4 users
  – 9 items (flavors)

            T1   T2     T3   C1   C2   C3   C4   C5   C6
   Wim      2    1      1    4         4    3    4    3
   Jan      1           1    5    5    4    4    3
   Pim      4    5      5    2         3    3    2
   Aukje    5    4           1    3    2         2
                                  Example
• Subtract averages
     flavor        T1     T2      T3     C1      C2      C3      C4     C5      C6
     Wim          -0.8 -1.8 -1.8         1.2            1.2     0.2     1.2    0.2
     Jan          -2.3           -2.3    1.7     1.7    0.7     0.7    -0.3
     Pim          0.6     1.6    1.6     -1.4           -0.4 -0.4 -1.4
     Aukje        2.2     1.2            -1.8    0.2    -0.8           -0.8

• Compute similarities, e.g.
0.6  2.2  1.6  1.2  ( 1.4)  ( 1.8)  ( 0.4)  ( 0.8)  ( 1.4)  ( 0.8)
                                                                                   0.83
        ( 4.8  1.4  3.2  0.6  0.6)(0.4  2.6  2.0  0.2  2.0)
                              Example
• Use similarities to predict missing ratings
          similarit        Wim       Jan       Pim      Aukje       T3
          y
          Wim                1       0.78     -0.96      -0.85     -1.8
          Jan              0.78        1      -0.74      -0.77     -2.3
          Pim              -0.96     0.74       1        0.83       1.6
          Aukje            -0.85     0.77      0.83        1
• Prediction for Aukje, tea T3
              ( 0.85)  ( 1.8)  ( 0.77)  ( 2.3)  0.83  1.6
      2.8                                                          4.7
                              0.85  0.77  0.83
                        Privacy Issues
• CF systems are threat to individual privacy
   –   Unsolicited Marketing
   –   Prying into personal lives.
   –   Profiling users to facilitate price discrimination by vendors
   –   Users’ profiles can be used in criminal case
   –   Collected data might be used for government survelliance.
• Customer data is an asset and can be sold (and has been sold).
• Collecting high-quality data is difficult because of privacy
  concerns.
                           So,
• The challenge:
  How can users contribute their private data for CF
  without compromising their privacy?
  – Providing sufficiently accurate predictions while
     • Nobody may know users’ ratings
     • Nobody may know who rated what
          Evaluation Metrics

• Privacy Gain

• Accuracy Loss




                               14
                            Privacy Gain
Privacy: How difficult is for the server to guess the users’ actual profiles,
having access to their online profiles
       actual profile of user ‘u’                      number of users
       online profile of user ‘u’                      rating frequency of item ‘i’


                                                                        Weight of items added
                                                                        by aggregation

                                                                        Weight of items in
                                                                        online profile



 Intuition: Structural difference of two graphs (online and actual)
 viewed as difference between correspondent edges
     R. Myers, R. C. Wilson, and E. R. Hancock. Bayesian graph edit distance.
                                                                                         15
     IEEE Trans. Pattern Anal. Mach. Intell., 22(6), 2000.
              Accuracy Loss
The bipartite graph that contains actual ratings

The bipartite graph available to the server




                                                   16
                   Outline
• Introduction to the Privacy issue in
  Collaborative Filtering

• Centralized Privacy Preservation
  – Wenliang Du(Syracuse University, NY)


• Distributed Privacy Preservation

• Conclusion
 Approach 1:Anonymous Techniques
• Revealing personal information without
  disclosing identities
• No guarantee on the quality of the database
  – A malicious user or a competing company make
    the database useless.
   Approach 2: Using Cryptography
• Paillier Public Cryptosystem
• Generate keys
  – choose large random primes p, q (private)
  – calculate n = pq and a ‘generator’ g (public)
• Encrypt message m by
                      (m)  g m r n mod n 2
  with r є      *
                     n
               Using Cryptography

 • Homomorphism properties

 (m1) (m2 )  g m1 m2 (r1r2 )n   (m1  m2 )
                                                   (mod n 2 )
 (m1)   m2
              g   m1m2
                          (r     )   (m1m2 )
                               m2 n
          Encrypted inner product
• User a: a  (a1,, ak )
• User b: b  (b1,, bk )
• User a encrypts vector and sends to b
           (a)  ( (a1 ),, (ak ))
• User b computes
         k              k                k
           (ai )    (ai bi )   (  ai bi )   (a  b)
                  bi
         i 1          i 1             i 1

  and sends back to a
• User a decrypts it to get inner product                a b
      Encrypted CF: correlation step
• Rewrite correlation as three inner products
                       (v ai  v a )(v bi  v b )
                  i Ia I b                                 wa  wb
    sab                                             
                  (v ai  v a )2  (v bi  v b )2       (rb  w a )(ra  w 2 )
                                                                 2
                                                                            b
              i Ia I b           i Ia I b


  where
              v  v a if item i is rated by user a
    w ai      ai
              0         otherwise
              1 if item i is rated by user a
    rai      
              0 otherwise

• Zeros to avoid contributions from                             i  Ia  I b      in
  sums
       Encrypted CF: correlation step
• Protocol
         active user            server          other users
                                     copy
   ( w a ), ( w a ), (ra )
                  2


         wa  wb
                                             ( w a  w b ), (rb  w a ), (ra  w 2 )
                                                                      2
                                                                                    b
    (rb  w a )(ra  w 2 )
            2
                       b




• Active user knows correlation values, but not to
  whom
• Server knows between whom, but not the
  correlation values
     Encrypted CF: prediction step
• Rewrite                         sab (v bi  v b )            sabw bi
                                bU i
                         va                            v a  bU
                                          sab                  sab rbi
                                        bU i                 bU


• Protocol
      active user                               server                other users
     ( sa ), ( s a )                                        split
                                                                       (sab ), ( sab )

            sabw bi                         (  sabw bi )             (sabw bi )
    v a  bU                                   bU
                                                                        ( sab rbi )
            sab rbi                         (  sab rbi )
          bU
                                                bU
  – each user b adds random factor
        Wim F.J. Verhaegh, Aukje E.M. van Duijnhoven, Pim Tuyls, and Jan Korst,
           Privacy Protection in Memory-Based Collaborative Filtering, Second
           European Symposium on Ambient Intelligence (EUSAI), 2004
     Encrypted CF: Extension to
          Sequential Data
• Problem: Sequential Data are held by various
  parties, even competing companies
  • Horizontal Partition: States are hold by different parties
  • Vertical Partition: Traces are separated by different parties


• Basic Idea: using Homomorphism properties
  • Setup key pairs
  • Separate and Exchange encrypted parameters
  • Aggregate encrypted parameters and data
                              Sahin Renckes, Huseyin Polat, Yusuf Oysal,
                           Providing predictions on distributed HMMs with
                            privacy. Artif. Intell. Rev. 28(4): 343-362 (2007)
            Approach 3
  Randomized Pertubation Techniques
                                     Huseyin Polat and Wenliang
                                       Du. Privacy-Preserving
        Collaborative Filtering       Collaborative Filtering. In
                                     the International Journal of
                                     Electronic Commerce (IJEC),
                                        pages 9-35. Volume 9,
                         Central      Number 4, Summer 2005
                         Database




+R1      +R2                      +Rn-1           +Rn

                            Usern-1          Usern
             Two Building Blocks

                         A’ = A+R = (a1+r1, …, an+rn)
1                        B’ = B+V = (b1+v1, …, bn+vn)


                         How to compute:
                                      n
                             A  B   ai bi
                                     i 1

n                                     n
                            SUM   ai
                                     i 1
    A’=A+R    B’ = B+V
          A Collaborative Filtering Scheme
Prediction for                                                        Weighted average of
active user on                    paq    va   a . paq
                                                                         preferences
    item q



                                         z-scores for
                                            item q                      Rating for
          n
                                                                        user i on
         w          ai    ziq                                          item q
 
paq     i 1
                n

               wai
                                        wai    z      ak   .z ik
                                                                              viq  vi
                                                                     z iq 
                                                 k
              i 1
                                                                                i
                                   Similarity weight
                                  between active user
                                      and user i
                        Dot Product
            n
A  B   ( ai  ri )(bi  vi )
           i 1

            n                n       n                 n
         ai bi   ai vi   ri bi   ri vi
           i 1             i 1    i 1              i 1
                n
          ab
            i 1
                    i   i                  Huseyin Polat and Wenliang
                                             Du. Privacy-Preserving
                                            Collaborative Filtering. In
                                           the International Journal of
      where mean=0                         Electronic Commerce (IJEC),
                                         pages 9-35. Volume 9, Number 4,
                                                  Summer 2005
              Sum of elements

 n                     n          n               n

 (a  r )   a   r   a
i 1
          i     i
                      i 1
                             i
                                 i 1
                                         i
                                                 i 1
                                                         i




       where mean=0
                                         Huseyin Polat and Wenliang
                                            Du. Privacy-Preserving
                                           Collaborative Filtering. In
                                         the International Journal of
                                      Electronic Commerce (IJEC), pages
                                          9-35. Volume 9, Number 4,
                                                 Summer 2005
              Conducting Collaborative Filtering

          n
                                             n          
        zak  zik   ziq           zak   zik  ziq 
paq  i 1 n k
                                   k        i 1       
                                                   n    
            zak  zik                  zak   zik 
               i 1   k                 k          i 1 
 Active user has zak
                                    Sum             Dot product
               sends            n                          n
 Server
              to active user
                               z
                               i 1
                                       ik    ziq   and   z      ik
                                                          i 1
                               for each k
                         Conclusion
            Cryptography vs. Randomized Perturbation

Randomization:                        Cryptographic:
   – Requires only one time              – Complex process
     random number addition
   – An approximation to the             – The precise output
     desired
   – the values of individual            – Protect everything as much as
     records are somewhat                  possible
     protected
   – Better efficiency, but privacy
     depends on data and                 – Better privacy, but efficiency
     predicate.                            depends on predicate.
   – people from the data mining         – small community
     community
                    Outline
• Introduction to the Privacy issue in
  Collaborative Filtering

• Centralized Privacy Preservation

• Distributed Privacy Preservation
  – John Canny(UC Berkeley)


• Discussion, Q&A
   Distributed data mining / secure multi-party
computation: The principle explained by secure sum

– Given a number of values x1,...,xn belonging to n
  entities
– compute S xi
– such that each entity ONLY knows its input and
  the result of the computation (The aggregate sum
  of the data)


 John Canny, Collaborative
 Filtering with Privacy, IEEE
Conf. on Security and Privacy,
    Oakland CA, May 2002
 Canny: Collaborative filtering with
              privacy
– Each user starts with their own preference data, and
  knowledge of who their peers are in their community.
– By running the protocol, users exchange various
  encrypted messages.
– At the end of the protocol, every user has an
  unencrypted copy of the linear model of the
  community’s preferences.
– They can then use this to extrapolate their own
  ratings
– At no stage does unencypted information about a
  user’s preferences leave their own machine.
– Users outside the community can request a copy of
  the model from any community member, and derive
  recommendations for themselves
                        SVD
• Singular value decomposition is an extremely
  useful tool for a lot of IR and data mining tasks (CF,
  clustering …)
• SVD for a matrix A is a factorization A = UDVT.
• If A encodes users x items, then VT gives us the
  best least-squares approximations to the rows of A
  in a user-independent way.
• ATAV = VD  SVD is an eigenproblem
                                SVD
• SVD for a matrix A is a factorization A = XYT= UDVT
• Given Y, Updating X:
   – D-1= ((D1/2VT) (VD1/2) )-1 = (YTY) -1
   – AY = XYT Y = (UD1/2) (D1/2VT) (VD1/2) = UD1/2 D = XD
   – X = XD D-1 = AY (YTY) -1
• Given X, Updating Y
   – D-1= ((D1/2UT) (UD1/2) )-1 = (XTX) -1
   – XA = XT XYT = (D1/2UT)(UD1/2 )( D1/2VT) = D D1/2VT = DYT
   – YT = D-1 DYT = (XTX)-1 XA
         Distributed SVD
               v

u1             A1*

u2             A2*

u3             A3*

                …

u(n-1)        A(n-1)*

un             An*
Distributed SVD




             John Canny, Collaborative
             Filtering with Privacy, IEEE
            Conf. on Security and Privacy,
                Oakland CA, May 2002
                    Outline
• Introduction to the Privacy issue in
  Collaborative Filtering

• Centralized Privacy Preservation

• Distributed Privacy Preservation

• Discussion, Q&A
        Other issues in Privacy
            Preservation
Where we need privacy preservation?
• E-commerce
• Mobile Search Engines
• Social Network Service
• Location based Service
• …

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:10/4/2011
language:English
pages:41