Docstoc

Privacy Preservation for Data Streams

Document Sample
Privacy Preservation for Data Streams Powered By Docstoc
					Privacy Preservation for Data Streams
         Feifei Li, Boston University

         Joint work with:
         Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and
          Ioana Stanoi (IBM T.J. Watson Research Center)
    Privacy Preservation for Data Streams




Application (1)

  Corp. A        P


  Corp. B        P                          Analytical Services



  Corp. C        P                               Finding trends,
                                               clusters, patterns,
                                                  aggregations.
                 Sensitive data
                                                                     2
     Privacy Preservation for Data Streams




Application (2)

                 Publish data                                  Client A
                 as a service

Corp. A                       Information Hub          P


                                                                Client B

                                             Subscribe data to identify
                                             trends, patterns, classes

                                                                     3
     Privacy Preservation for Data Streams




Target Application                           Identify
     value                                    trends
stream 1

                                                   time
     value
                                                             Cluster/
stream 2                                                  classification
                                                   time
     value
stream 3
                                                   time
     value
stream 4
                                                                           4
                                                   time
    Privacy Preservation for Data Streams




Problem Formulation
                          A1
                                    A1t
                        time
                          A2                 + ET  N             *
                                                                  A
                                                                  T N
                        time
                                            AT  N , T  [1, )
       ……..




                                      At  R     N

                         AN                       Online generated noise,
                                                  one vector at a time
                       time
              t
                                                                         5
    Privacy Preservation for Data Streams




Problem Formulation (continued)
                             Given σ2, obtain A* online,
                             s.t. D(A, A*) = σ2, and for
                        time given R, D(A, A ) is close
                                             ~

                             to σ2
                        time
     …….




                                                                    ~
                                            *
                                            A
                                            T N
                                                   x R            A T N



                                                                             ~
                                                         min R D ( AT  N , AT  N )
                              Offline and Online
                       time
                                                                               6
    Privacy Preservation for Data Streams




Data Perturbation
                                            Random i.i.d noise


                        time                             time


                        time      +                      time


                        time                             time


                       time                              time
  i.i.d: identical independently distributed
                                                                 7
    Privacy Preservation for Data Streams




Principal Component Analysis: PCA




                                             i.i.d
                                            Noise




                                                     8
    Privacy Preservation for Data Streams




Principal Component Analysis: PCA




                                            Correlate
                                             d Noise




                                                        9
      Privacy Preservation for Data Streams




 PCA Based Data Reconstruction
                                                                A: Original Data
                                                                A*: Perturbed Data
                                              A*                A~: Reconstructed Data
     Added Noise: Utility
                  σ2                               Removed Noise

                  A
                                                   Projection Error

Remaining Noise                           A~            Principal
                                                        Direction
               Privacy

                                                                                   10
      Privacy Preservation for Data Streams




 PCA Based Data Reconstruction
                                                           A: Original Data
                                                           A*: Perturbed Data
                                   Correlated Noise!
                                                           A~: Reconstructed Data


                   Added Noise: Utility
                      σ2
                  A                    A*
                                                   Projection Error

Remaining Noise                               A~ Principal
                                                 Direction
               Privacy

                                                                              11
    Privacy Preservation for Data Streams




Data Perturbation: main idea

   Observations
    – The amount of the random noise controls
      privacy/utility tradeoff
    – i.i.d (identical independently distributed) noise
      does not preserve the privacy! Not well enough

    Lesson learned
    – Noise should be correlated with original data
      • Z. Huang et al. Sigmod 05.
                                                          12
    Privacy Preservation for Data Streams




Challenge 1: Dynamic Correlation




                                            13
    Privacy Preservation for Data Streams




Challenge 1: Dynamic Correlation




                                            14
    Privacy Preservation for Data Streams




Challenge 2: Dynamic Autocorrelation




                                            15
    Privacy Preservation for Data Streams




Challenge 2: Dynamic Autocorrelation




                                            16
     Privacy Preservation for Data Streams




Online Random Noise for Autocorrelation: Stock




                                                 17
    Privacy Preservation for Data Streams




State of the Art

    Privacy Preservation
    – Given a utility requirement, maximize the privacy
    Existing Work (Z. Huang et al. Sigmod05)
    – Batch mode, static data
    – And many other works (see our paper for a
      detailed literature review)




                                                      18
        Privacy Preservation for Data Streams




 Adding Dynamic Correlated Noise
  A1                At      Et                           A~t


  A2                                            +

   A3
                                                          Publish
U3x3:    online estimation Update U Generate noise        A~t
of principal components             distributed along U
                        S. Papadimitriou et al. VLDB05          19
             Privacy Preservation for Data Streams




 Put it into Algorithm: Distribute Noise

            k=3, U: eigenvectors, V: eigenvalues


                                σ2                              σ2
  2   2 
                 V ( 2)
                  V
                                                U   T



                                        V (1)
                                                             Added to At
                           1   2          Rotate back to
                                         V
                                                data space
Noise distributed in principal
components’ subspace
                                                                           20
      Privacy Preservation for Data Streams




why is our algorithm better (state of the art)?
                      Removed noise
 Local principal      by online
 component            reconstruction          Local principal
                                              component

                                                                 Removed noise by
                                                                 online reconstruction


                                                                Noise added
                                                                along global
                                Global principal
                                                                PC -- offline
 Noise added
                                component
 along global
 PC -- offline
                                                                                21
     Privacy Preservation for Data Streams




Online Reconstruction vs. Offline Reconstruction

    Choice of adversary:
    – Offline reconstruction based on global principal
      components
    – Online tracking of the principal components and apply
      local reconstruction
    – Please see the details in the paper




                                                              22
    Privacy Preservation for Data Streams




Tracking Autocorrelation
                                                 h streams
 a=[1 2 3 4 5 6]T
                                                  123
         w1                                                  Time

                                            W=    234
             w2
               w3                                 345

                       w4                         456



                                                               23
    Privacy Preservation for Data Streams




Distribute Noise
   Avoid adding noise > allowed threshold!
   And still auto-correlated with the stream
                123
                                   Idea: constraint the next k
   W=           234                noise values based on
                                   previous h-k noises +
                345
                                   current estimation of U 
                456                becomes a linear system

                                                             24
    Privacy Preservation for Data Streams




Experiments

   Three Real Data Streams
   – Sensor streams, Lab: Light, Humidity,
    Volt, Temperature. 7712x198
   – Choroline environmental streams:
    4310x166
   – Stock streams: 8000x2


                                             25
     Privacy Preservation for Data Streams




Perturbation vs. Reconstruction

Perturbation         i.i.d-N        Offline-N   Online-N: SCAN / SACAN
Reconstruction Baseline Offline-R Online-R: SCOR / SACOR




              streaming auto-correlated additive noise
      noise correlated with global principal
      streaming correlated additive noise components
        perturbed data as the global principalreconstruction
 take streaming correlatedon reconstruction
           streaming auto-correlated online components
  offline-reconstruction based online reconstruction
noise (discrepancy) is represented by the relative
energy as percentage to the original data streams,
i.e., D(A, A*)/||A||

                                                                         26
    Privacy Preservation for Data Streams




Reconstruction Error: Online-R vs. Offline-R

10% noise
k=10




online reconstruction achieves better accuracy as
it minimizes the projection error            27
      Privacy Preservation for Data Streams




Reconstruction Error: vary k




1. online reconstruction achieves better accuracy
2. large k reduces projection error                 28
     Privacy Preservation for Data Streams




Privacy vs. Discrepancy, online-R: Lab data




                                              29
     Privacy Preservation for Data Streams




Privacy vs. Discrepancy, online-R: Choroline




                                               30
     Privacy Preservation for Data Streams




Online Random Noise for Autocorrelation: Choroline




                                                     31
     Privacy Preservation for Data Streams




Online Random Noise for Autocorrelation: Stock




                                                 32
     Privacy Preservation for Data Streams




Privacy vs. Discrepancy: Online-R (Choroline)




                                                33
     Privacy Preservation for Data Streams




Privacy vs. Discrepancy: Online-R (Stock)




                                             34
    Privacy Preservation for Data Streams




Running Time Analysis




                                            35
    Privacy Preservation for Data Streams




Running Time Analysis




                                            36
    Privacy Preservation for Data Streams




Future Work

   Combing correlation and autocorrelation
   Other type of data streams, other than
    numeric data, such as categorical data




                                              37
    Privacy Preservation for Data Streams




Questions

   Thank you!




                                            38

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:10
posted:11/5/2011
language:English
pages:38