PowerPoint Presentation by HC120807213935

VIEWS: 0 PAGES: 30

									On method-specific record linkage for risk
             assessment




                     Jordi Nin
                  Javier Herranz
                   Vicenç Torra
                   On method-specific record linkage for risk assessment
                                                                 Contents


 Disclosure Risk Scenario:
   How an intruder re-identifies an individual
 Preliminaries:
   Protection methods and Record Linkage
 Location record linkage:
   A new way to compute the disclosure risk
 Conclusions and future work:




                                                                        2
Disclosure Risk Scenario


Preliminaries


Location Record Linkage


Conclusions and future work




                              3
                           On method-specific record linkage for risk assessment
                                                        Disclosure Risk Scenario

                           Attribute classification


                 a
                 Marital               Identifiers: Passport number
    id    Sex              Income
                 status
n    1    Male   Single     13.500     Quasi-Identifiers: Age, postal code
     2    Male   Single     11.000
    ...    ...     …          …
                                       Confidential: Income
                 X




                                                                               4
               On method-specific record linkage for risk assessment
                                            Disclosure Risk Scenario

                 Re-identification scenario

X = id || Xnc || Xc                         X’ = X’nc || Xc

    Privacy is ensured, quasi-identifiers are anonymized

Data quality is preserved, confidential attributes are preserved




                                                                   5
           On method-specific record linkage for risk assessment
                                        Disclosure Risk Scenario

                    Record Linkage
     Data set 1                               Data set 2


    X1 X2 X3 X4                            X’1 X’2 X’3 X’4


   X1 X2 X3 X4                             X’1 X’2 X’3 X’4


    X1 X2 X3 X4                            X’1 X’2 X’3 X’4




Problem: Find a correct mapping between data file 1
                  and data file 2

                                                               6
                   On method-specific record linkage for risk assessment
                                                Disclosure Risk Scenario


      Distance based                           Probabilistic
      Record linkage                          Record linkage

• The nearest pairs of record
                                     • Linked pairs are computed
are considered as linked pairs
                                     using conditional probabilities
• It is very easy to tune
                                     • Tuning is difficult
• Results very dependent of
                                     • Few parameters
the parameters
                                     • High time cost
• Moderated time cost




                                                                       7
Disclosure Risk Scenario


Preliminaries


Location Record Linkage


Conclusions and future work




                              8
                     On method-specific record linkage for risk assessment
                                                               Preliminaries

                            Rank swapping - p

Algorithm
 For all attrj where 1 j  n
   Attrj is sorted
   all values xij are swapped with xil where i < l  l+p
   Sorting Attrj is reversed
 End for
End algorithm                                       Simple
                                              Preserve µ and 
                                       All combinations disappear

                                                                           9
          On method-specific record linkage for risk assessment
                                                    Preliminaries
          Rank swapping - p example

                      p = 20%

8    1
6    2
10   3
7    4
9    5
2    6
                                           QuickTime™ and a
                                      Photo - JPEG decompressor
1    7                              are neede d to see this picture.




4    8
5    9
3    10



                                                                       10
                                     On method-specific record linkage for risk assessment
                                                                               Preliminaries
                                             Microaggregation - k
    a             a               a          a
                                                                       k=3
k
k
k
k

               QuickTime™ and a
          Photo - JPEG decompressor
        are needed to see this picture.




                                               a = 1  Optimal
                                          a > 1, NP-Hard  Heuristic


                                                                                           11
                   On method-specific record linkage for risk assessment
                                                             Preliminaries

            Optimal univariate Microaggregation

 Result 1. When the elements are sorted according to an attribute,
for any optimal partition, the elements in each cluster are
contiguous (non overlapping clusters exist)

Result 2. All clusters of any optimal partition have between k and
2k-1 elements.
                           x1               Clusters are built using the
                                            nodes of the shortest path
              x2                                     algorithm
 k=2
                                  x4


                     x3


                                                                           12
       On method-specific record linkage for risk assessment
                                                 Preliminaries

        MDAV Microaggregation

k=2




X                                            X’


    MDAV is multivariate heuristic microaggegation



                                                             13
                          On method-specific record linkage for risk assessment
                                                                    Preliminaries

                   Score: Protection method evaluation

                               Score = 0.5 IL + 0.5 DR

IL = 100(0.2 IL1+0.2 IL2+0.2 IL3+0.2 IL4+0.2 IL5)   DR = 0.25 DLD+0.25 PLD+0.5 ID


IL1 = mean of absolute error                        DLD = number of links using DBRL

IL2 = mean variation of average                     PLD = number of links using PRL

IL3 = mean variation of variance                    ID = protected values near orginal

IL4 = mean variation of covariancie

IL5 = mean variation of correlation



                                                                                         14
Disclosure Risk Scenario


Preliminaries


Location Record Linkage


Conclusions and future work




                              15
        On method-specific record linkage for risk assessment
                                Location Problem Desciption

      L-RL: Location Record Linkage

Standard record linkage compares all records


 Rank swapping, univariate microaggregation
  and other methods only use some original
   records to create the protected data set




 It is unnecessary to compare all the records


                                                            16
       On method-specific record linkage for risk assessment
                                     Location record linkage

           Method Description




Xext                                         X’
                 QuickTime™ and a
            Photo - JPEG decompressor
           are needed to see this picture.




                                                           17
        On method-specific record linkage for risk assessment
                                      Location record linkage
          Example: Rank swapping

P=20%

                                                    Distance

                                                       17
                                                        6
                                                       13
                    QuickTime™ and a
               Photo - JPEG decompressor
             are neede d to see this picture.
                                                       14
                                                       16
                                                       19
                                                       12
                                                        5
                                                       16

                                                            18
                On method-specific record linkage for risk assessment
                                              Location record linkage

                Rank Swapping Experiments

Data sets:
      Census (1080 records & 13 attributes)
      EIA (4092 records & 10 attributes)


Rank swapping configurations:
      p = 2 … 20


Score modifications:
      DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
                                                                    19
    On method-specific record linkage for risk assessment
                                  Location record linkage

L-RL: Rank Swapping Linkage Results




               Para ver esta pelícu la, de be
              disponer de QuickTime™ y de
             un descompresor Photo - JPEG.




                                                        20
   On method-specific record linkage for risk assessment
                                 Location record linkage

L-RL: Rank Swapping Score Results




              Para ver esta pelícu la, de be
             disponer de QuickTime™ y de
            un descompresor Photo - JPEG.




                                                       21
                  On method-specific record linkage for risk assessment
                                                Location record linkage

             Univariate Microaggregation Experiments

Data sets:
      Census (1080 records & 13 attributes)
      EIA (4092 records & 10 attributes)


Univariate microaggregation configurations:
      k = 10 … 50


Score modifications:
      DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
                                                                      22
           On method-specific record linkage for risk assessment
                                         Location record linkage

L-RL: Univariate Microaggregation Linkage Results




                    Para ver esta película, deb e
                   disponer de QuickTime™ y de
                  un descompresor Photo - JPEG.




                                                               23
          On method-specific record linkage for risk assessment
                                        Location record linkage

L-RL: Univariate Microaggregation Score Results




                   Para ver esta pelícu la, de be
                  disponer de QuickTime™ y de
                 un descompresor Photo - JPEG.




                                                              24
                On method-specific record linkage for risk assessment
                                              Location record linkage

                       MDAV Experiments

Data sets:
      Census (1080 records & 13 attributes)
      EIA (4092 records & 10 attributes)


Univariate microaggregation configurations:
      k = 10 … 50


Score modifications:
      DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
                                                                    25
On method-specific record linkage for risk assessment
                              Location record linkage

L-RL: MDAV Linkage Results




          Para ver esta pelícu la, de be
         disponer de QuickTime™ y de
        un descompresor Photo - JPEG.




                                                    26
On method-specific record linkage for risk assessment
                              Location record linkage

L-RL: MDAV Score Results




            Para ver esta pelícu la, de be
           disponer de QuickTime™ y de
          un descompresor Photo - JPEG.




                                                    27
Disclosure Risk Scenario

Preliminaries

Location Problem Description

Location Record Linkage

Conclusions and future work




                               28
                On method-specific record linkage for risk assessment
                                        Conclusions and future work

                         Conclusions
• We have presented a new type of record linkage designed
to exploit the limitations of some protection methods

• L-RL method obtains a more accurate DR evaluation for
rank swapping and univariate microaggregation

• MDAV is immune to the location problem

                         Future work
• We plan to study the DR of MDAV and other protection
methods using other ad-hoc methods


                                                                    29
On method-specific record linkage for risk
             assessment




                     Jordi Nin
                  Javier Herranz
                   Vicenç Torra

								
To top