Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Privacy

VIEWS: 0 PAGES: 34

  • pg 1
									Distributing Data for Secure Data Services


     Vignesh Ganapathy, Dilys Thomas, Tomas Feder,
          Hector Garcia Molina, Rajeev Motwani
                      April 8th, 2011

                 Stanford, TRDDC, TRUST
RoadMap

Motivation for Secure Databases
Column level distribution
     Encryption, Distribution
     Privacy constraints
     Set cover initialization
Query Mediation
     Cost estimation
     Where and Select clause processing
     Query decomposition
Experiments
Related Work
Motivation 1: Data Privacy in Enterprises



                                                     Health
                                            Personal medical details
                                                Disease history
                                             Clinical research data             Govt. Agencies
                   Banking
                Bank statement                                                    Census records
                  Loan Details                                                   Economic surveys
               Transaction history                                               Hospital Records



    Finance
Portfolio information
                                                                                              Manufacturing
    Credit history
                                                                                                Process details
Transaction records
                                                                                                   Blueprints
 Investment details
                                                                                                Production data




                                                                                Outsourcing
                         Insurance
                                                                            Customer data for testing
                        Claims records
                                                                            Remote DB Administration
                        Accident history
                                                                                 BPO & KPO
                         Policy details       Retail Business
                                                 Inventory records
                                           Individual credit card details
                                                       Audits
Motivation 2: Government Regulations


           Country                            Privacy Legislation

   Australia         Privacy Amendment Act of 2000

   European Union    Personal Data Protection Directive 1998



   Hong Kong         Personal Data (Privacy) Ordinance of 1995



   United Kingdom    Data Protection Act of 1998



   United States     Security Breach Information Act (S.B. 1386) of 2002
                     Gramm-Leach-Bliley Act of 1999
                     Health Insurance Portability and Accountability Act of 1996
Motivation 3: Personal Information


Emails
Searches on Google/Yahoo
Profiles on Social Networking sites
Passwords / Credit Card / Personal information at multiple E-
  commerce sites / Organizations
Documents on the Computer / Network
Losses due to Lack of Privacy: ID-Theft




   • 3% of households in the US affected by ID-Theft

   • US $5-50B losses/year

   • UK £1.7B losses/year

   • AUS $1-4B losses/year
Data Privacy


Value disclosure: What is the value of attribute salary of person X
    Perturbation
         Privacy Preserving OLAP
Identity disclosure: Whether an individual is present in the database table
    Randomization, K-Anonymity etc.
         Data for Outsourcing / Research
Linkage disclosure: Linking columns from multiple sites
RoadMap

Motivation for Secure Databases
Column level distribution
     Encryption, Distribution
     Privacy constraints
     Set cover initialization
Query Mediation
     Cost estimation
     Where and Select clause processing
     Query decomposition
Experiments
Related Work
Masketeer: A tool for data privacy




  Lodha, Patwardhan, Roy, Sundaram etal.
Two Can Keep a Secret:
A Distributed Architecture for Secure Database Services



  How to distribute data across multiple sites for
  (1)redundancy and
  (2) privacy so that a single
  site being compromised does not lead to data loss


              Aggarwal, Bawa, Ganesan, Garcia-Molina, Kenthapadi,
                        Motwani, Srivastava, Thomas, Xu
                                  CIDR 2005
Motivation

 • Data outsourcing growing in popularity
   – Cheap, reliable data storage and management
       • 1TB $399 à < $0.5 per GB
       • $5000 – Oracle 10g / SQL Server
       • $68k/year DBAdmin
 • Privacy concerns looming ever larger
    – High-profile thefts (often insiders)
       •   UCLA lost 900k records
       •   Berkeley lost laptop with sensitive information
       •   Acxiom, JP Morgan, Choicepoint
       •   www.privacyrights.org
Present solutions

Application level: Salesforce.com
     On-Demand Customer Relationship Management
     $65/User/Month ---- $995 / 5 Users / 1 Year

Amazon Elastic Compute Cloud
    1 instance = 1.7Ghz x86 processor, 1.75GB RAM, 160GB local disk, 250 Mb/s
         network bandwidth
     Elastic, Completely controlled, Reliable, Secure
    $0.10 per instance hour
    $0.20 per GB of data in/out of Amazon
    $0.15 per GB-Month of Amazon S3 storage used
Google Apps for your domain
     Small businesses, Enterprise, School, Family or Group
Encryption Based Solution


                            Encrypt
        Client                                      DSP



     Query Q                        Q’
                    Client-side
                    Processor
     Answer                       “Relevant Data”

                 Problem: Q’ » “SELECT *”
The Power of Two




      Client       DSP1




                   DSP2
The Power of Two



                           Q1             DSP1
 Query Q
             Client-side
             Processor

                           Q2
                                          DSP2


     Key: Ensure Cost (Q1)+Cost (Q2) » Cost (Q)
SB1386 Privacy

{ Name, SSN},
  { Name, LicenceNo}
  { Name, CaliforniaID}
  { Name, AccountNumber}
  { Name, CreditCardNo, SecurityCode}
    are all to be kept private.
A set is private if at least one of its elements is “hidden”.
    Element in encrypted form ok
Techniques

Vertical Fragmentation
     Partition attributes across R1 and R2
     E.g., to obey constraint {Name, SSN},
         R1 ¬ Name, R2 ¬ SSN
     Use tuple IDs for reassembly. R = R1 JOIN R2
Encoding
   One-time Pad
     For each value v, construct random bit seq. r
     R1 ¬ v XOR r, R2 ¬ r
   Deterministic Encryption
     R1 ¬ EK (v) R2 ¬ K
     Can detect equality and push selections with equality predicate
   Random addition
     R1 ¬ v+r , R2 ¬ r
     Can push aggregate SUM
Example

An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone,
    ZipCode}
Privacy Constraints
     {Telephone}, {Email}
     {Name, Salary}, {Name, Position}, {Name, DoB}
     {DoB, Gender, ZipCode}
     {Position, Salary}, {Salary, DoB}
Will use just Vertical Fragmentation and Encoding.
Decomposed Schema
R1:{TID, Name, Email, Telephone, Gender, Salary}
R2:{TID, Name, Email, Telephone, DoB, Position,ZipCode}
Encrypted Attributes E: {Telephone, Email, Name}
Partitioning, Execution

• Partitioning Problem
   – Partition to minimize communication cost for
     given workload
   – Even simplified version              hard to
     approximate
   – Hill Climbing algorithm after starting with
     weighted set cover
• Query Reformulation and Execution
   – Consider only centralized plans
   – Algorithm to partition select and where clause
     predicates between the two partitions
Set Cover+ Greedy for partitioning
RoadMap

Motivation for Secure Databases
Column level distribution
     Encryption, Distribution
     Privacy constraints
     Set cover initialization
Query Mediation
     Cost estimation
     Where and Select clause processing
     Query decomposition
Experiments
Related Work
Cost Estimation
State Definitions



 •   0: condition clause cannot be pushed to either servers
 •   1: condition clause can be pushed to Server 1
 •   2: condition clause can be pushed to Server 2
 •   3: condition clause can be pushed to both servers
 •   4: condition clause can be pushed to either servers
OR State Evaluation
AND State Evaluation
Query Partitioning
                           Original Query
               SELECT Name, DoB, Salary
               FROM R WHERE
               (Name =’Tom’ AND Position=’Staff’) AND
               (Zipcode =’94305’ OR Salary > 60000)
                                   R1:
 R1:{TID, Name, Email, Telephone,Gender, Salary}
 R2:{TID, Name, Email, Telephone, DoB, Position,Zipcode}

• Query 1:                                  • Query 2:
  SELECT TID, name,                           SELECT TID, dob, zipcode
   salary                                     FROM R2
  FROM R1                                     WHERE Position=’Staff’
  WHERE Name=’Tom’
Distributed Query Plan
RoadMap

Motivation for Secure Databases
Column level distribution
     Encryption, Distribution
     Privacy constraints
     Set cover initialization
Query Mediation
     Cost estimation
     Where and Select clause processing
     Query decomposition
Experiments
Related Work
Number of Iterations
Perfomance Gain Experiment
Iterations Vs Privacy Constraints
Papers

[CIDR05]Two Can Keep A Secret.
[PAIS11] Distributing Data for Secure Databases.
[SIGMOD05] Privacy Preserving OLAP.
[ICDT05]Anonymizing Tables.
[PODS06]Clustering For Anonymity.
[KDD07] Probabilistic Anonymity.
Acknowledgements: Collaborators

Stanford Privacy Group




TRDDC Privacy Group




PORTIA, TRUST, Google
Back Up slides




March 18, 2011

								
To top