Privacy Preserving Set Intersection Based on Bilinear Groups by alendar


More Info
									     Privacy Preserving Set Intersection Based on Bilinear Groups
                                      Yingpeng Sang                    Hong Shen

                                            School of Computer Science
                                            The University of Adelaide,
                                     Adelaide, South Australia 5005, Australia,
                              Email: {yingpeng.sang, hong.shen}

Abstract                                                         Set Intersection (PPSI), in which there are N (N ≥ 2)
                                                                 parties, each party Pi (i = 1, ..., N ) has a set (or mul-
We propose a more efficient privacy preserving set                 tiset) Ti and |Ti | = S, each party wants to learn the
intersection protocol which improves the previously              intersection T I = T1 ∩ ... ∩ TN , without gleaning any
known result by a factor of O(N ) in both the com-               information on the other parties’ private elements ex-
putation and communication complexities (N is the                cept T I.
number of parties in the protocol). Our protocol                     Generally speaking two types of probabilistic
is obtained in the malicious model, in which we as-              polynomial-time (PPT) bounded adversaries are con-
sume a probabilistic polynomial-time bounded adver-              sidered in the research of Secure Multiparty Compu-
sary actively controls a fixed set of t (t < N/2) par-            tation (SMC) : semi-honest (passive) and malicious
ties. We use a (t + 1, N )-threshold version of the              (active). A semi-honest party is assumed to follow the
Boneh-Goh-Nissim (BGN) cryptosystem whose un-                    protocol exactly as what is prescribed by the proto-
derlying group supports bilinear maps. The BGN                   col, except that it analyzes the records of intermediate
cryptosystem is generally used in applications where             computations. A malicious party can arbitrarily devi-
the plaintext space should be small, because there               ate from the protocol. Theoretically if the adversary
is still a Discrete Logarithm (DL) problem after the             controls N/2 or more parties, a robust protocol can
decryption. In our protocol the plaintext space can              not be achieved to tolerate early-quitting of the ma-
be as large as bounded by the security parameter τ ,             licious parties. More details on the semi-honest and
and the intractability of DL problem is utilized to              malicious models can be found in other works (Yao
protect the private datasets. Based on the bilinear              1982, Goldreich et al. 1987, Goldreich 2004). In this
map, we also construct some efficient non-interactive              paper, we assume that the adversary corrupts a set of
proofs. The security of our protocol can be reduced              less than N/2 parties before the start, and maliciously
to the common intractable problems including the                 controls the fixed set during the execution. The ad-
random oracle, subgroup decision and discrete log-               versary we consider is also called non-adaptive. There
arithm problems. The computation complexity of                   are also adaptive adversaries that can select the par-
our protocol is O(N S 2 τ 3 ) (S is the cardinality of           ties they control as the execution proceeds, but they
each party’s dataset), and the communication com-                are not considered in this paper.
plexity is O(N S 2 τ ) bits. A similar work by Kissner               We also assume a physical broadcast channel exists
et al. (2006) needs O(N 2 S 2 τ 3 ) computation complex-         for all parties where there is a public board whose
                                                                 content change can be publicly tracked. In practice
ity and O(N 2 S 2 τ ) communication complexity for the           this broadcast channel can be implemented by the
same level of correctness as ours.                               Authenticated Byzantine Agreement in the point-to-
Keywords: cryptographic protocol, privacy preserva-              point network. More details can be found in works of
tion, bilinear groups, set intersection, non interactive         Lamport et al (1982) and Dolev et al (1983).
zero-knowledge proof.                                                A PPSI protocol was proposed by Kissner et al.
                                                                 (2005) by constructing a randomized polynomial Y
                                                                 whose roots set contains the intersection set T I. In
1   Introduction                                                 this paper, we construct a simplified Y , while still
                                                                 keeping the private relationship among Y and all
For datasets distributed on different sources, com-               Ti . Specifically we construct the encrypted Y by
puting the intersection without leaking the other ele-           one cryptosystem which is semantically secure un-
ments is a frequently required task. One example is              der the Subgroup Decision Assumption (SDA). Cryp-
that one airline company is always required to find out           tosystems based on SDA generally require a limited
those passengers who are on their private passenger              plaintext space because after the decryption there is
list and the government’s “do-not-fly” list. Another              still one discrete log computation to recover the plain-
example is that some companies may decide whether                text, as found in other works (Yamamura et al. 2001,
to make a business alliance by the percentage of cus-            Boneh et al. 2005). However, in this paper we show
tomers who have consumption records in all of them.              how they can be applicable on a plaintext space as
In these scenarios, none of the companies or govern-             large as the order of the subgroup on which the cryp-
ment is willing to to publish the other elements in              tosystems are constructed. A few candidate cryp-
their datasets than those of the intersection. In this           tosystems by Yamamura et al. (2001) can be consid-
paper, we address this problem as Privacy Preserving             ered for our protocol, but we choose the Boneh-Goh-
Copyright c 2008, Australian Computer Society, Inc. This pa-     Nissim (BGN) cryptosystem from Boneh et al. (2005),
per appeared at the Thirty-First Australasian Computer Sci-      because it is based on the bilinear map by which we
ence Conference (ACSC2008), Wollongong, Australia. Con-          can construct efficient non-interactive zero-knowledge
ferences in Research and Practice in Information Technology      NIZK proofs.
(CRPIT), Vol. 74, Gillian Dobbie and Bernard Mans, Ed. Re-           Our PPSI protocol based on the NIZK proofs has
production for academic, not-for profit purposes permitted pro-   O(N S 2 τ 3 ) computation complexity, and O(N S 2 τ )
vided this text is included.
communication complexity. The PPSI protocol by                 O(N 2 Sτ 3 ). At least O(N S 2 τ 3 ) computation cost is
Kissner et al. (2006) need O(N 2 S 2 τ 3 ) computation         still required to compute N numbers of E(fl ∗ rl,k ) on
complexity and O(N 2 S 2 τ ) communication complex-            each party. What’s more, in this simplified proof the
ity. Though they have stated that the communication            probability of one coefficient not being chosen by the
complexity may be reduced to O(N 2 Sτ ) by applying            verifiers is 1 − 2S+1 . When S is large, this probability
cut-and-choose technique, it has not been shown how            can not be negligible. Then a wrong coefficient will
this can be achieved and what is the reduced compu-            not likely be found out, by which a malicious party
tation complexity.                                             can multiply rl,k with another E(fl ) other than E(fl ),
    The remainder of the paper is organized as follow-         then Y will not be correctly constructed, and its roots
ing: Section 2 discusses some related work. Section 3          set will not equal T I. Some other techniques can be
formally defines the problem of PPSI. Section 4 lists           found to simplify the zero knowledge proof.
the basic tools for our protocol. Section 5 constructs             Sang et al. (2007) also extended their work (Sang
the non-interactive proofs required in our protocol.           et al. 2006) to get a PPSI protocol in the malicious
Section 6 proposes the PPSI protocol for the mali-             model with O(t2 S 2 τ 3 ) computation complexity, using
cious model. In Section 7 we analyze the computation           interactive zero-knowledge proofs from Cramer et al.
and communication costs of our protocol. Section 8             (2001), but there was still high cost polynomials mul-
concludes the whole paper.                                     tiplications in this extension. The complexity will be
                                                               ideal if there is no polynomials multiplications in con-
2   Related Work                                               structing Y , i.e., Y = l=1 (fl ∗ Rl ) in which Rl is a
                                                               random number. However, a malicious party knowing
A solution for the multi-party case of PPSI was firstly         the coefficients of this Y will get advantage on guess-
proposed by Freedman et al. (2004). The solution               ing the coefficients of fl from an honest party. In this
is based on evaluating polynomials representing ele-           paper we will construct this Y in its encrypted form,
ments in the sets. Kissner et al. (2005) proposed a            evaluate E(Y ) without leaking its coefficients.
more efficient solution for PPSI, in which each poly-                Hohenberger et al. (2006) proposed a solution to
nomial representing each set is multiplied by a ran-           two-part set disjointness testing, the security of which
dom polynomial which has the same degree with the              is based on the subgroup decision assumption, the
former polynomial. The degree of the random poly-              same assumption with this paper. They also extended
nomial was optimized by Sang et al. (2006). All these          their solution to solve two-party set intersection prob-
protocols are based on the semi-honest model.                  lem, but the soundness will be violated in their exten-
     Kissner et al. (2006) extended their PPSI protocol        sion, because a malicious party can reveal any element
(Kissner et al. 2005) to the malicious model using zero        as an intersected element to the other party. To en-
knowledge proofs. The main idea of their protocol is           sure the soundness, a commonly-shared polynomial
constructing a polynomial Y = l=1 (fl ∗ i=1 ri,l )
                                                     N         Y can be constructed, and each party judges the in-
where fl is a polynomial having Pl ’s dataset Tl as the        tersection for himself by evaluating his elements in Y ,
roots set, ri,l is a uniformly selected polynomial with        as Kissner et al. (2006), Sang et al. (2007) and this
the same degree with fl . Specifically each Pi com-             paper have done.
putes and broadcasts yi = f1 ∗ ri,1 + ... + fN ∗ ri,N ,
then they sums all yi to get Y . With overwhelm-               3     Problem Definition
ing probability, the roots set of Y equals T I. The
major cost of their protocol is computing the N 2              3.1    Assumptions and Major Notations
polynomials multiplications E(fl ∗ ri,l ), given E(fl )
and ri,l , and then proving the correctness of each            Suppose T is the domain of the inputs on all par-
                                      S                        ties, we assume the size of T, i.e. |T|, is sufficiently
multiplication. Suppose fl = k=0 al,k xk , E(fl ) =            large to prevent the dictionary attack. Specifically,
                                      S           k
{E(al,k )|k = 0, ..., S}, ri,l =      k=0 bi,l,k x , E(·) is
                                                               we assume |T|       tS. If tS is comparable with |T|,
an additive homomorphic encryption scheme, then                the adversary controlling t parties may manipulate
each coefficient of E(fl ∗ rl,k ) can be computed by the         the t datasets to cover T, then he can defraud the
corresponding coefficient-multiplications of E(fl ) and          intersection of all honest parties’ private inputs. We
ri,l . It is easy to get that totally (S + 1)2 coefficient-      stress that this assumption is practical in the exam-
multiplications are required, and each of them need            ples of airline company and business alliance, where
one modular exponentiation. The proof of correct               the key of each record can be the customer’s iden-
coefficient-multiplication is proposed by Cramer et al.          tity or passport number, and the domain of the key
(2001), which is constant size. Thus Kissner et al.            is much larger than tS.
                                                                   We also assume the parties have negotiated an S
(2006)’s protocol is O(N 2 S 2 ) size, i.e. the computa-       that is not a sensitive privacy for each of them. The
tion complexity is O(N 2 S 2 τ 3 ) (τ is the length of each    parties also do not mind the leakage of such informa-
element, O(τ 3 ) is the complexity of one modular ex-          tion as all parties have less than S elements in their
ponentiation), and the communication complexity is             datasets. Then if a party has less than S elements,
O(N 2 S 2 τ ) bits.                                            he can add dummy elements like |T| + 1 to fulfill the
     Kissner et al. (2006) also proposed that cut-and-         size of S. The result of |T| + 1 being in T I leaks no
choose technique may be used to simplify the proof             information except that all parties have less than S
of correct E(fl ∗ rl,k ), but cut-and-choose technique         elements.
may compromise the correctness of the proof. For                   In Table 1, we define the major notations in this
example, in a simplified proof the prover can firstly            paper.
broadcast the 2S + 1 coefficients of E(fl ∗ rl,k ), the
verifiers randomly select a coefficient (by means of se-          3.2    Definitions
lecting random common challenge from Cramer et al.
(2001)), then the prover proves the multiplications            In SMC, the security in both types (semi-honest and
inside the selected coefficient are all correct. This            malicious) of adversaries is argued by the computa-
proof is of O(S) size, and reduces the communication           tional indistinguishability of the views in the ideal
complexity of the whole protocol to O(N 2 Sτ ) bits,           model and real model (as found in works of Goldre-
but will not reduce the computation complexity to              ich et al. (1987), Lindell (2003)).
                                                                       In this paper we use the bilinear group G1 of com-
               Table 1: Major Notations                             posite order n = q1 q2 for two large primes q1 and q2 ,
 Notation    Definition                                              which is a subgroup of the additive group of points of
    N        Total number of parties                                an elliptic curve E over Fp (p+1 is divisible by n). An
    Pi       The i-th party                                         admissible bilinear map e for G1 can be the modified
    Ti       The set or multiset on Pi                              Weil pairing or Tate pairing over the curve (as found
    S        Total number of elements on each party                 in works of Boneh et al. (2003), Miller (2004)). G2 is
   Ti,j      The j-th element on Pi , j = 1, ..., S                 an order n subgroup of the multiplicative group of a
    T        (T1 , ..., TN )                                        finite field F∗2 .
    TI       T1 ∩ ... ∩ TN
     t       Total number of colluded parties, t < N/2
    I        The index set of t colluded parties, {i1 , ..., it }   4.2   A Threshold Version of Boneh-Goh-
    I        The index set of honest parties, {i1 , ..., iN −t }          Nissim Cryptosystem
    τ        The security parameter which can be 256         We use the BGN cryptosystem (Boneh et al. 2005)
 r ∈R Zn     Uniformly select an element r from a group Zn
                                                             based on the composite order bilinear group for its
                                                             additive homomorphism and one-time multiplicative
Definition 1 (Computational indistinguishability) homomorphism. Boneh et al. (2005) has used its
Suppose an ensemble X = {Xn } be a sequence of               threshold version for electronic election, but the de-
random variables Xn for n ∈ {1, ..., M }, which              tails on the key generation and decryption are not
are ranging over strings of length poly(n). Two              given. Below we give these algorithms.
ensembles X = {Xn } and Y = {Yn } are com-                      - Distributed Key Generation Algorithm G: Given
putationally indistinguishable, denoted by                         a security parameter τ , a key generation al-
X ≡c Y , if for every PPT algorithm A, and every                   gorithm G(τ ) can be run to get a tuple
c > 0, there exists an integer K such that for all                 (q1 , q2 , G1 , G2 , e). Specifically, q1 and q2 are two
n ≥ K, |P r[A(Xn ) = 1] − P r[A(Yn ) = 1]| < nc .                  τ -bit random primes. Let n = q1 q2 . Find
P r[A(x) = 1] is the probability that A outputs 1 on               the smallest positive integer l ∈ Z such that
input x.                                                           p = ln−1 is prime and p = 2 mod 3. In practice,
                                                                   τ can be 256 to get p of at least 512-bit length.
Definition 2 (Intersection Function                 f)    The       The elliptic curve y 2 = x3 +1 defined over Fp has
intersection function f is an N -ary func-                         p + 1 = ln points in Fp . Then G1 is the subgroup
tion:       ({0, 1}τ ∗S∗N )    →        ({0, 1})S∗N , i.e.,        of order n in the group of points on the curve.
f(T ) = {fij (T )|i = 1, ..., N, j = 1, ..., S}, where             Let G2 be the subgroup of F∗2 of order n. The
fij (T ) = 1 if Ti,j ∈ T I, and fij (T ) = 0 if Ti,j ∈ T I.
                                                      /            bilinear map e : G1 × G1 → G2 is the modified
                                                                   Weil pairing on the curve. Select two random
Definition 3 (PPSI in the malicious model)                          generators g, u ∈R G1 and set h = uq2 . Then h
Let Π be an N -party protocol for computing f. Let a               is a random generator of the order q1 subgroup
pair (I, A), where A is a PPT algorithm representing               of G1 .
an adversary in the real model, and I = {i1 , ..., it }            The public key is (n, G1 , G2 , e, g, h). The secret
(t < N/2) is the index set of parties controlled by                key sk = q1 . sk is distributed among the N par-
A. The joint execution of Π under (I, A) in the                    ties as ski for i = 1, ..., N by Shamir’s secret shar-
real model, denoted REALΠ,I,A (T ), is defined as the               ing scheme (Shamir 1979). The verification key
output sequence of A and honest parties, resulting                 vki is also generated as hski . The public key and
from their interaction in the execution of Π.                      each share of secrete key can be generated by the
     Let a pair (I, B), where B is a PPT algorithm,                technique of Frankel et al. (1998) and Pedersen
represent an adversary in the ideal model, where                   (1991) without a trusted dealer. For simplicity
there is an available trusted party. The joint exe-                we assume there is a trusted dealer that can run
cution of f under (I, B) in the ideal model, denoted               G as an offline phase.
IDEALf,I,B (T ), is defined as the output pair of B
and the honest parties in the ideal execution.                  - Encryption Algorithm E : To encrypt a message
     Π is said to securely solve the problem of pri-               m ∈ {0, ..., 2τ −1 }, select r ∈R Zn and compute
vacy preserving set intersection in the malicious                  C = E(m) = g m hr ∈ G1 .
model, if for every PPT algorithm A, there exists a
PPT algorithm B, such that the views of A and B are             - Partial Decryption Algorithm Di : To decrypt a
computationally indistinguishable, i.e.,                           ciphertext C, party Pi computes and broadcasts
                                                                   Di (C) = C ski . Each party checks whether
         {IDEALf,I,B (T )} ≡ {REALΠ,I,A (T )}. (1)                 e(C, vki ) = e(Di (C), h) to verify the correctness
                                                                   of Di (C). Because e(C, vki ) = e(g m hr , hski ) =
4 Basic Tools                                                      e(g, h)mski e(h, h)rski , and e(Di (C), h)            =
                                                                   e(g mski hrski , h) = e(g, h)mski e(h, h)rski , inter-
4.1 Bilinear Groups                                                active zero knowledge proofs are not needed
Bilinear group G1 is a group which supports an ad-
missible bilinear map e : G1 × G1 → G2 in which G2              - Recovery Algorithm: If less than t + 1 parties
have the same order with G1 . A bilinear map is said               pass the verification of partial decryptions, the
admissible if it satisfies the following properties:                algorithm fails. Otherwise, suppose S be a set
                                                    a b
                                                                   of t + 1 verified parties, and λi (i ∈ S) be the
    - Bilinear: ∀u, v ∈ G1 , ∀a, b ∈ Z, e(u , v ) =                appropriate Lagrange coefficients, the decryption
      e(u, v)ab .                                                  D(C) = i∈S C λi ski = C q1 = g mq1 .
    - Non-degenerate: If g is a generator of G1 , then
      e(g, g) is also a generator of G2 .                        In our constructions we only need to know whether
                                                             m = 0. For m ∈ {0, ..., 2τ −1 }, m = 0 if and only if
    - Computable: There is an efficient algorithm to           D(C) = 1. If m = 0, D(C) is an element in the or-
      compute e(u, v) for any u, v ∈ G1 .                    der q2 subgroup of G1 . To know m, a practical way is
computing firstly the discrete log mq1 , then the great-            4) Each Pi evaluates Ti,j in E(Y ) for j = 1, ..., S.
est common divisor GCD(mq1 , n) to get q1 . Though                    Pi decrypts E(Y (Ti,j )) by the help of other t par-
the latter computation can be solved by Euclid’s algo-                ties. If the decryption is 1, Ti,j ∈ T I; otherwise,
rithm within polynomial time, the former problem is                                       /
                                                                      it determines Ti,j ∈ T I.
known as hard as the Discrete Log (DL) problem over
a finite field (as proved by Menezes et al. (1993)). Our                         l
construction also utilizes the intractability of m = 0                Since     i =(l−t mod N ) Ri ,l in Y is generated by
in D(C) to protect privacy.                                       t + 1 parties, it is always a random number that can
    The semantic security of BGN encryption is based              not be manipulated by the adversary.
on the hardness of subgroup decision problem as                       One encryption scheme based on subgroup deci-
proved by Boneh et al. (2005): Given (n, G1 , G2 , e)             sion problem, such as the BGN encryption, is nec-
generated by G where n = q1 q2 , and an element                   essary for the security of the protocol. If we use
x ∈ G1 , output 1 if the order of x is q1 (i.e. x is              some other encryption schemes in which D(E(Yi,j )) =
over an order q1 subgroup of G1 ), and output 0 oth-              Y (Ti,j ), then one adversary controlling t (t < N/2)
erwise. We assume there is no PPT algorithm that                  parties can compute the coefficients of Y by the La-
can solve the subgroup decision problem efficiently.                grange interpolation, with S + 1 different evaluations
    The BGN encryption supports additive homomor-                 he can get. From the coefficients of Y , the adversary
phism. Given C1 = E(m1 ) and C2 = E(m2 ),                         will get some advantage on guessing the distribution
it is easy to compute E(m1 + m2 ) = C1 C2 . It                    of an honest party’s inputs. However, in BGN encryp-
also supports one-time multiplicative homomorphism.               tion, the adversary can get only |T I| evaluations, thus
Given u = g α , h = g αq2 for a random α ∈R                       he can not know any of Y ’s coefficients.
Zn , e(g, h) = e(h, g) = e(g, g)αq2 , e(h, h) =                       What’s more, a malicious (active) adversary con-
e(g, h)αq2 , so e(C1 , C2 ) = e(g m1 hr1 , g m2 hr2 ) =           trolling t parties may have attacks in the protocol. In
e(g, g)m1 m2 e(g, h)m1 r2 +m2 r1 +αq2 r1 r2 . e(g, g) is of or-   step 1), after receiveing E(fi ) from an honest party
der n, e(g, h) is of order q1 , e(C1 , C2 ) is an encryption      Pi , he can substitute his inputs with fi by just broad-
of m1 m2 over F∗2 and the decryption key is q1 .                  casting E(fi ) to others. He can also broadcast zero
                                                                  polynomials, then the roots set of Y will be the in-
                                                                  tersection of all honest parties, and he can test this
4.3      Calculations on encrypted polynomials                    intersection in the evaluations. In step 2.2), he can
In our protocol, we need do some calculations on                  broadcast an arbitrary E(yi ), then the outputs of the
encrypted polynomials. For a polynomial f (x) =                   protocol will also be arbitrary, because the roots set
   m       i                                                      of Y will have no certain relationship with the orig-
   i=0 ai x , we use E(f (x)) to denote the sequence              inal inputs. In step 4) he can evaluate some values
of encrypted coefficients {E(ai )|i = 0, ..., m}. Given             on E(fi ) from an honest Pi instead of on E(Y ), and
E(f (x)), where E is an additive homomorphic en-                  ask for the decryptions of them, then test the inputs
cryption scheme, some computations can be made as                 of Pi . We construct non-interactive zero knowledge
following:                                                        (NIZK) proofs to prevent these attacks.
    1) Evaluate E(f (x)) at a value v: E(f (v)) =
       E(am v m + am−1 v m−1 + ... + a0 )       =                 5.2    Proof of Correct Multiplication
              m          m−1
       E(am )v E(am−1 )v     · · · E(a0 ).                        Suppose the prover and verifier have the common
    2) Given a constant scalar c, compute E(c · f (x)) =          input a1 = E(m) = g m hr where the prover does
       {E(am )c , ..., E(a0 )c }.                                 not know the plaintext m, the prover is required to
                                                                  prove he does a correct multiplication a2 = E(mR).
    3) Given    E(g(x))          where    g(x)              =     This Proof of Correct Multiplication (POCM) can
          m       j                                               be denoted as P OCM {R, s1 |a1 = g m hr , a2 =
          j=0 bj x ,compute E(f (x) + g(x))                 =
       {E(am )E(bm ), ..., E(a0 )E(b0 )}.                         g mR hrR+s1 }, which means the prover should prove
                                                                  knowing R, s1 such that a2 = aR hs1 . POCM can be
                                                                  based on the bilinear map e as following:
5     Non-interactive Proofs for the PPSI Pro-
      tocol                                                        1) The prover generates R, s1 , x, s2 ∈R Zn , com-
                                                                      putes a2 = aR hs1 = g mR hrR+s1 , a3 = ax hs2 =
                                                                                     1                              1
5.1      Main Idea of the PPSI Protocol in the                        g mx hrx+s2 , a4 = H(a1 , a2 , a3 , pid, sid), z1 =
         semi-honest model                                            x + a4 R mod n, z2 = s2 + a4 s1 mod n, then
Briefly, our protocol for PPSI is based on evaluating                  broadcasts a2 , a3 , z1 and z2 . H(·) is a one-way
a randomized polynomial Y whose roots set contains                    hash function (e.g. SHA-2), pid is the unique
the intersection.                                                     identity of the prover, sid is the unique identity
                                                                      of the current session which POCM is belonging
    1) Each Pi computes fi = (x − Ti,1 ) · · · (x −                   to.
       Ti,S ) = k=0 ci,k xk , and broadcasts E(ci,k ) for
       k = 0, ..., S.                                              2) The verifier computes a = H(a1 , a2 , a3 , pid, sid),
                                                                      outputs      ‘1’     if   e(a1 , g z1 )e(g, hz2 ) =
    2) For i = 0, ..., N − 1,                                                 a
                                                                      e(a2 , g )e(a3 , g), and outputs ‘0’ otherwise.
       2.1) Pi selects Ri,l ∈R Zn , computes E(Ri,l ∗ fl )            The         correctness          of         POCM        is
            for l = i, ..., i + t mod N .                         easy        to       verify.                e(a1 , g z1 )   =
       2.2) Pi sums all E(Ri,l ∗ fl ) to get E(yi ) =             e(g m hr , g x+a4 R ) = e(g, g)mx+ma4 R e(g, h)rx+ra4 R ,
            E(Ri,i ∗fi +...+Ri,(i+t mod N ) ∗fi+t mod N ),        e(g, hz2 )      =        e(g, h)s2 +a4 s1 ,    e(a2 , g a ) =
            and broadcasts E(yi ).                                e(g, g)mRa
                                                                                e(g, h) rRa +s1 a
                                                                                                  ,      and      e(a3 , g)   =
    3) Each Pi sums all E(yi ) to get E(Y ) =                     e(g, g)mx e(g, h)rx+s2 . Because a = a4 , it is easy to
       E( (RN −t,0 + ... + RN −1,0 + R0,0 ) ∗ f0 + ... +          see that e(a1 , g z1 )e(g, hz2 ) = e(a2 , g a )e(a3 , g).
       (RN −t−1,N −1 + ... + RN −1,N −1 ) ∗ fN −1 ) =                 The zero-knowledge property can be based on the
            N −1   l
       E( l=0 ( i =(l−t mod N ) Ri ,l ) ∗ fl ).                   subgroup decision problem of BGN cryptosystem and
the random oracle H(·). A simulator can randomly                       3) Then the prover proves he correctly computes
select m from a given domain, select r, R, s1 , x, s2 ∈R                                 S
                                                                          E(Y (v)) = E( i=0 ci v i ):
Zn , and compute a1 = g m hr , a2 = g mR hrR+s1 , a3 =
g mx hrx+s2 , a4 = H(a1 , a2 , a3 , pid, sid), z1 = x + Ra4               3.1) The prover generates R                     ∈R        Zn ,
mod n, z2 = s2 + s1 a4 mod n. Because of the sub-                              computes       c        =P          E(YP  (v))        =
                                                                                     S            vi          S       i   S       i
group decision problem, distinguishing the distribu-                           h R
                                                                                     i=0 (E(ci ))    =g       i=0 ci v h  i=0 γi v +R ,
tions (a1 , a2 , a3 ) from (a1 , a2 , a3 ) in the real execution                                            S
of POCM is computationally hard. Then distinguish-                             d       =           g −R i=1 (E(ci ))ri               =
                                                                                     PS          PS
                                                                                 −R+ i=1 ci ri     i=1 γi ri ,
ing a4 from a4 is computationally hard in a random                             g               h                  and broadcasts
oracle H(·), and it is also hard to distinguish (z1 , z2 )                     c, d.
from (z1 , z2 ) because x, R, s1 , s2 in them are all uni-                3.2) The verifier checks whether e(c, g)e(d, h) =
formly selected. Therefore, we can say that the views                                         S
of the simulator and the adversary-controlled verifier                          e(E(c0 ), g) i=1 e(E(ci ), ai ).
in POCM are computationally indistinguishable.                        The correctness and zero-knowledge property of
    In POCM the prover computes 4 exps (i.e. modu-                 POCPE are also easy to get since POCPE is com-
lar exponentiations in G2 or point multiplications in              posed of POKP, POCM. In sum the prover computes
G1 ), 4 multis (i.e. modular multiplications in G2 or              6S + 2 exps, 4S + 3 multis and one hash value, broad-
point addition in G1 ), and one hash value. The veri-              casts 2S + 3 messages. The verifier computes 3 exps,
fier computes 3 exps, 2 multis, one hash value and 3                2S + 2 multis, one hash value and 2S + 3 pairings.
pairings. 4 messages need to be broadcasted.

5.3    Proof of Knowing Plaintext                                  6     The PPSI Protocol in the Malicious Model

The Proof of Knowing Plaintext (POKP) can be de-                   We add zero knowledge proofs into the PPSI protocol
noted as P OKP {m, r|a1 = g m hr }, which means the                for the semi-honest model to get a protocol in the
prover should prove that he knows m, r such that                   malicious model. A malicious adversary will be forced
a1 = E(m) = g m hr . POKP can be based on the                      to behave in the semi-honest manners, otherwise he
bilinear map e as following:                                       will be found cheating by these proofs. In Figure 1,
                                                                   we give the PPSI protocol for the malicious model.
 1) The prover generates x, s ∈R Zn , computes a2 =                As analyzed in Section 5.1, the malicious behaviors
    g x hs , a3 = H(a1 , a2 , pid, sid), z1 = x + a3 m             should be prevented as following:
    mod n, z2 = s + a3 r mod n, then broadcasts
    a2 , z1 , z2 .                                                     1) To prevent the adversary from generating zero
                                                                          polynomials, notice that ci,S , the leading coeffi-
 2) The verifier computes a = H(a1 , a2 , pid, sid),                       cient of fi , should always be 1, so in step 2.2) of
    checks whether e(g z1 hz2 , g) = e(a1 , g a )e(a2 , g),               Figure 1, each party will set the leading coeffi-
    outputs ‘1’ if it is the case, and outputs ‘0’ oth-                   cient E(ci,S ) = E(1) by themselves, then E(fi )
    erwise.                                                               from Pi will have at least one nonzero coefficient,
                                                                          the roots set of Y will not equal the intersection
    POKP can be treated as a special case of POCM,                        of the honest parties.
i.e. given a common input E(1), the prover should
prove that he knows m such that a1 = E(1 · m).                         2) To prevent the adversary from replacing his in-
Thus the correctness and zero-knowledge property of                       puts by an honest party, ideally each party
POKP are also easy to get. Including the computa-                         should run S times of POKP on its encrypted
tion of a1 the prover computes 4 exps, 4 multis, one                      coefficients in step 1) of Figure 1. However, one
hash value, broadcasts 3 messages. The verifier com-                       POKP for any encrypted coefficient other than
putes one hash value, 3 exps, 2 multis and 3 pairings.                    the leading one (say, POKP for E(ci,S−1 )) suf-
                                                                          fices to prove the party generates its polyno-
                                                                          mial independently. The adversary may sub-
5.4    Proof of Correct Polynomial Evaluation                             stitute his other coefficients (say, E(ci,S−2 ), ...,
Suppose E(Y ) = {E(ci )|i = 0, ..., S} for polynomial                     E(ci,0 )) with the coefficients received from an
Y = i=0 ci xi is the common input, the prover is re-                      honest party, but any of his substitution will gen-
quired to prove that he correctly evaluates a value v                     erate a polynomial whose roots are not known by
on E(Y ). This Proof of Correct Polynomial Evalua-                        him.
tion (POCPE) can be denoted as P OCP E{v|∀i =                          3) To prevent the adversary from broadcasting ar-
0, ..., S, E(ci ) = g ci hγi , E(Y (v)) = E( i=0 ci v i )}.               bitrary encryptions in step 2.1) of Figure 1, each
POCPE can be constructed based on POKP, and                               party should run POCM to prove the multiplica-
POCM:                                                                     tions are correct.
 1) The prover proves knowing the plaintext of E(v)                    4) To prevent the adversary from evaluating E(fi )
    by P OKP {v, r1 |a1 = g v hr1 }.                                      from an honest Pi in step 4.1) of Figure 1, each
                                                                          party should prove the evaluations are correct by
 2) For i = 2, ..., S, given a1 = g v hr1 and ai−1 =                      POCPE.
    g v hri−1 , the prover proves knowing some v and                   In Figure 1, if there is some party removed from
    ri ∈R Zn such that ai = g v hri . This proof is a              the verified list in Step 1), only the intersection of the
    simplified POCM as following:                                   remaining parties’ datasets will be computed. For the
                                                                   party removed in Step 2), any remaining party can be
      2.1) The prover generates ri ∈R Zn , com-                    elected to act his part in the protocol. In Step 3), a
           putes ai = g v hri , bi = g −ri +vri−1 ar1 =
                                                   i−1             removed party will not know the intersection. At least
           g −ri +vri−1 +v r1 hr1 ri−1 , and broadcasts            there are t+1 parties left in the final decryption, from
           ai , bi .                                               which they know the final intersection. The protocol
                                                                   can also be applicable when N ≥ 2 and t ≥ N/2, but
      2.2) The verifier checks whether e(a1 , ai−1 ) =              if there are less than t+1 parties remained in the final
           e(ai , g)e(bi , h).                                     decryption, they can not get the final intersection.
          Inputs: There are N parties, t of them may collude in malicious manners. Each party has a private
                set of S elements, denoted Ti . Each party holds the public key and its own share of the secret key
                for the (t + 1, N )-threshold BGN cryptosystem. All party have a common session identification
                number sid for running the current protocol in the computing system. Each party has a unique
                identity pid.

          Output: Each party Pi knows T I = T0 ∩ ... ∩ TN −1 .

                                                                                                                S         k
          1) Computing E(fi ): For i = 0, ..., N − 1, Pi computes fi = (x − Ti,1 ) · · · (x − Ti,S ) =          k=0 ci,k x ,
                encrypts ci,k for k = 0, ..., S −1, broadcasts all the encrypted values, and then runs P OKP {ci,S−1 :
                E(ci,S−1 )} to prove he knows the plaintext of E(ci,S−1 ).

          2) Computing E(yi ): For i = 0, ..., N − 1,

                2.1) Pi selects Ri,l ∈R Zn , computes E(Ri,l ∗ fl ) for l = i, ..., i + t mod N , and then broadcasts
                     the coefficient E(Ri,l cl,k ) for k = 0, ..., S − 1, runs P OCM {Ri,l : E(Ri,l cl,k )} to prove he does
                     the correct multiplication.
                2.2) each party sets cl,S = 1, E(Ri,l cl,S ) = E(Ri,l ), sums E(Ri,l ∗ fl ) for l = i, ..., i + t mod N , to
                     get E(yi ) = E(Ri,i ∗ fi + ... + Ri,(i+t   mod N )   ∗ fi+t   mod N ).

          3) Each Pi sums all E(yi ) to get E(Y ) = E( (RN −t,0 + ... + RN −1,0 + R0,0 ) ∗ f0 + ... + (RN −t−1,N −1 +
                                                    S       k
                ... + RN −1,N −1 ) ∗ fN −1 ) = E(   k=0 βk x ).

          4) Decryption and Evaluation: For i = 0, ..., N − 1,

                4.1) For j = 1, ..., S, Pi evaluates Ti,j in E(Y ) to get E(βk Ti,j ) for k = 1, ..., S, then runs
                     P OCP E{Ti,j : E(βk Ti,j ), k = 1, ..., S} to prove the correctness of the evaluation.
                                                                                                              S          k
                4.2) For j = 1, ..., S, all party of the t+1 quorum including Pi compute E(Y (Ti,j )) =       k=0 E(βk Ti,j ).
                     Pi decrypts E(Y (Ti,j )) by the help of the other t parties. If the decryption is 1, Ti,j ∈ T I;
                     otherwise, it determines Ti,j ∈ T I.

               Figure 1: Protocol for Privacy Preserving Set Intersection in the Malicious Model

    We show below briefly the correctness and secu-                    accounts for the security of our PPSI protocol as de-
rity of our protocol, though these claims can also be                 fined in Definition 3.
mathematically proving.
    Correctness If Ti,j ∈ T I, then fl (Ti,j ) = 0 for
                                                                      7       Complexity of the PPSI Protocol
l = 0, ..., N − 1. Then the evaluation Y (Ti,j ) = 0
and D(E( Y (Ti,j ) )) = g 0q1 = 1. If Ti,j ∈ T I, then
                                                  /                   The computation complexity of our protocol is mainly
there exists fl (Ti,j ) = 0 for some l ∈ {0, ..., N −                 due to the encryptions, exponentiations, bilinear
1}. Then Y (Ti,j ) =
                                  l                                   maps and multiplications. One BGN encryption
                              l ( i =(l −t mod N ) Ri ,l ) ∗          needs 2 exponentiations. One exponentiation in G2
fl (Ti,j ), D(E( Y (Ti,j ) )) = g q1 Y (Ti,j ) , whose order          with a τ -bit exponent (or point multiplication in G1 )
is q2 . Because each Ri ,l ∈R Zn , overwhelmingly                     requires O(τ 3 ) computation. One bilinear map, such
Y (Ti,j ) = 0 and D(E( Y (Ti,j ) )) = 1.                              as Weil pairing by Miller’s algorithm, has the same
    Security Suppose A is the adversary in the real                   complexity with the exponentiation in G2 . One mul-
execution of PPSI protocol in Figure 1 which controls                 tiplication of two τ -bit elements in G2 (or point ad-
parties PI = {Pi1 , ..., Pit |t < N/2}, B is the adver-               dition in G1 ) has cost of O(τ 2 ). One element in G1
sary which controls the same parties in the ideal exe-                has O(τ ) bits.
cution assuming there is a trusted party T . Based on                     Computation Complexity : We assume the N
the zero-knowledge properties of POCM, POKP and                       parties execute the protocol in parallel, and the com-
POCPE, B’s views in the simulations of these proofs                   putation cost of one party Pi can be representative
are computationally indistinguishable from A’s views                  of the whole protocol’s complexity. In Step 1), Pi
in the real execution of them. The zero-knowledge                     computes E(fi ) by 2S exps, computes P OKP by 2
properties of our NIZK proofs are based on the basic                  exps, 3 multis and one hash value, checks P OKP by
assumptions of random oracle model, the subgroup                      3(N −1) exps, 3(N −1) bilinear maps 2(N −1) multis
decision and discrete logarithm problems, so by the                   and N − 1 hash values. In Step 2.1), Pi computes all
same assumptions, the views of A and B are com-                       P OCM by 4(S + 1)(t + 1) exps, 4(S + 1)(t + 1) mul-
putationally indistinguishable from each other, which                 tis and (S + 1)(t + 1) hash values, checks P OCM by
3S(t + 1)(N − 1) exps, 3S(t + 1)(N − 1) bilinear maps,      Dolev, D. & Strong, H. (1983), Authenticated Algo-
2S(t+1)(N −1) multis and S(t+1)(N −1) hash values.           rithms for Byzantine Agreement, in ‘SIAM J. Com-
In Step 2.2), Pi computes all yi by (S + 1)tN multi-         put’, Vol. 12(4), pp. 656–665.
plications. In Step 3), Pi computes Y by S(N − 1)
multiplications. In Step 4.1), for each evaluation of       Frankel, Y., MacKenzie, P. & Yung, M. (1998), Ro-
Ti,j , Pi computes one P OCE, thus for S evluations           bust efficient distributed RSA-key generation, in
Pi computes 6S 2 + 2S exps, 4S 2 + 3S multis and              ‘Proc. of the 17th annual ACM symposium on
S hash values. Pi also checks S(N − 1) P OCE for              Principles of distributed computing’, ACM Press,
other parties, which need (2S + 3)S(N − 1) bilinear           pp. 320–330.
maps, 3S(N − 1) exps, S(N − 1) hash values and              Freedman, M., Nissim, K. & Pinkas, B. (2004), Ef-
(2S + 3)S(N − 1) multis. In Step 4.2), Pi computes            ficient Private Matching and Set Intersection, in
S decryptions, which need S exps, 2tS bilinear maps,          ‘Proc. of Eurocrypt’04’, Vol. 3027, LNCS, pp. 1–
and tS multis. Pi also computes (N − 1)S exps for             19.
other parties’ decryptions.
    The complexity for one exp (or bilinear map) is         Goldreich, O. (2004), Foundations of Cryptography:
O(τ ) times of that of one multiplication or hash            Volume 2, Basic Applications, Cambridge Univer-
value, so the complexity for all exps plus all bilin-        sity Press, 2004.
ear maps can be representative for the whole pro-
tocol’s complexity, as we have done in calculating          Goldreich, O., Micali, S. & Wigderson, A. (1987),
Kissner et al. (2006)’s computation complexity. We           How to Play Any Mental Game, in ‘Proc. of 19th
also consider only the practical instance that S       t,    STOC’, ACM Press, pp. 218-229.
i.e. O(StN + N S 2 ) = O(N S 2 ). The major cost of         Hohenberger, S. & Weis, S. (2006), Honest-Verifier
our protocol is in Step 4.1), for the computation of          Private Disjointness Testing Without Random Or-
O(N S 2 ) bilinear maps. The whole protocol’s compu-          acles, in ‘6th International Workshop of Privacy
tation complexity is O(N S 2 τ 3 ).                           Enhancing Technologies (PET 2006)’, Vol. 4285,
    Communication Complexity : In Step 1),                    LNCS, pp. 277–294.
N (S + 3) elements are broadcasted by the N parties.
In Step 2.1), 4N (S + 1)(t + 1) elements are broad-         Kissner, L. & Song, D. (2005), Privacy-Preserving Set
casted. In Step 4.1), N S(2S + 3) elements are broad-         Operations, in ‘Advances in Cryptology - CRYPTO
casted. In Step 4.2), N tS elements are broadcasted.          2005’, Vol. 3621, LNCS, pp. 241–257.
The major cost is in Step 4.1), the communication           Kissner, L. & Song, D. (2006), Privacy-Preserving Set
complexity is O(N S 2 ) elements, or O(N S 2 τ ) bits.        Operations, in ‘Technical Report CMU-CS-05-113’,
                                                              Carnegie Mellon University, June 2006.
8   Concluding Remarks                                      Lamport, L., Shostack, R. & Pease, M. (1982), The
                                                              Byzantine Generals Problem, in ‘ACM Trans. on
We proposed a more efficient privacy preserving set             Programming Languages and Systems’, Vol. 4(3),
intersection protocol which improves a previous work          ACM Press, pp. 382–401.
by Kissner et al. (2006) by an O(N ) factor in the
computation and communication complexities, with            Lindell, Y. (2003), Parallel Coin-Tossing and
the same level of correctness. Though cut-and-choose          Constant-Round Secure Two-Party Computation,
technique may reduce the communication complex-               in ‘Journal of Cryptology’, Vol. 16(3), pp. 143-184.
ity of Kissner et al. (2006)’s protocol, the correctness
may be compromised and the reduced computation              Menezes, A., Vanstone, S., & Okamoto, T. (1993),
complexity is the same with us. We proved the se-            Reducing elliptic curve logarithms to logarithms in
curity of our protocol in the malicious model assum-         a finite field, in ‘IEEE Trans. on Information The-
ing an adversary actively controlling a fixed set of          ory’, Vol. 39, pp. 1639–1646.
t (t < N/2) parties. Our construction is based on the
BGN cryptosystem which supports bilinear maps. Ef-          Miller, V. (2004), The Weil Pairing, and Its Efficient
ficient NIZK proofs for correct multiplications, know-        Calculation, in ‘Journal of Cryptology’, Vol. 17,
ing the plaintext, and correct polynomial evaluation         pp. 235–261.
are also constructed. In the future we will utilize the     Pedersen, T. (1991), A Threshold Cryptosystem with-
bilinear group and NIZK proofs in other privacy pre-          out a Trusted Party, in ‘Proc. of Eurocrypt 1991’,
serving set operations.                                       Vol. 547, LNCS, pp. 522–526.

References                                                  Sang, Y., Shen, H., Tan, Y. & Xiong, N. (2006),
                                                              Efficient Protocols for Privacy Preserving Match-
Boneh, D. & Franklin, M. (2003), Identity-Based En-           ing Against Distributed Datasets, in ‘Proc. of the
  cryption from the Weil Pairing, in ’SIAM Journal            8th International Conference on Information and
  of Computing’, Vol. 32(3), pp. 586–615.                     Communications Security (ICICS ’06)’, Vol. 4307,
                                                              LNCS, pp. 210–227.
Boneh, D., Goh, E. & Nissim, K. (2005), Evaluat-
  ing 2-DNF Formulas on Ciphertexts, in ’Proc. of           Sang, Y. & Shen, H. (2007), Privacy Preserving Set
  TCC’05’, Vol. 3378, LNCS, pp. 325–341.                      Intersection Protocol Secure Against Malicious Be-
                                                              haviours, accepted by ‘The 8th International Con-
Barreto, P., Kim, H., Lynn, B., & Scott, M., (2002),          ference on Parallel and Distributed Computing,
  Efficient Algorithms for Pairing-Based Cryptosys-             Applications and Technologies (PDCAT 2007)’,
  tems, in ‘Proc. of CRYPTO’02’, Vol. 2442, LNCS,             Adelaide, Australia, Dec. 2007.
  pp. 354–369.
                                                            Shamir, A. (1979), How to Share a Secret, in ‘Com-
Cramer, R., Damgard, I. & Nielsen, J. (2001), Multi-          munications of the ACM’, Vol. 22(11), ACM Press,
  party Computation from Threshold Homomorphic                pp. 612–613.
  Encryption, in ’Advances in Cryptology - EURO-
  CRYPT 2001’, Vol. 2045, LNCS, pp. 280–300.
Yamamura, A. & Saito, T. (2001), Private Informa-
  tion Retrieval Based on the Subgroup Membership
  Problem, in ‘Australian Conference on Informa-
  tion Security and Privacy (ACISP’01)’, Vol. 2119,
  LNCS, pp. 206–220.
Yao, A. (1982), Protocols for Secure Computations,
  in ‘Proc. of the 23rd Annual IEEE Symposium on
  Foundations of Computer Science’, pp. 160–164.

To top