VIEWS: 8 PAGES: 8 POSTED ON: 3/13/2010 Public Domain
Privacy Preserving Set Intersection Based on Bilinear Groups Yingpeng Sang Hong Shen School of Computer Science The University of Adelaide, Adelaide, South Australia 5005, Australia, Email: {yingpeng.sang, hong.shen}@adelaide.edu.au Abstract Set Intersection (PPSI), in which there are N (N ≥ 2) parties, each party Pi (i = 1, ..., N ) has a set (or mul- We propose a more eﬃcient privacy preserving set tiset) Ti and |Ti | = S, each party wants to learn the intersection protocol which improves the previously intersection T I = T1 ∩ ... ∩ TN , without gleaning any known result by a factor of O(N ) in both the com- information on the other parties’ private elements ex- putation and communication complexities (N is the cept T I. number of parties in the protocol). Our protocol Generally speaking two types of probabilistic is obtained in the malicious model, in which we as- polynomial-time (PPT) bounded adversaries are con- sume a probabilistic polynomial-time bounded adver- sidered in the research of Secure Multiparty Compu- sary actively controls a ﬁxed set of t (t < N/2) par- tation (SMC) : semi-honest (passive) and malicious ties. We use a (t + 1, N )-threshold version of the (active). A semi-honest party is assumed to follow the Boneh-Goh-Nissim (BGN) cryptosystem whose un- protocol exactly as what is prescribed by the proto- derlying group supports bilinear maps. The BGN col, except that it analyzes the records of intermediate cryptosystem is generally used in applications where computations. A malicious party can arbitrarily devi- the plaintext space should be small, because there ate from the protocol. Theoretically if the adversary is still a Discrete Logarithm (DL) problem after the controls N/2 or more parties, a robust protocol can decryption. In our protocol the plaintext space can not be achieved to tolerate early-quitting of the ma- be as large as bounded by the security parameter τ , licious parties. More details on the semi-honest and and the intractability of DL problem is utilized to malicious models can be found in other works (Yao protect the private datasets. Based on the bilinear 1982, Goldreich et al. 1987, Goldreich 2004). In this map, we also construct some eﬃcient non-interactive paper, we assume that the adversary corrupts a set of proofs. The security of our protocol can be reduced less than N/2 parties before the start, and maliciously to the common intractable problems including the controls the ﬁxed set during the execution. The ad- random oracle, subgroup decision and discrete log- versary we consider is also called non-adaptive. There arithm problems. The computation complexity of are also adaptive adversaries that can select the par- our protocol is O(N S 2 τ 3 ) (S is the cardinality of ties they control as the execution proceeds, but they each party’s dataset), and the communication com- are not considered in this paper. plexity is O(N S 2 τ ) bits. A similar work by Kissner We also assume a physical broadcast channel exists et al. (2006) needs O(N 2 S 2 τ 3 ) computation complex- for all parties where there is a public board whose content change can be publicly tracked. In practice ity and O(N 2 S 2 τ ) communication complexity for the this broadcast channel can be implemented by the same level of correctness as ours. Authenticated Byzantine Agreement in the point-to- Keywords: cryptographic protocol, privacy preserva- point network. More details can be found in works of tion, bilinear groups, set intersection, non interactive Lamport et al (1982) and Dolev et al (1983). zero-knowledge proof. A PPSI protocol was proposed by Kissner et al. (2005) by constructing a randomized polynomial Y whose roots set contains the intersection set T I. In 1 Introduction this paper, we construct a simpliﬁed Y , while still keeping the private relationship among Y and all For datasets distributed on diﬀerent sources, com- Ti . Speciﬁcally we construct the encrypted Y by puting the intersection without leaking the other ele- one cryptosystem which is semantically secure un- ments is a frequently required task. One example is der the Subgroup Decision Assumption (SDA). Cryp- that one airline company is always required to ﬁnd out tosystems based on SDA generally require a limited those passengers who are on their private passenger plaintext space because after the decryption there is list and the government’s “do-not-ﬂy” list. Another still one discrete log computation to recover the plain- example is that some companies may decide whether text, as found in other works (Yamamura et al. 2001, to make a business alliance by the percentage of cus- Boneh et al. 2005). However, in this paper we show tomers who have consumption records in all of them. how they can be applicable on a plaintext space as In these scenarios, none of the companies or govern- large as the order of the subgroup on which the cryp- ment is willing to to publish the other elements in tosystems are constructed. A few candidate cryp- their datasets than those of the intersection. In this tosystems by Yamamura et al. (2001) can be consid- paper, we address this problem as Privacy Preserving ered for our protocol, but we choose the Boneh-Goh- Copyright c 2008, Australian Computer Society, Inc. This pa- Nissim (BGN) cryptosystem from Boneh et al. (2005), per appeared at the Thirty-First Australasian Computer Sci- because it is based on the bilinear map by which we ence Conference (ACSC2008), Wollongong, Australia. Con- can construct eﬃcient non-interactive zero-knowledge ferences in Research and Practice in Information Technology NIZK proofs. (CRPIT), Vol. 74, Gillian Dobbie and Bernard Mans, Ed. Re- Our PPSI protocol based on the NIZK proofs has production for academic, not-for proﬁt purposes permitted pro- O(N S 2 τ 3 ) computation complexity, and O(N S 2 τ ) vided this text is included. communication complexity. The PPSI protocol by O(N 2 Sτ 3 ). At least O(N S 2 τ 3 ) computation cost is Kissner et al. (2006) need O(N 2 S 2 τ 3 ) computation still required to compute N numbers of E(fl ∗ rl,k ) on complexity and O(N 2 S 2 τ ) communication complex- each party. What’s more, in this simpliﬁed proof the ity. Though they have stated that the communication probability of one coeﬃcient not being chosen by the 1 complexity may be reduced to O(N 2 Sτ ) by applying veriﬁers is 1 − 2S+1 . When S is large, this probability cut-and-choose technique, it has not been shown how can not be negligible. Then a wrong coeﬃcient will this can be achieved and what is the reduced compu- not likely be found out, by which a malicious party tation complexity. can multiply rl,k with another E(fl ) other than E(fl ), The remainder of the paper is organized as follow- then Y will not be correctly constructed, and its roots ing: Section 2 discusses some related work. Section 3 set will not equal T I. Some other techniques can be formally deﬁnes the problem of PPSI. Section 4 lists found to simplify the zero knowledge proof. the basic tools for our protocol. Section 5 constructs Sang et al. (2007) also extended their work (Sang the non-interactive proofs required in our protocol. et al. 2006) to get a PPSI protocol in the malicious Section 6 proposes the PPSI protocol for the mali- model with O(t2 S 2 τ 3 ) computation complexity, using cious model. In Section 7 we analyze the computation interactive zero-knowledge proofs from Cramer et al. and communication costs of our protocol. Section 8 (2001), but there was still high cost polynomials mul- concludes the whole paper. tiplications in this extension. The complexity will be ideal if there is no polynomials multiplications in con- N 2 Related Work structing Y , i.e., Y = l=1 (fl ∗ Rl ) in which Rl is a random number. However, a malicious party knowing A solution for the multi-party case of PPSI was ﬁrstly the coeﬃcients of this Y will get advantage on guess- proposed by Freedman et al. (2004). The solution ing the coeﬃcients of fl from an honest party. In this is based on evaluating polynomials representing ele- paper we will construct this Y in its encrypted form, ments in the sets. Kissner et al. (2005) proposed a evaluate E(Y ) without leaking its coeﬃcients. more eﬃcient solution for PPSI, in which each poly- Hohenberger et al. (2006) proposed a solution to nomial representing each set is multiplied by a ran- two-part set disjointness testing, the security of which dom polynomial which has the same degree with the is based on the subgroup decision assumption, the former polynomial. The degree of the random poly- same assumption with this paper. They also extended nomial was optimized by Sang et al. (2006). All these their solution to solve two-party set intersection prob- protocols are based on the semi-honest model. lem, but the soundness will be violated in their exten- Kissner et al. (2006) extended their PPSI protocol sion, because a malicious party can reveal any element (Kissner et al. 2005) to the malicious model using zero as an intersected element to the other party. To en- knowledge proofs. The main idea of their protocol is sure the soundness, a commonly-shared polynomial N constructing a polynomial Y = l=1 (fl ∗ i=1 ri,l ) N Y can be constructed, and each party judges the in- where fl is a polynomial having Pl ’s dataset Tl as the tersection for himself by evaluating his elements in Y , roots set, ri,l is a uniformly selected polynomial with as Kissner et al. (2006), Sang et al. (2007) and this the same degree with fl . Speciﬁcally each Pi com- paper have done. putes and broadcasts yi = f1 ∗ ri,1 + ... + fN ∗ ri,N , then they sums all yi to get Y . With overwhelm- 3 Problem Deﬁnition ing probability, the roots set of Y equals T I. The major cost of their protocol is computing the N 2 3.1 Assumptions and Major Notations polynomials multiplications E(fl ∗ ri,l ), given E(fl ) and ri,l , and then proving the correctness of each Suppose T is the domain of the inputs on all par- S ties, we assume the size of T, i.e. |T|, is suﬃciently multiplication. Suppose fl = k=0 al,k xk , E(fl ) = large to prevent the dictionary attack. Speciﬁcally, S k {E(al,k )|k = 0, ..., S}, ri,l = k=0 bi,l,k x , E(·) is we assume |T| tS. If tS is comparable with |T|, an additive homomorphic encryption scheme, then the adversary controlling t parties may manipulate each coeﬃcient of E(fl ∗ rl,k ) can be computed by the the t datasets to cover T, then he can defraud the corresponding coeﬃcient-multiplications of E(fl ) and intersection of all honest parties’ private inputs. We ri,l . It is easy to get that totally (S + 1)2 coeﬃcient- stress that this assumption is practical in the exam- multiplications are required, and each of them need ples of airline company and business alliance, where one modular exponentiation. The proof of correct the key of each record can be the customer’s iden- coeﬃcient-multiplication is proposed by Cramer et al. tity or passport number, and the domain of the key (2001), which is constant size. Thus Kissner et al. is much larger than tS. We also assume the parties have negotiated an S (2006)’s protocol is O(N 2 S 2 ) size, i.e. the computa- that is not a sensitive privacy for each of them. The tion complexity is O(N 2 S 2 τ 3 ) (τ is the length of each parties also do not mind the leakage of such informa- element, O(τ 3 ) is the complexity of one modular ex- tion as all parties have less than S elements in their ponentiation), and the communication complexity is datasets. Then if a party has less than S elements, O(N 2 S 2 τ ) bits. he can add dummy elements like |T| + 1 to fulﬁll the Kissner et al. (2006) also proposed that cut-and- size of S. The result of |T| + 1 being in T I leaks no choose technique may be used to simplify the proof information except that all parties have less than S of correct E(fl ∗ rl,k ), but cut-and-choose technique elements. may compromise the correctness of the proof. For In Table 1, we deﬁne the major notations in this example, in a simpliﬁed proof the prover can ﬁrstly paper. broadcast the 2S + 1 coeﬃcients of E(fl ∗ rl,k ), the veriﬁers randomly select a coeﬃcient (by means of se- 3.2 Deﬁnitions lecting random common challenge from Cramer et al. (2001)), then the prover proves the multiplications In SMC, the security in both types (semi-honest and inside the selected coeﬃcient are all correct. This malicious) of adversaries is argued by the computa- proof is of O(S) size, and reduces the communication tional indistinguishability of the views in the ideal complexity of the whole protocol to O(N 2 Sτ ) bits, model and real model (as found in works of Goldre- but will not reduce the computation complexity to ich et al. (1987), Lindell (2003)). In this paper we use the bilinear group G1 of com- Table 1: Major Notations posite order n = q1 q2 for two large primes q1 and q2 , Notation Deﬁnition which is a subgroup of the additive group of points of N Total number of parties an elliptic curve E over Fp (p+1 is divisible by n). An Pi The i-th party admissible bilinear map e for G1 can be the modiﬁed Ti The set or multiset on Pi Weil pairing or Tate pairing over the curve (as found S Total number of elements on each party in works of Boneh et al. (2003), Miller (2004)). G2 is Ti,j The j-th element on Pi , j = 1, ..., S an order n subgroup of the multiplicative group of a T (T1 , ..., TN ) ﬁnite ﬁeld F∗2 . p TI T1 ∩ ... ∩ TN t Total number of colluded parties, t < N/2 I The index set of t colluded parties, {i1 , ..., it } 4.2 A Threshold Version of Boneh-Goh- I The index set of honest parties, {i1 , ..., iN −t } Nissim Cryptosystem τ The security parameter which can be 256 We use the BGN cryptosystem (Boneh et al. 2005) r ∈R Zn Uniformly select an element r from a group Zn based on the composite order bilinear group for its additive homomorphism and one-time multiplicative Deﬁnition 1 (Computational indistinguishability) homomorphism. Boneh et al. (2005) has used its Suppose an ensemble X = {Xn } be a sequence of threshold version for electronic election, but the de- random variables Xn for n ∈ {1, ..., M }, which tails on the key generation and decryption are not are ranging over strings of length poly(n). Two given. Below we give these algorithms. ensembles X = {Xn } and Y = {Yn } are com- - Distributed Key Generation Algorithm G: Given putationally indistinguishable, denoted by a security parameter τ , a key generation al- X ≡c Y , if for every PPT algorithm A, and every gorithm G(τ ) can be run to get a tuple c > 0, there exists an integer K such that for all (q1 , q2 , G1 , G2 , e). Speciﬁcally, q1 and q2 are two 1 n ≥ K, |P r[A(Xn ) = 1] − P r[A(Yn ) = 1]| < nc . τ -bit random primes. Let n = q1 q2 . Find P r[A(x) = 1] is the probability that A outputs 1 on the smallest positive integer l ∈ Z such that input x. p = ln−1 is prime and p = 2 mod 3. In practice, τ can be 256 to get p of at least 512-bit length. Deﬁnition 2 (Intersection Function f) The The elliptic curve y 2 = x3 +1 deﬁned over Fp has intersection function f is an N -ary func- p + 1 = ln points in Fp . Then G1 is the subgroup tion: ({0, 1}τ ∗S∗N ) → ({0, 1})S∗N , i.e., of order n in the group of points on the curve. f(T ) = {fij (T )|i = 1, ..., N, j = 1, ..., S}, where Let G2 be the subgroup of F∗2 of order n. The p fij (T ) = 1 if Ti,j ∈ T I, and fij (T ) = 0 if Ti,j ∈ T I. / bilinear map e : G1 × G1 → G2 is the modiﬁed Weil pairing on the curve. Select two random Deﬁnition 3 (PPSI in the malicious model) generators g, u ∈R G1 and set h = uq2 . Then h Let Π be an N -party protocol for computing f. Let a is a random generator of the order q1 subgroup pair (I, A), where A is a PPT algorithm representing of G1 . an adversary in the real model, and I = {i1 , ..., it } The public key is (n, G1 , G2 , e, g, h). The secret (t < N/2) is the index set of parties controlled by key sk = q1 . sk is distributed among the N par- A. The joint execution of Π under (I, A) in the ties as ski for i = 1, ..., N by Shamir’s secret shar- real model, denoted REALΠ,I,A (T ), is deﬁned as the ing scheme (Shamir 1979). The veriﬁcation key output sequence of A and honest parties, resulting vki is also generated as hski . The public key and from their interaction in the execution of Π. each share of secrete key can be generated by the Let a pair (I, B), where B is a PPT algorithm, technique of Frankel et al. (1998) and Pedersen represent an adversary in the ideal model, where (1991) without a trusted dealer. For simplicity there is an available trusted party. The joint exe- we assume there is a trusted dealer that can run cution of f under (I, B) in the ideal model, denoted G as an oﬄine phase. IDEALf,I,B (T ), is deﬁned as the output pair of B and the honest parties in the ideal execution. - Encryption Algorithm E : To encrypt a message Π is said to securely solve the problem of pri- m ∈ {0, ..., 2τ −1 }, select r ∈R Zn and compute vacy preserving set intersection in the malicious C = E(m) = g m hr ∈ G1 . model, if for every PPT algorithm A, there exists a PPT algorithm B, such that the views of A and B are - Partial Decryption Algorithm Di : To decrypt a computationally indistinguishable, i.e., ciphertext C, party Pi computes and broadcasts Di (C) = C ski . Each party checks whether c {IDEALf,I,B (T )} ≡ {REALΠ,I,A (T )}. (1) e(C, vki ) = e(Di (C), h) to verify the correctness of Di (C). Because e(C, vki ) = e(g m hr , hski ) = 4 Basic Tools e(g, h)mski e(h, h)rski , and e(Di (C), h) = e(g mski hrski , h) = e(g, h)mski e(h, h)rski , inter- 4.1 Bilinear Groups active zero knowledge proofs are not needed here. Bilinear group G1 is a group which supports an ad- missible bilinear map e : G1 × G1 → G2 in which G2 - Recovery Algorithm: If less than t + 1 parties have the same order with G1 . A bilinear map is said pass the veriﬁcation of partial decryptions, the admissible if it satisﬁes the following properties: algorithm fails. Otherwise, suppose S be a set a b of t + 1 veriﬁed parties, and λi (i ∈ S) be the - Bilinear: ∀u, v ∈ G1 , ∀a, b ∈ Z, e(u , v ) = appropriate Lagrange coeﬃcients, the decryption e(u, v)ab . D(C) = i∈S C λi ski = C q1 = g mq1 . - Non-degenerate: If g is a generator of G1 , then e(g, g) is also a generator of G2 . In our constructions we only need to know whether m = 0. For m ∈ {0, ..., 2τ −1 }, m = 0 if and only if - Computable: There is an eﬃcient algorithm to D(C) = 1. If m = 0, D(C) is an element in the or- compute e(u, v) for any u, v ∈ G1 . der q2 subgroup of G1 . To know m, a practical way is computing ﬁrstly the discrete log mq1 , then the great- 4) Each Pi evaluates Ti,j in E(Y ) for j = 1, ..., S. est common divisor GCD(mq1 , n) to get q1 . Though Pi decrypts E(Y (Ti,j )) by the help of other t par- the latter computation can be solved by Euclid’s algo- ties. If the decryption is 1, Ti,j ∈ T I; otherwise, rithm within polynomial time, the former problem is / it determines Ti,j ∈ T I. known as hard as the Discrete Log (DL) problem over a ﬁnite ﬁeld (as proved by Menezes et al. (1993)). Our l construction also utilizes the intractability of m = 0 Since i =(l−t mod N ) Ri ,l in Y is generated by in D(C) to protect privacy. t + 1 parties, it is always a random number that can The semantic security of BGN encryption is based not be manipulated by the adversary. on the hardness of subgroup decision problem as One encryption scheme based on subgroup deci- proved by Boneh et al. (2005): Given (n, G1 , G2 , e) sion problem, such as the BGN encryption, is nec- generated by G where n = q1 q2 , and an element essary for the security of the protocol. If we use x ∈ G1 , output 1 if the order of x is q1 (i.e. x is some other encryption schemes in which D(E(Yi,j )) = over an order q1 subgroup of G1 ), and output 0 oth- Y (Ti,j ), then one adversary controlling t (t < N/2) erwise. We assume there is no PPT algorithm that parties can compute the coeﬃcients of Y by the La- can solve the subgroup decision problem eﬃciently. grange interpolation, with S + 1 diﬀerent evaluations The BGN encryption supports additive homomor- he can get. From the coeﬃcients of Y , the adversary phism. Given C1 = E(m1 ) and C2 = E(m2 ), will get some advantage on guessing the distribution it is easy to compute E(m1 + m2 ) = C1 C2 . It of an honest party’s inputs. However, in BGN encryp- also supports one-time multiplicative homomorphism. tion, the adversary can get only |T I| evaluations, thus Given u = g α , h = g αq2 for a random α ∈R he can not know any of Y ’s coeﬃcients. Zn , e(g, h) = e(h, g) = e(g, g)αq2 , e(h, h) = What’s more, a malicious (active) adversary con- e(g, h)αq2 , so e(C1 , C2 ) = e(g m1 hr1 , g m2 hr2 ) = trolling t parties may have attacks in the protocol. In e(g, g)m1 m2 e(g, h)m1 r2 +m2 r1 +αq2 r1 r2 . e(g, g) is of or- step 1), after receiveing E(fi ) from an honest party der n, e(g, h) is of order q1 , e(C1 , C2 ) is an encryption Pi , he can substitute his inputs with fi by just broad- of m1 m2 over F∗2 and the decryption key is q1 . casting E(fi ) to others. He can also broadcast zero p polynomials, then the roots set of Y will be the in- tersection of all honest parties, and he can test this 4.3 Calculations on encrypted polynomials intersection in the evaluations. In step 2.2), he can In our protocol, we need do some calculations on broadcast an arbitrary E(yi ), then the outputs of the encrypted polynomials. For a polynomial f (x) = protocol will also be arbitrary, because the roots set m i of Y will have no certain relationship with the orig- i=0 ai x , we use E(f (x)) to denote the sequence inal inputs. In step 4) he can evaluate some values of encrypted coeﬃcients {E(ai )|i = 0, ..., m}. Given on E(fi ) from an honest Pi instead of on E(Y ), and E(f (x)), where E is an additive homomorphic en- ask for the decryptions of them, then test the inputs cryption scheme, some computations can be made as of Pi . We construct non-interactive zero knowledge following: (NIZK) proofs to prevent these attacks. 1) Evaluate E(f (x)) at a value v: E(f (v)) = E(am v m + am−1 v m−1 + ... + a0 ) = 5.2 Proof of Correct Multiplication m m−1 E(am )v E(am−1 )v · · · E(a0 ). Suppose the prover and veriﬁer have the common 2) Given a constant scalar c, compute E(c · f (x)) = input a1 = E(m) = g m hr where the prover does {E(am )c , ..., E(a0 )c }. not know the plaintext m, the prover is required to prove he does a correct multiplication a2 = E(mR). 3) Given E(g(x)) where g(x) = This Proof of Correct Multiplication (POCM) can m j be denoted as P OCM {R, s1 |a1 = g m hr , a2 = j=0 bj x ,compute E(f (x) + g(x)) = {E(am )E(bm ), ..., E(a0 )E(b0 )}. g mR hrR+s1 }, which means the prover should prove knowing R, s1 such that a2 = aR hs1 . POCM can be 1 based on the bilinear map e as following: 5 Non-interactive Proofs for the PPSI Pro- tocol 1) The prover generates R, s1 , x, s2 ∈R Zn , com- putes a2 = aR hs1 = g mR hrR+s1 , a3 = ax hs2 = 1 1 5.1 Main Idea of the PPSI Protocol in the g mx hrx+s2 , a4 = H(a1 , a2 , a3 , pid, sid), z1 = semi-honest model x + a4 R mod n, z2 = s2 + a4 s1 mod n, then Brieﬂy, our protocol for PPSI is based on evaluating broadcasts a2 , a3 , z1 and z2 . H(·) is a one-way a randomized polynomial Y whose roots set contains hash function (e.g. SHA-2), pid is the unique the intersection. identity of the prover, sid is the unique identity of the current session which POCM is belonging 1) Each Pi computes fi = (x − Ti,1 ) · · · (x − to. S Ti,S ) = k=0 ci,k xk , and broadcasts E(ci,k ) for k = 0, ..., S. 2) The veriﬁer computes a = H(a1 , a2 , a3 , pid, sid), outputs ‘1’ if e(a1 , g z1 )e(g, hz2 ) = 2) For i = 0, ..., N − 1, a e(a2 , g )e(a3 , g), and outputs ‘0’ otherwise. 2.1) Pi selects Ri,l ∈R Zn , computes E(Ri,l ∗ fl ) The correctness of POCM is for l = i, ..., i + t mod N . easy to verify. e(a1 , g z1 ) = 2.2) Pi sums all E(Ri,l ∗ fl ) to get E(yi ) = e(g m hr , g x+a4 R ) = e(g, g)mx+ma4 R e(g, h)rx+ra4 R , E(Ri,i ∗fi +...+Ri,(i+t mod N ) ∗fi+t mod N ), e(g, hz2 ) = e(g, h)s2 +a4 s1 , e(a2 , g a ) = and broadcasts E(yi ). e(g, g)mRa e(g, h) rRa +s1 a , and e(a3 , g) = 3) Each Pi sums all E(yi ) to get E(Y ) = e(g, g)mx e(g, h)rx+s2 . Because a = a4 , it is easy to E( (RN −t,0 + ... + RN −1,0 + R0,0 ) ∗ f0 + ... + see that e(a1 , g z1 )e(g, hz2 ) = e(a2 , g a )e(a3 , g). (RN −t−1,N −1 + ... + RN −1,N −1 ) ∗ fN −1 ) = The zero-knowledge property can be based on the N −1 l E( l=0 ( i =(l−t mod N ) Ri ,l ) ∗ fl ). subgroup decision problem of BGN cryptosystem and the random oracle H(·). A simulator can randomly 3) Then the prover proves he correctly computes select m from a given domain, select r, R, s1 , x, s2 ∈R S E(Y (v)) = E( i=0 ci v i ): Zn , and compute a1 = g m hr , a2 = g mR hrR+s1 , a3 = g mx hrx+s2 , a4 = H(a1 , a2 , a3 , pid, sid), z1 = x + Ra4 3.1) The prover generates R ∈R Zn , mod n, z2 = s2 + s1 a4 mod n. Because of the sub- computes c =P E(YP (v)) = S vi S i S i group decision problem, distinguishing the distribu- h R i=0 (E(ci )) =g i=0 ci v h i=0 γi v +R , tions (a1 , a2 , a3 ) from (a1 , a2 , a3 ) in the real execution S of POCM is computationally hard. Then distinguish- d = g −R i=1 (E(ci ))ri = PS PS −R+ i=1 ci ri i=1 γi ri , ing a4 from a4 is computationally hard in a random g h and broadcasts oracle H(·), and it is also hard to distinguish (z1 , z2 ) c, d. from (z1 , z2 ) because x, R, s1 , s2 in them are all uni- 3.2) The veriﬁer checks whether e(c, g)e(d, h) = formly selected. Therefore, we can say that the views S of the simulator and the adversary-controlled veriﬁer e(E(c0 ), g) i=1 e(E(ci ), ai ). in POCM are computationally indistinguishable. The correctness and zero-knowledge property of In POCM the prover computes 4 exps (i.e. modu- POCPE are also easy to get since POCPE is com- lar exponentiations in G2 or point multiplications in posed of POKP, POCM. In sum the prover computes G1 ), 4 multis (i.e. modular multiplications in G2 or 6S + 2 exps, 4S + 3 multis and one hash value, broad- point addition in G1 ), and one hash value. The veri- casts 2S + 3 messages. The veriﬁer computes 3 exps, ﬁer computes 3 exps, 2 multis, one hash value and 3 2S + 2 multis, one hash value and 2S + 3 pairings. pairings. 4 messages need to be broadcasted. 5.3 Proof of Knowing Plaintext 6 The PPSI Protocol in the Malicious Model The Proof of Knowing Plaintext (POKP) can be de- We add zero knowledge proofs into the PPSI protocol noted as P OKP {m, r|a1 = g m hr }, which means the for the semi-honest model to get a protocol in the prover should prove that he knows m, r such that malicious model. A malicious adversary will be forced a1 = E(m) = g m hr . POKP can be based on the to behave in the semi-honest manners, otherwise he bilinear map e as following: will be found cheating by these proofs. In Figure 1, we give the PPSI protocol for the malicious model. 1) The prover generates x, s ∈R Zn , computes a2 = As analyzed in Section 5.1, the malicious behaviors g x hs , a3 = H(a1 , a2 , pid, sid), z1 = x + a3 m should be prevented as following: mod n, z2 = s + a3 r mod n, then broadcasts a2 , z1 , z2 . 1) To prevent the adversary from generating zero polynomials, notice that ci,S , the leading coeﬃ- 2) The veriﬁer computes a = H(a1 , a2 , pid, sid), cient of fi , should always be 1, so in step 2.2) of checks whether e(g z1 hz2 , g) = e(a1 , g a )e(a2 , g), Figure 1, each party will set the leading coeﬃ- outputs ‘1’ if it is the case, and outputs ‘0’ oth- cient E(ci,S ) = E(1) by themselves, then E(fi ) erwise. from Pi will have at least one nonzero coeﬃcient, the roots set of Y will not equal the intersection POKP can be treated as a special case of POCM, of the honest parties. i.e. given a common input E(1), the prover should prove that he knows m such that a1 = E(1 · m). 2) To prevent the adversary from replacing his in- Thus the correctness and zero-knowledge property of puts by an honest party, ideally each party POKP are also easy to get. Including the computa- should run S times of POKP on its encrypted tion of a1 the prover computes 4 exps, 4 multis, one coeﬃcients in step 1) of Figure 1. However, one hash value, broadcasts 3 messages. The veriﬁer com- POKP for any encrypted coeﬃcient other than putes one hash value, 3 exps, 2 multis and 3 pairings. the leading one (say, POKP for E(ci,S−1 )) suf- ﬁces to prove the party generates its polyno- mial independently. The adversary may sub- 5.4 Proof of Correct Polynomial Evaluation stitute his other coeﬃcients (say, E(ci,S−2 ), ..., Suppose E(Y ) = {E(ci )|i = 0, ..., S} for polynomial E(ci,0 )) with the coeﬃcients received from an S Y = i=0 ci xi is the common input, the prover is re- honest party, but any of his substitution will gen- quired to prove that he correctly evaluates a value v erate a polynomial whose roots are not known by on E(Y ). This Proof of Correct Polynomial Evalua- him. tion (POCPE) can be denoted as P OCP E{v|∀i = 3) To prevent the adversary from broadcasting ar- S 0, ..., S, E(ci ) = g ci hγi , E(Y (v)) = E( i=0 ci v i )}. bitrary encryptions in step 2.1) of Figure 1, each POCPE can be constructed based on POKP, and party should run POCM to prove the multiplica- POCM: tions are correct. 1) The prover proves knowing the plaintext of E(v) 4) To prevent the adversary from evaluating E(fi ) by P OKP {v, r1 |a1 = g v hr1 }. from an honest Pi in step 4.1) of Figure 1, each party should prove the evaluations are correct by 2) For i = 2, ..., S, given a1 = g v hr1 and ai−1 = POCPE. i−1 g v hri−1 , the prover proves knowing some v and In Figure 1, if there is some party removed from i ri ∈R Zn such that ai = g v hri . This proof is a the veriﬁed list in Step 1), only the intersection of the simpliﬁed POCM as following: remaining parties’ datasets will be computed. For the party removed in Step 2), any remaining party can be 2.1) The prover generates ri ∈R Zn , com- elected to act his part in the protocol. In Step 3), a i putes ai = g v hri , bi = g −ri +vri−1 ar1 = i−1 removed party will not know the intersection. At least i−1 g −ri +vri−1 +v r1 hr1 ri−1 , and broadcasts there are t+1 parties left in the ﬁnal decryption, from ai , bi . which they know the ﬁnal intersection. The protocol can also be applicable when N ≥ 2 and t ≥ N/2, but 2.2) The veriﬁer checks whether e(a1 , ai−1 ) = if there are less than t+1 parties remained in the ﬁnal e(ai , g)e(bi , h). decryption, they can not get the ﬁnal intersection. Inputs: There are N parties, t of them may collude in malicious manners. Each party has a private set of S elements, denoted Ti . Each party holds the public key and its own share of the secret key for the (t + 1, N )-threshold BGN cryptosystem. All party have a common session identiﬁcation number sid for running the current protocol in the computing system. Each party has a unique identity pid. Output: Each party Pi knows T I = T0 ∩ ... ∩ TN −1 . Steps: S k 1) Computing E(fi ): For i = 0, ..., N − 1, Pi computes fi = (x − Ti,1 ) · · · (x − Ti,S ) = k=0 ci,k x , encrypts ci,k for k = 0, ..., S −1, broadcasts all the encrypted values, and then runs P OKP {ci,S−1 : E(ci,S−1 )} to prove he knows the plaintext of E(ci,S−1 ). 2) Computing E(yi ): For i = 0, ..., N − 1, 2.1) Pi selects Ri,l ∈R Zn , computes E(Ri,l ∗ fl ) for l = i, ..., i + t mod N , and then broadcasts the coeﬃcient E(Ri,l cl,k ) for k = 0, ..., S − 1, runs P OCM {Ri,l : E(Ri,l cl,k )} to prove he does the correct multiplication. 2.2) each party sets cl,S = 1, E(Ri,l cl,S ) = E(Ri,l ), sums E(Ri,l ∗ fl ) for l = i, ..., i + t mod N , to get E(yi ) = E(Ri,i ∗ fi + ... + Ri,(i+t mod N ) ∗ fi+t mod N ). 3) Each Pi sums all E(yi ) to get E(Y ) = E( (RN −t,0 + ... + RN −1,0 + R0,0 ) ∗ f0 + ... + (RN −t−1,N −1 + S k ... + RN −1,N −1 ) ∗ fN −1 ) = E( k=0 βk x ). 4) Decryption and Evaluation: For i = 0, ..., N − 1, k 4.1) For j = 1, ..., S, Pi evaluates Ti,j in E(Y ) to get E(βk Ti,j ) for k = 1, ..., S, then runs k P OCP E{Ti,j : E(βk Ti,j ), k = 1, ..., S} to prove the correctness of the evaluation. S k 4.2) For j = 1, ..., S, all party of the t+1 quorum including Pi compute E(Y (Ti,j )) = k=0 E(βk Ti,j ). Pi decrypts E(Y (Ti,j )) by the help of the other t parties. If the decryption is 1, Ti,j ∈ T I; / otherwise, it determines Ti,j ∈ T I. Figure 1: Protocol for Privacy Preserving Set Intersection in the Malicious Model We show below brieﬂy the correctness and secu- accounts for the security of our PPSI protocol as de- rity of our protocol, though these claims can also be ﬁned in Deﬁnition 3. mathematically proving. Correctness If Ti,j ∈ T I, then fl (Ti,j ) = 0 for 7 Complexity of the PPSI Protocol l = 0, ..., N − 1. Then the evaluation Y (Ti,j ) = 0 and D(E( Y (Ti,j ) )) = g 0q1 = 1. If Ti,j ∈ T I, then / The computation complexity of our protocol is mainly there exists fl (Ti,j ) = 0 for some l ∈ {0, ..., N − due to the encryptions, exponentiations, bilinear 1}. Then Y (Ti,j ) = l maps and multiplications. One BGN encryption l ( i =(l −t mod N ) Ri ,l ) ∗ needs 2 exponentiations. One exponentiation in G2 fl (Ti,j ), D(E( Y (Ti,j ) )) = g q1 Y (Ti,j ) , whose order with a τ -bit exponent (or point multiplication in G1 ) is q2 . Because each Ri ,l ∈R Zn , overwhelmingly requires O(τ 3 ) computation. One bilinear map, such Y (Ti,j ) = 0 and D(E( Y (Ti,j ) )) = 1. as Weil pairing by Miller’s algorithm, has the same Security Suppose A is the adversary in the real complexity with the exponentiation in G2 . One mul- execution of PPSI protocol in Figure 1 which controls tiplication of two τ -bit elements in G2 (or point ad- parties PI = {Pi1 , ..., Pit |t < N/2}, B is the adver- dition in G1 ) has cost of O(τ 2 ). One element in G1 sary which controls the same parties in the ideal exe- has O(τ ) bits. cution assuming there is a trusted party T . Based on Computation Complexity : We assume the N the zero-knowledge properties of POCM, POKP and parties execute the protocol in parallel, and the com- POCPE, B’s views in the simulations of these proofs putation cost of one party Pi can be representative are computationally indistinguishable from A’s views of the whole protocol’s complexity. In Step 1), Pi in the real execution of them. The zero-knowledge computes E(fi ) by 2S exps, computes P OKP by 2 properties of our NIZK proofs are based on the basic exps, 3 multis and one hash value, checks P OKP by assumptions of random oracle model, the subgroup 3(N −1) exps, 3(N −1) bilinear maps 2(N −1) multis decision and discrete logarithm problems, so by the and N − 1 hash values. In Step 2.1), Pi computes all same assumptions, the views of A and B are com- P OCM by 4(S + 1)(t + 1) exps, 4(S + 1)(t + 1) mul- putationally indistinguishable from each other, which tis and (S + 1)(t + 1) hash values, checks P OCM by 3S(t + 1)(N − 1) exps, 3S(t + 1)(N − 1) bilinear maps, Dolev, D. & Strong, H. (1983), Authenticated Algo- 2S(t+1)(N −1) multis and S(t+1)(N −1) hash values. rithms for Byzantine Agreement, in ‘SIAM J. Com- In Step 2.2), Pi computes all yi by (S + 1)tN multi- put’, Vol. 12(4), pp. 656–665. plications. In Step 3), Pi computes Y by S(N − 1) multiplications. In Step 4.1), for each evaluation of Frankel, Y., MacKenzie, P. & Yung, M. (1998), Ro- Ti,j , Pi computes one P OCE, thus for S evluations bust eﬃcient distributed RSA-key generation, in Pi computes 6S 2 + 2S exps, 4S 2 + 3S multis and ‘Proc. of the 17th annual ACM symposium on S hash values. Pi also checks S(N − 1) P OCE for Principles of distributed computing’, ACM Press, other parties, which need (2S + 3)S(N − 1) bilinear pp. 320–330. maps, 3S(N − 1) exps, S(N − 1) hash values and Freedman, M., Nissim, K. & Pinkas, B. (2004), Ef- (2S + 3)S(N − 1) multis. In Step 4.2), Pi computes ﬁcient Private Matching and Set Intersection, in S decryptions, which need S exps, 2tS bilinear maps, ‘Proc. of Eurocrypt’04’, Vol. 3027, LNCS, pp. 1– and tS multis. Pi also computes (N − 1)S exps for 19. other parties’ decryptions. The complexity for one exp (or bilinear map) is Goldreich, O. (2004), Foundations of Cryptography: O(τ ) times of that of one multiplication or hash Volume 2, Basic Applications, Cambridge Univer- value, so the complexity for all exps plus all bilin- sity Press, 2004. ear maps can be representative for the whole pro- tocol’s complexity, as we have done in calculating Goldreich, O., Micali, S. & Wigderson, A. (1987), Kissner et al. (2006)’s computation complexity. We How to Play Any Mental Game, in ‘Proc. of 19th also consider only the practical instance that S t, STOC’, ACM Press, pp. 218-229. i.e. O(StN + N S 2 ) = O(N S 2 ). The major cost of Hohenberger, S. & Weis, S. (2006), Honest-Veriﬁer our protocol is in Step 4.1), for the computation of Private Disjointness Testing Without Random Or- O(N S 2 ) bilinear maps. The whole protocol’s compu- acles, in ‘6th International Workshop of Privacy tation complexity is O(N S 2 τ 3 ). Enhancing Technologies (PET 2006)’, Vol. 4285, Communication Complexity : In Step 1), LNCS, pp. 277–294. N (S + 3) elements are broadcasted by the N parties. In Step 2.1), 4N (S + 1)(t + 1) elements are broad- Kissner, L. & Song, D. (2005), Privacy-Preserving Set casted. In Step 4.1), N S(2S + 3) elements are broad- Operations, in ‘Advances in Cryptology - CRYPTO casted. In Step 4.2), N tS elements are broadcasted. 2005’, Vol. 3621, LNCS, pp. 241–257. The major cost is in Step 4.1), the communication Kissner, L. & Song, D. (2006), Privacy-Preserving Set complexity is O(N S 2 ) elements, or O(N S 2 τ ) bits. Operations, in ‘Technical Report CMU-CS-05-113’, Carnegie Mellon University, June 2006. 8 Concluding Remarks Lamport, L., Shostack, R. & Pease, M. (1982), The Byzantine Generals Problem, in ‘ACM Trans. on We proposed a more eﬃcient privacy preserving set Programming Languages and Systems’, Vol. 4(3), intersection protocol which improves a previous work ACM Press, pp. 382–401. by Kissner et al. (2006) by an O(N ) factor in the computation and communication complexities, with Lindell, Y. (2003), Parallel Coin-Tossing and the same level of correctness. Though cut-and-choose Constant-Round Secure Two-Party Computation, technique may reduce the communication complex- in ‘Journal of Cryptology’, Vol. 16(3), pp. 143-184. ity of Kissner et al. (2006)’s protocol, the correctness may be compromised and the reduced computation Menezes, A., Vanstone, S., & Okamoto, T. (1993), complexity is the same with us. We proved the se- Reducing elliptic curve logarithms to logarithms in curity of our protocol in the malicious model assum- a ﬁnite ﬁeld, in ‘IEEE Trans. on Information The- ing an adversary actively controlling a ﬁxed set of ory’, Vol. 39, pp. 1639–1646. t (t < N/2) parties. Our construction is based on the BGN cryptosystem which supports bilinear maps. Ef- Miller, V. (2004), The Weil Pairing, and Its Eﬃcient ﬁcient NIZK proofs for correct multiplications, know- Calculation, in ‘Journal of Cryptology’, Vol. 17, ing the plaintext, and correct polynomial evaluation pp. 235–261. are also constructed. In the future we will utilize the Pedersen, T. (1991), A Threshold Cryptosystem with- bilinear group and NIZK proofs in other privacy pre- out a Trusted Party, in ‘Proc. of Eurocrypt 1991’, serving set operations. Vol. 547, LNCS, pp. 522–526. References Sang, Y., Shen, H., Tan, Y. & Xiong, N. (2006), Eﬃcient Protocols for Privacy Preserving Match- Boneh, D. & Franklin, M. (2003), Identity-Based En- ing Against Distributed Datasets, in ‘Proc. of the cryption from the Weil Pairing, in ’SIAM Journal 8th International Conference on Information and of Computing’, Vol. 32(3), pp. 586–615. Communications Security (ICICS ’06)’, Vol. 4307, LNCS, pp. 210–227. Boneh, D., Goh, E. & Nissim, K. (2005), Evaluat- ing 2-DNF Formulas on Ciphertexts, in ’Proc. of Sang, Y. & Shen, H. (2007), Privacy Preserving Set TCC’05’, Vol. 3378, LNCS, pp. 325–341. Intersection Protocol Secure Against Malicious Be- haviours, accepted by ‘The 8th International Con- Barreto, P., Kim, H., Lynn, B., & Scott, M., (2002), ference on Parallel and Distributed Computing, Eﬃcient Algorithms for Pairing-Based Cryptosys- Applications and Technologies (PDCAT 2007)’, tems, in ‘Proc. of CRYPTO’02’, Vol. 2442, LNCS, Adelaide, Australia, Dec. 2007. pp. 354–369. Shamir, A. (1979), How to Share a Secret, in ‘Com- Cramer, R., Damgard, I. & Nielsen, J. (2001), Multi- munications of the ACM’, Vol. 22(11), ACM Press, party Computation from Threshold Homomorphic pp. 612–613. Encryption, in ’Advances in Cryptology - EURO- CRYPT 2001’, Vol. 2045, LNCS, pp. 280–300. Yamamura, A. & Saito, T. (2001), Private Informa- tion Retrieval Based on the Subgroup Membership Problem, in ‘Australian Conference on Informa- tion Security and Privacy (ACISP’01)’, Vol. 2119, LNCS, pp. 206–220. Yao, A. (1982), Protocols for Secure Computations, in ‘Proc. of the 23rd Annual IEEE Symposium on Foundations of Computer Science’, pp. 160–164.