P2P Content Delivery Networks by wangjunbaby


The main sources of illegal file sharing are peers who
ignore copyright laws and collude with pirates. To solve this
peer collusion problem, we propose a copyright-compliant
system for legalized P2P content delivery. Our goal is to
stop collusive piracy within the boundary of a P2P content
delivery network. In particular, our scheme appeals to
protecting large-scale perishable contents that diminish in
value as time elapses.

More Info
									970                                                                                     IEEE TRANSACTIONS ON COMPUTERS,       VOL. 58,   NO. 7, JULY 2009

                          Collusive Piracy Prevention in
                         P2P Content Delivery Networks
                        Xiaosong Lou, Student Member, IEEE, and Kai Hwang, Fellow, IEEE

       Abstract—Collusive piracy is the main source of intellectual property violations within the boundary of a P2P network. Paid clients
       (colluders) may illegally share copyrighted content files with unpaid clients (pirates). Such online piracy has hindered the use of open
       P2P networks for commercial content delivery. We propose a proactive content poisoning scheme to stop colluders and pirates from
       alleged copyright infringements in P2P file sharing. The basic idea is to detect pirates timely with identity-based signatures and time-
       stamped tokens. The scheme stops collusive piracy without hurting legitimate P2P clients by targeting poisoning on detected violators,
       exclusively. We developed a new peer authorization protocol (PAP) to distinguish pirates from legitimate clients. Detected pirates will
       receive poisoned chunks in their repeated attempts. Pirates are thus severely penalized with no chance to download successfully in
       tolerable time. Based on simulation results, we find 99.9 percent prevention rate in Gnutella, KaZaA, and Freenet. We achieved 85-
       98 percent prevention rate on eMule, eDonkey, Morpheus, etc. The scheme is shown less effective in protecting some poison-resilient
       networks like BitTorrent and Azureus. Our work opens up the low-cost P2P technology for copyrighted content delivery. The advantage
       lies mainly in minimum delivery cost, higher content availability, and copyright compliance in exploring P2P network resources.

       Index Terms—Peer-to-peer networks, content poisoning, copyright protection, network security.



P     EER-TO-PEER(P2P) networks are most cost-effective in
    delivering large files to massive number of users [3],
[33]. Unfortunately, today’s P2P networks are grossly
                                                                                     provider. P2P networks are inherently scalable, because
                                                                                     more providers lead to faster content delivery.
                                                                                         We use identity-based signatures (IBS) [4] to secure file
abused by illegal distributions of music, games, video                               indexes. IBS offers similar level of security as PKI-based
streams, and popular software. These abuses have not only                            signatures with much less overhead. We apply discrimina-
resulted in heavy financial loss in media and content                                tory content poisoning against pirates. We focus on protec-
industry, but also hindered the legal commercial use of                              tion of decentralized P2P content networks. Protecting
P2P technology.                                                                      centralized P2P networks like Napster is much simpler than
   The main sources of illegal file sharing are peers who                            the scheme we proposed because of centralized indexing.
ignore copyright laws and collude with pirates. To solve this                            Honest or legitimate clients are those that comply with
peer collusion problem, we propose a copyright-compliant                             the copyright law not to share contents freely. Pirates are
system for legalized P2P content delivery. Our goal is to                            peers attempting to download some content files without
stop collusive piracy within the boundary of a P2P content                           paying or authorization. The colluders are those paid clients
delivery network. In particular, our scheme appeals to                               who share the contents with pirates. Pirates and colluders
protecting large-scale perishable contents that diminish in                          coexist with the law-abiding clients.
value as time elapses.                                                                   Content poisoning is implemented by deliberate falsifica-
   Traditional content delivery networks (CDNs) [13], [17],                          tion of the file requested by pirate. The media industry
[24] use a large number of surrogate content servers over                            backed by Record Industry Association of America (RIAA) and
many globally distributed WANs. The content distributors                             Motion Picture Association of America (MPAA) has applied
need to replicate or cache contents on many servers. The                             unscreened brutal-force content poisoning to deter piracy in
bandwidth demand and resources needed to maintain these                              open P2P file-sharing networks. However, their prevention
                                                                                     results are mixed and controversial.
CDNs are very expensive.
                                                                                         Overpeer [23] ceased operation in 2005 for ineffective
   A P2P content network significantly reduces the dis-
                                                                                     universal poisoning of all peer clients. MediaDefender [1]
tribution cost [27], since many content servers are elimi-
                                                                                     has caused a heated controversy over P2P user community,
nated and open networks are used. P2P networks improve
                                                                                     when their operations disrupted many P2P services. The
the content availability, as any peer can serve as a content
                                                                                     eDonkey overlay was ordered to shut down in 2006 [24] due
                                                                                     to incontrollable piracy activities. The popular BitTorrent
. The authors are with the Department of Electrical Engineering, University          and Freenet networks are still facing many lawsuits against
  of Southern California, Los Angles, CA 90089.                                      their content distribution operations [22].
  E-mail: {xlou; kaihwang}@usc.edu.                                                      The media industry applies poisoning against all P2P
Manuscript received 28 Sept. 2007; revised 8 Apr. 2008; accepted 10 Sept.            file-sharing services by brutal-force. In contrast, our scheme
2008; published online 27 Jan. 2009.                                                 detects unpaid pirates and use discriminatory content
Recommended for acceptance by X. Zhang.
For information on obtaining reprints of this article, please send e-mail to:
                                                                                     poisoning to deter online piracy. Legitimate clients can still
tc@computer.org and reference IEEECS Log No TC-2007-09-0492.                         enjoy the flexibility and convenience provided by an open
Digital Object Identifier no. 10.1109/TC.2009.26.                                    P2P network. Our scheme stops pirates from downloading
                                               0018-9340/09/$25.00 ß 2009 IEEE       Published by the IEEE Computer Society
LOU AND HWANG: COLLUSIVE PIRACY PREVENTION IN P2P CONTENT DELIVERY NETWORKS                                                    971

                         TABLE 1                                                              TABLE 2
           Parameters and Notations Used in Paper                     File Chunking, Hashing, Poisoning, and Download Policies
                                                                                      in P2P Content Networks

copyrighted files, even in the presence of colluding peers.
We use a reputation scheme to detect these colluders.
   A copyright-protected P2P network should benefit both
media industry and Internet user communities [6]. Our
work leads to the development of a new generation of                [31], [32]. Digital content protection in P2P networks were
CDNs based on P2P technology. Table 1 lists important               also studied in [15], [29].
symbols and notations used to benefit our readers. These                Electronic publishing was hindered by the rapid growth
terms are used to secure file indexes, generate access tokens,      of copyright violations [14], [26]. The major source of illegal
quantify poisoning effects, collusion prevention, and define        P2P content distribution lies in peer collusion to share
the performance metrics.                                            copyrighted content with other peers or pirates. Kalker et al.
   We focus on finding solution of collusive piracy within the      [16] proposed a copyrighted music distribution over a P2P
scope of a P2P network. Internetwork piracy between                 network. However, the system is ineffective when colluders
unprotected networks is a much more complex security                are undetected.
problem. Our main purpose is to stop colluders from                     Digital watermarking is often considered for digital
releasing content files freely and disrupt pirate efforts from      copyright protection [19]. Digital watermarking is in-
accumulating clean chunks. There are many other forms of            jected to content file so that when a pirated copy is
online or offline piracy that are beyond the scope of this study.   discovered, authorities can find the origin of piracy via a
   For example, our protection scheme does not work on a            unique watermark in each copy. In a P2P network, all
private or enclosed network formed by pirate hosts                  peers are sharing exactly the same file (if not poisoned),
exclusively. We did not solve the randomized piracy                 which effectively defeats the purpose of watermarking.
problems using email attachments, FTP download directly             Thus, watermarking is not a suitable technology for P2P
between colluders, or replicated CDs or DVDss. At present,          file-sharing.
these direct point-to-point copyright violation problems are            By subdividing a large file into small chunks, P2P
mostly handled by digital rights management (DRM) techni-           content delivery allows a peer to download multiple chunks
ques [21]; even the protection results are not considered           from different sources. File chunking increases the avail-
satisfactory [10], as many hackers have post DRM-cracks on          ability and shortens the download time. Based on file
the Internet.
                                                                    chunking and hashing protocols built in popular open P2P
                                                                    networks, we classify them into three network families, as
2   RELATED WORK AND OUR APPROACH                                   shown in Table 2.
First, we review related work on copyrighted P2P content                P2P networks in the same family have some common
delivery. Then we identify our unique approach to solving           features. They are variants or descendants of their pre-
the problem in P2P networks.                                        decessors: BitTorrent [2], Gnutella [11], and eMule [18],
                                                                    respectively. These families are primarily distinguished by
2.1 Related Work                                                    file chunking or hashing protocols applied. Presently, none
A P2P network does not require many expensive servers to            of these P2P networks has built with satisfactory support
deliver contents. Instead, contents are distributed and             for copyright protection.
shared among the peers. P2P networks improve from                       The BitTorrent family [2] applies the strongest hashing at
conventional CDNs in content availability and system                individual piece or chunk level, which is most resistant to
scalability [3]. Many performance and security issues in            poisoning. The Gnutella family applies file-level hashing,
P2P networks have been studied, such as in [5], [12], [30],         which is easily poisoned. The eMule family [18] applies
972                                                                  IEEE TRANSACTIONS ON COMPUTERS,        VOL. 58,   NO. 7, JULY 2009

part-level hashing with fixed chunking. Our analytical and
experimental results show that the eMule family demon-
strates a moderate level of resistance to content poisoning.
   The ability to detect and identify poisoned chunks is
different in three P2P network families. The BitTorrent
family keeps clean chunks and discard poisoned chunks.
The Gnutella family must download the entire file before
any poisoned chunks can be detected. Because of the part-
level hashing, the eMule family will either keep or discard
the entire part of 53 chunks.

2.2 Our Approach and Contributions
Content poisoning is often treated as a security threat to P2P
networks [7], [9]. To our best knowledge, using selective
content poisoning to prevent collusive piracy has not been
explored in the past. We offer the very first proactive          Fig. 1. A secured P2P platform for copyright-protected content delivery.
poisoning approach to curtailing copyright violation in P2P      This open network is accessed by a large number of paid clients, some
networks. We make the following specific contributions           colluders or pirates, and a few distribution agents. The system design
towards P2P content delivery.                                    prevents pirates from downloading copyrighted files from colluders.

2.2.1 Distributed Detection of Colluders and Pirates             3    COPYRIGHT-PROTECTED P2P NETWORKS
We develop a protocol that identifies a peer with its            This section specifies the system architecture, client
endpoint address. File index format is changed to incorpo-       joining process, pirate poisoning mechanism, and colluder
rate a digital signature based on this identity. A peer          detection that we built in the newly proposed copyright-
authentication protocol is developed to establish the            protection scheme for P2P content distribution in open
legitimacy of a peer when it downloads and uploads the           network environment.
file. Using IBS, our system enables each peer to identify
unauthorized peers or pirates without the need for commu-        3.1 Trusted P2P Network Architecture
nication with a central authority.                               Our copyright-protected P2P network is depicted in Fig. 1,
                                                                 conceptually. The network is built over a large number of
2.2.2 Proactive Content Poisoning of Detected Pirates            peers. There are four types of peers coexist in the P2P
Our protocol requires sending poisoned chunks to any             network: clients (honest or legitimate peers), colluders (paid
detected pirate requesting a protected file. If all clients      peers sharing contents with others without authorization),
simply deny download request without poisoning, the              distribution agents (trusted peers operated by content owners
pirates can still accumulate clean chunks from colluders that    for file distribution), and pirates (unpaid clients down-
are willing to share. With poisoning, pirates are forced to      loading content files illegally).
discard even clean chunks received. This will prolong their         To join the system, clients submit the requests to a
download time to a level beyond practical limit. Experi-         transaction server that handles purchasing and billing
ments show that it is unlikely that a pirate can download a      matters. A private key generator (PKG) is installed to generate
clean copy of the file.                                          private keys with IBS for securing communication among
                                                                 the peers. The PKG has a similar role of a certificate authority
2.2.3 Containment of Peer Collusion to Stage Piracy              (CA) in PKI services. The difference lies in the fact that CA
Our system is unique from any existing P2P copyright             generates the public/private key pairs, while PKG only
protection scheme in that we recognize that peer collusion       generates the private key.
is inevitable: a paid customer may intentionally collude            The transaction server and PKG are only used initially
with pirates; a pirate may also hack into client hosts and       when peers are joining the P2P network. With IBS, the
turn them into unwilling colluders. Our system is designed       communication between peers does not require explicit
so that even with large number of colluders, a pirate will       public key, because the identity of each party is used as the
still suffer from intolerably long download time. We also        public key. In our system, file distribution and copyright
present a random collusion detection mechanism to further        protection are completely distributed.
enhance our system.                                                 Based on past experience, the number of peers sharing or
                                                                 requesting the same file at any point of time is around
2.2.4 Trusted P2P Platform for Copyrighted                       hundreds. Depending on the variation of the swarm size,
        Content Delivery                                         only a handful of distribution agents is needed. For example,
Hardware investment for P2P content delivery is much             it is sufficient to use 10 PC-based distribution agents to
lower than that required in any existing CDNs. Our system        handle a swarm size of 2,000 peers. These agents authorize
only uses a few distribution agents to serve large number of     peers to download and prevent unpaid peers from getting
clients. The system is highly scalable, robust to peer and       the same contents.
link failures, and easily deployed in Gnutella, KaZaA,              Paid clients, colluders, and pirates are all mixed up
eMule networks, etc. All claimed advantages are backed by        without visible labels. Our copyright-protection network is
performance analysis and simulation results.                     designed to distinguish them automatically. Each client is
LOU AND HWANG: COLLUSIVE PIRACY PREVENTION IN P2P CONTENT DELIVERY NETWORKS                                                           973

Fig. 2. The bootstrap agent observes end-point address p ¼
68:59:33:62 : 5678 in a trust-enhanced P2P network.
                                                                    Fig. 3. The protected peer joining process for copyrighted P2P content
                                                                    delivery. Seven messages are used to secure the communications
assigned with a bootstrap agent, selected from one of the           among four parties involved.
distribution agents, as its entry point. In current P2P
networks, a peer can self-assert its username without               receipt is encrypted such that only content owner and
verification. Therefore, we use peer endpoint address               distribution agent can decrypt.
(IP address þ port number) instead of username to identify              The client receives the address of the bootstrap agent as
a peer. A peer is considered fully connected if it is reachable     its point of contact. The joining client authenticates with the
via a listening port on its host.                                   bootstrap agent using the digital receipt. The session key
    We use the endpoint address of the listening port as a          assigned by the transaction server secures their commu-
peer identity. For simplicity, we assume that each peer has a       nication. Since the bootstrap agent is set up by the content
statistically configured listening port. Currently, most P2P        owner, it decrypts the receipt and authenticate its identity.
users connect to the Internet via a home network. In such           The bootstrap agent requests a private key from PKG and
environments, statistically configuring the NAT device to           constructs an authorization token, accordingly.
forward incoming ports to a peer node is a norm. The                    Let k be the private key of content owner and id be the
constraint occurs when a large number of peers are behind           identity of the content owner. We use Ek ðmsgÞ to denote the
a single NAT device.                                                encryption of message with key k. The Sk ðmsgÞ denotes a
    Fig. 2 depicts an example: A peer has an IP address             digital signature of plaintext msg with key k. The client is leased from its local router. It is listening to port   identified by user ID and the file by file ID.
5,678 forwarded by the router. When communicating with                  Each legitimate peer has a valid token. The token is only
the bootstrap agent, the peer announces its listening port          valid for a short time so that a peer needs to refresh the
number. The bootstrap agent calls an Observe() subroutine,          token periodically. To ensure that peers do not share the
which verifies that the same peer is indeed reachable via the       content with pirates, the trusted P2P network modifies the
claimed port, although its public IP address is actually            file-index format to include a token and IBS peer signature. Hence, the peer is identified by     Peers use this secured file index in inquiries and download
    The detail of Observe() is as follows: when a peer sends        requests. Seven messages are specified below to protect the
message to its bootstrap agent through outgoing port,               peer joining process:
agent attaches a random number (nonce) in the reply. The               Msg0: Content purchase request;
agent then sends a message to the advertised listening port            Msg1: BootstrapAgentAddress, Ek (digital_receipt, Bootstrap-, asking the peer to send back the nonce. If                Agent_session_key);
the peer replies correctly, then its endpoint is verified.             Msg2: Adding digital signature Ek (digital_receipt);
    The endpoint address is used as peer’s public key. There           Msg3: Authentication request with userID, fileID, Ek
is no need to encrypt the file body. This reduces the system                (digital_receipt);
overhead. Enabling peers behind NAT without a static                   Msg4: Private key request with privateKeyRequest (observed
listening port requires a hole-punching mechanism. The                      peer address);
system uses the bootstrap agent to forward the incoming                Msg5: PKG replies with privateKey;
requests. The identities of all agents, except the bootstrap           Msg6: Assign the authentication token to the client.
agent, are hidden from clients. This stops a malicious node
to blacklist or attack the distribution agents.                        Peers identify the pirates by checking the validity of
                                                                    extra signatures in file indexes. The trusted P2P applies this
3.2 Protection in Peer Joining Process
                                                                    protection to share clean contents exclusively among the
Fig. 3 illustrates the process for a client to join a P2P           peers, and uses content poisoning techniques against the
network supported by a new peer authorization protocol              pirates. Tokens are time-stamped and need to be refreshed
(PAP). We will formally specify PAP in Section 4.3. Here,           periodically. Colluders detected by our system cannot
we first introduce the handshaking mechanisms used to               receive new token after its current token expires.
protect the peer joining process. For a peer to join the
network, it first logs in to a transaction server to purchase       3.3 Proactive Content Poisoning
the content. After transaction, the client receives a digital       We summarize in Table 3 the key protocols and mechan-
receipt containing the content title, client ID, etc. This          isms used to construct the trusted P2P system. In this
974                                                                              IEEE TRANSACTIONS ON COMPUTERS,         VOL. 58,   NO. 7, JULY 2009

                           TABLE 3
               Mechanisms for Copyright Protection

                                                                              Fig. 5. Distribution agent randomly recruits some clients to probe
                                                                              suspected peers. Collusion is reported when a peer replies clean content
                                                                              to an illegal download request.

                                                                              downloading corrupted file, the pirates will eventually give
                                                                              up the attempt out of frustration.

                                                                              3.4 Randomized Colluder Detection
                                                                              Although our system is designed to tolerate the presence of
                                                                              colluders in the network, we show in later sections that
                                                                              reducing number of colluders will improve system perfor-
                                                                              mance. Therefore, we introduce a reputation-based [8]
                                                                              colluder detection mechanism to secure our system from
Fig. 4. Proactive poisoning mechanisms built in trusted P2P network,
                                                                                  As reported in the GossipTrust paper [34], the gossip
where clean chunks (white) and poisoned chunks (shaded) are mixed in          protocol and power nodes play a crucial role in speeding up
file streams received by pirates, but legitimate clients receive only clean   the reputation aggregation process in a P2P network.
chunks.                                                                       Randomized gossiping can reach consensus among all peers
                                                                              in a distributed manner. This approach exploits massive
approach, modified file index format enables pirate detec-                    concurrency among millions of active nodes in a very large
tion. PAP authorizes legitimate download privileges to                        P2P network. We design below a simplified GossipTrust
clients. Content distributor applies content poisoning to                     system to identify colluders.
disrupt illegal file distribution to unpaid clients. The system                   The idea is to associate each {peer, file} pair with a collusion
is enhanced by randomized collusion detection.                                rate. The “0” rate means that the peer was never reported as
   In our system, a content file must be downloaded fully to                  a colluder. Otherwise, the peer is getting a collusion report
be useful. Such a restraint is easily achievable by compres-                  of “1,” meaning that it has shared clean content with illegal
sing and encrypting the file with a trivial password that is                  download requesters. This collusion rate is accumulative
known to every peer. This encryption does not offer any                       like the way e-Bay collects peer’s reputation scores. Fig. 5
protection of the content, except to package the entire file                  illustrates the collusion detection process.
for distribution.                                                                 Distribution agents randomly recruit clients, called
   Fig. 4 illustrates the proactive content poisoning mechan-                 decoys, to send illegal download requests to suspected
isms built in our enhanced P2P system. If a pirate sends                      peers. If an illegal request is returned with a clean file
download request to a distribution agent or a client, then by                 chunk, the decoy reports the collusion event. Since the
protocol definition, it will receive poisoned file chunks. If the             decoy is randomly chosen, there exists a risk that the report
download request was sent to a colluder, then it will receive                 is not trustworthy either by error or by cheating. Thus, we
clean file chunks. If a pirate shares the file chunks with                    need a reputation system to screen the peers.
another pirate, then it could potentially spread the poison.                      To choose honest decoys, we designed a lightweight
   Therefore, it is critical to send poisoned chunks to                       reputation system. Consider a P2P network with n paid
pirates, not simply denying their requests. Otherwise, even                   clients. We use a collusion vector C ¼ fci g, where 0 ci
if all clients deny pirate’s requests, the pirate still can                   ’ is the collusion rate of peer i. The collusion threshold ’ is
assemble a clean copy from those colluders who have                           used to bar detected colluders from getting new tokens.
responded with clean chunks.                                                      When a current token expires, the colluder is labeled
   With poisoning, we exploit the limited poison detection                    as a pirate with denied access to the file. We define a
capability of P2P networks and force a pirate to discard the                  trust vector T ¼ fti g, where ti ¼ 1 À ci =’ for all 1 i n.
clean chunks downloaded with the poisoned chunks. The                         When a decoy i probes a peer j for collusion, it sends j
rationale behind such poisoning is that if a pirate keeps                     an illegal request and sends report rij to the agent. The
LOU AND HWANG: COLLUSIVE PIRACY PREVENTION IN P2P CONTENT DELIVERY NETWORKS                                                   975

condition rij ¼ 1, when j replies with a clean content. The       4.2 File-Level Token Generation
collusion rate for peer j is computed by the following            First, both the transaction server and the PKG are fully
expression:                                                       trusted. Their public keys are known to all peers. The
                                                                  PAP protocol consists of two integral parts: token genera-
        cj ¼ min cj þ ti  rij ; ’g for all 1
             minfc                              i; j   n:   ð1Þ   tion and authorization verification. When a peer joins the
   Peer i is identified as a colluder, when its collusion rate    P2P network, it first sends authorization request to the
exceeds the threshold, i.e., cj ! ’ . With this reputation        bootstrap agent. All messages between a peer and its
                                                                  bootstrap agent are encrypted using the session key
system, a distribution agent weighs each decoy’s report
                                                                  assigned by the transaction server at purchase time.
against its own trust score to determine the trustworthiness
                                                                     The authorization token is generated by Algorithm 1
of the reported collusion event. Such a design ensures that a     specified below. A token is a digital signature of a three-
pirate will never be selected as a probing decoy.                 tuple: {peer endpoint, file ID, timestamp} signed by the private
   Consider a case when the collusion threshold is set with       key of the content owner. Since bootstrap agent has a copy
’ ¼ 2:5. Consider an honest peer i with an initial collusion      of the digital receipt sent by transaction server, verifying
rate ci ¼ 0, and thus, a complete trust ti ¼ 1 initially. A       the receipt is thus done locally. The Decript (Receipt)
suspected client j has collusion rate ci ¼ 1:6. We recruit i to   function decrypts the digital receipt to identify the file .
probe j, and i reports with rij ¼ 1. We can identify peer j as    The Observe (requestor) returns with the endpoint address p.
a colluder since cj ¼ Min½1:6 þ 1  1; 2:5Š ¼ 2:5. This way,      The Owner Sign (; p ; ts ) function returns with a token.
only high-reputation clients are hired as probing decoys.            Upon receiving a private key, the bootstrap agent
                                                                  digitally signs the file ID, endpoint address, and timestamp
Thus, more credibility is given to ensure the accuracy of
                                                                  to create the token. The reply message contains a four-
colluder detection.
                                                                  tuple: {endpoint address, peer private key, timestamp, token}.
                                                                  The reply message from bootstrap agent is encrypted using
4   PEER AUTHORIZATION PROTOCOL                                   the assigned session key.
In a P2P content distribution network, only the content           Algorithm 1. Token Generation
owner can verify the user ID/password pair; peers cannot          Input: Digital Receipt
check each other’s identity. Revealing a user’s identity to       Output: Encrypted authorization token T
other peers violates his or her privacy. To solve this problem,   Procedures :
we developed a PAP protocol. First, we apply IBS to secure        01: if Receipt is invalid,
file indexing. Then we outline the procedure to generate          02:    deny the request;
tokens. Finally, we specify the PAP protocol that authorizes      03: else
file access to download by peers.                                 04:     = Decrypt(Receipt);
                                                                            //  is file identifier decrypted from receipt //
4.1 Secure File Indexing                                          05:    p = Observe(requestor);
In a P2P file-sharing network, a file index is used to map a                // p is endpoint address as peer identity//
file ID to a peer endpoint address. When a peer requests to       06:    k = PrivateKeyRequest (p);
download a file, it first queries the indexes that match a                  // Request a private key for user at p //
given file ID. Then the requester downloads from selected         07:    Token T = OwnerSign(f ; p; ts )
peers pointed by the indexes. To detect pirates from paid                   // Sign the token T to access file f //
clients, we propose to modify file index to include three         08:    Reply = fk; p; ts ; T g
interlocking components: an authorization token, a timestamp,               // Reply with key, endpoint address,
and a peer signature.                                                          timestamp, and the token //
    Each legitimate client has a valid token assigned by its      09:    SendtoRequestor {Encrypt(Reply)}
bootstrap agent. The timestamp indicates the time when a                    // Encrypt reply with the session key //
token expires. Thus, the peer needs to refresh the token          10: end if
periodically. This short-lived token is designed for protect-        The cost at each distribution agent to refresh the tokens is
ing copyright against colluders. The cost at each distribu-       rather limited. In our experiments, there are 10 distribution
tion agent to refresh the client tokens is rather limited, as     agents to serve 1,000 clients/colluders. Each token refresh
shown via experiments. The peer signature is signed with          requires transmitting at most 2 KB of data and each peer is
the private key generated by PKG. This signature proves           required to refresh its token in every 10 minutes. Per each
the authenticity of a peer.                                       agent, there are 1; 000=10 ¼ 100 peers refreshing tokens in
    Download requests make explicit references to file            10 minutes, Hence, we need to transmit only 100 Â 2 KB ¼
indexes. The combined effects of the three extra fields           200 KB to refresh the tokens in every 10 minutes. Considering
                                                                  a standard broadband link capacity of 1.5 Mbps bandwidth,
ensure that all references to the file indexes are secured.
                                                                  such a low refreshing overhead is negligible.
Peers identify the pirates by checking the validity of the
token and the signature in a file index. These features secure    4.3 The Peer Authorization Protocol
the P2P network operations to safeguard the sharing of            The PAP protocol is formally specified below. A client must
clean contents among the paid clients.                            verify the download privilege of a requesting peer before
976                                                                            IEEE TRANSACTIONS ON COMPUTERS,    VOL. 58,   NO. 7, JULY 2009

                                                                           4.4 Adversary and Security Analysis
                                                                           In contrast to a security-via-obscurity scheme, the PAP
                                                                           protocol is designed to be completely open. We provide an
                                                                           adversary analysis for security assurance of the proposed
                                                                           copyright-protected P2P networks. These assurances ensure
                                                                           that our PAP protocol is secured from common attacks as
                                                                           explained below.

                                                                           4.4.1 Peer Endpoint Address Is Forgery Proof
                                                                           Collusive piracy is achievable, only if the pirate manages
                                                                           to communicate with other peers. IP spoofing can change
                                                                           pirate’s endpoint address, resulting in pirate not to receive
Fig. 6. The PAP enables instant detection of a pirate upon submitting an
illegal download request.
                                                                           any response. Therefore, spoofing endpoint address during
                                                                           download is useless to a pirate. A pirate can intercept the
clean file chunks are shared with the requestor. If the                    token sent to a client, and masquerade its own endpoint
requestor fails to present proper credentials, the client must             address to match with the token. However, using the
send poisoned chunks, as shown in Fig. 6.                                  Observe() subroutine illustrated in Fig. 2, other clients will
   In PAP, a download request applies a token T , file index ,            notice the masqueraded peer identity and fail its endpoint
timestamp ts , and the peer signature S . If any of the fields are         verification.
missing, the download is stopped. A download client must
have a valid token T and signature S . Two pieces of critical              4.4.2 Authorization Tokens Cannot be Shared by Peers
information are needed: public key K of PKG and the peer                   A token is generated after the verification of a digital receipt.
endpoint address p .                                                       This is used to authorize a client to download the content. It
   Algorithm 2 verifies both token T and signature S . File                is designed to be a digital signature of a three-tuple: {file ID,
index ð; p contains the peer endpoint address p and the file
         ; pÞ                                                             endpoint address, timestamp}. Multiple peers cannot share this
ID . Token T also contains the file index information and ts              three-tuple because each peer has a different endpoint
indicating the expiration time of the token. The Parse (input)             address. Sharing the same token on different endpoint
extracts timestamp ts , token T , signature S , and index  from           addresses will result in signature mismatch. This is applied
a download request. The function Match (T ; ts , K ) checks the
                                             T                             to stop a pirate from using a stolen token.
token T against public key K . Similarly, Match (S ; p ) grants
access if S matches with p .                                               4.4.3 Pirates Cannot Poison Legitimate Clients
Algorithm 2. Peer Authorization Protocol                                   Our system modifies file index format to include tokens and
Input: T = token, ts = timestamp, S = peer signature,                      signatures. When downloading from other peers, a client
          and ð; p = file index for file  at endpoint p
                 ; p)                                                     checks the file index for valid signatures. It only downloads
Output: Peer authorization status                                          file chunks from other legitimate clients that publish some
            True: authorization granted                                    valid file indexes. Therefore, even if a pirate attempts to
            False: authorization denied                                    poison other peers, no legitimate client will use it as a
Procedures :                                                               download source.
01: Parse (input) = fT ; ts ; S; ð; pÞg
    // Check all credentials from a input request //                       4.4.4 Stolen Private Keys Are Useless to Pirates
02: p = Observe(requestor);                                                A pirate may hack into a peer’s host to obtain its private
          // detect peer endpoint address p //                             keys. A colluder may even share these secrets with a pirate.
03: if {Match (S; p fails},
                 S; p)                                                     However, sharing or stealing private keys does not help the
        //Fake endpoint address p detected //                              pirate at all, because of the use of endpoint address as public
        return false;                                                      key. Since other clients use Observe() subroutine to obtain
04: endif                                                                  peer endpoint address, stolen private keys are useful.
05: if {Match(T ; ts ; K fails},
                T      K)
          return false;                                                    5    PROTECTION PERFORMANCE ANALYSIS
          // Invalid or expired token detected //
06: endif                                                                  In this section, we analyze the performance of the P2P
07: return true;                                                           copyright protection scheme. First, we give the condition to
                                                                           secure the file index. Then, we calculate the poisoning rate  of
   When a client downloads a file, it needs to authorize the
                                                                           receiving poisoned chunk in response to a pirate’s down-
peer to share the file. Otherwise, downloading from a pirate
may be poisoned, as shown in Fig. 4. When responding                       load request. Finally, we estimate the average file download
queries from honest peers, a client adopts a slightly                      times T by legitimate clients and detected pirates for
reduced version of Algorithm 2. Because the inquiry is                     comparison. The protection success rate  measures the
sent directly to endpoint p , the Observe() procedure is no                percentage of pirates that fail to download the requested
longer required.                                                           file within a given tolerance threshold.
LOU AND HWANG: COLLUSIVE PIRACY PREVENTION IN P2P CONTENT DELIVERY NETWORKS                                                           977

5.1 Secure File Indexes
In current P2P networks, a file index ð; p associates a file
identifier  with a peer endpoint address p. In PAP, we
replace this index format with a four-tuple:

                  È ¼ f ð; pÞ; T ; ts ; Sg
                                          Sg:              ð2Þ
   This security-enhanced index format cannot be forged.
Both T and S are collision-free signatures. A pirate cannot
create its own token or signature via brutal-force attack.
Therefore, a pirate cannot create index by itself. With
Algorithm 2, attempt to modify any single element of È will
fail in token or signature verification or both. Therefore, the
enhanced index È is secured.
   Based on above discussion and Section 4.4, there exist a       Fig. 7. Variation of the poisoning rate in BitTorrent-like networks with
one-to-one mapping of È and client digital receipt. This          respect to variation in piracy rate r and collusion rate ".
forgery proof mapping is the foundation of our PAP
protocol because it ensures distributed pirate detection at           ¼ Probability {poisoned by a client or an agent} þ
every client. Securing the digital receipt belongs to the            Probability {poisoned by a pirate} ¼ ð1 À rÞð1 À "Þ þ r ¼
realm of general network security, which is beyond the               1 À " Q:E:D:
scope of this paper.                                                    Fig. 7 plots the poisoning rate  in (1) as a function of r
   The reason of using IBS instead of PKI service is due to          and ". This is a concave upward surface in the 3D space.
concern of overhead. In a P2P network with n peers, each             The peak of the protection surface is at the point when
peer may need to contact all n À 1 peers. If we use                  we have a fully trusted P2P network by which r ¼ 0. The
PKI service for signature verification, the total CA com-            lowest point corresponds to a completely pirated net-
munication overhead is Oðn2 Þ. With an IBS system, this
                              n                                      work where law-abiding peer does not exist.
overhead is reduced to OðnÞ, because a peer needs to
contact the PKG only once.                                        5.3 Prolonged Download Time by Pirates
                                                                  In this study, we ignore the accidental corruption of content
5.2 Chunk Poisoning Rate
                                                                  (content pollution) due to transmission error, etc. To capture
Our system has an integral function to randomly detect            the inherent poisoning resistance of a P2P network, we
colluders. However, such effect could never be perfect. It is     define a piracy penalty  as the percent of downloaded
always possible that some colluders will evade the detec-
                                                                  chunks that are discarded due to poisoning.
tion. Therefore, these undetected colluders become the real
                                                                      Let f be the size of a clean content file and d be the actual
source of copyright violations. Let collusion rate " be the
                                                                  file downloaded including clean and poisoned chunks. The
percentage of paid clients acting as undetected colluders.
                                                                  ratio f =d represents the percentage of downloaded chunks
The pirate receives clean content from undetected colluders.
   Under a randomized policy, the piracy rate r is the            that are clean. Thus, we have
percent of pirates among all peers in the content delivery                                     ¼ 1 À f d:
                                                                                                      f=d                             ð5Þ
network. We define chunk poisoning rate  as the probability
of a pirate to receive a poisoned chunk. The following two           Piracy penalty implies extra workload imposed on the
theorems are obtained:                                            pirates to receive some poisoned chunks. In estimating this
                                                                  piracy penalty, we ignore the effects of network topology,
Theorem 1. In BitTorrent-like network, the chunk poisoning
  rate  is expressed by                                          traffic congestions, peer locations, etc. To normalize the
                                                                  results, we consider a chunk the smallest unit of a file, whose
                       ¼ ð1 À rÞð1 À "Þ
                                      "Þ:                  ð3Þ    hash value is used to verify its authenticity and integrity. A
      For eMule and Gnutella, the chunk poisoning rate  is       chunk is called a piece in BitTorrent. Chunk is called by
   expressed by                                                   eMule or Gnutella families.
                                                                     We model the piracy penalty on all three P2P content
                          ¼ ð1 À "Þ
                                  "Þ:                      ð4Þ    networks: Gnutella, eMule, and BitTorrent. In the eMule
                                                                  network, every 53 chunks form a part. Peers compute the
Proof. In BitTorrent, only an honest client or a distribution
                                                                  hash values of all parts and exchange the part-level hash sets
  agent can poison a pirate. There is no propagation of
  poisoned chunks among the pirates. The term (1 À r )            inside the P2P network. Thus, the part hash sets are also
  represents the percentage of nonpirates among all               susceptible to content poisoning.
  peers. Among these peers, (1 À " is the percent of
                                  1 ")                               We estimate below the download time of a pirate. This
  noncolluding clients. Therefore,  is just the product of       estimation will be verified by simulation experiments.
  the two terms.                                                  Consider a peer attempting to download a content file of
     A pirate cannot identify poisoned file chunks in eMule       size f . Let b be the average download speed for a peer. A
  and Gnutella. The pirate stores undetected poisoned             legitimate client does not receive any poisoned chunks.
  chunks in its local cache and unknowingly shares them           Thus, the download time of a legitimate client is simply
  with other pirates. We can express poisoning rate by            calculated by Tc ¼ f b.
978                                                                  IEEE TRANSACTIONS ON COMPUTERS,       VOL. 58,   NO. 7, JULY 2009

    By definition of  ; 1 À  represents the percent of useful
download effort. Every time a pirate attempts to download a
file, only 1 À  portion of the downloaded file is clean and
useful. Therefore, to receive the entire file, the pirate must
repeat the download attempts 1=ð1 À  ) times. This leads to
the following download time estimated for a pirate Tp :

                      Tp ¼ f bð1 À Þ
                           f=bð1 Þ:                       ð6Þ

Theorem 2. The pirate is expected to experience the following
  download times in three existing networks with our proactive
  copyright-protection system:
            8                                                     Fig. 8. Copyright-protected P2P simulation experiment environment.
            < f =ðb "m Þ;
                  b                 Gnutella family;
   E½T p Š ¼ f =ðb "54 Þ;
     T            b                 eMule family;          ð7Þ    time and protection success rate of P2P networks simulated.
            :                                                     Finally, we compare their differences in defense perfor-
              f =½b ðr þ " À r"
                  b r        r"ފ;  BitTorrent family:
                                                                  mance and protection overhead experienced.
Proof. In Gnutella, a poisoned file must be discarded and
                                                                  6.1 Simulation Setting and Experiments
  downloaded again. Thus, f =d ¼ 1 À  m . By (4), we
                                   d ¼ð1 Þ
  have  ¼ 1À 1 À  m ¼ 1À 1Àð1 À " m ¼ 1 À "m . In an
             1Àð1 Þ         1À½1 1 "ފ                           The simulated P2P network environment is illustrated in
  eMule network, a pirate verifies file chunks at the part        Fig. 8. The simulator consists of three layers: The lowest
  level. If the part hash set is clean, the poisoning rate        layer for data collection and reporting. The middle layer
   ¼ ð1 À  53 . For a poisoned hash set, we get 100 percent
        1 Þ                                                      simulates the behaviors of four peer categories: distribution
  piracy penalty. Combining the two cases, we have                agents, legitimate clients, peer colluders, and illegal pirates.
   ¼  þ ð1 À  1À ð1 À  53 Š ¼ 1À ð1 À  54 ¼ 1 À "54 b y
            1 )½1À 1 Þ                1 Þ                      The upper layer simulates the P2P transport.
  substituting  from (4). In BitTorrent, the piracy penalty          For eMule simulation, we bypassed the eMule server
  for the entire file equals the poisoning rate of individual     connections so that the network operates in a completely
  chunk:  ¼  ¼ ð1 À r 1 À " after substituting  by (3).
                     1 rÞð1 "Þ                                    decentralized mode. Assume that all peers are connected via
  Substituting the above results on piracy penalty to (6),        broadband connections with bandwidth limit of 1.5 Mbps.
  we obtain (7). Q.E.D.                                           For simplicity, we ignored network delays and transmission
                                                                  errors. The simulator excludes the communication between
    We define a tolerance threshold  as the maximum time         distribution agents and the PKG, which are outside of the
any pirate can tolerate to download a file. The protection        P2P network. The network topology is known to all peers.
success rate  measures the probability that a pirate fails to        The simulator measures the shortest possible download
download the file successfully within the time frame . Let       time by a pirate or by a client. In reality, the download time
g ðTp Þ be probability density function of the download time
   T                                                              by a pirate should be even longer than the measured value
by a pirate. Then  is defined as follows:                        due to delays caused by other network factors. Simulator
                                   Z 1                            starts with distribution agents having clean chunks. The
                ¼ Prob½Tp > Š ¼      gðTp ÞdTp :         ð8Þ    chunks are distributed using a rarest-first policy [20].
                                                                     We use 10 distribution agents in our simulated networks.
   The above expressions for the poisoning rate  , piracy        By varying the numbers of peers, colluders, and pirates, we
penalty  , expected download times Tc and Tp , and               simulate the P2P network with different level of piracy
protection success rate  are obtained by analytical reason-      challenges. There are many other peer selection policies used
ing. Their accuracy is verified by simulation experiments in      by P2P client software. Essentially, these policies favor one
Section 6, except in some cases, where the pirate download        group of peers against another. Therefore, we simply lump
time becomes so large that cannot be simulated in finite          their effects into different values of peer collusion rate " or of
time. For simplicity, we assume that pirates adopting a           the piracy rate r.
random peer selection policy. In Section 6, we will discuss           A P2P network may have millions of peer nodes. Our
other peer selection policies.                                    experience shows that per content file, a peer accesses at
                                                                  most a few hundreds of peer nodes. Since the trusted P2P
                                                                  offers file-level copyright protection, it is sufficient to
6     SIMULATED P2P EXPERIMENTAL RESULTS                          simulate a P2P overlay with 2,000 of peer nodes. We test
It is very difficult to conduct copyright violation experiments   the distribution of a 700 MB CD-ROM file. In eMule
in a real-life P2P network. We have to use simulated P2P          network, this file is divided into 4,017 chunks.
experiments to verify the analytical results just presented.          To explore the limits of a trusted P2P network with the
We simulate P2P network architectures of three major P2P          proposed copyright protection features, we also tested small
network families introduced in Table 2.                           files of sizes 10 and 20 MB, and very large content files of
    The trusted P2P features are built into simulators of these   size 4.5 GB of a movie. Based on estimation in Theorem 2,
three popular P2P networks. The simulation process consists       the download times of large content files by pirates may
of three stages: First, we measure the chunk poisoning rate      require millions of years, meaning infeasible for a pirate to
on simulated networks. Second, we measure the download            download a clean file from a trusted P2P network.
LOU AND HWANG: COLLUSIVE PIRACY PREVENTION IN P2P CONTENT DELIVERY NETWORKS                                                                        979

Fig. 9. Chunk poisoning rate ( ) in two P2P networks under two collusion
rates and a fixed piracy rate r = 50 percent. (a) eMule network and
(b) Gnutella network.                                                       Fig. 10. Poisoning rate decreases linearly with increasing peer collusion
                                                                            rate, independent of the piracy rate. (a) Effect of " under r ¼ 50 percent.
                                                                            (b) Effect of r under " = 10 percent.
6.2 Results on Chunk Poisoning Rate
In all experiments, the chunk poisoning rate  starts from
100 percent. This reflects the fact that initially only the
distribution agents have clean file chunks in local caches.
Since distribution agents must poison any request from a
detected pirate, the pirate will receive only poisoned chunks.
When colluders have the chunks to share, the poisoning rate
 is lowered to a stable value after about 2 hours. Fig. 9a
reports the measured poisoning rate  for the first two hours
on a simulated eMule network. The y-axis shows the average
 value over all 1,000 pirates.
   The piracy rate r assumes the value 50 percent. With low
" ¼ 1 percent, the poisoning rate is fairly close to the ideal case
of  ¼ 100 percent of poisoning. With a higher " ¼ 10 percent,
                                                                            Fig. 11. Average download time of a 700-MB CD file by legitimate clients
poisoning rate  starts at a value close to 1 and fluctuates in a           in simulated eMule and Gnutella networks.
decreasing trend. In about 1.5 hours,  converges to 0.9. Fig. 9b
plots similar results on the Gnutella network.                                 This result reinforces Theorem 1, namely, the poisoning
   Fig. 10 plots the effects of changing " and r on the                     rate  is irrelevant to the piracy rate r on the eMule network.
poisoning rate . Fig. 10a reports results of varying the                   Although we simulated " up to 0.8, such extremely high
collusion rate " with r ¼ 50 percent and 1,000 pirates out of               collusion rate is unlikely to occur in real-life P2P applica-
2,000 peers. Increasing the collusion rate " results in a sharp             tions. We incorporate randomized detection to identify
drop of poisoning rate on both eMule and Gnutella.                          colluders. The detected colluders will not obtain a new
   As the number of colluders increases to 800 (" ¼ 0:8), the
                                                      "                     token, and thus treated as new pirates after the current
average  is reduced to only 20 percent. The decrease of  is               token expires.
linear with respect to the increase of ", especially in the case
of Gnutella. On the eMule network, the measured poisoning                   6.3 Download Time by Legitimate Clients
rate  deviates a little from the theoretical prediction, as                Fig. 11 plots the average client download times of a content
dictated by (2).                                                            file with a file size f ¼ 700 MB. Because clients are not
   The poisoning rate is insensitive to the increase of                     poisoned, we measure their download time as a basis for
pirates, as shown in Fig. 10b. These results are plotted for                studying the download time penalty of pirates. In both
the low collusion rate " ¼ 10 percent corresponding to                      eMule and Gnutella, the average download time for clients
100 colluders out of 1,000 paid peers. The simulation results               remains quite flat around 1.5 hours.
show that when the number of pirates increases from 200 to                      This flat download time implies no suffer by legal users
1,000, the change of  is very small.                                       under peer collusions. In Fig. 11, a paid client on the
980                                                                       IEEE TRANSACTIONS ON COMPUTERS,    VOL. 58,   NO. 7, JULY 2009

                                                                                                TABLE 4
                                                                           Expected Download Times by Pirates and Paid Clients
                                                                                    in Three Simulated P2P Networks

                                                                        satisfactory, because most pirates will give up trying after a
                                                                        few days without success.
                                                                           The Gnutella family including KaZaA and LimeWare
                                                                        networks has the highest success rate close to 99.9 percent
                                                                        for both file sizes. With a tolerance threshold of 20 days in
                                                                        Fig. 12, the eMule network has an average of 98 percent
                                                                        success rate for the 4.5 GB file and 85 percent success rate
                                                                        for the 700 MB file.
                                                                           In comparison, an average pirate in BitTorrent network
                                                                        takes about 100 minutes to download the 700 MB file, only
                                                                        marginally longer than a paid client. Hence, the success rate
                                                                        drops rapidly before 2 hours. This implies that our system
                                                                        does not protect BitTorrent well due to its strong resistance
                                                                        to content poisoning.
                                                                           Table 4 reports the expected download time Tc by a regular
                                                                        client and Tp expected by a pirate. These numbers are
Fig. 12. Protection success rate of a simulated eMule network. Most     averaged over 1,000 pirates out of 2,000 peers. Some
users have a tolerance threshold far less than 5 days. (a) 700 MB CD-   extraordinarily large download times by pirates on eMule
ROM file and (b) 4.5 GB movie file.                                     or Gnutella networks will exceed hundreds or thousands of
                                                                        years, far beyond what the simulation experiments can do.
Gnutella downloads in 10 percent shorter time than a client             Their magnitude are calculated using (4) and marked as
on the eMule network. The shorter download time on                      greater than 1,000,000 years in Table 3. This implies that
Gnutella is attributed to the lower complexity of the file              pirates cannot download the files from Gnutella successfully.
chunking protocol applied in Gnutella.                                     Among the three P2P network families in Table 1, we
                                                                        find that Gnutella family is best protected by our system.
6.4 Pirate Download Time and Success Rate                               The eMule family is the next and BitTorrent-like networks
Now, we report simulation results on the expected down-                 are most poison resistant. In reality, the tolerance threshold
load time by pirates. To explore the limit of the trusted P2P           by pirates is only a few days at most. Pirates may try other
system, we have experimented on various file sizes. We                  means to steal from public networks. That kind of attacks is
define the protection success rate  by the failure rate of             beyond the scope of our P2P protection scheme.
pirates to download the file within a tolerance threshold  of             In general, we find that our system protects eMule and
time. The  is set at 20 days for a 700 MB CD-ROM image                 Gnutella families rather satisfactorily, if the tolerance
file and 30 days for a 4.5 GB movie file.                               threshold is set to be 20 days for files up to 700 MB. For
    In Fig. 12, we simulated all three P2P network families             the 4.5 GB movie file, the tolerance level may have to set as
with 100 paid clients, 900 colluders, and 1,000 pirates. This           long as 4 months. This implies that all pirates will give up
scenario of 90 percent colluders is very unlikely in a real-life        the attempt on P2P networks, if they have to wait for so
P2P network. We design this experiment to approximate a                 long and still cannot download a clean copy.
worst-case scenario that all pirates adopt an aggressive peer
selection policy. We measure the percentage of pirates that             6.5 Overheads in Token and Poison Processing
fail to download a clean copy within the time frame  as                In our experiments, a distribution agent assigns new tokens
an approximated success rate  . All  curves start with                to about 100 clients/colluders in every 10 minutes. Fig. 13
100 percent initially. With increasing threshold, we evaluate           plots the traffic distribution of three message types from a
the success rate of the three networks.                                 single agent. In Table 4, the regular download time of a
    Without poisoning, the average download time for a                  700 MB CD image file to a paid client is about 90 minutes.
client is 1.5 hours. The Gnutella family, including KaZaA                  The white bars for actual content delivery dominate the
and LimeWire, has a near-perfect success rate (higher than              distribution, consuming almost 99.9 percent of the link
99.9 percent). The eMule network has an average 85 percent              bandwidth of 1.5 Mbps. The crossed bars are the total byte
after tolerating up to 20 days. These success rates are                 count for token distribution. During content uploading, the
LOU AND HWANG: COLLUSIVE PIRACY PREVENTION IN P2P CONTENT DELIVERY NETWORKS                                                               981

Fig. 13. Traffic distribution for uploading a 700-MB CD file using   Fig. 15. Normalized upload poisoning overhead for delivering files of 10-
authorization token and chunk poisoning over 100 peer nodes.         700 MB over the ADSL links.

                                                                     legitimate clients. Initially, since no client has any file
                                                                     chunks to share, the clients receive no request for either.
                                                                        Fig. 14a shows a high piracy rate of r ¼ 50 percent,
                                                                     meaning that 1,000 pirates are present out of 2,000 peers in
                                                                     the simulated eMule network. Therefore, almost all upload
                                                                     bandwidth is used to handle the pirate requests after
                                                                     1.5 hours. Fig. 14b corresponds to a low piracy rate of
                                                                     200 pirates out of 2,000 peers. Thus, lower portion of the
                                                                     bandwidth is allocated to distribute poisoned chunks. The
                                                                     poisoning overhead is dropped to 40 percent after one hour.
                                                                        Fig. 15 plots the upload bandwidth allocated for content
                                                                     poisoning at a distribution agent. The agents dedicate the
                                                                     upload bandwidth, when client download is in progress.
                                                                     When there are no legitimate requests, distribution agents
                                                                     are fully dedicated to the task of poisoning pirates. This
                                                                     switching of bandwidth allocation enhances a low-cost
                                                                     P2P file distribution network.
                                                                        For a small content file of 10 MB, the switching of
                                                                     bandwidth usage takes place at 2 minutes. The 50 MB file
                                                                     takes 7 minutes to switch from content delivery to
                                                                     complete poisoning. The 700 MB file takes 90 minutes to
                                                                     switch the bandwidth allocation. The differences in the
                                                                     switch time correspond to the average download time of a
                                                                     legitimate client.
Fig. 14. Normalized upload bandwidth for distributing clean and
poisoned chunks in a simulated eMule network. (a) High piracy rate
(50 percent). (b) Low piracy rate (10 percent).                      7    CONCLUSIONS AND SUGGESTIONS
                                                                     Based on the above performance results, the combined use of
token and poisoning overhead is less than 0.1 percent
                                                                     IBS-based indexing and selective chunk poisoning is indeed
(¼ 0:1 MB/100 MB), which is negligible.
                                                                     effective to detect colluders and pirates, and stop them from
   The responsibility of content poisoning is distributed to
                                                                     collusive piracy in major families of P2P networks.
all paid clients and distribution agents. This distribution
inevitably costs some upload bandwidth by agents and                 7.1 Major Research Findings
clients. Compared to clean (unpoisoned) file sharing, the            Summarized below are the major research findings and
upload bandwidth constitutes the network overhead for                discussions on deployment requirements of our proposed
distributing poisoned chunks.                                        P2P copyright-protection system.
   We distinguish in Fig. 14 the bandwidth consumed to
deliver clean file versus poisoned file chunks. We report the        7.1.1 Stop Piracy at the Presence of Colluders
normalized bandwidth allocation for both types of file               With secure file indexing and assistance from a peer
chunks in eMule network under two peer distributions. The            reputation system, colluding peers are detectable. The system
shaded area represents the upload bandwidth of a client              stops piracy by poisoning pirates with excessively long
allocated to distribute poisoned chunks. The white area is           download overhead at the presence of a large number of
the bandwidth used to distribute the content files to                colluders.
982                                                                    IEEE TRANSACTIONS ON COMPUTERS,    VOL. 58,   NO. 7, JULY 2009

7.1.2 Pirates Detected Immediately upon First Attempt            7.2 Discussions and Suggestions
With the time-stamped authorization token, the PAP proto-        Our protection scheme gives higher priority to satisfy honest
col enables clients to detect illegal download attempts from a   clients. Putting poisoning tasks at lower priority reduces the
pirate without communicating with a central authority. Our       upload overhead. Our selective chunk poisoning outper-
scheme heavily penalizes copyright violators without hurt-       forms the undiscriminated poisoning practiced by RIAA or
ing honest clients.                                              MPAA enforcers. This system is fair to the majority of honest
                                                                 clients who enjoy P2P content delivery services.
7.1.3 Selecting High-Reputation Peers as Probing                    Perishable contents, such as real-time broadcast of news
       Decoys                                                    or sports events, are well protected by our system. For these
                                                                 contents, a hashing mechanism to detect poisoning cannot
Using a reputation system, we select trusted clients to act as
                                                                 be effective, because distributing chunk hashes ahead of
decoys to probe peers in collusive piracy. This mechanism
                                                                 content is impossible in real time. Essentially, all P2P
accurately detects colluders when they send clean file
                                                                 networks including BitTorrent fail to resist poisoning in
chunks to illegal requests. Reducing number of colluders
                                                                 perishable contents.
can improve the protection against pirates, as demonstrated         Existing DRM systems [10] focus on enforcing usage
by experiments.                                                  rules of digital content such as copy and replay, while our
                                                                 proposed system provides a protected P2P distribution
7.1.4 Gnutella-Like Networks Best Protected
                                                                 platform to control the access of copyrighted content files.
       by Our Scheme
                                                                 Combining DRM with P2P copyright protection should be
Applying our protection scheme, the Gnutella family,             the focus of future research.
including Gnutella, Ares, KaZaA, LimeWire, Freenet, Bare-           In a traditional encryption-based protection scheme,
Share, etc., demonstrates the highest penalty on pirates         once a pirate discovers how the system works, he can
because poison detection is only possible at the file level.     break it post the hack on the Internet. We enforce copyright
Even a few chunks poisoned, the entire file must be              protection via distributed content poisoning. Statistically, a
discarded and downloaded repeatedly.                             pirate cannot download the file successfully in tolerable
   This makes the file download time by pirates intolerably      time once the PAP protocol is enforced. Two R/D tasks are
long (1,000 years or longer). Thus, this family of P2P           suggested below for further work for copyright protection
networks is most suitable to apply the proposed copyright        in P2P content delivery.
protection scheme, yielding almost a perfect protection
success rate (>99.9 percent), as reported in Section 6.4.        7.2.1 Prototyping and Benchmark Experiments Needed
                                                                         in Real-Life Open P2P Networks
7.1.5 eMule-Like Networks Get Protected Satisfactorily           Simulation results reported here can only prove the protec-
The eMule-like P2P networks apply weak chunk hashing at          tion concept, lacking of sustained accuracy. Proactive chunk
the part level. Pirating on eMule networks experiences very      poisoning can be made selectively to reduce the processing
high download overhead, as seen in Table 4. Our copyright-       overhead. However, further studies are needed to upgrade
protection system works with 85-98 percent success rate on       the performance of the copyright-protected system in real-
eMule family of P2P networks, as listed in Table 2.              life P2P benchmark applications.

7.1.6 BitTorrent Network Is Most Resistant to Poisoning          7.2.2 Integration of Trusted P2P System with Reputation
Our system is less effective to protect poison-resistant P2P             System and DRM Scheme
networks like BitTorrent, BNBT, Azureus, etc. Continued          File-level reputation system posts a new challenge to work
research is needed to extend the PAP protocol to overcome        with DRM systems in P2P content delivery. The integration
this difficulty.                                                 of selective poisoning with reputation system and DRM will
                                                                 widen the CDN application domains. Combining DRM and
7.1.7 Protection Scheme Performs Better for Large Files          reputation system to protect P2P content delivery networks
The proposed system is more effective to protect large files.    will lead to a total solution of the online piracy problem.
Our PAP protocol is not tailored to protect short music or
document files. A popular song in MP3 format has less than       ACKNOWLEDGMENTS
a few MB in size. Content poisoning takes less effects on
                                                                 This work was supported by US National Science Founda-
such small files due to single or a few chunks contained in      tion ITR Grant ACI-0325409 at the University of Southern
the file.                                                        California. The work was carried out at USC Internet and
                                                                 P2P Computing Laboratory. The authors appreciate the
7.1.8 Negligible Detection and Poisoning Overhead
                                                                 technical support by GridSec team members.
The proposed PAP protocol detects colluders and pirates,
and apply chunk poisoning selectively. These extra activities
add only limited extra workload or traffic to the network.       REFERENCES
These overheads are distributed among all distribution           [1]    N. Anderson, “Peer-to-Peer Poisoners: A Tour of Media-Defen-
                                                                        der,” Ars Technica, Sept. 2007.
agents and clients, making their effects almost negligible on    [2]    BitTorrent.org, “BitTorrent Protocol Specification,” http://
individual clients.                                                     www.bittorrent.org/protocol.html, 2006.
LOU AND HWANG: COLLUSIVE PIRACY PREVENTION IN P2P CONTENT DELIVERY NETWORKS                                                                     983

[3]    S. Androutsellis-Theotokis and D. Spinellis, “A Survey of Peer-to-   [27] S. Saroiu et al., “An Analysis of Internet Content Delivery
       Peer Content Distribution Technologies,” ACM Computing Sur-               Systems,” SIGOPS Operating System Rev., pp. 315-327, 2002.
       veys, vol. 36, pp. 335-371, 2004.                                    [28] M. Srivatsa and L. Liu, “Vulnerabilities and Security Threats in
[4]    D. Boneh and M. Franklin, “Identity-Based Encryption from the             Structured Overlay Networks: A Quantitative Analysis,” Proc.
       Weil Pairing,” Proc. Advances in Cryptology (Crypto ’01), pp. 213-        20th Ann. Computer Security Applications Conf., 2004.
       229, 2001.                                                           [29] J. Sung, J. Jeong, and K. Yoon, “DRM Enabled P2P Architecture,”
[5]    S. Chen and X.D. Zhang, “Design and Evaluation of a Scalable and          Proc. Eighth Int’l Conf. Advanced Comm. Technology (ICACT), vol. 1,
       Reliable P2P Assisted Proxy for On-Demand Streaming Media                 pp. 487-490, Jan. 2006.
       Delivery,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 5,      [30] K. Walsh and E.G. Sirer, “Fighting Peer-to-Peer SPAM and Decoys
       pp. 669-682, May 2006.                                                    with Object Reputation,” Proc. ACM Special Interest Group on Data
[6]    A.K. Choudhury, N.F. Maxemchuk, S. Paul, and H.G. Schulzrinne,            Comm. (SIGCOMM) Workshop Economics of Peer-to-Peer Systems,
       “Copyright Protection for Electronic Publishing over Computer             pp. 138-143, 2005.
       Networks,” IEEE Trans. Networking, vol. 9, no. 3, pp. 12-20, May/    [31] C. Wang, B.A. Alqaralleh, B.B. Zhou, F. Brites, and A.Y. Zomaya,
       June 1995.                                                                “Self-Organizing Content Distribution in a Data Indexed DHT
[7]    N. Christin, A.S. Weigend, and J. Chuang, “Content Availability,          Network,” Proc. Sixth IEEE Int’l Conf. P2P Computing, pp. 241-248,
       Pollution and Poisoning in File-Sharing P2P Networks,” Proc.              2006.
       ACM Conf. e-Commerce, pp. 68-77, 2005.                               [32] L. Xiao, Y. Liu, and L.M. Ni, “Improving Unstructured P2P
[8]    E. Damiani, D.C. di Vimercati, S. Paraboschi, P. Samarati, and            Systems by Adaptive Connection Establishment,” IEEE Trans.
       F. Violante, “A Reputation-Based Approach for Choosing                    Computers, vol. 54, no. 9, pp. 1091-1103, Sept. 2005.
       Reliable Resources in Peer-to-Peer Networks,” Proc. ACM Conf.        [33] M. Yurkewych, B.N. Levine, and A.L. Rosenberg, “On the Cost-
       Computer and Comm. Security (CCS ’02), pp. 207-216, 2002.                 Ineffectiveness of Redundancy in Commercial P2P Computing,”
                                                                                 Proc. 12th ACM Conf. Computer and Comm. Security, 2005.
[9]    D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica, and
                                                                            [34] R. Zhou and K. Hwang, “GossipTrust for Fast Reputation
       W. Zwaenepoel, “Denial-of-Service Resilience in Peer-to-Peer
                                                                                 Aggregation in P2P Networks,” IEEE Trans. Knowledge and Data
       File Sharing Systems,” Proc. Int’l Conf. Measurement and
                                                                                 Eng., vol. 20, no. 9, pp. 1282-1295, Sept. 2008.
       Modeling of Computer Systems, pp. 38-49, 2005.
[10]   M. Fetscherin and M. Schmid, “Comparing the Usage of Digital
       Rights Management Systems in the Music, Film, and Print
       Industry,” Proc. Conf. e-Commerce, 2003.
[11]   J. Frankel and T. Pepper, “The Gnutella Protocol Spec. v0.4,”                               Xiaosong Lou received the BS degree from
       Revision 1.2, http://www9.limewire.com/developer/gnutella_                                  Shanghai Jiao Tong University in 1994 and the
       protocol_0.4.pdf, 2000.                                                                     MS degree in computer engineering in 2005
[12]   B. Gedik and L. Liu, “A Scalable P2P Architecture for Distributed                           from the University of Southern California
       Information Monitoring Applications,” IEEE Trans. Computers,                                (USC), where he received the PhD in compu-
       vol. 56, no. 6, pp. 767-782, June 2005.                                                     ter engineering in 2009. He is currently work-
[13]   M. Hofmann and I. Beaumont, Content Networking, Architecture,                               ing at Yahoo on a performance engineer. His
       Protocols, and Practice, S.F. Kaufmann, ed. 2005.                                           research covers P2P content networking, on-
[14]   Y. Itakura, M. Yokozawa, and T. Shinohara, “Model Analysis of                               line video streaming, and network security. He
       Digital Copyright Piracy on P2P Networks,” Proc. Int’l Symp.                                is a student member of the IEEE.
       Applications and the Internet Workshops (SAINT), pp. 84-89, Jan.
[15]   T. Iwata, T. Abe, K. Ueda, and H. Sunaga, “A DRM System
       Suitable for P2P Content Delivery and the Study on Its                                      Kai Hwang received the PhD degree from the
       Implementation,” Proc. Asia-Pacific Conf. Comm. (APCC), vol. 2,                             University of California, Berkeley, in 1972. He
       pp. 806-811, Sept. 2003.                                                                    is a professor of electrical engineering and
[16]   T. Kalker, D.H.J. Epema, P.H. Hartel, R.L. Lagendijk, and M. Van                            computer science at the University of Southern
       Steen, “Music2share—Copyright-Compliant Music Sharing in P2P                                California. He specializes in computer architec-
       Systems,” Proc. IEEE, vol. 92, no. 6, pp. 961-970, June 2004.                               ture, parallel processing, network security, Web
[17]   B. Krishnamurthy, C. Wills, and Y. Zhang, “On the Use and                                   services, distributed computing, and Internet
       Performance of Content Distribution Networks,” Proc. Special                                technology. He has led various research
       Interest Group on Data Comm. on Internet Measurement Workshop                               groups at Purdue and USC, where he has
       (SIGCOMM), Nov. 2001.                                                                       produced over 20 PhD students. Presently, he
[18]   Y. Kulbak and D. Bickson, “The eMule Protocol Specification,”        also serves as an EMC endowed visiting professor at Tsinghua
       Technical Report TR-2005-03, Hebrew Univ., Jan. 2005.                University, Beijing, China. He has chaired numerous ACM/IEEE
                                                                            International Conferences and presented over two dozens of keynote
[19]   S.H. Kwok, “Watermark-Based Copyright Protection System
                                                                            addresses in various Conferences. He has lectured worldwide and
       Security,” Comm. ACM, pp. 98-101, Oct. 2003.
                                                                            performed advisory work for IBM Fishkill, Intel Scalable System
[20]   A. Legout, G. Urvoy-Keller, and P. Michiardi, “Rarest First and
                                                                            Division, MIT Lincoln Lab., JPL at Caltech, ETL in Japan, Academia
       Choke Algorithms Are Enough,” Proc. ACM Special Interest Group
                                                                            Sinica in China, GMD in Germany, and INRIA in France. He is a fellow
       on Data Comm. on Internet Measure (SIGCOMM), pp. 203-216, 2006.
                                                                            of the IEEE. More information regarding Dr. Hwang can be found at
[21]   E. Luoma and H. Vahtera, “Current and Emerging Requirements          http://GridSec.usc.edu/Hwang.html.
       for Digital Rights Management Systems Through Examination of
       Business Networks,” Proc. 37th Ann. Hawai Int’l Conf. System
       Sciences, 2004.
[22]   D.P. Majoras, O. Swindle, T.B. Leary, and J. Harbour, “Peer-to-      . For more information on this or any other computing topic,
       Peer File-Sharing Technology: Consumer Protection and Competi-       please visit our Digital Library at www.computer.org/publications/dlib.
       tion Issues,” Federal Trade Commission Report, June 2005.
[23]   N. Mook, “P2P Flooder Overpeer Cease Operation,” Beta News,
       Ceases_Operation/1134249644, Dec. 2005.
[24]   N. Mook, “P2P Future Darkens as eDonkey Closes,” http://
       Closes/1127953242, Sept. 2005.
[25]   G. Pallis and A. Vakali, “Insight and Perspectives for Content
       Delivery Networks,” Comm. ACM, pp. 101-106, Jan. 2006.
[26]   P. Rodriguez et al., “On the Feasibility of Commercial Legal P2P
       Content Distribution,” SIGCOMM Computer Comm. Rev., vol. 36,
       no. 1, pp. 75-78, Jan. 2006.

To top