Analysis of Internet Music Content Distribution by fionan


									   Analysis of Internet Music
      Content Distribution
     DIMI Project - UCLA/Warner Bros.

Wendy Aylsworth       Vince Busam
Charles L. Dages      Sasha Slijepcevic
Warner Bros.          Miodrag Potkonjak
                      Richard Muntz
         Full Project Objectives
o   Statistical analysis of web and user behavior
    toward pirated audio material.
o   Evaluation of watermarking and
    fingerprinting techniques.
o   Determine methods to control distribution of
    digital content.
     Processes and procedures
     Technical controls
     Legal announcements and controls
        Immediate Project Tasks
o   Develop experiment for sampling web data
    and user behavior
o   Data acquisition
o   Data analysis
o   To supply real numbers to the discussion of
    the content distribution on the Internet
     To identify interesting data about the Internet
      users who exchange MP3 files
     To develop tools for acquiring interesting
     Exploit existing services:
       "   DNS (mapping URLs into IP addresses and vice versa)
       "   whois (returns the owner of an IP address)
o   Current target data (comments welcomed)
o   Tools
o   MP3 on FTP
o   Napster
o   Future work
                    Target Data
o   Who are MP3 users?
     Are   they mostly college students?
     How    many are outside of US?
     What typesof connection they have: DSL, cable
      modem, Ethernet from dorms?
     How    many songs they offer for download?
     Who    are the most popular artists?
                   Target Data
o   How soon do files show up after or even
    before being officially released?
o   Who are the users who are first to convert
    CDs into MP3 files?
     Ifpirated copies appear before official release,
      do they originate at the same subset of IP
o   How many different MP3 versions of a same
    song exist?
     Different versions   indicate different pirate
Internet Access in US
by Type of Connection
        Internet TV
           2.2 %
Cable 4.5 %           DSL 0.3 %

                         Dial-Up Access 93%

                            *From TRI’s report, May 2000
   Getting Information About Users
     Getting Information About Users

 o   We want to find out using DNS and whois
      Owner   of the IP address (college, ISP)
      Connection type   (DSL, Cable modem, Dial-up)
      Geographic   location (Europe, …)
                  MP3 on FTP
o   FTP is the traditional method of MP3 sharing
    (before Napster)
o   Search engines crawl FTP servers, indexing
    available MP3 files

o   Users go to a search engine, look for an
    artist/song, then connect to the FTP sites to
                   FTP Crawler
  o   Gather a list of FTP sites by entering search
      terms into popular FTP search engines

Search terms          List of FTP

  o   Crawl each of these servers, gathering a list of
      all files offered for download
  o   Re-crawl all servers weekly, to allow future
          Dynamic IP Addresses
o   FTP servers whose IP address changes (dial-
    up access) use the dynamic DNS
o is a
    typical file
o does not give any of target data
o   Using DNS, turn the dynamic domain name
    into an IP address     DNS
            DNS lookup sequence
o   Using reverse DNS, turn IP addresses into
    native hostnames if possible


o   This shows that the FTP server is in Singapore
o   More specifically, at School of Computing at
    National University of Singapore
     Not   all IP addresses have corresponding DNS name
                  Whois lookups
o   If reverse DNS does not submit any answer,
    we try to find the owner of an IP address
    using whois lookup at ARIN
o   American Registry for Internet Numbers
     Authority for IP addresses in US
     Will return owner of each IP address

o   whois
     Returns Flashcom   owned netblock
     DSL   connection
         Locations of FTP Servers
o    3800 servers in April, 390 servers in July
o    Reasons: Napster, firewalls at cable modem
     networks, summer break

        13 %
      18 %       43 %   Other          28 %        41%
2%       22 %
                                  5%      22 %

             APRIL                        JULY
        Files by Connection Type
o   230,000 MP3 files in April, 340,000 in July,
    mostly illegal by visual inspection of filenames
o   58 sites with over 1000 MP3s in April, 119 in
o   8500 MP3s on a single site in Canada
             26 %        Cable
                         Other      26 %        40%
      52 %    12 %       Europe
                         College        26 %
         APRIL                             JULY
o   The most popular application for sharing
o   Developed by San Mateo based Napster, Inc.
o   Has been imitated by many competitors, but
    still has largest market share
o   Client-client transfer
o   Clients register with a server
o   Clients then transfer files between each other
o   Previously done with Email, ICQ, IRC
o   Napster offers easy package, allowing this to
    surpass client-server transfers in number of
  Napster Architecture

•IP address
•Shared files
  Napster Architecture

•Search term(s)
    Napster Architecture

•IP addresses of users with
files containing search terms
•File names
Napster Architecture

  Actual File Transfer

                  Napster Tools
o   Napster protocol has been reverse-engineered
o   Modified GTK-Napster (Open Source clone)
    to log IP addresses of servers returned in
o   Napster's application forces users to share
    their download directory
     Most users will be registered as servers
     Users can move MP3s out of a shared directory

     Use   unapproved clone without sharing
o   Application continues to run even if the main
    window is closed - oblivious to many users
o   Our data is a sample of connected users
o   Found 63,000 files on 24,000 unique servers


           16 %                  Cable
         18 %          51 %      Europe
          9%                     Other

                  Pending Work
o   Gather more information from Napster
     Snapshots of searches by time
     File format returned by search (bitrate, size...)
     Bandwidth of the connected clients measured
      sending “ping” packets while downloading
o   Correlate our data with other studies
     Compare   MP3 transfer to record sales near
      colleges in last 2 years
     Study by VNU Entertainment Marketing
      Solutions (4% decrease in stores near colleges)
     Watch MP3 usage as SDMI becomes popular
            Pending Work (cont’d)
o   Inject files with false metadata (title, artist, ID
     Napster
     Set up Web or FTP server
     Measure how far a false version of a song spreads
      on the Internet
o   Develop techniques for fast recognition of a
     How  to prove that the song is the one that the title
      claims without downloading the whole file

To top