Analysis of Internet Music Content Distribution by fionan

VIEWS: 4 PAGES: 27

									   Analysis of Internet Music
      Content Distribution
     DIMI Project - UCLA/Warner Bros.



Wendy Aylsworth       Vince Busam
Charles L. Dages      Sasha Slijepcevic
Warner Bros.          Miodrag Potkonjak
                      Richard Muntz
                      UCLA
         Full Project Objectives
o   Statistical analysis of web and user behavior
    toward pirated audio material.
o   Evaluation of watermarking and
    fingerprinting techniques.
o   Determine methods to control distribution of
    digital content.
     Processes and procedures
     Technical controls
     Legal announcements and controls
        Immediate Project Tasks
o   Develop experiment for sampling web data
    and user behavior
o   Data acquisition
o   Data analysis
                           Goals
o   To supply real numbers to the discussion of
    the content distribution on the Internet
     To identify interesting data about the Internet
      users who exchange MP3 files
     To develop tools for acquiring interesting
      information
     Exploit existing services:
       "   DNS (mapping URLs into IP addresses and vice versa)
       "   whois (returns the owner of an IP address)
                   Outline
o   Current target data (comments welcomed)
o   Tools
o   MP3 on FTP
o   Napster
o   Future work
                    Target Data
o   Who are MP3 users?
     Are   they mostly college students?
     How    many are outside of US?
     What typesof connection they have: DSL, cable
      modem, Ethernet from dorms?
     How    many songs they offer for download?
     Who    are the most popular artists?
                   Target Data
o   How soon do files show up after or even
    before being officially released?
o   Who are the users who are first to convert
    CDs into MP3 files?
     Ifpirated copies appear before official release,
      do they originate at the same subset of IP
      addresses?
o   How many different MP3 versions of a same
    song exist?
     Different versions   indicate different pirate
      sources
Internet Access in US
by Type of Connection
        Internet TV
           2.2 %
Cable 4.5 %           DSL 0.3 %



                         Dial-Up Access 93%




                            *From TRI’s report, May 2000
   Getting Information About Users


137.132.94.4
     Getting Information About Users


137.132.94.4

 o   We want to find out using DNS and whois
      Owner   of the IP address (college, ISP)
      Connection type   (DSL, Cable modem, Dial-up)
      Geographic   location (Europe, …)
                  MP3 on FTP
o   FTP is the traditional method of MP3 sharing
    (before Napster)
o   Search engines crawl FTP servers, indexing
    available MP3 files
     mp3.lycos.com
     2Look4.com
     oth.net

o   Users go to a search engine, look for an
    artist/song, then connect to the FTP sites to
    download
                   FTP Crawler
  o   Gather a list of FTP sites by entering search
      terms into popular FTP search engines
                     http://mp3.lycos.com

Search terms         http://2Look4.com          List of FTP
                                                  servers

                     http://oth.net
  o   Crawl each of these servers, gathering a list of
      all files offered for download
  o   Re-crawl all servers weekly, to allow future
      analysis
          Dynamic IP Addresses
o   FTP servers whose IP address changes (dial-
    up access) use the dynamic DNS
o   mp3.dhs.org/mp3/Artist-SongTitle.mp3 is a
    typical file
o   mp3.dhs.org does not give any of target data
o   Using DNS, turn the dynamic domain name
    into an IP address

      mp3.dhs.org     DNS       137.132.94.4
            DNS lookup sequence
o   Using reverse DNS, turn IP addresses into
    native hostnames if possible

                      Reverse
    137.132.94.4                   sun450.comp.nus.edu.sg
                       DNS


o   This shows that the FTP server is in Singapore
o   More specifically, at School of Computing at
    National University of Singapore
     Not   all IP addresses have corresponding DNS name
                  Whois lookups
o   If reverse DNS does not submit any answer,
    we try to find the owner of an IP address
    using whois lookup at ARIN
o   American Registry for Internet Numbers
     Authority for IP addresses in US
     Will return owner of each IP address

o   whois 209.185.207.136@whois.arin.net
     Returns Flashcom   owned netblock
     DSL   connection
         Locations of FTP Servers
o    3800 servers in April, 390 servers in July
o    Reasons: Napster, firewalls at cable modem
     networks, summer break

                                              7%
        13 %
                        Cable
      18 %       43 %   Other          28 %        41%
                        DSL
2%       22 %
                        Europe
                        College
                                  5%      22 %


             APRIL                        JULY
        Files by Connection Type
o   230,000 MP3 files in April, 340,000 in July,
    mostly illegal by visual inspection of filenames
o   58 sites with over 1000 MP3s in April, 119 in
    July
o   8500 MP3s on a single site in Canada
                                           2%
        8%
             26 %        Cable
                         Other      26 %        40%
                         DSL
      52 %    12 %       Europe
                                   6%
                         College        26 %
                    2%
         APRIL                             JULY
                   Napster
o   The most popular application for sharing
    MP3s
o   Developed by San Mateo based Napster, Inc.
o   Has been imitated by many competitors, but
    still has largest market share
                    Napster
o   Client-client transfer
o   Clients register with a server
o   Clients then transfer files between each other
o   Previously done with Email, ICQ, IRC
o   Napster offers easy package, allowing this to
    surpass client-server transfers in number of
    users
  Napster Architecture


CONNECT
•IP address
•Shared files
  Napster Architecture


REQUEST
•Search term(s)
    Napster Architecture


RESPONSE
•IP addresses of users with
files containing search terms
•File names
Napster Architecture




  Actual File Transfer


                         137.132.94.4
                  Napster Tools
o   Napster protocol has been reverse-engineered
o   Modified GTK-Napster (Open Source clone)
    to log IP addresses of servers returned in
    searches
o   Napster's application forces users to share
    their download directory
     Most users will be registered as servers
     Users can move MP3s out of a shared directory

     Use   unapproved clone without sharing
o   Application continues to run even if the main
    window is closed - oblivious to many users
                       Napster
o   Our data is a sample of connected users
o   Found 63,000 files on 24,000 unique servers

                  4%

           16 %                  Cable
                                 Dial-Up
                                 DSL
         18 %          51 %      Europe
                                 College
          9%                     Other

           2%
                  Pending Work
o   Gather more information from Napster
     Snapshots of searches by time
     File format returned by search (bitrate, size...)
     Bandwidth of the connected clients measured
      sending “ping” packets while downloading
o   Correlate our data with other studies
     Compare   MP3 transfer to record sales near
      colleges in last 2 years
     Study by VNU Entertainment Marketing
      Solutions (4% decrease in stores near colleges)
     Watch MP3 usage as SDMI becomes popular
            Pending Work (cont’d)
o   Inject files with false metadata (title, artist, ID
    tag)
     Napster
     Set up Web or FTP server
     Measure how far a false version of a song spreads
      on the Internet
o   Develop techniques for fast recognition of a
    song
     How  to prove that the song is the one that the title
      claims without downloading the whole file

								
To top