Docstoc

Netflix Project

Document Sample
Netflix Project Powered By Docstoc
					Netflix Prize: Predicting Ratings
                    Data
• mv_00(movieID).txt:
  1:
  (1-2,649,429) (1-5)
• Over 17,000 movie txt files
• Over 400,000 userID
• Two Gigs zipped
              Overall Plan
• Compute user similarity using:
  – termFrequency: # of movies in common
  – documentFrequency: 1/|rating1 – rating2|
• tfdf = (# of movies in common) *
         1/|rating1 – rating2|
                       Plan 1
• Store it all in memory (haha) in java
• Store a User class with:
   – UserID
   – Array of Movies classes:
      • movieID
      • Rating
• Then have matrix of users with an
  array of top similar users using
  (tfdf)

• Problem 1 - Memory issues
                               Plan 2*
• Step 1: store in text files on hard drive in java
   – text file for each user
• Step 2: compute similarity (tfdf)
   – text file of top then users for each user
• Step 3: predictions
   – Run through two directories of text files to compute an average
     movie rating prediction

• Problem 2 - Very Slow:
   – Step 1: 3 days – ~5000 movie text files currently
   – Step 2: 1 user every 35 mins | 1 user every 5 mins
   – Step 3: ~10 minutes currently
                         Plan 3
• Step 1: Store in text file’s data in a database
  using php
   – Table: userID | movieID | rating
      • Primary keys: userID, movieID
• Step 2: Compute Similarity
   – Table: userID | 1st userIDs | 2nd userID | etc.
      • Primary key: userID
• Step 3: Predictions
• Problem 3 - Very Slow:
   – Step 1: 4 days – 7000 movie text files currently
   – Step 2: n/a
   – Step 3: n/a
                      Results

• Predicting everything 3.0:
  – RMSE = 1.3149
• Similarities I have so far:
  – RMSE = 1.3149 | 384 users
  – RMSE = 1.3149 | 575 users
• http://www.netflixprize.com/leaderboard
  – Grand Prize RMSE = 0.8563
• RMSE:
  – sqrt(avg((actual_rating - predicted rating) *
    (actual_rating - predicted rating))).
Future Idea

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:5
posted:9/6/2011
language:English
pages:8