# KDD06 InnerProduct based prediction

Document Sample

```					The \$1,000,000 Netflix Contest is to develop a "ratings prediction program“
that can beat Netflix’s (called Cinematch) by 10% in predicting what rating users gave to movies.
I.e., predict rating(M,U) where (M,U)  QUALIFYING(MovieID,                  UserID).
Netflix uses Cinematch to decide which movies a user will probably like next (based on all past rating history).
All ratings are "5-star" ratings (5 is highest. 1 is lowest. Caution: 0 means “did not rate”).

Unfortunately rating=0 does not mean that the user "disliked" that movie, but that it wasn't rated at all. Most
“ratings” are 0. Therefore, the ratings data sets are NOT vector spaces!

One can approach the Netflix contest problem as a data mining Classification           or Prediction problem.

A "history of ratings by users to movies“, TRAINING(MovieID, UserID, Rating, Date) is given
with which to train your predictor, which will predict the ratings given to QUALIFYING movie-user pairs
(Netflix knows the rating given to Qualifying pairs, but you don't.)

Since the TRAINING is very large, Netflix also provides a “smaller, but representative subset” of TRAINING,
PROBE(MovieID, UserID)               (~2 orders of magnitude smaller than TRAINING).

Netflix gives 5 years to submit QUALIFYING predictions. That contest window is about 1/2 gone now.

A team can submit as many solution as they wish and at any time. Each October, Netflix give \$50,000 to the
team on top the so-called Netflix Leaderboard. Bellcore has won that twice.
The Netflix Contest                        (USER versus MOVIE voting)
One can address the prediction or classification problem using several different "approaches".

USER VOTERs (approach 1):
To predict the rating of a pair, (M,U), we take TRAINING as a vector space of user ratings vectors. The users
are the points in the vector space and the movies are the dimensions in that vector space. Since there are
17,770 movies each user is tuple of 17770 ratings, if all movies are used as dimensions. That’s too many
dimensions! The first dimension pruning: restrict to only those movies that U has rated ( =supportU ). We
also allow another round of dimension pruning based on correlation with M.

Once the dimensions movie set is pruned, we pick a “Set of Near Neighbor users to U”, (NNS) from the users,
V, who have rated M ( =supportM ). “Near” is defined based on correlation with U. One can think of this step
as the voter pruning step. Note: most correlations calculations involve the other variable also. I.e., the result
of a user pruning depends on the pruned movie set and vice versa. Thus, theoretically, the movie/user pruning
steps could be alternated ad infinitum! Our current approach is to allow an initial global dimension prune,
then the voter prune, then a final dimension prune. You will see these 3 prune steps in the .config files.

We then let voters vote, but they don’t necessarily cast the straight-forward rating(M,V) vote.

The best way to think about the 3 pruning steps (and there could be more!) is: We prune down the dimensions
so that vector space methods are tractable, emeliorating the curse of dimensionality (the first, which may be
turned off, is a global dimension prune (not based on individual voters). The second is the voter prune based
on the currently pruned dimensions. The third is a final dimension prune (different for each voter) which give
the final vector space over which the vote by that voter is calculated. Then we let those VOTERS vote as to the
best rating prediction to be made.
There are many ways to prune, vote, tally, and decide on the final prediction. These choices make up the
.config file.

MOVIE VOTERs (approach 2) is identical with roles of Movies (voters) and Users (dimensions)            reversed
The Netflix Contest                                           (Using SLURM to generate a clustering)

SLURM has been set up to run on the Penryn Cluster2 (32 8 processor machines - 1 terrabyte of main
memory) so that one can create a .config file (must end in .config) which specifies all the parameters for
the program. Issuing:

./mpp-submit -S -i Data/probe-full.txt -c pf.0001/u.00.00/u.00.00.config -t .0001 -d ./pf.0001

The program pulls parameters from .config: -t .0001 means SquareError threshold = .0001 -d ./pf.0001
means results goto ./pf.0001 dir. The prog takes as input, the file Data/probe-full.txt (which is not quite
the full probe but close) with format:

Takes
InputFile.txt       (MovieID with interleaved UserIDs format or .txt format. See next slide)
ConfigFile.config   (shows which program to run. In .config format. See next slide)
SqErrThrhld         (if PredictionSqErr ≤ SqErrThrhld, put pair in Dir/lo-InputFile.txt, else put in Dir/hi-InputFile.txt)
Directory            (existing directory for the output)
as input

Puts as output (in Dir)
lo-InputFileName.txt
mpp-submit –S                                           Hi-InputFileName.txt
–i InputFile.txt                                        InputFileName.config
–c ConfigFile.config                                    InputFileName.rmse
–t SqErrThrhd
–d Dir
The Netflix Contest                                 (Using SLURM to generate a clustering)
./mpp-submit -S -i Data/probe-full.txt -c pf.0001/u.00.00/u.00.00.config -t .0001 -d ./pf.0001

InputFile           ConfigFile: pf.0001/u.00.00/u.00.00.config
Data/probe-full.txt

1:             Program sets parameters as specified in the .config:                [Prune_Movies_in_CoSupUV]
30878                                                                              method=MovieCommonCoSupportPrune
2647871        user_voting = enabled                                               leftside = 0
1283744        movie_voting = disabled
2488120        user_vote_weight = 1                                                width = 2000
317050                                                                             mstrt = 0
1904905        # processed only if user voting enabled.                            mstrt_mult=0
1989766        [user_voting]                                                       ustrt = 0
14756          Prune_Movie_in_SupU      = disabled                                 ustrt_mult=0
1027056        Prune_Users_in_SupM      = enabled                                  TSa = -100
1149588        Prune_Movies_in_CoSupUV = enabled                                   TSb = -100
1394012                                                                            Tdvp = -1
1406595        [Prune_Movies_in_SupU]                                              Tdvs = -1
2529547        method=MoviePrune                                                   Tvdp = -1
1682104        leftside = 0                                                        Tvds = -1
2625019        width = 30000     [Prune_Users_in_SupM]
method=UserCommonCoSupportPrune                   TD = -1
2603381        mstrt = 0                                                           TP = -1
1774623        mstrt_mult=0      leftside = 0
470861                           width = 30000                                     PPm = .1
ustrt = 0                                                           TV = -1
712610         ustrt_mult=0      mstrt = 0
1772839                          mstrt_mult=0                                      TSD = -1
1059319        TSa = -100                                                          Ch = 1
TSb = -100        ustrt = 0
2380848                          ustrt_mult=0                                      Ct = 1
548064         Tdvp = -1                                                           (Part identical to blue for movie voting params)
Tdvs = -1         TSa = -100
2:                               TSb = -100
1959936        Tvdp = -1                                        Only the method, leftside, width, Ch=Choice,
748922         Tvds = -1         Tdvp = -1
1131325        TD = -1           Tdvs = -1                      Ct=Count parameters are used at this time.
1312846        TP = -1           Tvdp = -1
2314531        PPm = .1          Tvds = -1
1636093                          TD = -1                        Using this program, the many "lo-u.xx.xx"
TV = -1                                          and, if movie voting is also enabled, "lo-
584750         TSD = -1          TP = -1
2418486        Ch = 1            PPm = .1                       m.yy.yy" files constitute what we have called
715897         Ct = 1            TV = -1                        a clustering (tho they’re not mutually
1172326                          TSD = -1
etc.                             Ch = 1                         exclusive). Once we have {z-lo.xx.yy | z=u of
Ct = 1                         m } we can make a submission by: 
where 1: and 2: are movieIDs and the others are userIDs. Note, this in an      qualifying pair (m,u), use correlations to pick
interleaved format of a 2-column DB file, probe-full(movieID,userID)           program to make that prediction.
The Netflix Contest                               (Using this scheme to predict Qualifying pair ratings)
The above prediction scheme requires the existence of Square Errors (SqErr),
e.g., clusters files, lo-u.vv.nn.txt and lo-m.nn.vv.txt are composed of all input pairs such that SqErr ≤ .0001

To predict rating(M,U) for pairs from Qualifying, we won’t have answers, so we won’t have SqErrs of our predictions

So how can we form good cluster then?

Once that’s decided what matchup algorithm should we use to match a cluster (program) to a Qualifying pair to be predicted?

After the clusters are created, we can try the matchup algorithms that worked best for Probe predictions, but

We may want to develop new ones because the performance of those matchup algorithms may depend on the way the clusters
were created.

We could use the same 288 configs to generate a new config-subset-collection of Qualifying pairs using, e.g., prediction
some kind of prediction variation instead of thresholded prediction SqErr?

lo-u.vv.nn.txt could be constructed to consist of Qualifying pairs as follows (a variation based method):
Set all answers in Qualifying to 1. Use ./mpp-submit to create clusters as above (threshold=.0001) in a directory, q1.
Set all answers in Qualifying to 2. Use ./mpp-submit to create clusters as above (threshold=.0001) in a directory, q2, etc.
This will create a clustering of 288*5=1440 cluster sets (but, of course, only 288 different programs configs).

One could matchup a Qualifying pair using count-based correlations, Pearson-correlations, 1-perpendicular-correlations, or?
One could matchup (M,U) with the cluster in which the sum of the M and U counts (or counts relative to cluster size) is max?
Other?
Mi: u\rd
The Netflix Files                                                                               uID rating   date
{Mi} i=1..17770 given by Netflix as:                                                   avg:
u i1 rmk,u   dmk,u
5655     ui2
u/m
Mi ( uID, Rating, Date ) .
.                                                   m\u u1 ... uk ... u480189
For each MovieID, Mi,
this is a file of all users   .                                                   m1
who rated it, the rating,
ui n                                                  :
the rating date.                 i
rmhuk
mh
Training (Mid,Uid,R,D) orderd by Mid:                                                                                                                      TRAINING as
:       M-U interaction
mID      uID     rating       day_number                                                                                            
cube (Rolodex
 47B 
-------- 100,480,507 --------

m1       u1      rm,u         dm,u                                                                 m17770         Model, m\u)
m1       u2                                                                                                                            
avg:
209                                        .
TRAINING in MySQL with key (mID, uID)
m/u                                       11-bit day numbers starting at 1=1/1/99
.
and ending at 2922=12/31/06.
.
day_numbers
bit-sliced TRAINING:
m17770 u480189 r17770,480189 d17770,480189                                                                                M-U interaction cube
or U2649429                                                                                                          (Rolodex Model, m\u)
ratings                              Pu480189,0

Training (Uid,Mid,R,D) ordered by Uid:
uID     mID      rating       day_number                                   m1                                                     1
u1      m1       ru,m         du,m                                                                                            0
-------- 17,770 --------------

u1      m2                                                                                                                0
1
mh                                     0
.
TRAINING in MySQL with key (uID, mID)                                                 Pmh, 2
11-bit day numbers starting at 1=1/1/99                                                                       0
.
and ending at 2922=12/31/06.                                                                             1
.
0
m17770                  1
u480189 m17770
u1       uk    u480189
The Program:                   Code Structure - the main modules        mpp-mpred.C

mpp-mpred.C reads a Neflix PROBE file
Mi(Uid) and passes Mi and ProbeSupport(Mi) to                            mpp-user.C
mpp-user.C to make predictions for each pair
(Mi,U), foreach UProbeSupport(Mi).
It can also calls separate instances of mpp-user.C
user-vote.C                      movie-vote.C
for many Us, to be processed in parallel (governed
by the number of "slots" specified in 1st code line.)

prune.C

mpp-user.C loops thru ProbeSupport(M), the ULOOP,
reading in the designated (matchedup) config file, then
writing out a (Mi,U) prediction for each U.
If the user-vote-approach is used , mpp-user.C calls user-vote.C,
passing it (M, Support(M), U, Support(U)).
If the movie-vote-approach is used, mpp-user.C calls movie-vote.C,
passing it (M, Support(M), U, Support(U).

user-vote.C does the specified pruning by calling prune.C, looping through the pruned set of
user voters, V, calculating a vote for each, combining those votes and returning a prediction_vote(M,U)

movie-vote.C does similarly.
What kind of pruning can be specified?                                                mpp-mpred.C
Again, all parameters are specified in a configuration file and the values
specified there are consumed at runtime using, e.g., the call:
mpp -i Input_.txt_file -c config -n 16
where Input_.txt_file is the input Probe subset file and 16 is the number of            mpp-user.C
parallel threads that mpp-mpred.C will generate (here, 16 movies are
processed in parallel, each sent to a separate instantiation of mpp-user.C)

A sample config file is given later.                              user-vote.C                              movie-vote.C

There are up to 3 types of pruning used (for pruning down
support(M) as the set of all users that rate M or                                         prune.C
pruning down support(U) as the set of all movies that rate U:

1. correlation or similarity threshold based pruning
2. count based pruning
3. ID window based pruning

Under correlation or similarity threshold based pruning, and using support(M)=supM for example (pruning support(U) is
similar) we allow any function f:supMsupM [0,HighValue] to be called a user correlation provided only that
f(u,u)=HighValue for every u in supM. Examples include Pearson_Correlation, Gaussian_of_Distance,
1_perp_Correlation (see appendix of these notes), relative_exact_rating_match_count (Tingda is using),
dimension_of_common_cosupport, and functions based on Standard Deviations.

Under count based pruning, we usually order by one of the correlations above first (into a multimap) then prune down to a
specified count of the most highly correlated.

Under ID window based pruning we prune down to a window of userIDs within supM (or movieIDs within supU) by
specifying a leftside (number added to U, so leftside is relative to U as a userID) and a width.
How does one specify prunings?                                                                   mpp-mpred.C
Again, in a file (this one is named config) there is a section
for specifying the parameters for user-voting and a separate                                     mpp-user.C
section for specifying parameters for movie-voting. E.g.,
for movie voting, at the bottom, there are 3 external
prunings possible (0 or more can be chosen):                         user-vote.C                                        movie-vote.C
1. an intial pruning of dimensions to be used (since
dimensions are user, it prunes supM):
2. a pruning of movie voters, N, (in supU)                                                         prune.C
3 a final pruning of dimensions (CoSupport(M,N) for the
specific movie voter, N. E.g., parameters are       specifies type of prune (there are 3 types: UserPrune with a full range of possibilities;
specified for this final prune as follows:          UserFastPrune with just PearsonCorrelation pruning; CommonCoSupportPrune which
orders users, V, according to the size of their CommonCoSupport with U only (note that
[movie_voting Prune_Users_in_CoSupMN]                   this is a correlation of sorts too.)
method = UserCommonCoSupportPrune
leftside =     0          specify leftside (from Uid) of an ID interval prune of supM
width     = 8000             specify the width of an ID interval prune of supM
mstrt = 0                  specify starting movie (intercept and slope) for N loop
mstrt_mult = 0.0
ustrt = 0
specify starting movie (intercept and slope) for V loop
ustrt_mult = 0.0
TSa   = -100               specify PearsonCorr threshold (a=Amal, meaning: use Amal's table lookup)
TSb   = -100               specify PearsonCorr threshold (b=bill, meaning: use bill's formula - note if there has been prior pruning this
Tdvp = -1         threshold "diff of vectors" population-based std_dev prune                     will have a different value than Amal's)
Tdvs = -1         threshold "diff of vectors"sample-based std_dev prune
Tvdp = -1         threshold "vectorof diffs" population-based std_dev prune              Note: all thresholds are for
Tvds = -1        threshold "vector of diffs"sample-based std_dev prune                   similarities, not distance
TD    = -1       threshold (Gaussian of) Euclidean distance based prune                  i.e., when we start with a
TP    = -1        threshold for (Gaussian of) 1perpendicular distance prune              distance we follow it with the
PPm   = .1        exponent for (Gaussian of) 1perpendicular distance prune               Gaussian to make it a
TV    = -1        threshold (Gaussian of) a variation based prune                        similarity or correlation.
TSD   = -1        threshold std_dev based prune
Ch    = 1               Picks odering for count-based prune below: 1=Amal_Pearson, 2=Bill_Pearson, etc.
Ct    = 2         threshol for count based prune
/** \file
*
mpp-mpred.C1      * The MovieSet has movie rating PTree's across the vertical axis of
* the table. Each movie is encoded using three PTree's.
* This contains the main entry point and contains the code for driving   */
* the multi-process shared memory implementation of the vertical PTree   UserSet Users;
* based predictor system.                                                MovieSet Movies;
*/
int topMovK = 5,
/* Standard includes. */                                                     verK   = 50;
#include <stdlib.h>
#include <unistd.h>                                                       bool use_pearson_movies = false;
#include <stdio.h>
#include <wait.h>                                                         /*
#include <sys/types.h>                                                     * The minimum user correlation required to be eligible to participate
#include <time.h>                                                          * in voting.
*/
/* Standard C++ includes. */                                              float Minimum_User_Correlation = 0.5;
#include <fstream>
#include <iostream>                                                       float corData[17771];
#include <vector>
unsigned short int supData[17771];
/* Local C++ includes. */                                                 string probe;
#include "mppConfig.H"
#include "PredictionConfig.H"                                             /* External functions. */
#include "UserSet.H"                                                      extern int Mpred_User_Predict(mppConfig &, unsigned long int, vector <int> &, \
#include "MovieSet.H"                                                                        PTree &);

#include "mpp.h"                                                          /**
* Internal private function.
using namespace std;                                                       *
* This function prints the current status of the task table. It is
/* Definition of structures static to this module. */                      * an encapsulation function for reducing the complexity of the
struct task_table {                                                        * job_table function.
int pid;
int movie;                                                           * In the case of either transaction a status table is printed out
int predictions;                                                     * which reflects the current progress of the simulation.
time_t start;                                                        *
};                                                                        * \param max_slots The maximum number of subordinates process
*               which will be managed.
/*                                                                        *
* The following two global variables define the two sets of PTree's      * \param table       A pointer to the task table which is to
* which will be used to carry out the predictions.                       *               be changed.
*                                                                        *
* The UserSet of PTree's have user rating PTree's across the vertical    * \param changed        The slot number in the task table which is
* axis of the table. Each rating is encoded using three PTree's.         *               being updated.
mpp-mpred.C2       * \param pid       The process ID number.
* \param movie_number A movie value of zero causes this function to locate
* \param reason        A character pointer to a description string      *             and return the PID of the subordinate slave process
*               indicating why the table is being updated. */           *             which is processing the momvie. A non-zero value
extern void print_job_table(int max_slots, \                             *             causes the PID to be stored in the relationship array.
struct task_table const * const table, \              * \param predictions This arguement is only referenced when an update
int const changed, char const * const reason)         *             is made to the task table. This arguement is
{                                                                        *             the number of customer predictions to be made
auto int entry;                                                    *             for the movie being scheduled
auto time_t now = time(NULL);                                      * \return   No return values are defined.
*/
fprintf(stdout, "Task status change: %s", ctime(&now));
fputs("\tSlot\t PID\tMovie\tUsers\n", stdout);                      extern void job_table(int max_slots, int const pid, int const movie_number, \
fputs("\t----\t-----\t-----\t-----\n", stdout);                                   int const predictions)

for (entry= 0; entry < max_slots; ++entry) {                        {
fprintf(stdout, "\t%-5d\t%5d\t%5d\t%5d", entry, \                  auto char msg[50];
table[entry].pid, table[entry].movie, \                      auto int lp,
table[entry].predictions);                                         changed = 0;
if ( entry == changed )
fprintf(stdout, "\t<- %s\n", reason);                        auto time_t now = time(NULL);
else
fputs("\n", stdout);                                         static int movie_count    = 0,
}                                                                                prediction_count = 0;
fputs("\n", stdout);
return;                                                                  static bool first = true;
}
/**
* Internal private function.
*                                                                            /* Initialize the process table on the first call. */
* This function maintains a table which correllates process ID's with        if ( first ) {
* the movies they are processing, the total number of predictions                   size_t amt = max_slots * sizeof(struct task_table);
* required per movie and the time required to process a movie.                      table = (struct task_table *) malloc(amt);
*                                                                                   if ( table == NULL ) {
* Depending on the value of the movie number arguement this function                       fputs("Cannot allocate job table.\n", stderr);
* either stores the relationship or retrieves the movie associated                         exit(1);
* with the PID.                                                                     }
*
* In the case of either transaction a status table is printed out                 for (lp= 0; lp < max_slots; ++lp) {
* which reflects the current progress of the simulation.                                table[lp].pid      = 0;
*                                                                                       table[lp].movie      = 0;
* \param max_slots The maximum number of subordinate processes                          table[lp].predictions = 0;
*               which are under management.                                             table[lp].start    = 0;
mpp-mpred.C3       * \param pid       The process ID number.
* \param movie_number A movie value of zero causes this function to locate
* \param reason        A character pointer to a description string      *             and return the PID of the subordinate slave process
*               indicating why the table is being updated. */           *             which is processing the momvie. A non-zero value
extern void print_job_table(int max_slots, \                             *             causes the PID to be stored in the relationship array.
struct task_table const * const table, \              * \param predictions This arguement is only referenced when an update
int const changed, char const * const reason)         *             is made to the task table. This arguement is
{                                                                        *             the number of customer predictions to be made
auto int entry;                                                    *             for the movie being scheduled
auto time_t now = time(NULL);                                      * \return   No return values are defined.
*/
fprintf(stdout, "Task status change: %s", ctime(&now));
fputs("\tSlot\t PID\tMovie\tUsers\n", stdout);                      extern void job_table(int max_slots, int const pid, int const movie_number, \
fputs("\t----\t-----\t-----\t-----\n", stdout);                                   int const predictions)

for (entry= 0; entry < max_slots; ++entry) {                        {
fprintf(stdout, "\t%-5d\t%5d\t%5d\t%5d", entry, \                  auto char msg[50];
table[entry].pid, table[entry].movie, \                      auto int lp,
table[entry].predictions);                                         changed = 0;
if ( entry == changed )
fprintf(stdout, "\t<- %s\n", reason);                        auto time_t now = time(NULL);
else
fputs("\n", stdout);                                         static int movie_count    = 0,
}                                                                                prediction_count = 0;
fputs("\n", stdout);
return;                                                                  static bool first = true;
}
/**
* Internal private function.
*                                                                            /* Initialize the process table on the first call. */
* This function maintains a table which correllates process ID's with        if ( first ) {
* the movies they are processing, the total number of predictions                   size_t amt = max_slots * sizeof(struct task_table);
* required per movie and the time required to process a movie.                      table = (struct task_table *) malloc(amt);
*                                                                                   if ( table == NULL ) {
* Depending on the value of the movie number arguement this function                       fputs("Cannot allocate job table.\n", stderr);
* either stores the relationship or retrieves the movie associated                         exit(1);
* with the PID.                                                                     }
*
* In the case of either transaction a status table is printed out                 for (lp= 0; lp < max_slots; ++lp) {
* which reflects the current progress of the simulation.                                table[lp].pid      = 0;
*                                                                                       table[lp].movie      = 0;
* \param max_slots The maximum number of subordinate processes                          table[lp].predictions = 0;
*               which are under management.                                             table[lp].start    = 0;     }
mpp-mpred.C4        /** * Main program starts here. */
int main(int argc, char **argv) {
first = false;
}                                                                        /* The following variable controls whether or not movie predictions
/* Add a task to the table. */                                            * are to be run in parallel, ie. each in its own process. */
if ( movie_number != 0 ) {                                               auto bool have_input = false,
for (lp= 0; lp < max_slots; ++lp) {                                       single_threaded = true;
if ( table[lp].pid == 0 ) {
changed            = lp;                              char snbufr[10];
table[lp].pid      = pid;
table[lp].movie      = movie_number;                  int movie_count = 0;
table[lp].predictions = predictions;                  int max_process_slots,
table[lp].start    = now;                                process_count = 0;

print_job_table(max_slots, table, changed, \             pid_t pid;
"Started");                                    time_t run_start, t1, t2;
fflush(stdout);
return;                                                  string data_root = PTREEDATA"/";
}
}                                                                  string corr_root = data_root + "mv_corr/co_mv_";
}                                                                       string supp_root = data_root + "mv_supp/sp_mv_";

/* Remove a task from the table. */                                     string ptree_set_id = data_root + "nf_us_mv_pt";
for (lp= 0; lp < max_slots; ++lp) {                                     string ptree_set_idT = data_root + "nf_mv_us_pt";
if ( table[lp].pid == pid ) {
auto time_t run_time = time(NULL) - table[lp].start;       ifstream inFile1;
auto float per_user = run_time;                            ifstream inFile2;
prediction_count += table[lp].predictions;
auto mppConfig config;
snprintf(msg, sizeof(msg), "Completed: %lu " \
"[%.2f/user] secs.", run_time, \
per_user/table[lp].predictions);                        /* Option parsing. */
print_job_table(max_slots, table, lp, msg);                   auto int gopt;
while ( (gopt = getopt(argc, argv, "C:c:i:n:")) != EOF )
table[lp].pid      = 0;                                       {
table[lp].movie      = 0;                                          switch ( gopt )
table[lp].predictions = 0;                                         {
table[lp].start    = 0;                                                 case 'c':
fprintf(stdout, "\tMovies: %5d\tPredictions: %d\n\n", \                             fprintf(stderr, "%s: Cannot read " \
++movie_count, prediction_count);                                                   "standard configuration - " \
fflush(stdout);                                                                          "%s\n", argv[0], optarg);
return;                                                                             exit(1);
}}}                                                                                          }
break;
case 'C':
mpp-mpred.C5                 /** Load the rating data as two separate sets of PTree's. */
t1=time(NULL);
"cluster configuration - " \                           fputs("\tUser ptrees - ", stderr);
"%s\n", argv[0], optarg);                              if ( !Users.load_binary() ) {
}                                                                        return 1;
break;                                                             }
case 'i':                                                               fputs("identities - ", stderr);
have_input = true;                                                 if ( !Users.load_identities() ) {
break;                                                                   return 1;
case 'n':                                                               }
max_process_slots = atoi(optarg);
break;                                                             fputs("\tMovie ptrees - ", stderr);
}                                                                           if ( !Movies.load_binary() ) {
return 1;
if ( !have_input ) {                                                              }
fprintf(stderr, "%s: No input file specified.\n", argv[0]);                 fputs("completed.\n", stderr);
return 1; }
t2=time(NULL);
if ( !config.is_standard_config() && !config.is_cluster_config() ) {              fprintf(stderr, "Data load completed, time = %u\n\n", t2 - t1);
fprintf(stderr, "%s: No configuration specified.\n", argv[0]);
return 1;                                }                                 ifstream inFile;
fprintf(stderr, "%s: Vertical Rating Predictor - %s\n\n", argv[0], VERSION);      inFile.open(probe.c_str() );
fputs("Data files:\n", stderr);
fprintf(stderr, "\tid:\t%s\n", ptree_set_id.c_str());                             char str[100];
fprintf(stderr, "\tidT:\t%s\n", ptree_set_idT.c_str());                           int last_movie_id = 0,
fprintf(stderr, "\tsupp:\t%s*\n", supp_root.c_str());                                new_movie_id = 0;
fprintf(stderr, "\tcorr:\t%s*\n\n", corr_root.c_str());                           bool last_movie = true;
fprintf(stderr, "\tInput:\t%s\n\n", probe.c_str());
else                                                                              str1.erase(str1.size()-1);
fprintf(stderr, "Mode: %d way multi-processor\n", \                        new_movie_id = atoi(str1.c_str());
max_process_slots);
if ( config.is_standard_config() ) {                                              /* Start of loop over movies begins here. */
auto PredictionConfig *pcfg = config.get_standard_config();                run_start = time(NULL);
fputs("\nPrediction configuration:\n", stderr);
pcfg->print(stderr);     }                                                 for(int movie_cnt= 0; !inFile.eof(); movie_cnt++) {
vector <int> probeUs;
++movie_count;
last_movie_id = new_movie_id;
mpp-mpred.C6                                                                /* Wait for any child processes to complete. */
if ( process_count == max_process_slots ) {
last_movie = true;                                                                                                                       int status;
pid = wait(&status);
/* Check to see if predictions of movies are                                       if ( pid == -1 ) {
while( last_movie && (inFile>>str) ) {                       * to be single-threaded. If so run the                                           perror("FPP wait failed.");
string str1(str);                                       * movie prediction synchronously and then                                        exit(1);
if (str1.at(str1.size() - 1) == ':') {                  * skip to the next movie. */                                               }
str1.erase(str1.size() - 1);                     if ( single_threaded ) {
new_movie_id = atoi(str1.c_str());                      auto time_t now = time(NULL);                                        --process_count;
last_movie = false;                                     auto float start = now;                                              job_table(max_process_slots, pid, 0, 0);
}                                                             fprintf(stderr, "Starting movie: %d, " \
else                                                               "Users: %d, ", M, probeUs.size());                              if ( WIFEXITED(status) == 0 ) {
probeUs.push_back(atoi(str1.c_str()));                                                                                             fprintf(stderr, "\tError in movie, " \
}                                                                  Mpred_User_Predict(config, M, probeUs, user_list);                              "status = %d\n",          \
WEXITSTATUS(status));
/* M is the movie to be predicted. */                              now = time(NULL);                                                    }
t1 = time(NULL);                                                   fprintf(stderr, "Completed: %2.0f "          \                  }
unsigned long int M = last_movie_id - 1;                                "[%.2f/user] secs.\n\n", now - start, \               }
(now - start)/probeUs.size());                        /* Capture all remaining slave processes. */
/* read the pearson correlations for movies                        continue;                                                  do {
* NOTE using pearson not Perp                              }                                                                      int status;
* Try to find bes co-related movie set for                                                                                        pid = wait(&status);
* pmv                                                      /* Start prediction for movie pmv for given                            if ( pid == -1 ) {
*/                                                          * users in probeUser set. Fork a new process and                            fputs("No processes left.\n", stderr);
snprintf(snbufr, sizeof(snbufr), "%d", last_movie_id);       * generate customer predictions in this new fork. */                        process_count = 0;
string sn(snbufr);                                          if ( process_count < max_process_slots ) {                                   continue;
pid = fork();                                                    }
string outCorr1 = corr_root + sn + ".bin";                        if ( pid == -1 ) {                                               --process_count;
inFile1.open( outCorr1.c_str() );                                       perror("FPP fork failed.");                                job_table(max_process_slots, pid, 0, 0);
exit(1);
string outSupp1 = supp_root + sn + ".bin";                        }                                                                  if ( WIFEXITED(status) == 0 ) {
inFile2.open( outSupp1.c_str() );                                                                                                          fprintf(stderr, "\tError in movie, " \
/* Child - process movie and exit. */                                        "status = %d\n",          \
inFile1.read(reinterpret_cast<char*>(&corData), \                  if ( pid == 0 ) {                                                            WEXITSTATUS(status));
17771*sizeof(float));                                           Mpred_User_Predict(config, M, probeUs, \                    }
inFile2.read(reinterpret_cast<char*>(&supData), \                                    user_list);                             } while ( process_count > 0 );
17771*sizeof(short int));                                       _exit(0);                                           inFile.close();
inFile1.close();                                                   }                                                         fputs("\nPredictions completed.\n", stderr);
inFile2.close();                                                                                                             fprintf(stderr, "\tMovies: %d\n", movie_count);
/* Parent - update task table. */                      fprintf(stderr,"\tTime: %d\n", time(NULL) -run_start);
/* Get the list of users who have rated this movie. */             ++process_count;                                          return 0; }
auto PTree user_list = Movies.get_users(M);
job_table(max_process_slots, pid, M, probeUs.size()); }
/** \file
* This file contains the driver code which                  mpp-user.C1
* implements predictions of recommendations. */
// CREATES, OPENS logfile if logging enabled, else NULL returned LOGGING
/* Program compilation defines folloow.                                      #if defined(MOVIE_LOGGING)
*                                                                           static inline FILE * open_logfile(string movie_number) {
* These defines enable and control generation of movie specific logfiles.         auto string logname("./Output/" + probe.substr(probe.find_last_of('/') + 1) + \
* The MOVIE_LOGGING define needs to be enabled to turn on generation of                  "_" + movie_number + ".log"); return(fopen(logname.c_str(), "w+")); }
* logfiles. Other defines increase the amount of output generated.          #else
*/                                                                                static inline FILE * open_logfile(string movie_number) {return NULL;}
#if 0                                                                        #endif
#define MOVIE_LOGGING
#endif                                                                       // ENABLING causes nearest nbr user voting to print for each prediction.
#if 0                                                                        #if defined(VOTE_LOGGING)
#define MEMORY_LOGGING                                                            static inline void print_votes( FILE *logfile, int user, double vote, double weight, \
#endif                                                                                    double vRt, double VBar, double Ub, double voter_corr) {
#if 0                                                                             if ( logfile == NULL ) return;
#define VOTE_LOGGING                                                                     fprintf(logfile, "\t\tVote: %.2f\tWeight: %.2f\tUser: %d\n", vote, weight, user);
#endif                                                                                   fprintf(logfile, "\t\t\tvRt: %.2f\tVbar: %.2f\tUb: %.2f\n", vRt, VBar, Ub);
fprintf(logfile, "\t\t\tCor: %.2f\n\n", voter_corr); return; }
// Include files.                                                            #else
#include <stdio.h>                                                                static inline void print_votes( FILE *logfile, int user, double vote, double weight,\
#include <time.h>                                                                          double vRt, double VBar, double Ub, double voter_corr){ return; }
// Standard C++ includes.                                                    #endif
#include <fstream>
#include <iostream>                                                          // Enabling prints amount of memory consumed against given starting pt.
#include <vector>                                                            #if defined(MEMORY_LOGGING)
#include <map>                                                               static inline void log_memory(FILE *logfile, const char *fmt, void *start) {
#include <utility>                                                                 fprintf(logfile, fmt, (char *) sbrk(0) - (char *) start); return; }
// Local C++ include files.                                                  #else
#include <PTreeSet.H>                                                        static inline void log_memory(FILE *logfile __attribute__ ((unused)), \
#include "mppConfig.H"                                                                            const char *fmt __attribute__ ((unused)), \
#include "UserSet.H"                                                                              void *start __attribute__ ((unused))) { return; }
#include "MovieSet.H"                                                        #endif
/* Standard C include files. */                                              extern int Mpred_User_Predict (mppConfig &config, unsigned long int M, \
#include "mpp.h"                                                                                   vector <int> & user_list, PTree & M_support)
using namespace std;                                                         {
// External variables.                                                             auto void *movie_memory_start;
extern int topMovK, verK;                                                          auto char snbufr[10];
extern bool use_pearson_movies;                                                    auto time_t start_time = time(NULL);
extern float Minimum_User_Correlation;                                             auto unsigned long int U;
extern float corData[17771];                                                       auto FILE *predictions;
extern unsigned short int supData[17771];                                          auto FILE *logfile;
extern string probe;                                                               auto PredictionConfig *pcfg = NULL;
// OPEN log and prediction files.                              mpp-user.C2       supportM.clearbit(U);
supportU.clearbit(M);
snprintf(snbufr, sizeof(snbufr), "%lu", Movies.get_identity(M));
string sn(snbufr);                                                                   if ( supportM.get_count() < 1) {
string outPredName("./Output/"+probe.substr(probe.find_last_of('/')+1)   \                 fprintf(predictions, "%.2f\n", vote);
+ "_" + sn + ".predict");                                                      fflush(predictions);
logfile = open_logfile(sn);                                                                continue;
}
if ( (predictions = fopen(outPredName.c_str(), "w+")) == NULL ) {
fputs("Cannot open prediction file.\n", stderr);                                /* Get configuration information. */
return 0;                                                                       if ( config.is_standard_config() )
}                                                                                           pcfg = config.get_standard_config();
if ( config.is_cluster_config() ) {
fprintf(predictions, "%lu:\n", Movies.get_identity(M));                                     pcfg = config.select_configuration(Users, U);
if ( logfile != NULL ) fflush(logfile);                                                     config.show_selection(logfile);
}
/*
* Write descriptor to output logfile and the number of the movie                     /* Config file needs: (mpp-user part)
* to the prediction file.                                                             * External Pruning:
*/                                                                                    * 1. Reset support in movie-vote call: yes, no.
if ( logfile != NULL )                                                                 *
fprintf(logfile, "\nBeginning movie: %5d\tUsers: %d\t"       \                  * Voting selection:
"PID: %d\n", Movies.get_identity(M), user_list.size(),\                    * 2. Set vote_wt: 0 <= vote_wt <= 1
getpid());                                                                 *             (VOTE_wt = 1 - vote_wt)
* Forcing in Range:
if ( logfile != NULL )                                                                 * 5. Select 0, 1 or 2 force_vote_in_ranges:
movie_memory_start = sbrk(0);                                                   *    user-vote movie-VOTE
*/

/* Select eligible clusters for this movie. */                                        /* User voting.*/
if ( config.is_cluster_config() )                                                     if ( pcfg->do_user_voting() )
config.select_clusters(Movies, M);                                                     vote = user_vote(pcfg, M, supportM, U, supportU);
//if ( vote < 1 ) vote = 1; else if ( vote > 5 ) vote = 5;

/* Loop over users starts here. */                                                    /* Movie voting. */
for (unsigned int user= 0; user < user_list.size(); ++user) {                         if ( pcfg->do_movie_voting() )
auto double vote = DEFAULT_VOTE,                                                        VOTE = movie_vote(pcfg, M, supportM, U, supportU);
VOTE = DEFAULT_VOTE,                                                      //if ( VOTE < 1 ) VOTE = 1; else if ( VOTE > 5 ) VOTE = 5;
vote_wt = 0.0,
VOTE_wt = 0.0;                                                            /* Set user_vote_weight here. */
vote_wt = pcfg->get_user_vote_weight();
U = Users.get_index(user_list[user]);                                            VOTE_wt = 1.0 - vote_wt;
auto PTree supportM(M_support),                                                  vote = (vote * vote_wt + VOTE * VOTE_wt ) / \
supportU = Users.get_movies(U);                                                 (vote_wt + VOTE_wt);
mpp-user.C3
//sumSCor=sumSCor/countdimMN; sumPCor=sumPCor/countdimMN; sumDCor=sumDCor/countdimMN; sumdimMN=sumdimMN/countdimMN;
//sumsCor=sumsCor/countdimUV; sumpCor=sumpCor/countdimUV; sumdCor=sumdCor/countdimUV; sumdimUV=sumdimUV/countdimUV;
// vote=(vote*sumdimUV + VOTE*sumdimMN)/(sumdimUV+sumdimMN);
//auto double red=.4; vote=(vote*exp(-pow(Vsdp,2))+VOTE*exp(-red*pow(Nsdp,2)))/(exp(-pow(Vsdp,2))+exp(-red*pow(Nsdp,2)));
//auto double red=1.0; vote=(vote*exp(-pow(Vsdp,2))+VOTE*red*exp(-pow(Nsdp,2)))/(exp(-pow(Vsdp,2))+red*exp(-pow(Nsdp,2)));
// if ( sumsCor>sumSCor + 0.1 ){ vote=( vote*sumdimUV*(2+sumsCor)+VOTE*sumdimMN*(2+sumSCor) )/( sumdimUV*(2+sumsCor)+sumdimMN*(2+sumSCor)); }
// vote=VOTE;
// if ( Nsdp < 2.0 && Vsdp > 2.0 && sumSCor + .5 > sumsCor ) vote=VOTE;
// if ( Nsdp < 0.5 && Vsdp > 2 ) vote=VOTE;
// vote=(vote*exp(-pow(Vsdp,2) ) + VOTE*exp(-pow(Nsdp,2)))/(exp(-pow(Vsdp,2)) + exp(-pow(Nsdp,2))); //.937465(95)

// Final output occurs here.
if ( (vote < 1) && (vote != DEFAULT_VOTE) )
vote = 1;
if ( (vote > 5) && (vote != DEFAULT_VOTE) )
vote = 5;         // force vote into range

fprintf(predictions, "%.2f\n", vote);
fflush(predictions);

if (logfile != NULL)
fprintf(logfile,"\tPrediction #%d: %0.1f\tuser: %u\t"      \
"config: %s\n\n", user, vote, Users.get_identity(U), \
pcfg->get_name());

} // ULOOP end

if (logfile!=NULL) {
float total_time = time(NULL) - start_time;
fprintf(logfile,"Ending movie: %d\tTime: %.2f [%.2f/user] " \
"secs.\t", Movies.get_identity(M), total_time, \
(float) (total_time/user_list.size()));
log_memory(logfile, "Memory: %d\n", movie_memory_start);
fputs("\n", logfile);
fclose(logfile);
}

fclose(predictions);

return 0;
} // MLOOP end
/** \file   This file contains the implementation of the user voting function.           */

/* Include files. */
User-vote.C1
#include <stdio.h>
#include <math.h>
#include <PTree.H>
#include "MovieSet.H"
#include "UserSet.H"
#include "mppConfig.H"
#include "PredictionConfig.H"
#include "mpp.h"

/* Config file needs:   (user-vote part)
*      uCor Internal Pruning:
*
* 1. Select 0 or 1 of                            dvCorp,    dvCors,     vdCorp,    vdCors,   pCor,   dCor,     sCor
* 1.1 For selected in 1, set Threshold:          dvThrp,    dvThrs,     vdThrp,    vdThrs,   pThr,   dThr,     sThr,
*                      Threshold defaults are:   0          0           0          0         0       0         0
*
*
*      uCor vote weighting:                      (Default uCor=1. By selecting 1 of these, we reset uCor value to it.)
* 2. Select 0 or 1 of                            dvCorp, dvCors, vdCorp, vdCors, pCor, dCor, sCor
*
*      Standard Deviation Internal Pruning:      (population/sample; diffference_of_vectors/vector_of_differences)
*
* 3. Select 0 or more of:                        dUVsdp,              dUVsds,           Vsdp_Usdp,           Vsds_Usds
* 3.1 Foreach selected in 2, set Threshold:      dUVsdpThr,           dUVsdsThr,        Vsdp_UsdpThr,        Vsds_UsdsThr
*                      Threshold defaults are:   0                    0                 0                    0
*
* 3.2 Foreach selected in 2, set pow exp:        dUVsdpExp,           dUVsdsExp,        Vsdp_UsdpExp,        Vsds_UsdsExp
*                 Power Exponent defaults are:   -1                   -1                -1                   -1
*
*      External Pruning:
* 4. Select 0 or more of:                        Prune_Movies_In_SupU,        Prune_Users__In_SupM,      Prune_Movies_In_CoSupUV
* 4.1 Foreach selected in 2, select 1 of:        Prune,                       FastPrune,                 CommonCoSupportPrune
*
* 4.2 Reset non-pruned support in 2nd:           yes,      no.
*
* 4.3 Foreach selected in 2, set parameter:      mstrt, ustrt, TSa, TSb, Tdvp,Tdvs,Tvdp,Tvds,TD,TP,PPm,TV,TSD,Ch, Ct
*                Prune Parameter defaults are:   0      0      -100 -100 -1   -1   -1   -1   -1 -1 .1 -1 -1 1 no def
*
*      Forcing in Range:
* 5. Select 0 or more force_vote_in_range:       in_Voter_LOOP                after_Voter_LOOP               before_return
*/
/**                                                                       auto PTree supM = supportM,
* Public function.
* This function implements user voting.
supU = supportU;       User-vote.C2
supM.clearbit(U);
*                                                                            supU.clearbit(M);
* \param pcfg        A pointer to the class containing the parameters
*              which configure the voting.                                  /* External pruning: PRUNE MOVIES supU */
* \param M          The movie number for which a prediction is to be        external_prune = pcfg->get_user_Prune_Movies_in_SupU();
*              made                                                         if ( external_prune->enabled ) {
*                                                                                 if( supU.get_count() > external_prune->params.Ct )
* \param supportM      The PTree identifying the support for the movie                  do_pruning(external_prune, M, U, supM, supU);
*              to be predicted.                                                   supM.clearbit(U);
* \param U          The identity number of the user for which a                   supU.clearbit(M);
*              prediction is to be made.                                          if( (supM.get_count() < 1) || (supU.get_count() < 1) )
* \param supportU      The Ptree identifying the support for the user                   return vote;
*              who a predication is being made for.                         }
* \return         The recommended prediction.
*/                                                                          /* Reset user support if requested. */
if ( pcfg->reset_user_support() ) {
extern double user_vote(PredictionConfig *pcfg, unsigned long int M, \             supM = supportM;
PTree & supportM,      unsigned long int U, \                       supM.clearbit(U);
PTree & supportU)                                             }
{
/* Enabled for boundary based prediction revisions. */                 /* External pruning: Prune Users supM */
#if 0                                                                        external_prune = pcfg->get_user_Prune_Users_in_SupM();
auto double z0IP55=0, z0IP44=0, z0IP33=0, z0IP22=0, z0IP11=0,          if ( external_prune->enabled ) {
z0IP15=0, z0IP14=0, z0IP13=0, z0IP12=0, z0IP51=0,                     if ( supM.get_count() > external_prune->params.Ct )
z0IP41=0, z0IP31=0, z0IP21=0, z0IP25=0, z0IP24=0,                           do_pruning(external_prune, M, U, supM, supU);
z0IP23=0, z0IP52=0, z0IP42=0, z0IP32=0, z0IP35=0,                     supM.clearbit(U);
z0IP34=0, z0IP53=0, z0IP43=0, z0IP45=0, z0IP54=0;                     supU.clearbit(M);
#endif                                                                             if( (supM.get_count() < 1) || (supU.get_count() < 1) )
return vote;
auto double vote = DEFAULT_VOTE,                                         }
vote_sum = 0,
vote_cnt = 0;                                                     /* VN: VLOOP strt (Vs are user voters)*/
auto unsigned long long int *supMlist = supM.get_indexes();
auto double Vb,
Ub,                                                               for (unsigned long long int v= 0; v < supM.get_count(); ++v) {
dsSq,                                                                  auto unsigned long long int V = supMlist[v];
uCor = 1;
auto double MV = Users.get_rating(V, M) - 2,
struct pruning *internal_prune;                                                      max = 0,
struct external_prune *external_prune;                                               smV = 0, smU = 0,
UU = 0, UV = 0, VV = 0,
dm;
auto PTree csUV = supU & Users.get_movies(V);                                    Vb = smV / dm;         Ub = smU / dm;
csUV.clearbit(M);
dm = csUV.get_count();
dsSq = VV - 2*UV + UU;
vote = MV - Vb + Ub;
User-vote.C3
if( dm < 1) continue;
/* turn on only if doing Inner-Product Boundary-Based prediction revisions */     /* SAMPLE-statistic-based pruning through early exit. */
#if 0                                                                             if( dm > 1) {
auto double      S1=0, S2=0, S3=0, S4=0, S5=0,                               /* method dUVsds */
C1=0, C2=0, C3=0, C4=0, C5=0,                                       internal_prune = pcfg->get_internal_prune(user_dUVsds);
A1=0, A2=0, A3=0, A4=0, A5=0,                                       if ( internal_prune->enabled ) {
S11=0, S22=0, S33=0, S44=0, S55=0,                                         auto double dUVsds,
C11=0, C22=0, C33=0, C44=0, C55=0,                                                 thr = internal_prune->threshold,
A11=0, A22=0, A33=0, A44=0, A55=0,                                                 expnt = internal_prune->exponent;
smN=0, smM=0, NN=0, MN=0, MM=0;                                            dUVsds = pow((dsSq-dm*(Vb-Ub)*(Vb-Ub))/(dm-1),.5);
#endif                                                                                        if( dUVsds > (thr * pow(dm, expnt)) ) continue;
/* External pruning: PRUNE MOVIES CoSupUV */                                 }
external_prune = pcfg->get_user_Prune_Movies_in_CoSupUV();                   /* method Usds_Vsds. NO exponent. */
if ( external_prune->enabled ) {                                             internal_prune = pcfg->get_internal_prune(user_Vsds,Usds);
if( csUV.get_count() > external_prune->params.Ct )                     if ( internal_prune->enabled )             {
do_pruning(external_prune, M, U, supM, csUV);                           auto double Usds, Vsds, thr=internal_prune->threshold;
csUV.clearbit(M); supM.clearbit(U);                                          Usds = pow((UU-dm*Ub*Ub)/(dm-1), 0.5);
dm = csUV.get_count();                                                              Vsds = pow((VV-dm*Vb*Vb)/(dm-1), 0.5);
if( dm < 1 ) continue;         }                                                    if( Vsds > (thr * Usds) ) continue; }
/* e.g., -10 is exponent. */
/* VN: NLOOP strt (Ns are movie vector_space_dimensions) */
/* e.g., 0 in if statement is threshold. */
auto unsigned long long int *csUVlist = csUV.get_indexes();
internal_prune = pcfg->get_internal_prune(user_dvCors);
for (unsigned long long int n= 0; n < csUV.get_count(); ++n) {
if ( internal_prune->enabled )                        {
auto unsigned long long int N = csUVlist[n];
auto double dvCors, Usds, Vsds,
auto double NU = Users.get_rating(U, N) - 2,
thr = internal_prune->threshold,
NV = Users.get_rating(V, N) - 2;
expnt = internal_prune->exponent;
if( pow(NU-NV, 2) > max) max = pow(NU-NV, 2);
Usds = pow((UU-dm*Ub*Ub)/(dm-1), 0.5);
Vsds = pow((VV-dm*Vb*Vb)/(dm-1), 0.5);
smV += NV; smU += NU;
dvCors = exp(expnt * (Vsds-Usds)*(Vsds-Usds));
UU += NU * NU; UV += NU * NV; VV += NV * NV;
if ( dvCors < thr ) continue;
if ( internal_prune->weight ) uCor = dvCors;   }
//turn on only if doing Inner-Product Boundary-Based prediction revisions                    internal_prune = pcfg->get_internal_prune(user_vdCors);
#if 0                                                                                         if ( internal_prune->enabled ) {
if(NU==1&&NV>0){S1+=NV;++C1;}else{                                                   auto double vdCors, dUVsds,
if(NU==2&&NV>0){S2+=NV;++C2;}else{                                                            thr = internal_prune->threshold,
if(NU==3&&NV>0){S3+=NV;++C3;}else{                                                            expnt = internal_prune->exponent;
if(NU==4&&NV>0){S4+=NV;++C4;}else{                                               dUVsds=pow((dsSq-dm*(Vb-Ub)*(Vb-Ub))/(dm-1),.5);
if(NU==5&&NV>0){S5+=NV;++C5;} }}}}                                               vdCors = exp(expnt * dUVsds * dUVsds);
#endif                                                                                           if ( vdCors < thr ) continue;
}                                                                                                if ( internal_prune->weight ) uCor=vdCors; } }
/* POPULATION-statistics-based pruning through early exit. */        internal_prune = pcfg->get_internal_prune(user_vdCorp);
if( dm > 0 ) {
internal_prune = pcfg->get_internal_prune(user_dUVsdp);        if ( internal_prune->enabled ) {
User-vote.C4
if ( internal_prune->enabled ) {                                      auto double vdCorp, dUVsdp,
auto double dUVsdp,                                                    thr = internal_prune->threshold,
thr = internal_prune->threshold,                                expnt = internal_prune->exponent;
expnt = internal_prune->exponent;
dUVsdp=pow(dm*dsSq-(smV-smU)*(smV-smU),.5)/dm;                 dUVsdp = pow(dm*dsSq-(smV-smU)*(smV-smU), .5) \
if ( dUVsdp > thr * pow(dm, expnt) ) continue;                      / dm;
}                                                                     vdCorp = exp(expnt * dUVsdp * dUVsdp);
/* method Usds_Vsds */                                             if ( vdCorp < thr) continue;
// Usdp=pow(dm*UU-smU*smU,.5)/dm;                                  if ( internal_prune->weight )
// Vsdp=pow(dm*VV-smV*smV,.5)/dm;                                         uCor = vdCorp;
// if( Vsdp > 0.5 * Usdp )continue;                           }
// Threshold is 0.5                                       }
// No exponent
internal_prune = \                                        /* OTHER Correlation pruning
pcfg->get_internal_prune(user_Vsdp_Usdp);           * (pearson=s, pureshift=p, distance=d) */
if ( internal_prune->enabled ) {                          internal_prune = pcfg->get_internal_prune(user_sCor);
auto double Usdp, Vsdp,                            if ( internal_prune->enabled ) {
thr = internal_prune->threshold;                  auto double sCor,
Usdp = pow(dm*UU - smU*smU, 0.5) / dm;                           thr = internal_prune->threshold;
Vsdp = pow(dm*VV - smV*smV, 0.5) / dm;
if ( Vsdp > thr * Usdp ) continue;                     sCor = (UV - dm*Ub*Vb)/(.0001 +                \
}                                                                              (pow((UU-dm*pow(Ub,2)),0.5))* \
(.0001+pow((VV-dm*pow(Vb,2)),.5)));
// e.g., Threshold: 0.9                                       if ( sCor < thr ) continue;
// e.g., Exponent: -10                                        if ( internal_prune->weight ) uCor = sCor;
// dvCorp=exp(-10 *(Vsdp-Usdp) * (Vsdp-Usdp));            }
// if ( dvCorp < .9 ) continue;
// uCor=dvCorp;                                           internal_prune = pcfg->get_internal_prune(user_pCor);
internal_prune = pcfg->get_internal_prune(user_dvCorp);   if ( internal_prune->enabled ) {
auto double OnePDS,
if ( internal_prune->enabled ) {                                        pCor = -1,
auto double dvCorp, Usdp, Vsdp,                                  thr = internal_prune->threshold,
thr = internal_prune->threshold,                        expnt = internal_prune->exponent;
expnt = internal_prune->exponent;
Usdp = pow(dm*UU - smU*smU, 0.5) / dm;                 OnePDS = dsSq - dm*pow(Vb-Ub, 2);
Vsdp = pow(dm*VV - smV*smV, 0.5) / dm;                 if ( max > 0 )
dvCorp = exp(expnt * (Vsdp-Usdp)*(Vsdp-Usdp));            pCor=exp(expnt*OnePDS/(pow(max,.75)*pow(dm,.5)));
if ( dvCorp < thr ) continue;                          if ( pCor < thr ) continue;
if ( internal_prune->weight ) uCor = dvCorp;           if ( internal_prune->weight ) uCor = pCor;
}                                                         }
User-vote.C5
internal_prune = pcfg->get_internal_prune(user_dCor);
if ( internal_prune->enabled ) {
auto double dCor, OnePDS,
thr = internal_prune->threshold;

OnePDS = dsSq - dm*pow(Vb-Ub, 2);
dCor = exp(-dsSq / 100);
if ( dCor < thr ) continue;
if ( internal_prune->weight ) uCor = dCor;
}
/* Turn on for boundary based predication revisions. */
#if 0
if(C1>0&&C2+C3+C4+C5>0) {A1=S1/C1; A11=(S2+S3+S4+S5)/(C2+C3+C4+C5); z0IP11+=(A1-((A1+A11)/2))*(MV-((A1+A11)/2));}
if(C1>0&&C2>0)         {A1=S1/C1; A2=S2/C2;                z0IP12+=(A1-((A1+A2 )/2))*(MV-((A1+A2 )/2));}
if(C1>0&&C3>0)         {A1=S1/C1; A3=S3/C3;                z0IP13+=(A1-((A1+A3 )/2))*(MV-((A1+A3 )/2));}
if(C1>0&&C4>0)         {A1=S1/C1; A4=S4/C4;                z0IP14+=(A1-((A1+A4 )/2))*(MV-((A1+A4 )/2));}
if(C1>0&&C5>0)         {A1=S1/C1; A5=S5/C5;                z0IP15+=(A1-((A1+A5 )/2))*(MV-((A1+A5 )/2));}
z0IP51=-z0IP15; z0IP41=-z0IP14; z0IP31=-z0IP13; z0IP21=-z0IP12;

if(C2>0&&C1+C3+C4+C5>0) {A2=S2/C2; A22=(S1+S3+S4+S5)/(C1+C3+C4+C5); z0IP22+=(A2-((A2+A22)/2))*(MV-((A2+A22)/2));}
if(C2>0&& C3>0)       {A2=S2/C2; A3=S3/C3;         z0IP23+=(A2-((A2+A3 )/2))*(MV-((A2+A3 )/2));}
if(C2>0&& C4>0)       {A2=S2/C2; A4=S4/C4;         z0IP24+=(A2-((A2+A4 )/2))*(MV-((A2+A4 )/2));}
if(C2>0&& C5>0)       {A2=S2/C2; A5=S5/C5;         z0IP25+=(A2-((A2+A5 )/2))*(MV-((A2+A5 )/2));}
z0IP32=-z0IP23; z0IP42=-z0IP24; z0IP52=-z0IP25;

if(C3>0&&C1+C2+C4+C5>0) {A3=S3/C3; A33=(S1+S2+S4+S5)/(C1+C2+C4+C5); z0IP33+=(A3-((A3+A33)/2))*(MV-((A3+A33)/2));}
if(C3>0&& C4>0)       {A3=S3/C3; A4=S4/C4;         z0IP34+=(A3-((A3+A4 )/2))*(MV-((A3+A4 )/2));}
if(C3>0&& C5>0)       {A3=S3/C3; A5=S5/C5;         z0IP35+=(A3-((A3+A5 )/2))*(MV-((A3+A5 )/2));}
z0IP43=-z0IP34; z0IP53=-z0IP35;

if(C4>0&&C1+C2+C3+C5>0) {A4=S4/C4; A44=(S1+S2+S3+S5)/(C1+C2+C3+C5); z0IP44+=(A4-((A4+A44)/2))*(MV-((A4+A44)/2));}
if(C4>0&& C5>0)   {A4=S4/C4; A5=S5/C5;             z0IP45+=(A4-((A4+A5 )/2))*(MV-((A4+A5 )/2));}
z0IP54=-z0IP45;

if(C5>0&&C1+C2+C3+C4>0) {A5=S5/C5; A55=(S1+S2+S3+S4)/(C1+C2+C3+C4);             z0IP55+=(A5-((A5+A55)/2))*(MV-((A5+A55)/2));}

//auto double MU = Users.get_rating(U,M)-2; fprintf(stderr,"MU=%1.0f %8.1f %8.1f    %8.1f \n", MU,z0IP55,z0IP11,z0IP51);
//auto double MU = Users.get_rating(U,M)-2; fprintf(stderr,"MU=%1.0f %5.1f %5.1f %5.1f %5.1f %5.1f %5.1f %5.1f %5.1f %5.1f %5.1f %5.1f %5.1f %5.1f %5.1f \
%5.1f\n",MU,z0IP11,z0IP22,z0IP33,z0IP44,z0IP55,z0IP12,z0IP13,z0IP14,z0IP15,z0IP23,z0IP24,z0IP25,z0IP34,z0IP35,z0IP45);
#endif
if ( uCor > 0 ) {
vote_sum += vote*uCor;
vote_cnt += uCor;
User-vote.C6
} else continue;
/* Check and implement forcing of vote in the user loop. */
if ( pcfg->user_vote_force_in_loop() ) {
if( (vote < 1) && (vote != DEFAULT_VOTE) ) vote = 1;
if( (vote > 5) && (vote != DEFAULT_VOTE) ) vote = 5;
}
}
if ( vote_cnt > 0 ) vote = vote_sum / vote_cnt;
else vote = DEFAULT_VOTE;

/* force_vote_after_Voter_Loop goes here. */
if ( pcfg->user_vote_force_after_loop() ) {
if( (vote < 1) && (vote != DEFAULT_VOTE) ) vote=1;
if( (vote > 5) && (vote != DEFAULT_VOTE) ) vote=5;
}
/* Turn on only if doing Inner-Product Boundary-Based prediction revisions. */
#if 0
//Boundary-Based-Inner-Product vote CHANGE start
if (
z0IP55>-.01                       //&& z0IP55> z0IP33 && z0IP55> z0IP44
&& z0IP51>-.01            //&& z0IP52> .1     && z0IP53> THRZ0        && z0IP54> THRZ0
)      vote=5;
#endif
#if 0                                           //Boundary-Based-Inner-Product vote CHANGE start
auto double FACZ0=-0.1, THRZ0=-0.1 ;
//fauto double FACZ0= 0.40, THRZ0=0.7, z0IP51=-z0IP15, z0IP52=-z0IP25, z0IP53=-z0IP35, z0IP54=-z0IP54;
#if 1                                           //Change vote to 5?
if ( true
&& z0IP55> FACZ0 + z0IP11 && z0IP55> FACZ0+z0IP22 && z0IP55> FACZ0+z0IP33 && z0IP55> FACZ0 + z0IP44
&& z0IP51> THRZ0                && z0IP52> THRZ0          && z0IP53> THRZ0           && z0IP54> THRZ0
)      vote=5;
#endif
#if 1                                           //Change vote to 1?
if ( true
&& z0IP11>(FACZ0 )*z0IP22 && z0IP11>(FACZ0 )*z0IP33 && z0IP11>(FACZ0 )*z0IP44 && z0IP11>(FACZ0 )*z0IP55
&& z0IP12> THRZ0                 && z0IP13> THRZ0             && z0IP14> THRZ0            && z0IP15> THRZ0
) vote=1;
#endif
#endif                                            //Boundary-Based-Inner-Product vote CHANGE end
return vote;
}
/** \file This file contains the implementation of the movie voting algorithem. */
/* Include files. */
#include <stdio.h>
#include <PTree.H>
#include "MovieSet.H"
#include "UserSet.H"
#include "mppConfig.H"
#include "PredictionConfig.H"                                     movie-vote.C1
#include "mpp.h"

/* Config file needs: (movie-vote part)
* UCor Internal Pruning:
* 1. Select 0 or 1 of                DVCorp, DVCors, VDCorp, VDCors, PCor, DCor, SCor
* 1.1 For selected in 1, set Threshold:    DVThrp, DVThrs, VDThrp, VDThrs, PThr, DThr, SThr
*              Threshold defaults are: 0     0      0     0     0    0 0
* UCor VOTE weighting:                    (Default is UCor=1. By selecting 1 of these, we reset UCor's value to it.)
* 2. Select 0 or 1 of                DVCorp, DVCors, VDCorp, VDCors, PCor, DCor, SCor
* Standard Deviation Internal Pruning: (population/sample; diffference_of_vectors/vector_of_differences)
* 3. Select 0 or more of:              dMNsdp,        dMNsds,       Nsdp_Msdp, Nsds_Msds
* 3.1 Foreach selected in 2, set Threshold: dMNsdpThr, dMNsdsThr,            Nsdp_MsdpThr, Nsds_MsdsThr
*              Threshold defaults are: 0          0         0         0
* 3.2 Foreach selected in 2, set pow exp: dMNsdpExp,          dMNsdsExp,     Nsdp_MsdpExp, Nsds_MsdsExp
*           Power Exponent defaults are: -1           -1        -1        -1
* External Pruning:
* 4. Select 0 or more of:              Prune_Users_In_SupM, Prune_Movies_In_SupU, Prune_Users_In_CoSupMN
* 4.1 Foreach selected in 2, select 1 of: Prune,             FastPrune,        CommonCoSupportPrune
* 4.2 Reset non-pruned support in 2nd:       yes, no.
* 4.3 Foreach selected in 2, set parameter: mstrt, ustrt, TSa, TSb, Tdvp,Tdvs,Tvdp,Tvds,TD,TP,PPm,TV,TSD,Ch, Ct
*          Prune Parameter defaults are: 0     0    -100 -100 -1 -1 -1 -1 -1 -1 .1 -1 -1 1 no def
* Forcing in Range:
* 5. Select 0,1 or 2 force_vote_in_ranges: in_Voter_LOOP(for each voter) outside_Voter_LOOP (for composite VOTE)
*/

/**
* Public function.
* This function implements movie voting.
* \param pcfg       A pointer to the class containing the parameters
*              which configure the voting.
* \param M         The movie number for which a prediction is to be made
* \param supportM     The PTree identifying the support for the movie to be predicted.
* \param U         The identity number of the user for which a prediction is to be made.
* \param supportU     The Ptree identifying the support for the user who a predication is being made for.
* \return        The recommended prediction.
*/
extern double movie_vote(PredictionConfig *pcfg, unsigned long int M, \    /* NV: NLOOP strt (Ns are movie voters) */
PTree & supportM, unsigned long int U,    \                     auto unsigned long long int *supUlist = supU.get_indexes();
PTree & supportU)                                               for (unsigned long long int nn= 0; nn < supU.get_count(); ++nn) {
auto unsigned long long int N = supUlist[nn];
{
auto double vote = DEFAULT_VOTE,                                                auto double NU = Users.get_rating(U,N)-2,
VOTE = DEFAULT_VOTE,
VOTE_sum = 0, VOTE_cnt = 0;                         movie-vote.C2              MAX = 0,
smN = 0, smM = 0,
auto double Nb, Mb,                                                                    MM = 0, MN = 0, NN = 0,
dsSq,                                                                          dm;
UCor = 1;
struct pruning *internal_prune;                                                 auto PTree csMN = supM & Movies.get_users(N);
struct external_prune *external_prune;                                          csMN.clearbit(U);
dm = csMN.get_count();
auto PTree supM = supportM,                                                     if( dm < 1 ) continue;
supU = supportU;
supM.clearbit(U); supU.clearbit(M);                                             /* External pruning: PRUNE USERS CoSupMN */
external_prune = pcfg->get_movie_Prune_Users_in_CoSupMN();
/* External pruning: Prune Users supM */                                        if ( external_prune->enabled ) {
external_prune = pcfg->get_movie_Prune_Users_in_SupM();                               if( csMN.get_count() > external_prune->params.Ct)
if ( external_prune->enabled ) {                                                            do_pruning(external_prune, M, U, csMN, supU);
if( supM.get_count() > external_prune->params.Ct)                               csMN.clearbit(U);
do_pruning(external_prune, M, U, supM, supU);                            supU.clearbit(M);
supM.clearbit(U); supU.clearbit(M);                                             dm = csMN.get_count();
if ( (supM.get_count() < 1) || (supU.get_count() < 1) )                         if( dm < 1) continue;
return vote;                                                       }
}
/* NV: VLOOP strt (Vs are user vector_space_dimensions) */
/* Reset support if requested. */                                               auto unsigned long long int *csMNlist = csMN.get_indexes();
if ( pcfg->reset_movie_support() ) {
supU = supportU;                                                          for (unsigned long long int v= 0; v < csMN.get_count(); ++v) {
supU.clearbit(M);                                                              auto unsigned long long int V = csMNlist[v];
}
auto double MV = Users.get_rating(V,M) - 2,
/* External pruning: Prune Movies supU */                                                   NV = Users.get_rating(V,N) - 2;
external_prune = pcfg->get_movie_Prune_Movies_in_SupU();
if ( external_prune->enabled ) {                                                     if( pow(MV-NV, 2) > MAX ) MAX = pow(MV-NV, 2);
if( supU.get_count() > external_prune->params.Ct )                             smN += NV;    smM += MV;
do_pruning(external_prune, M, U, supM, supU);                            MM += MV * MV; MN += NV * MV; NN += NV * NV;
supM.clearbit(U);                                                         }
supU.clearbit(M);
if( (supM.get_count() < 1) || (supU.get_count() < 1) )                    Nb = smN / dm; Mb = smM / dm;
return vote;                                                        dsSq = NN - 2*MN + MM;
}                                                                               VOTE = NU - Nb + Mb;
/* force_vote_in_Voter_Loop goes here. */                                  internal_prune = \
if ( pcfg->movie_vote_force_in_loop() ) {                                             pcfg->get_internal_prune(movie_VDCors);
if ( (VOTE < 1) && (VOTE != DEFAULT_VOTE) ) VOTE=1;                     if ( internal_prune->enabled ) {
if ( (VOTE > 5) && (VOTE != DEFAULT_VOTE) ) VOTE=5;                            auto double VDCors, dMNsds,
}                                                                                            thr = internal_prune->threshold,
/* SAMPLE-statistic-based pruning through early exit. */                                     expnt = internal_prune->exponent;
if( dm > 1 ) {
/* method dMNsds */                              movie-vote.C3                   dMNsds=pow((dsSq-dm*(Nb-Mb)*(Nb-Mb))/(dm-1),.5);
internal_prune = \
pcfg->get_internal_prune(movie_dMNsds);                                  VDCors = exp(expnt * dMNsds * dMNsds);
if ( internal_prune->enabled ) {                                                if ( VDCors < thr ) continue;
auto double dMNsds,                                                      if ( internal_prune->weight ) UCor = VDCors;
thr = internal_prune->threshold,                           }
expnt = internal_prune->exponent;                    }
/* POPULATION-statistics-based pruning through early exit. */
dMNsds = pow((dsSq-dm*(Nb-Mb)*(Nb-Mb))/(dm-1),\                    if ( dm > 0 ) { internal_prune = \
0.5);                                                                    pcfg->get_internal_prune(movie_dMNsdp);
if( dMNsds > (thr * pow(dm, expnt)) ) continue;                          if ( internal_prune->enabled ) {
}                                                                                   auto double dMNsdp,thr=internal_prune->threshold;

/* method Msds_Nsds NO exponent. */                                                 dMNsdp=pow(dm*dsSq-(smN-smM)*(smN-smM),.5)/dm;
internal_prune = \                                                                  if ( dMNsdp > (thr * pow(dm,0.9)) ) continue;
pcfg->get_internal_prune(movie_Nsds_Msds);                           }
if ( internal_prune->enabled ) {                                            /* method Usds_Vsds */
auto double Msds, Nsds,                                              internal_prune = \
thr = internal_prune->threshold;                                    pcfg->get_internal_prune(movie_Nsdp_Msdp);
Msds = pow((MM-dm*Mb*Mb)/(dm-1), 0.5);                               if ( internal_prune->enabled ) {
Nsds = pow((NN-dm*Nb*Nb)/(dm-1), 0.5);                                      auto double Nsdp, Msdp,
if ( Nsds > (thr * Msds) ) continue;                                                 thr = internal_prune->threshold;
}                                                                                  Msdp = pow(dm*MM - smM*smM, 0.5) / dm;
Nsdp = pow(dm*NN - smN*smN, 0.5) / dm;
internal_prune = \                                                                 if( Nsdp > (thr * Msdp) ) continue;
pcfg->get_internal_prune(movie_DVCors);                              }
if ( internal_prune->enabled ) {                                            internal_prune = \
auto double Msds, Nsds, DVCors,                                             pcfg->get_internal_prune(movie_VDCorp);
thr = internal_prune->threshold,                              if ( internal_prune->enabled ) {
expnt = internal_prune->exponent;                                    auto double DVCorp, Msdp, Nsdp,
thr = internal_prune->threshold,
Msds = pow(dm*MM - smM*smM, 0.5) / dm;                                                  expnt = internal_prune->exponent;
Nsds = pow(dm*NN - smN*smN, 0.5) / dm;                                         Msdp = pow(dm*MM - smM*smM, 0.5) / dm;
DVCors = exp(expnt * (Nsds-Msds)*(Nsds-Msds));                                 Nsdp = pow(dm*NN - smN*smN, 0.5) / dm;
if ( DVCors < thr ) continue;                                                  DVCorp = exp(expnt * (Nsdp-Msdp)*(Nsdp-Msdp));
if ( internal_prune->weight ) UCor = DVCors;                                   if ( DVCorp < thr ) continue;
}                                                                                  if ( internal_prune->weight ) UCor = DVCorp;
if ( internal_prune->enabled ) {
auto double VDCorp, dMNsdp,
thr = internal_prune->threshold,
expnt = internal_prune->exponent;

dMNsdp=pow(dm*dsSq-(smN-smM)*(smN-smM),.5)/dm;
VDCorp = exp(expnt * dMNsdp * dMNsdp);
if ( VDCorp < thr ) continue;                movie-vote.C4
if ( internal_prune->weight ) UCor = VDCorp;
}
}

/* OTHER Correlation pruning (pearson=s,pureshift=p,distance=d)*/
internal_prune = pcfg->get_internal_prune(movie_SCor);
if ( internal_prune->enabled ) {
auto double SCor, thr=internal_prune->threshold;
SCor= (MN-dm*Mb*Nb)/(.0001+(pow((MM-dm*pow(Mb,2)),.5))
* (.0001+pow((NN-dm*pow(Nb, 2)),.5)));
if ( SCor < thr ) continue;
if ( internal_prune->weight ) UCor = SCor;                      /* force_vote_in_Voter_Loop goes here. */
}                                                                               if ( pcfg->movie_vote_force_in_loop() ) {
if ( (VOTE < 1) && (VOTE != DEFAULT_VOTE) )
/* CHECK for exponent */                                                                    VOTE=1;
internal_prune = pcfg->get_internal_prune(movie_PCor);                                if ( (VOTE > 5) && (VOTE != DEFAULT_VOTE) )
if ( internal_prune->enabled ) {                                                            VOTE=5;
auto double ONEPDS, PCor = 1,                                            }
thr = internal_prune->threshold;
ONEPDS = dsSq - dm * pow(Nb-Mb, 2);                                }
if (MAX>0) PCor=exp(-.1*ONEPDS/(pow(MAX,.75)*pow(dm,.5)));
if( PCor < thr ) continue;                                         if ( VOTE_cnt > 0 )
if ( internal_prune->weight ) UCor = PCor;                               VOTE = VOTE_sum / VOTE_cnt;
}                                                                         else
VOTE = DEFAULT_VOTE;
internal_prune = pcfg->get_internal_prune(movie_DCor);
if ( internal_prune->enabled ) {                                          /* force_vote_after_Voter_Loop goes here. */
auto double DCor, ONEPDS,                                          if ( pcfg->movie_vote_force_after_loop() ) {
thr = internal_prune->threshold;                                if ( (VOTE < 1) && (VOTE != DEFAULT_VOTE) )
ONEPDS = dsSq - dm*pow(Nb-Mb, 2);                                              VOTE=1;
DCor = exp(-dsSq / 100);                                                 if ( (VOTE > 5) && (VOTE != DEFAULT_VOTE) )
if ( DCor < thr ) continue;                                                    VOTE=5;
if ( internal_prune->weight ) UCor = DCor;                         }
}
if (UCor>0) {VOTE_sum += VOTE*UCor; VOTE_cnt+=UCor;                       return VOTE;
} else continue;                                                    }
/** \file contains implementations of routines
*       for pruning user and movie voting lists. */
/* Set the starting point based on the specificed start point
* and a multiplier if it is specified. If the starting point   Prune.C1
* exceeds the support count start at the beginning of the
/* Standard C++ include files. */                                                         * support list. */
#include <map>                                                                           start = start + (unsigned long long int) (mult * supcnt);
#include <vector>                                                                        if ( start > supcnt )
#include <unistd.h>                                                                             start = (unsigned long long int) (mult * supcnt);
#include <stdlib.h>                                                                      if ( start > supcnt ) start = 0;

/* Local C++ include files. */                                                           /* The simple case is a start of zero. */
#include <PTree.H>                                                                       if ( start == 0 )                    {
#include "UserSet.H"                                                                            for (unsigned long long int lp= 0; lp < supcnt; ++lp)
#include "MovieSet.H"                                                                                 list.push_back(indexes[lp]); }
#include "mppConfig.H"                                                                   /* Two loop passes are needed for a non-zero start value. */
#include "mpp.h"                                                                         for (unsigned long long int lp= start; lp < supcnt; ++lp)
list.push_back(indexes[lp]);
/* Global accessible variables. */
extern float corData[17771];                                                             for (unsigned long long int lp= 0; lp < start; ++lp)
list.push_back(indexes[lp]);
using namespace std;                                                                     return;
}
/* Shorthand type definition for the correlation map. */
typedef multimap<double, unsigned long long int, greater<double > > map_t;         /* Private function.
* This function verifies whether or not a voting entity is within a
/* Private function.                                                                * selection window. A selection window is defined by a minimum (leftside)
*                                                                                  * voter window and a window size.
* This function loads a vector with a list of support indexes from                 * \param voter         The voter being considered.
* the given PTree. The list contains N elements where N is the support             *
* count. The actual order of the list is determined by the start and               * \param pp            A pointer to the structure containing the
* multiplier values passed in from the caller.                                     *                leftside and width parameters for a pruning method.
*                                                                                  * \return           A boolean value is returned if the voter is
* \param suptree      A reference to PTree whose support list is to be generated. *                 within the selection window. A false value
* \param list       A reference to vector loaded with support indexes.             *                is automatically returned if the width value
* \param start      The starting element in the support list which                 *                is set to zero. Setting the width value to
*              will be 0th element in the completed support list.                  *                zero thus disables window based selection.
* \param mult        The multiplier value to be used in determining                */
*              the support starting point.                                        static bool outside_window(unsigned long long int voter, \
*/                                                                                                    struct pruning_parameters *pp)
static void load_support_vector(PTree & suptree,                   \               {
vector<unsigned long long int> & list, \                            if ( pp->width == 0 ) return false;
unsigned long long int start, double mult)                          if ( voter < pp->leftside ) return true;
{                                                                                        if ( voter > pp->leftside + pp->width ) return true;
auto unsigned long long int *indexes = suptree.get_indexes(),                      return false;
supcnt = suptree.get_count();                              }
/* Private function.
* This function implements the final step in 'pruning' of a PTree. It
auto PTree csMN = supM&Movies.get_users(N);
if( csMN.get_count() < 1 ) continue;            Prune.C2
* clears the destination PTree and then sets only those bits in the PTree            /* moviePRUNE (NV loops) VLOOP start */
* which have been selected by a previous correlation strategy.                       auto vector<unsigned long long int> ilp;
* \param tree          A reference to the PTree which is reflect the                 load_support_vector(csMN, ilp, pp->ustrt, pp->ustrt_mult);
*                 contents of the multimap.                                          for (unsigned long long int lp1= 0; lp1 < ilp.size(); ++lp1) {
* \param index_map The map specifying the index bits to be set.                            auto unsigned long long int V = ilp[lp1];
* \param max_count Maximum number of indexes to be selected from PTree.     #if 0
*/                                                                                         if ( outside_window(V, pp) ) continue;
static void load_ptree(PTree & tree, map_t index_map, double max_count)      #endif
{                                                                                           MV = Movies.get_rating(V, M) - 2;
map_t::iterator index_ptr = index_map.begin();                                        NV = Movies.get_rating(V, N) - 2;
if ( index_map.size() < max_count )                                                   if(pow(MV-NV,2)>max) max=pow(MV-NV,2);
max_count = index_map.size();                                                  smM += MV; smN += NV;
tree.clearall();                                                                      MM += MV*MV; NN += NV*NV; MN += MV*NV;
for (unsigned int lp= 0; lp < max_count; ++lp) {                                }
tree.setbit(index_ptr->second);                                    dm=csMN.get_count(), Mb=smM/dm,
++index_ptr;                                                       Nb=smN/dm, dsSq=NN-2*MN+MM, OnePDS=dsSq-dm*pow(Nb-Mb,2),
}                                                                         sCor=(MN-dm*Mb*Nb)/(.0001+
return;                                                                            (pow((MM-dm*pow(Mb,2)),.5))*(pow((NN-dm*pow(Nb,2)),.5))),
}                                                                               dCor=exp(-dsSq/100),
pCor=1; if(max>0)pCor=exp(-pp->PPm*OnePDS/(.0001+pow(max,.75)*pow(dm,.5)));
/* Movie prune standard. */                                                     if(dm>0){Nsdp=pow(dm*NN-smN*smN,.5)/dm;
/* movie_vote: Prune */                                                               Msdp=pow(dm*MM-smM*smM,.5)/dm;
static void mPrune(unsigned long long int M,                                          dMNsdp=pow(dm*dsSq-(smN-smM)*(smN-smM),.5)/dm;}
PTree & supM, PTree & supU, struct pruning_parameters *pp)        if(dm>1){Nsds=pow((NN-dm*Nb*Nb)/(dm-1),.5);
{                                                                                     Msds=pow((MM-dm*Mb*Mb)/(dm-1),.5);
if ( supU.get_count() < (pp->Ct + 1) ) return;                                  dMNsds=pow((dsSq-dm*(Nb-Mb)*(Nb-Mb))/(dm-1),.5);}
map_t corRm;                                                                    dvCorp=exp(-10 * (Nsdp-Msdp) * (Nsdp-Msdp) );
auto vector<unsigned long long int> support;                                    dvCors=exp(-10 * (Nsds-Msds) * (Nsds-Msds) );
vdCorp=exp(-10 * dMNsdp * dMNsdp );
/* moviePRUNE (NV loops) NLOOP start */                                           vdCors=exp(-10 * dMNsds * dMNsds );
for (unsigned long long int lp= 0; lp < support.size(); lp++) {                  if( pp->Ch == 1) mCor = corData[N+1];
auto unsigned long long int N = support[lp];                                if( pp->Ch == 2) mCor = sCor;     if( pp->Ch == 3) mCor = dCor;
if( pp->Ch == 4) mCor = pCor; if( pp->Ch == 5) mCor=vCor;
if ( outside_window(N, pp) ) continue;                                      if( pp->Ch == 6) mCor = stdCor; if( pp->Ch == 7 ) mCor = dvCorp;
auto double smM = 0, smN = 0, MM = 0, NN = 0, MN = 0,                       if( pp->Ch == 8 ) mCor = dvCors; if( pp->Ch == 9 ) mCor = vdCorp;
MV, NV, max=0, dm, Mb, Nb, dsSq, OnePDS,                           if( pp->Ch == 0 ) mCor = vdCors;
Nsdp = 0, Msdp = 0, Nsds = 0, Msds = 0, dMNsdp = 0,                // THRESHOLD PRUNING
dMNsds = 0,                                                        if ( corData[N+1] < pp->TSa || sCor < pp->TSb ||     \
mCor = 1, sCor = 1, dCor = 1, pCor = 1, vCor = 1,                      pCor < pp->TP || dCor < pp->TD || vCor < pp->TV || \
stdCor = 1, dvCorp = 1, dvCors = 1, vdCorp = 1,                        stdCor < pp->TSD || dvCorp < pp->Tdvp ||       \
vdCors = 1;                                                            dvCors < pp->Tdvs || vdCorp < pp->Tvdp || vdCors < pp->Tvds )
else { auto pair<double,unsigned long long int> entry(mCor,N);
corRm.insert(entry);
auto double smU=0, smV=0, UU=0, VV=0, UV=0, max=0,
Vsdp=0, Usdp=0, Vsds=0, Usds=0, dUVsdp=0, dUVsds=0,        Prune.C3
}                                                                             mCor=1, sCor=1, dCor=1, pCor=1, vCor=1, stdCor=1,
}                                                                                   dvCorp=1, dvCors=1, vdCorp=1, vdCors=1,
if ( corRm.size() == 0 ) return;                                                    NU, NV, dm, Ub, Vb, dsSq, OnePDS;
load_ptree(supU, corRm, pp->Ct);                                              auto PTree csUV = supU & Users.get_movies(V);
return;                                                                       if( csUV.get_count() < 1 ) continue;
}                                                                                   /* user PRUNE (VN loops) NLOOP start */
/* movie_vote: FastPrune */                                                         auto vector<unsigned long long int> ilp;
static void fmPruneS(PTree & supU, struct pruning_parameters *pp)                   load_support_vector(csUV, ilp, pp->mstrt, pp->mstrt_mult);
{                                                                                   for (unsigned long long int lp1= 0; lp1 < ilp.size(); ++lp1) {
if ( supU.get_count() < pp->Ct + 1 ) return;                                        auto unsigned long long int N = ilp[lp1];
map_t corRm;                                                        #if 0
auto vector<unsigned long long int> support;                                        if ( outside_window(N, pp) ) continue;
#endif
/* moviePRUNE (NV loops) NLOOP start */                                           NU = Movies.get_rating(U, N) - 2;
load_support_vector(supU, support, pp->mstrt, pp->mstrt_mult);                    NV = Movies.get_rating(V, N) - 2;
for (unsigned long long int lp= 0; lp < support.size(); lp++) {                   if ( pow(NU-NV,2) > max ) max=pow(NU-NV, 2);
auto unsigned long long int N = support[lp];                                 smU += NU; smV += NV;
#if 0                                                                                     UU += NU*NU; VV += NV*NV; UV += NU*NV;
if ( outside_window(N, pp) ) continue;                                  } //user PRUNE (VN loops) NLOOP end
#endif                                                                              dm = csUV.get_count();
if( corData[N+1] < pp->TSa ) continue;                                  Ub = smU/dm; Vb = smV/dm;
auto pair<double, unsigned long long int> \                             dsSq = VV - 2*UV + UU;
entry(corData[N+1], N); corRm.insert(entry);                      OnePDS = dsSq - dm*pow(Vb-Ub,2);
}
if ( corRm.size() == 0 ) return;                                sCor=(UV-dm*Ub*Vb)/((pow((UU-dm*pow(Ub,2)),.5))*(pow((VV-dm*pow(Vb,2)),.5)));
load_ptree(supU, corRm, pp->Ct);                                       dCor = exp(-dsSq/100);
return;                                                                if (max>0) pCor=exp(-pp->PPm*OnePDS/(pow(max,.75)*pow(dm,.5)));
}                                                                              if(dm>0){ Vsdp=pow(dm*VV-smV*smV,.5)/dm;
//userPRUNE (VN loops) start                                                          Usdp=pow(dm*UU-smU*smU,.5)/dm;
/* user_vote: Prune */                                                                dUVsdp=pow(dm*dsSq-(smV-smU)*(smV-smU),.5)/dm;}
static void uPrune (unsigned long long int U, PTree & supM, PTree & supU, \    if(dm>1){ Vsds=pow((VV-dm*Vb*Vb)/(dm-1),.5);
struct pruning_parameters *pp)                                         Usds=pow((UU-dm*Ub*Ub)/(dm-1),.5);
{                                                                                     dUVsds=pow((dsSq-dm*(Vb-Ub)*(Vb-Ub))/(dm-1),.5);}
if ( supM.get_count() < pp->Ct + 1) return;                              dvCorp=exp(-10 * (Vsdp-Usdp) * (Vsdp-Usdp) );
map_t corR;                                                              dvCors=exp(-10 * (Vsds-Usds) * (Vsds-Usds) );
auto vector<unsigned long long int> support;                             vdCorp=exp(-10 * dUVsdp * dUVsdp );
vdCors=exp(-10 * dUVsds * dUVsds );
/* userPrune (VN loops) VLOOP start */                                   if( pp->Ch == 1 ) mCor = sCor; if( pp->Ch == 2 ) mCor = sCor;
load_support_vector(supM, support, pp->ustrt, pp->ustrt_mult);           if( pp->Ch == 3 ) mCor = dCor; if( pp->Ch == 4 ) mCor = pCor;
for (unsigned long long int lp= 0; lp < support.size(); lp++) {          if( pp->Ch == 5 ) mCor = vCor; if( pp->Ch == 6 ) mCor = stdCor;
auto unsigned long long int V = support[lp];                       if( pp->Ch == 7 ) mCor = dvCorp; if( pp->Ch == 8 ) mCor = dvCors;
if ( outside_window(V, pp) ) continue;                             if( pp->Ch == 9 ) mCor = vdCorp; if( pp->Ch == 0) mCor = vdCors;
// THRESHOLD PRUNE
if ( sCor < pp->TSb || pCor < pp->TP || dCor < pp->TD ||
auto double dm = csUV.get_count(),
Ub = smU / dm, Vb = smV / dm,                                Prune.C4
SCor=(UV-dm*Ub*Vb)/(.00001+(pow((UU-dm*pow(Ub,2)),.5))*
vCor < pp->TV || stdCor < pp->TSD || dvCorp < pp->Tdvp||                                      (pow((VV-dm*pow(Vb,2)),.5)));
dvCors < pp->Tdvs|| vdCorp < pp->Tvdp|| vdCors < pp->Tvds)        if( SCor < pp->TSb ) continue;
continue;                                                         auto pair<double,unsigned long long int> entry(SCor,V);
else { auto pair<double,unsigned long long int> entry(mCor,V);         corR.insert(entry);
corR.insert(entry);                                               }
}                                                                      if ( corR.size() == 0 ) return;
if ( corR.size() == 0 ) return;                                              return; }
return; }                                                                /* user_vote: CommonCoSupportPrune */
/* user_vote: FastPrune */                                                     static void uPrune2(PTree & supM, PTree & supU, struct pruning_parameters *pp)
static void fuPruneS(unsigned long long int U, PTree & supM, PTree & supU, \   {     if ( supM.get_count() < pp->Ct+1) return;
struct pruning_parameters *pp)                                        map_t corR;
{                                                                                    auto PTree csUV;
if ( supM.get_count() < (pp->Ct + 1) ) return;                                 auto vector<unsigned long long int> support;
map_t corR;
auto vector<unsigned long long int> support;                                    /* CommonCoSup userPRUNE VN loops VLOOP start */
load_support_vector(supM, support, pp->ustrt, pp->ustrt_mult);                for (unsigned long long int lp= 0; lp < support.size(); lp++) {
auto unsigned long long int V = support[lp];
for (unsigned long long int lp= 0; lp < support.size(); lp++) {
auto unsigned long long int V = support[lp];                                 if ( outside_window(V, pp) ) continue;
if ( outside_window(V, pp) ) continue;
auto PTree csUV = supU & Users.get_movies(V);                                csUV = supU & Users.get_movies(V);
if ( csUV.get_count() < 1 ) continue;                                        auto double dm = csUV.get_count();
auto double smU = 0, smV = 0,                                                auto pair<double, unsigned long long int> entry(dm, V);
UU = 0, VV = 0, UV = 0, NU,          NV;                            corR.insert(entry);
/* fast user Prune (VN loops) NLOOP start */                             }
auto vector<unsigned long long int> ilp;
load_support_vector(csUV, ilp, pp->mstrt, pp->mstrt_mult);               auto unsigned int select_count = (unsigned int) pp->Ct;
auto PTree ccsU = supU;
for(unsigned long long int lp1= 0; lp1 < ilp.size(); ++lp1) {             map_t::iterator begin = corR.begin();
auto unsigned long long int N = ilp[lp1];                            supM.clearall();
#if 0                                                                                 if ( corR.size() < pp->Ct ) select_count = corR.size();
if ( outside_window(N, pp) ) continue;
#endif                                                                                for(unsigned int lp= 0; lp < select_count; ++lp) {
NU = Movies.get_rating(U, N) - 2;                                         supM.setbit(begin->second);
NV = Movies.get_rating(V, N) - 2;                                         ccsU = ccsU & Users.get_movies(begin->second);
smU += NU; smV += NV;                                                     ++begin; }
UU += NU * NU; VV += NV * NV; UV += NU * NV;                         supU = ccsU; return;
}                                                                  }
/* movie_voting: CommonCoSupportPrune */
static void mPrune2(PTree & supM, PTree & supU, struct pruning_parameters *pp)                                           Prune.C5
{
if ( supU.get_count() < (pp->Ct + 1) ) return;
map_t corRm;
auto PTree csMN;
auto vector<unsigned long long int> support;

/* moviePRUNE NV loops NLOOP start */
auto struct pruning_parameters *params = &prune->params;
switch ( prune->method ) {
for (unsigned long long int lp= 0; lp < support.size(); lp++) {
case UserPrune:
auto unsigned long long int N = support[lp];
uPrune(U, supM, supU, params);
if ( outside_window(N, pp) ) continue;
break;
csMN = supM & Movies.get_users(N);
case UserFastPrune:
auto double dm = csMN.get_count();
fuPruneS(U, supM, supU, params);
auto pair<double, unsigned long long int> entry(dm, N);
break;
corRm.insert(entry);
case UserCommonCoSupportPrune:
}
uPrune2(supM, supU, params);
break;
auto unsigned int select_count = (unsigned int) pp->Ct;
auto PTree ccsM = supM;
case MoviePrune:
map_t ::iterator begin = corRm.begin();
mPrune(M, supM, supU, params);
supU.clearall();
break;
if ( corRm.size() < select_count ) select_count = corRm.size();
case MovieFastPrune:
for(unsigned int lp= 0; lp < select_count; ++lp) {
fmPruneS(supU, params);
supU.setbit(begin->second);
break;
ccsM = ccsM & Movies.get_users(begin->second); ++begin;
case MovieCommonCoSupportPrune:
}
mPrune2(supM, supU, params);
supM = ccsM;
break;
return;
}
}
/* Internal function.
return;
* This function dispatches execution to the pruning method which has
}
* been selected for an external pruning routine.
* \param pcfg         A pointer to the structure defining the
*              external pruning to be conducted.
* \param M           The movie whose rating is to be predicted.
* \param U           The user who the predication is to be made for.
* \param supM          A PTree describing the movie support.
* \param supU         A PTree describing user support. */
void do_pruning(struct external_prune * const prune, unsigned long int M, \
unsigned long int U, PTree & supM, PTree & supU)
{
run script for processing movie_predict files into                # Remove any old output files and make sure we have a fresh backup directory.
1 movie_prediction file (and also 1 .rmse and 1 .out log file).   rm -f \$Output \$Logfile;
if [ -d "\$Backup" ]; then
cd Output
echo "Error: Backup directory present.";
../mpp-glue1 ../\$1
exit 1;
cd ..
fi;
mpp-rmse1 ./\$1
mkdir \$Backup;
mpp-glue script                                                  # Loop over prediction input file and generate outputs.
cat \$Inputfile | while read input;
#! /bin/bash                                                     do
# This utility 'glues' a set of .predict files for a given run         if [ "\$input" != "\${input%%:}" ]; then
# of mpp-mpred into a single file. This program is driven                    Movie=\${input%%:};
# by the input file used for the prediction run. When it finds               Predictions="\$Name"_\$Movie.predict;
# a movie (delimited by a trailing :) ALL entries in files,                  Log="\$Name"_\$Movie.log;
# InputFileName_movieID.predict, in the current directory                    if [ ! -e "\$Predictions" ]; then
# are printed to a file, InputFileName.txt.prediction.                              echo "Error: Prediction file not found - " \
# The utility takes as the single argument, InputFileName                                 ">\$Predictions<";
# used for the prediction run                                                       exit 1;
fi;
# Verify input file is found.                                                echo "Processing: \$Movie";
if [ -z "\$1" ]; then                                                         cat \$Predictions >>\$Output;
echo "Error: Input file not specified.";                  #            [ -e "\$Log" ] && cat \$Log >>\$Logfile;
exit 1;                                                               rm \$Predictions;
if [ ! -e "\$1" ]; then                                           # with following commented out, it seem to eliminate backing up.
echo "Error: Input file not found - >\$Input<";            # mv \$Predictions \$Backup;
exit 1;
fi;                                                              # if [ \$? -ne 0 ]; then echo "Error: Unable to create predictions backup.";
# exit 1; fi;
# Variables global to this module.
# if [ -e "\$Log" ]; then mv \$Log \$Backup;
declare -r Name=`basename \$1`;
# if [ \$? -ne 0 ]; then echo "Error: Unable to create logs backup.";
declare -r Output="\$Name.predictions" Logfile="\$Name.logfile";
# exit 1; fi; fi;
declare -r Backup="\$Name.backup";                                                                                                          Puts as output (in current dir)
fi;
declare Inputfile=\$1;                                                                                                                      InputFileName.txt.predictions
done;
declare Movie;
# All done.
declare Predictions Log;
echo -e "\nInputfile: \$Inputfile";          Takes all
declare Current_Dir;
echo -e "\tPredictions:\t\$Output";          InputFileName_movieID1.predict
# Main body of the program occurs here.
echo -e "\tLogfile:\t\$Logfile";             …
# If a directory named Output is present assume
echo -e "\tBackups:\t\$Backup";              InputFileName_movieIDn.predict
# we should use that directory.
echo -e "\nLine count verifications:";      in current directory as input
if [ -d "./Output" ]; then
echo -e "\t\$(wc -l \$Inputfile)";             (deleted after processing)
Current_Dir=`pwd`;
echo -e "\t\$(wc -l \$Output)";
Inputfile="../\$Inputfile";
[ -n "\$Current_Dir" ] && cd ..;
cd Output;                                                                                                                            mpp-glue
exit 0
fi;
mpp-rmse1 script                                                        mpp-rmse1.pl
\$answers = \$ARGV[0]; \$predictions = \$ARGV[1];
#! /bin/bash
\$lp = 0; \$cnt           = 0; \$error      = 0; \$error_sum = 0;
# This utility generates an RMSE report based on predictions carried
\$total_error = 0; \$total_cnt = 0; \$last_movie = "";
# out on the 'probe' dataset. It compares a prediction list against
# the set of known files.
foreach(@answers) { if ( /:\$/ ) { if ( \$last_movie ne "" )               {
# This program is driven by the input file used for the prediction
printf "\n\tSum: %.5f\tTotal: %-5d\tRMSE: %f\n\n",
# run. The majority of the comparative work and generation of the
\$error_sum, \$cnt, sqrt(\$error_sum/\$cnt);
# RMSE values is done by the PERL script called from this script.
printf "\tRunning RMSE: %f / %d predictions\n\n",
# The PERL script reads both the prediction file
sqrt(\$total_error/\$total_cnt), \$total_cnt;
# (Output/InputFileName.txt.prediction) and the list of known answers
\$error_sum = 0; \$cnt = 0;                              }
# (InputFileName.txt.answers in the current directory).
\$last_movie = \$_; print "Movie: \$_\n";
# When a movie is found it verifies the movie is
if ( \$_ ne \$predictions[\$lp] )                        {
# also present in the companion file. This is to insure there are
print "Movies don't match\n";
# no discrepancies between the two files.
print "\t\$_ vs. \$predictions[\$lp]\n"; exit 1; }
# The utility takes as a single argument the name of the input file
++\$lp; next;                                                 }
# used for the prediction run.
# Correct for an NAN
# Verify input file is found.
if ( \$predictions[\$lp] eq "nan" ) { print "NAN";
if [ -z "\$1" ]; then
\$predictions[\$lp] = "3.70"; }
echo "Error: Input file not specified.";
if ( \$predictions[\$lp] eq "corm-nan" ) {
exit 1;
print "CORM-NAN";
fi;
\$predictions[\$lp] = "3.70";         }
if [ ! -e "\$1" ]; then
echo "Error: Input file not found - >\$Input<";                    \$error = (\$_ - \$predictions[\$lp])**2;
exit 1;                                                                \$error_sum += \$error; \$total_error += \$error;
fi;                                                                           ++\$total_cnt; ++\$cnt;
# Variables global to this module.                                            printf "\t%4d:\tAnswer: %2d\tPrediction: \$predictions[\$lp]\tError: %.5f\n",
declare -r Startdir=`dirname \$0`;                                                   \$cnt -1, \$_, \$error; ++\$lp; }
declare -r Basename=`basename \$1`;                                      # Print the RMSE from the last movie.
declare -r Predictions="Output/\$Basename.predictions";                        sqrt(\$error_sum/\$cnt);
# Then the total RMSE for the run.
if [ ! -e "\$Answers" ]; then                                            print "Prediction summary:\n";
exit 1;                                                                sqrt(\$total_error/\$total_cnt);
fi;                                                                     exit 0;                                                             Puts as output (in current dir)
if [ ! -e "\$Predictions" ]; then                                                                                                            InputFileName.txt.rmse
exit 1;                                                           Output/InputFileName.txt.predictions and
# Main body of the program occurs here.                                  from current directory as input
| tee "\$Basename.rmse";
mpp-rmse
exit 0
mpp-user-reduce script                                                          if [ -z "\$2" ]; then
echo "\$Pgm: Error - RMSE threshold not specified.";
exit 1;
#! /bin/bash                                                                    fi;
# Variables global to this module.                                              if [ -z "\$3" ]; then
declare -r Pgm=`basename \$0`;                                                          echo "\$Pgm: Error - Low output filename not specified.";
declare Mode="both";                                                                   exit 1;
# This utility reduces a set of movies to be predicted by outputting            fi;
# movies which have an RMSE value greater than a specified threshold.           if [ -z "\$4" ]; then
# This program is driven by the input file used for the prediction                     echo "\$Pgm: Error - High output filename not specified.";
# run. The majority of the comparative work and generation of the                      exit 1;
fi;
# RMSE values is done by the PERL script called from this script.               # Variables global to this module which are dependent on command-line options.
# If the first argument to the utility is a -m the next argument                declare -r Input=\$1;
# is interpreted as a mode value. The following arguments are accepted:         declare -r Startdir=`dirname \$0`;
#      low: Output only low RMSE pairings.                                      declare -r Basename=`basename \$1`;
#      both: Output both files.                                                 declare -r Predictions="Output/\$Basename.predictions";
# The default is for both files to be output.                                   declare -r Threshold=\$2;
if [ "\$1" = "-m" ]; then                                                        declare -r LowOut=\$3;
declare -r HighOut=\$4;
case \$2 in                                                                if [ ! -e "\$Answers" ]; then
high) Mode="high";;                                                        exit 1;
both) Mode="both";;                                                 fi;
*)    echo -e "\$Pgm: Unknown argument to mode switch, \c";          if [ ! -e "\$Predictions" ]; then
echo "specify low, high or both.";                                    echo "\$Pgm - Predictions file not found: >\$Predictions<.";
exit 1;;                                                              exit 1;
esac;                                                                     fi;
# Main body of the program occurs here.
shift 2;                                                                  perl -w \$Startdir/mpp-user-reduce.pl \$Input \$Answers \$Predictions \$Threshold \
fi;                                                                                    \$LowOut \$HighOut \$Mode;
# The utility takes four general argumns as follows:                            exit 0
#
#      \$1: Inputfile
#      \$2: RMSE threshold value.
#      \$3: Root name of output file for movies below threshold.
#      \$4: Root name of output file for movies above threshold.           mpp-user-reduce -m both Data/probe19.txt .0001 lo19 hi19           Takes input,
# Verify input file is found.                                             Data/probe19.txt (movieID with interleaved userIDs format or .txt format)
if [ -z "\$1" -o ! -e "\$1" ]; then                                         SqErrThrhld      (if SqErr ≤ .0001, put pair in lo19.txt, else put in hi19.txt)
echo "\$Pgm: Error - Input file not specified.";                    -m both means both lo and hi will be produced (other options: low or high)
echo
echo "Command format:"
echo -e "\t\$Pgm [-m low|high|both] Inputfile Threshold \c";
echo -e "LowOutFile HighOutfile";                                                                                                  Puts as output
exit 1;                                                               mpp-user-reduce                                              lo-FileName
fi;
–m both|low|high                                             hi-FileName
InputFile.txt
SqErrThrhd
mpp-user-reduce.pl                                                                   # Main program starts here.
# Load input, answers and predictions into arrays which are stored in
\$Input    = \$ARGV[0];                                                                # hashes keyed by movie number.
open(INPUT, \$Input) || die "Cannot open input: \$Input";
while ( <INPUT> ) {chomp; if (/:\$/) {\$key = \$_; \$Input{\$key}=[];}
\$Predictions = \$ARGV[2];                                                                  else { push(@{\$Input{\$key}}, \$_); } }
\$Threshold = \$ARGV[3];                                                               close(INPUT);
\$HighOut = \$ARGV[5];                                                                 while ( <INPUT> ) {chomp; if (/:\$/) {\$key = \$_; \$Answers{\$key}=[];}
\$Mode       = \$ARGV[6];                                                                   else { push(@{\$Answers{\$key}}, \$_); } }
\$Low_Count = 0;                                                                      close(INPUT);
\$High_Count = 0;                                                                     open(INPUT, \$Predictions)
|| die "Cannot open predictions file: \$Predictions";
while ( <INPUT> ) { chomp;
# Subroutine outputs pairing results for a given collection of user/movie ratings;
if ( /:\$/ ) { \$key = \$_; \$Predictions{\$key} = []; }
sub Output_Pairing                                                                        else { push(@{\$Predictions{\$key}}, \$_); }         }
{ my(\$file, \$rmse_ptr) = @_;                                                         close(INPUT);
# Open input and answer files.                                                       my \$lp; my \$error;
\$inputfile = \$file . ".txt";                                                         \$movie = \$_;
print "\t\tInput: \$inputfile\n";                                                     @users = @{\$Input{\$movie}};
@pred = @{\$Predictions{\$movie}};
|| die "Cannot open new inputfile: \$inputfile";
for (\$lp= 0; \$lp <= \$#ans; ++\$lp) {
|| die "Cannot open new answer file: \$answerfile.";                                   if (\$pred[\$lp] eq "nan"){print "NAN"; \$predict="3.70";}
if (\$pred[\$lp] eq "corm-nan"){print "CORM-NAN";\$predict="3.70";}
# The outer loop runs over the movies in a grouping. The inner                             \$error = (\$ans[\$lp] - \$predict)**2;
# loop then runs over the set of inputs and answers for that movie.                        if ( \$error > \$Threshold ) {
\$HighRMSE{\$movie} = [] if !defined(\$HighRMSE{\$movie});
foreach ( keys(%{\$rmse_ptr}) ) {
push(@{\$HighRMSE{\$movie}},"\$user \$ans[\$lp]");++\$High_Count;}
print NEWINPUT "\$_\n";                                                                else { \$LowRMSE{\$movie} = [] if !defined(\$LowRMSE{\$movie});
}    }
foreach ( @{\$\$rmse_ptr{\$_}} ) {                                            # Output new input and predictions files based on the reduced set.
(\$user, \$answer) = split;                                             print "Selected movie/user pairings based on RMSE = \$Threshold:\n";
print NEWINPUT "\$user\n";                                             if ( (\$Mode eq "low") or (\$Mode eq "both") ) {
}                                                                                Output_Pairing(\$LowOut, \%LowRMSE);
print "\n"; }
}
if ( (\$Mode eq "high") or (\$Mode eq "both") ) {
close(NEWINPUT);                                                                      print "\tHigh rmse pairs: ", \$High_Count, "\n";
return;                                                                         # All done.
}                                                                                    exit 0;
mpp-filter script (for unioning (-M or), intersecting (-M and)                    mpp-filter.pl
clusters (to check coverage, etc.)
#! /bin/bash This is a driver program for implementing a utility for ANDing or
# ORing two input files.                                                         # This script implements boolean filtering operations between two input
# Variables global to this module.                                               # files. The results of the filtering operation are output on stdout.
declare -r Pgm=`basename \$0`;                                                    # Two merge modes are supported:
declare Mode="";                                                                 # AND: A user index is output if it exists for a given movie in both input files.
# Parse arguements.                                                              # OR: A movie/user pair is output if it exists in either input file.
while getopts "M:" Arg;                                                          \$Mode = \$ARGV[0];
do                                                                               \$Input1 = \$ARGV[1];
case \$Arg in                                                                \$Input2 = \$ARGV[2];
M)   Mode=\$OPTARG;;                                                   # The following subroutine loads a file into an associative array. The
esac;                                                                       # filename to be read is passed to the subroutine as the first arguement.
done;                                                                            # A reference to the associative array is passed as the second arguement.
# If the filename cannot be opened an error exit is taken from the applic.
if [ -z "\$Mode" ]; then                                                          {       my \$key,
echo "\$Pgm: No mode specified.";                                                  \$file = \$_[0], \$hptr = \$_[1];
exit 1;                                                                         open(IN, \$file) || die "Cannot open file: \$file";
fi;                                                                                    while ( \$_ = <IN> ) {
if [ "\$Mode" != "and" -a "\$Mode" != "or" ]; then                                             chomp;
echo "\$Pgm: Invalid mode specifed - \$Mode";                                           if ( /:\$/ ) {
exit 1;                                                                                      \$key = \$_;
fi;                                                                                                 \$\$hptr{\$key} = [];
# Verify two filenames are present.                                                          }
shift `expr \$OPTIND - 1`;                                                                    else {
if [ \$# -ne 2 ]; then                                                                               push(@{\$\$hptr{\$key}}, \$_);
echo "\$Pgm: Insufficient filenames specified.";                                       }
exit ;                                                                          }
fi;                                                                                    close(IN);
# Call Perl to carry out the boolean filtering operation.                              return;
exec perl \$Pgm.pl \$Mode \$*;                                                      }
# Subroutine outputs a file which has been stored in hashed/array format.
sub Output_File
{
foreach ( keys(%{\$_[0]}) ) {
print "\$_\n";

my @hlist = @{\$_[0]{\$_}};
foreach ( @hlist ) {
print "\$_\n";
}
}
drwxr-xr-x
75 perrizo faculty
3 perrizo faculty
2.3M Feb 2 13:13 Output
4.0K Jan 8 13:28 p19
Directories              drwxr-xr-x
drwxr-xr-x
5 perrizo faculty
5 perrizo faculty
4.0K Jan 8 13:34 p95
4.0K Jan 31 10:51 pf
\$ ls -l
-rwxr-xr-x   1 perrizo faculty 259 Nov 29 11:22 cluster-corr           -rw-r--r--        1 perrizo faculty   22K Nov 29 11:22 PredictionConfig.C
-rw-r--r--   1perrizo faculty 1.2K Nov 29 11:22 cluster-corr.pl        -rw-r--r--        1 perrizo faculty   5.3K Nov 29 11:22 PredictionConfig.H
-rwxr-xr-x   1 perrizo faculty 7.7K Feb 1 12:26 config                 -rw-r--r--        1 perrizo faculty   22K Nov 29 11:25 PredictionConfig.o
-rw-r--r--   1 perrizo faculty 14K Nov 29 11:22 config.c               -rw-r--r--        1 perrizo faculty   19K Feb 2 12:38 prune.C
-rw-r--r--   1 perrizo faculty 1.4K Nov 29 11:22 config.h              -rw-r--r--        1 perrizo faculty   29K Feb 2 12:38 prune.o
-rw-r--r--   1 perrizo faculty 5.6K Nov 29 11:25 config.o              -rw-r--r--        1 perrizo faculty   1.2K Nov 29 11:22 read-user-ptrees.C
-rw-r--r--   1 perrizo faculty 38K Nov 29 11:25 config-parser.c        -rwxr-xr-x        1 perrizo faculty   146 Nov 29 13:59 run
-rw-r--r--   1 perrizo faculty 806 Nov 29 11:22 config-parser.l        -rwxr-xr-x        1 perrizo faculty   74K Dec 16 20:47 show-config
-rw-r--r--   1 perrizo faculty 15K Nov 29 11:25 config-parser.o        -rw-r--r--        1 perrizo faculty   454 Nov 29 11:22 show-config.C
-rw-r--r--   1 perrizo faculty 2.4K Nov 29 11:22 cosupport.C           -rw-r--r--        1 perrizo faculty   2.7K Nov 29 11:25 show-config.o
drwxr-xr-x   2 perrizo faculty 12K Feb 2 12:40 Data                    -rw-r--r--        1 perrizo faculty   6.4K Nov 29 11:22 UserSet.C
drwxr-xr-x   2 perrizo faculty 4.0K Nov 29 11:25 libPTree              -rw-r--r--        1 perrizo faculty   1.5K Nov 29 11:22 UserSet.H
-rw-r--r--   1 perrizo faculty 4.1K Nov 29 11:22 Makefile              -rw-r--r--        1 perrizo faculty   7.2K Nov 29 11:25 UserSet.o
-rwxr-xr-x   1 perrizo faculty 16K Nov 29 11:25 movie-corr             -rw-r--r--        1 perrizo faculty   17K Jan 19 06:57 user-vote.C
-rw-r--r--   1 perrizo faculty 1.3K Nov 29 11:22 movie-corr.C          -rw-r--r--        1 perrizo faculty   9.3K Jan 19 07:07 user-vote.o
-rw-r--r--   1 perrizo faculty 2.3K Nov 29 11:22 MovieCorrelation.C                 \$ ls -l Data
-rw-r--r--   1 perrizo faculty 1.4K Nov 29 11:22 MovieCorrelation.H                 -rw-r--r-- 1 perrizo faculty 67 Dec 18 01:32               p1.txt
-rw-r--r--   1 perrizo faculty 9.6K Nov 29 11:25 MovieCorrelation.o                 -rw-r--r-- 1 perrizo faculty 23 Dec 18 01:32               p1.txt.answers
-rw-r--r--   1 perrizo faculty 3.3K Nov 29 11:25 movie-corr.o                       -rw-r--r-- 1 perrizo faculty 533K Dec 18 01:33             probe-1000.txt
-rw-r--r--   1 perrizo faculty 1.5K Nov 29 11:22 movie-rating.C                     -rw-r--r-- 1 perrizo faculty 146K Dec 18 01:33             probe-1000.txt.answers
-rw-r--r--   1 perrizo faculty 1.5K Nov 29 11:22 movie-set.C                        -rw-r--r-- 1 perrizo faculty 1.9K Dec 18 01:32             probe19.txt
-rw-r--r--   1 perrizo faculty 2.0K Nov 29 11:22 MovieSet.C                         -rw-r--r-- 1 perrizo faculty 611 Dec 18 01:32              probe19.txt.answers
-rw-r--r--   1 perrizo faculty 1.1K Nov 29 11:22 MovieSet.H                         -rw-r--r-- 1 perrizo faculty 23K Dec 18 01:32              probe95.txt
-rw-r--r--   1 perrizo faculty 4.2K Nov 29 11:25 MovieSet.o                         -rw-r--r-- 1 perrizo faculty 6.4K Dec 18 01:32             probe95.txt.answers
-rw-r--r-- 1 perrizo faculty 594K Dec 18 01:32             test-probe-1000.txt
-rw-r--r--   1 perrizo faculty 14K Jan 19 07:07 movie-vote.C                        -rw-r--r-- 1 perrizo faculty 162K Dec 18 01:32             test-probe-1000.txt.answers
-rw-r--r--   1 perrizo faculty 9.7K Jan 19 07:07 movie-vote.o                       -rw-r--r-- 1 perrizo faculty 51K Dec 18 01:32              test-probe-100.txt
-rwxr-xr-x   1 perrizo faculty 303 Nov 29 11:22 mpp                                 -rw-r--r-- 1 perrizo faculty 14K Dec 18 01:32              test-probe-100.txt.answers
-rwxr-xr-x   1 perrizo faculty 1.3K Nov 29 11:22 mpp-cluster-list
-rw-r--r--   1 perrizo faculty 2.5K Nov 29 11:22 mpp-cluster-list.pl                 \$ ls -l libPTree
-rw-r--r--   1 perrizo faculty 1.7K Nov 29 11:22 mppConfig.C                         -rw-r--r-- 1 perrizo faculty 18672 Nov 29 11:25            libPTree.a
-rw-r--r--   1 perrizo faculty 1.1K Nov 29 11:22 mppConfig.H                         -rw-r--r-- 1 perrizo faculty 3192 Nov 29 11:22             Makefile
-rw-r--r-- 1 perrizo faculty 15813 Nov 29 11:22            PTree.C
-rw-r--r--   1 perrizo faculty 2.9K Nov 29 11:25 mppConfig.o                         -rw-r--r-- 1 perrizo faculty 2973 Nov 29 11:22             PTree.H
-rwxr-xr-x   1 perrizo faculty 745 Dec 5 11:32 mpp-filter                            -rw-r--r-- 1 perrizo faculty 11096 Nov 29 11:25            PTree.o
-rw-r--r--   1 perrizo faculty 3.0K Dec 5 11:32 mpp-filter.pl                        -rw-r--r-- 1 perrizo faculty 18135 Nov 29 11:22            PTree-omp.C
-rwxr-xr-x   1 perrizo faculty 2.3K Nov 29 11:22 mpp-glue                            -rw-r--r-- 1 perrizo faculty 3796 Nov 29 11:22             ptree-op-test.C
-rw-r--r--   1 perrizo faculty 591 Nov 29 11:22 mpp.h                                -rw-r--r-- 1 perrizo faculty 488 Nov 29 11:22              ptree-read.C
-rwxr-xr-x   1 perrizo faculty 101K Feb 2 12:38 mpp-mpred                            -rw-r--r-- 1 perrizo faculty 779 Nov 29 11:22              ptree-save.C
-rw-r--r--   1 perrizo faculty 13K Nov 29 11:22 mpp-mpred.C                          -rw-r--r-- 1 perrizo faculty 7485 Nov 29 11:22             PTreeSet.C
-rw-r--r--   1 perrizo faculty 29K Nov 29 11:25 mpp-mpred.o                          -rw-r--r-- 1 perrizo faculty 1179 Nov 29 11:22             PTreeSet.H
-rwxr-xr-x   1 perrizo faculty 1.4K Nov 29 11:22 mpp-rmse                            -rw-r--r-- 1 perrizo faculty 6464 Nov 29 11:25             PTreeSet.o
-rw-r--r-- 1 perrizo faculty 2265 Nov 29 11:22             ptreeset-read.C
-rw-r--r--   1 perrizo faculty 1.5K Nov 29 11:22 mpp-rmse.pl                         -rw-r--r-- 1 perrizo faculty 420 Nov 29 11:22              ptree-test.C
-rw-r--r--   1 perrizo faculty 6.9K Jan 19 06:36 mpp-user.C                          -rw-r--r-- 1 perrizo faculty 16127 Nov 29 11:22            PTree-x86_64.C
-rwxr-xr-x   1 perrizo faculty 1.3K Nov 29 11:22 mpp-user-cluster                    -rw-r--r-- 1 perrizo faculty 16127 Nov 29 11:22            PTree-x86.C
-rw-r--r--   1 perrizo faculty 3.8K Nov 29 11:22 mpp-user-cluster.pl
-rw-r--r--   1 perrizo faculty 11K Jan 19 07:07 mpp-user.o                           \$ ls -l Output ...
-rwxr-xr-x   1 perrizo faculty 2.5K Jan 21 17:38 mpp-user-reduce                     -rw-r--r-- 1 perrizo faculty 32157 Feb 2 13:25 probe-full.txt_9939.predict ...
-rw-r--r--   1 perrizo faculty 3.2K Jan 21 17:38 mpp-user-reduce.pl                  drwxr-xr-x 2 perrizo faculty 901120 Jan 20 06:23 probe-full.txt.backup
-rw-r--r-- 1 perrizo faculty 7059980 Jan 20 06:23 probe-full.txt.predictions
OPT = -O2 \${VECTOR}
Makefile                                                                 ifeq (\${ARCH}, x86_64)
OPT += -msse2
VERSION = 2.6.0                                                           endif
# Default directory where PTree data is stored.                           C_DEBUG = -g -pg
# Overriden below depending on architecture.                              LD_DEBUG = -g -pg
PTREEDATA = /tmp                                                  endif
ifeq (\${COMPILER}, pgroup)
# Set compiler behavior based on architecture.                            CC = pgcc
ARCH := \$(shell uname -m | sed -e s/i686/x86/)                            C++ = pgCC
ifeq (\${ARCH}, x86_64)                                                    OPT = -fast -Minline=levels:10
COMPILER = gcc                                                    C_DEBUG = -g -Minfo #-pg
# COMPILER = gcc4                                                 LD_DEBUG = -g -tp core2-64 #-pg
PTREEDATA = /scratch/perrizo                              endif
endif
ifeq (\${COMPILER}, intel)
ifeq (\${ARCH}, ia64)                                                      CC = icpc
# COMPILER = intel                                                C++ = icpc
COMPILER = gcc4                                                   OPT = -O2
endif                                                                     C_DEBUG = -g -p
LD_DEBUG = -g -p
ifeq (\${ARCH}, x86)                                               endif
COMPILER = gcc                                    endif
endif                                                     INCLUDES = -I./libPTree
CFLAGS = \${OPT} \${WARNINGS} \${INCLUDES}
ifndef (\${COMPILER},)                                     ifdef DEBUG
ifeq (\${COMPILER}, gcc4)                                  CFLAGS += \${C_DEBUG}
CC = /opt/gcc4/bin/gcc                    endif
C++ = /opt/gcc4/bin/g++                   ifdef DEBUG
LDFLAGS += \${LD_DEBUG}
# WARNINGS = -W -Wall -Wchar-subscripts -Wshadow \        endif
-Wpointer-arith -Wwrite-strings -Wmissing-prototypes      OBJS = mpp-mpred.o mpp-user.o mppConfig.o PredictionConfig.o      \
# VECTOR = -ftree-vectorize -ftree-vectorizer-verbose=5    MovieCorrelation.o UserSet.o MovieSet.o movie-vote.o user-vote.o \
prune.o config.o config-parser.o
OPT = -O2 \${VECTOR}                       LIB = ./libPTree/libPTree.a
ifeq (\${ARCH}, x86_64)                    LIBS = -lfl -L ./libPTree -lPTree
OPT += -msse2
endif                                     # Executable target definitions.
all: mpp-mpred show-config movie-corr
C_DEBUG = -g -pg                          mpp-mpred: \${OBJS} \${LIB}
LD_DEBUG = -g -pg                                 \${C++} \${LDFLAGS} -o \$@ \$^ \${LIBS};
endif                                                     cosupport: cosupport.o UserSet.o MovieSet.o \${LIB}
\${C++} \${LDFLAGS} -o \$@ \$^ \${LIBS};
ifeq (\${COMPILER}, gcc)                           tools: movie-rating movie-set
CC = gcc                                  movie-rating: movie-rating.o UserSet.o MovieSet.o \${LIB}
C++ = g++                                         \${C++} \${LDFLAGS} -o \$@ \$^ \${LIBS}
# WARNINGS = -W -Wall -Wchar-subscripts -Wshadow \        movie-set: movie-set.o UserSet.o \${LIB}
-Wpointer-arith -Wwrite-strings -Wmissing-prototypes              \${C++} \${LDFLAGS} -o \$@ \$^ \${LIBS};
# VECTOR = -ftree-vectorize -ftree-vectorizer-verbose=5   movie-corr: movie-corr.o MovieCorrelation.o \${LIB}
if ( argv[1] == NULL ) {
cosupport.C                                                            fputs("Need V specified.\n", stderr);
return 1;
/**                                                                }
* This file contains a driver program to                          auto unsigned long int U = 421582,
* determine the rating given to                                                          M = 0,
* a movie by a user.                                                                     V = strtoul(argv[1], NULL, 10);
*/                                                                auto PTree M_support = Movies.get_users(M);
auto PTree Voters(M_support);
Voters.clearbit(U);
/* Standard include files. */                                      unsigned long long int *voters = Voters.get_indexes();
#include <unistd.h>
#include <stdio.h>                                                 fputs("Voter list:\n", stdout);
#include <string.h>                                                for (size_t voter= 0; voter < Voters.get_count(); ++voter)
#include <math.h>                                                          fprintf(stdout, "%zu: %llu\n", voter, voters[voter]);
fputc('\n', stdout);
auto PTree cosupport;
/* Local include files. */                                         fputs("Voter Map:\n", stdout);
#include "UserSet.H"                                               Voters.dump(stdout);
#include "MovieSet.H"
fputs("U Map:\n", stdout);
(Users.get_movies(U)).dump(stdout);
extern int main(int argc, char *argv[])                            fputs("V Map:\n", stdout);
(Users.get_movies(V)).dump(stdout);
{                                                                  cosupport = Users.get_movies(U) & Users.get_movies(V);
auto MovieSet Movies;                                      fputs("Cosupport Map:\n", stdout);
cosupport.dump(stdout);
auto UserSet Users;                                        cosupport.clearbit(M);

fprintf(stdout, "Cosupport, M= %lu, U = %lu, V = %lu\n", M,U,V);
Vbar = Users.get_mean(V, cosupport),
Vrt = Users.get_rating(V, M);
return 1;
}
auto double vote = Vrt - Vbar + Ubar;
auto unsigned long long int *movies = cosupport.get_indexes();
for (unsigned long int movie= 0; movie < cosupport.get_count(); \
++movie)
return 1;
fprintf(stdout, "\t\t\t\t%lu [%lu]:\tU = %0.2f, V = %0.2f\n",\
}
Movies.get_identity(movies[movie]), movies[movie],   \
Movies.get_rating(U, movies[movie]),                   \
Movies.get_rating(V, movies[movie]));
fprintf(stdout"\t\t\t%.2f\t[Vrt: %.2f Vbar: %.2f Ubar: %.2f]\n",
return 1;
vote, Vrt, Vbar, Ubar);
}
return 0;
}
movie-corr.C                                                       mpp
/** \file This file implements a program for                        #! /bin/bash
* printing movie-movie correlations.*/                             if [ "\$1" != "-i" ]; then
echo "No input file specified.";
/* Standard include files. */
exit 1;
#include <stdio.h>                                                  fi;
#include <stdlib.h>
#include <unistd.h>                                                 shift;
inputfile="\$1";
/* Local include files. */
#include "MovieCorrelation.H"                                       run_name=`basename \$inputfile`;
rm -f \$run_name.out;
/* Program entry point. */                                          ./mpp-mpred -i \$inputfile \$* >"\$run_name.out" 2>&1 &
extern int main(int argc, char *argv[])
while [ ! -e "\$run_name.out" ];
{     auto bool dump = false;                                       do
auto int gopt;                                                         sleep 1s;
auto unsigned int target=0, movie=0;                           done;
auto MovieCorrelation mvcorr;                                  tail -f "\$run_name.out";
while ((gopt=getopt(argc,argv,"dm:t:"))!=EOF){                 exit;
switch ( gopt ) {
case 'd': dump = true; break;
case 'm': movie = atoi(optarg); break;
case 't': target = atoi(optarg); break;             mpp.h
}                                                        /** \file
}                                                               * This file contains general definitions and
* defines for the PTree
if ( movie == 0 ) {
* based Netflix prediction system.
fputs("movie-corr: No movie specified.\n", stderr);
*/
return 1; }
return 1; }
/* External variable declarations. */
extern UserSet Users;
/* Dump movies and correlations. */
extern MovieSet Movies;
if ( dump ) {
fprintf(stdout, "Correlations for movie: %u\n", movie);
for (unsigned int lp= 0; lp < MOVIE_COUNT; ++lp)
/* Function declarations. */
fprintf(stdout, "\t%5u: %7.4f / %d\n", lp + 1, \
extern void do_pruning(struct external_prune * const prune,
mvcorr.supp(lp), mvcorr.corr(lp)); return 0; }
unsigned long int M, unsigned long int U, \
/* Print correlation of target movie. */
PTree & supM, PTree & supU);
if ( target > 0 ) {
fprintf(stdout,"%-7.4f\n",mvcorr.corr(target-1));return 0;}
double user_vote(PredictionConfig *,
return 0;}
unsigned long int, PTree &, unsigned long int, PTree &);

double movie_vote(PredictionConfig *, unsigned long int,
PTree &, unsigned long int, PTree &);
/*Public method.
MovieCorrelation.C                                         * Implements loading of correlation and support vector for given movie.
* \param index          The index number of the movie to be loaded.
/** \file                                                   * \return    A boolean value is used to indicate the success
* This file contains the implementation of a class        *    or failure of the load. A true value indicates success.*/
* which encapsulates management of correlation info     bool MovieCorrelation::load(unsigned long int index)
* for a particular movie to all other movies.           {
*/                                                               auto char snbufr[10];
auto string root       = PTREEDATA"/mpred-data/",
corr_path = root + "mv_corr/co_mv_",
/* System include files. */                                                    supp_path = root + "mv_supp/sp_mv_";
#include <stdlib.h>                                                auto ifstream corr_file,
supp_file;
/* Standard C++ includes. */                                       /* Sanity check for movie index size. */
#include <string>                                                  if ( index > (MOVIE_COUNT + 1) )
#include <iostream>                                                        return false;
#include <fstream>                                                 movie_index = index;
/* Synthesize the filename of the correlations file and read it. */
/* Local include files. */                                         snprintf(snbufr, sizeof(snbufr), "%lu", movie_index);
#include "MovieCorrelation.H"                                      string sn(snbufr);
string corr_fname = corr_path + sn + ".bin";
using namespace std;                                               corr_file.open(corr_fname.c_str());
if ( corr_file.fail() ) {
corr_file.close();
MovieCorrelation::MovieCorrelation(void)                                   return false;
}
movie_index = 0;                                                         (MOVIE_COUNT + 1)*sizeof(float));
if ( corr_file.fail() ) {
/* Initialize correlation and support count. */                          corr_file.close();
for (unsigned int lp= 0; lp <= MOVIE_COUNT + 1; ++lp) {                  return false;
support[lp]      = 0;                             }
correlations[lp] = 0.0;                           corr_file.close();
}                                                        /* Synthesize the filename of the support file and read it. */
string supp_fname = supp_path + sn + ".bin";
return;                                                   supp_file.open(supp_fname.c_str());
}                                                                  if ( supp_file.fail() ) {
supp_file.close();
return false;
/**                                                                }
*/                                                                                     (MOVIE_COUNT + 1)*sizeof(short int));
if ( supp_file.fail() ) {
MovieCorrelation::~MovieCorrelation(void)                                  supp_file.close();
return false;
{                                                                  }
return;                                                            supp_file.close();
}                                                                  return true;
}
/*
MovieCorrelation.H                                                        * Inline accessor methods for returning movie supports and
* correlations.
#if !defined(MOVIECORRELATION_H)                                           */
#define MOVIECORRELATION_H                                               float inline corr(unsigned int index) {
if ( index > (MOVIE_COUNT + 1) )
/* Total number of movies. */                                                              return 0;
#define MOVIE_COUNT 17770                                                         return correlations[index + 1];
}
unsigned short int inline supp(unsigned int index) {
/* Standard include files. */                                                     if ( index > (MOVIE_COUNT + 1) )
#include <stdio.h>                                                                         return 0;
return support[index + 1];
/* Local include files. */                                               }
/* Public method for loading the correlation vector for a movie. */
class MovieCorrelation                                           };
#endif
{
private:
/* The index number of the movie whose correlations are loaded. */
unsigned long int movie_index;

/*
* The following array contains the list of correlations for
* a movie to all the other movies. The array is one based
* so a value of one needs to be added to the movie index
* number to retrieve the correlation.
*/
float correlations[MOVIE_COUNT + 1];

/*
* The following array contains the support list for the
* correlations vector. The vector is one based as is the
* correlations vector.
*/
unsigned short int support[MOVIE_COUNT + 1];

public:
/* Void constructor. */
MovieCorrelation(void);

/* Destructor. */
~MovieCorrelation(void);
MovieSet.C                                                       MovieSet.H
/* System include files. */
#include <limits.h>                                               #if !defined(MOVIESET_H)
/* Local include files. */                                        #define MOVIESET_H
#include "MovieSet.H"
/* Variables static to this module. */                            /* Standard include files. */
/* No arguement constructor.*/                                    #include <stdio.h>
MovieSet::MovieSet(void)                                          #include <math.h>
: ptree_set()                                            /* Local include files. */
{return;}
/* Destructor.*/                                                  #include "PTreeSet.H"
MovieSet::~MovieSet(void)                                         class MovieSet
{return;}                                                         {
/* Public method calculates rating user provided for movie.       private:
* \param user_index    The identity number of the user.
* \param movie         The identity number of the movie.                PTreeSet ptree_set;
* \return The rating number is returned to the caller.*/        public:
double MovieSet::get_rating(unsigned long int user_index, \               /* Void constructor. */
unsigned long int movie_index)               MovieSet(void);
{
auto double rating = 0;
auto size_t slot = movie_index * 3;                               /* Constructor to initialize an in-memory tree. */
/* Destructor. */
for (int tree= 2, bit= 0; tree >= 0; --tree, ++bit) {              ~MovieSet(void);
if ( ptree_set[slot + tree].is_set(user_index))
rating += pow(2.0, bit);
}                                                                  /* Public inline method to return identity of movie index*/
return rating;                                                     unsigned long int get_identity(unsigned long int offset) {
}                                                                                  return offset + 1;
/* Public method returns PTree describing
* set of users who rated movie*/                                         }

PTree MovieSet::get_users(unsigned long int index)                         /* Public inline method to return index of movie identity*/
{                                                                          unsigned long int get_index(unsigned long int identity) {
auto size_t slot = index * 3;
return ptree_set[slot] | ptree_set[slot+1] | ptree_set[slot+2];                    return identity - 1;
}                                                                          }
/* Public method \param output descriptor- PTree's to be directed*/
/* Public method to return rating of movie by user. */
void MovieSet::dump(FILE *output)
{ for (int lp= 0; lp < ptree_set.size(); ++lp)                             double get_rating(unsigned long int, unsigned long int);
ptree_set[lp].dump(output); return;}
/* Public method loads binary PTree set which has as its                   /* Public method to return set of users rating movie. */
* X-axis user indexes with movie rating PTree's on Y-axis.*/              PTree get_users(unsigned long int);
{
auto char bufr[PATH_MAX];                                          /* Public method to print sparseness of set. */
auto FILE *input;                                                  void dump(FILE *);
for (int pt= 22; pt <= 53331; ++pt) {
snprintf(bufr, sizeof(bufr), \
"%s/mpred-data/nf_us_mv_pt/p%d.pct", PTREEDATA, pt);             /* Public method to load a binary PTree set. */
if ( (input = fopen(bufr, "r")) == NULL )                  bool load_binary(void);
return false;                             };
return false;
fclose(input);
} return true;}
mppConfig.C                                                    /* Public method causes object to be initialized as standard single
* file configuration.
/** \file contains implentation of class which encapsulates      * \param cfgfile Pointer to buffer containing the name of the
* info needed to configure prediction run. Purpose of           *                      standard configuration file.
* class is to abstract out diff between single config           * \return       If initialization of configuration is successful
* run and a run based on a cluster of configurations. */        *               a boolean true value is returned. Otherwise a
*               false value is returned.*/
/* System include files. */                                    bool mppConfig::read_cluster_config(const char * const cfgfile)
/* Local include files. */                                     {
#include "mppConfig.H"                                                  return false;
}
/* No arguement constructor. */
mppConfig::mppConfig(void)

mppConfig.H
{
standard_config = false;
standard = NULL;
cluster_config = false;                               #if !defined(MPPCONFIG_H)
return;                                               #define MPPCONFIG_H
}                                                             /* Standard include files. */
#include <stdio.h>
/* Destructor. */                                             /* Local include files. */
mppConfig::~mppConfig(void)                                   #include "PredictionConfig.H"
{                                                             class mppConfig
if ( standard != NULL )                               {
delete standard;                              private:
return;                                                       bool standard_config,
}                                                                          cluster_config;
PredictionConfig *standard;
/*   Public method causes the object to be initialized        public:
*   as a standard single file configuration.                         /* Void constructor. */
*   \param cfgfile       ptr to buffer containing name of            mppConfig(void);
*                        standard configuration file.
*   \return      If init of configuration is successful              /* Destructor. */
*           a boolean true value is returned. Otherwise a            ~mppConfig(void);
*                        false value is returned. */         /* Public inline accessor methods to determine if a standard
* or cluster configuration is being used. */
bool mppConfig::read_config(const char * const cfgfile)       inline bool is_standard_config(void) {return standard_config;}
{                                                             inline bool is_cluster_config(void) {return cluster_config;}
standard = new PredictionConfig;                     /* Public inline accesor method for the standard configuration. */
if ( standard == NULL )                              inline PredictionConfig*get_standard_config(void){return standard;}
return false;                                /* Public method to read a configuration file. */
return false;                                /* Public method to read a cluster configuration file. */
standard_config = true;                                     /* Public method to print out a configuration. */
return true;                                         void print(FILE *);
}                                                             };
#endif
/* No arguement constructor. */
PredictionConfig.C                                                     PredictionConfig::PredictionConfig(void) {
/* Initialize general prediction parameters. */
/* \file File contains implementation of class which encapsulates              name = NULL;
* info which regulates how Movie/User pair predictions are made.*/           user_voting = false;
/* System include files. */                                                    movie_voting = false;
#include <stdlib.h>                                                            user_vote_weight = 1;
#include <string.h>                                                            /* Initialize user voting parameters. */
/* Local include files. */                                                     user_force_vote_in_Voter_Loop     = false;
#include "PredictionConfig.H"                                                  user_force_vote_after_Voter_Loop = false;
extern "C" {#include "config.h"}                                               user_reset_support = false;
/* Internal private function.                                                  user_boundary_override = false;
* This function initializes an internal pruning structure.                   user_facz = 0.0;
* \param p     A pointer to the structure to be initialized. */              user_thrz = 1.0;
static void _init_internal_prune(struct pruning *p)                            _init_internal_prune(&dvCorp);
{                                                                              _init_internal_prune(&dvCors);
p->enabled   = false;                                                 _init_internal_prune(&vdCorp);
p->weight    = false;                                                 _init_internal_prune(&vdCors);
p->threshold = 0.0;                                                   _init_internal_prune(&pCor);
p->exponent = 1.0;                                                    _init_internal_prune(&dCor);
return;                                                               _init_internal_prune(&sCor);
}                                                                              _init_internal_prune(&dUVsdp);
/* Internal private function.                                                  _init_internal_prune(&dUVsds);
* This function initializes a structure defining external pruning.           _init_internal_prune(&Vsdp_Usdp);
* \param p     A pointer to the structure to be initialized. */              _init_internal_prune(&Vsds_Usds);
_init_external_prune(&Prune_Users_in_SupM);
static void _init_external_prune(struct external_prune *p)                     _init_external_prune(&Prune_Movies_in_SupU);
{                                                                              _init_external_prune(&Prune_Movies_in_CoSupUV);
p->enabled = false;                                                    /* Initialize movie voting parameters. */
p->method = UserPrune;                                                 movie_force_vote_in_Voter_Loop       = false;
p->params.mstrt      = 0;                                              movie_force_vote_outside_Voter_Loop = false;
p->params.mstrt_mult = 0.0;                                            movie_boundary_override = false;
p->params.ustrt      = 0;                                              movie_facz = 0.0;
p->params.ustrt_mult = 0.0;                                            movie_thrz = 1.0;
p->params.TSa = -100;                                                  _init_internal_prune(&DVCorp);
p->params.TSb = -100;                                                  _init_internal_prune(&DVCors);
p->params.Tdvp = -1;                                                   _init_internal_prune(&VDCorp);
p->params.Tdvs = -1;                                                   _init_internal_prune(&VDCors);
p->params.Tvdp = -1;                                                   _init_internal_prune(&PCor);
p->params.Tvds = -1;                                                   _init_internal_prune(&DCor);
p->params.TD = -1;                                                     _init_internal_prune(&SCor);
p->params.TP = -1;                                                     _init_internal_prune(&dMNsdp);
p->params.PPm = 0.1;                                                   _init_internal_prune(&dMNsds);
p->params.TV = -1;                                                     _init_internal_prune(&Nsdp_Msdp);
p->params.TSD = -1;                                                    _init_internal_prune(&Nsds_Msds);
p->params.Ch = 1;                                                      _init_external_prune(&Movie_Prune_Users_in_SupM);
p->params.Ct = 2;                                                      _init_external_prune(&Movie_Prune_Movies_in_SupU);
return;                                                                _init_external_prune(&Movie_Prune_Users_in_CoSupMN);
}                                                                              return; }
/*   Internal private function.
PredictionConfig.C                     page 2                 *
*
initializes configuration structure for an internal pruning method.
\param cf            The configuration which is being used.
*   \param sp            A pointer to the external pruning definition
/* Destructor. */                                              *                        structure which is to be initialized.
PredictionConfig::~PredictionConfig(void)                      *   \param name          The name of the external pruning method. */
{ if ( name != NULL ) free(name); return; }
void _set_external_prune(Config cf, struct external_prune *sp, \
/* Internal private fctn determines if config enabled.                                 const char *name)
* \param cf Ptr to configto be tested for the option.       {
* \param var Ptr to name of variable to be tested.                  auto char *val;
* \return    Boolean value returned to indicated whether            auto struct pruning_parameters *pp = &sp->params;
* configuration option has been enabled. True value                 if ( !Config_Set_Section(cf, name) ) return;
* indicates variable is enabled else false returned. */             val = Config_Get(cf, "method");
static bool _is_enabled(Config cf, const char * const var)            if ( strcmp(val, "UserPrune")==0) sp->method=UserPrune;
{                                                                     if ( strcmp(val,"UserFastPrune")==0) sp->method=UserFastPrune;
auto char *p;                                                if ( strcmp(val, "UserCommonCoSupportPrune")==0)
p = Config_Get(cf, var);                                                       sp->method=UserCommonCoSupportPrune;
if(p==NULL) return false;                                 if(strcmp(val,"MoviePrune")==0) sp->method=MoviePrune;
if (strcmp(p,"enabled")==0) return true;                    if(strcmp(val,"MovieFastPrune")==0)sp->method=MovieFastPrune;
return false;                                               if(strcmp(val,"MovieCommonCoSupportPrune")==0)
}                                                                            sp->method = MovieCommonCoSupportPrune;
/* Internal private function.
* initializes config struct for internal pruning method.           /* Set the external pruning parameters. */
* \param cf   The configuration which is being used.               val = Config_Get(cf, "mstrt");
* \param sp   Pointer to the structure to be initialized.          if ( val != NULL ) pp->mstrt = atoll(val);
* \param name Name of the internal pruning method.                 val = Config_Get(cf, "mstrt_mult");
* \param threshold Name of variable containing thresh.             if ( val != NULL ) pp->mstrt_mult = atof(val);
* \param wt Name of variable specifying whether                    val = Config_Get(cf, "ustrt");
* method should be used to set the value of uCor.*/                if ( val != NULL ) pp->ustrt = atoll(val);
void _set_internal_prune(Config cf,struct pruning *sp,               val = Config_Get(cf, "ustrt_mult");
const char *name,const char *threshold,const char *weight)           if ( val != NULL ) pp->ustrt_mult = atof(val);
{                                                                    val = Config_Get(cf, "TSa");
auto char *val;                                             if ( val != NULL ) pp->TSa = atof(val);
sp->enabled = _is_enabled(cf, name);                        val = Config_Get(cf, "TSb");
if ( !sp->enabled ) return;                                 if ( val != NULL ) pp->TSb = atof(val);
val = Config_Get(cf, threshold);                            val = Config_Get(cf, "Tdvp");
if ( val != NULL ) sp->threshold = atof(val);               if ( val != NULL ) pp->Tdvp = atof(val);
sp->weight = _is_enabled(cf, weight); return;               val = Config_Get(cf, "Tdvs");
}                                                                    if ( val != NULL ) pp->Tdvs = atof(val);
/* Internal private function.                                         val = Config_Get(cf, "Tvdp");
* Function initializes config structure for standard                 if ( val != NULL ) pp->Tvdp = atof(val);
* deviation based pruning method.                                    val = Config_Get(cf, "Tvds");
* \param cf       Configuration which is being used.                 if ( val != NULL ) pp->Tvds = atof(val);
* \param sp       Ptr to structure to be initialized.                val = Config_Get(cf, "TD");
* \param name       name of the internal pruning method.             if ( val != NULL ) pp->TD = atof(val);
* \param threshold name of variable containing threshold val.        val = Config_Get(cf, "TP");
* \param exponent Name of variable specifying the exponent           if ( val != NULL ) pp->TP = atof(val);
*                   which should be used for the GAUSSIAN            val = Config_Get(cf, "PPm");
*             method should be used to set value of uCor.*/          if ( val != NULL ) pp->PPm = atof(val);
val = Config_Get(cf, "TV");
void _set_stddev_prune(Config cf,struct pruning *sp,                  if ( val != NULL ) pp->TV = atof(val);
const char *name,const char *threshold,const char *exponent){         val = Config_Get(cf, "TSD");
auto char *val;                                              if ( val != NULL ) pp->TSD = atof(val);
sp->enabled = _is_enabled(cf, name);                         val = Config_Get(cf, "Ch");
if(!sp->enabled)return;                                      if ( val != NULL ) pp->Ch = atof(val);
val=Config_Get(cf, threshold);                               val = Config_Get(cf, "Ct");
if(val!=NULL)sp->threshold=atof(val);                        if ( val != NULL ) pp->Ct = atof(val);
val=Config_Get(cf,exponent);                                 return;
if(val!=NULL)sp->exponent=atof(val); return; }      }
/* Open and parse the configuration file. */
PredictionConfig.C                          page 3                     cf = Config_Init();
if(cf == NULL ) return false;
if(Config_Parse(cf,file)<0){Config_Destroy(cf); return false;}
/*   Public method used for paramers to be associated with
*   internal pruning type.                                            /* Set general prediction parameters. */
*   \param   Enumerated type describing internal pruning return       if (!Config_Set_Section(cf,"Default"))
*            for which parameter information is to be obtained.               {Config_Destroy(cf);return false;}
*   \return Ptr to structure describing how the internal              val = Config_Get(cf, "name");
*                pruning method is to be implemented. */              if ( val != NULL ) name = strdup(val);
user_voting = _is_enabled(cf, "user_voting");
struct pruning *PredictionConfig::get_internal_prune                   movie_voting = _is_enabled(cf, "movie_voting");
(enum internal_pruning pr)                              val = Config_Get(cf, "user_vote_weight");
{                                                                      if ( val != NULL ) user_vote_weight = atof(val);
switch ( pr ) {
case user_dvCorp: return &dvCorp;                      /* Process user voting parameters. */
case user_dvCors: return &dvCors;                      if ( user_voting && Config_Set_Section(cf, "user_voting") ) {
case user_vdCorp: return &vdCorp;                      user_force_vote_in_Voter_Loop = _is_enabled(cf, \
case user_vdCors: return &vdCors;                                      "force_vote_in_Voter_Loop");
case user_pCor: return &pCor;                          user_force_vote_after_Voter_Loop = _is_enabled(cf, \
case user_dCor: return &dCor;                                          "force_vote_after_Voter_Loop");
case user_sCor: return &sCor;                          user_reset_support = _is_enabled(cf, "reset_support");
user_boundary_override = _is_enabled(cf, "boundary_override");
case    movie_DVCorp: return &DVCorp;                 if ( user_boundary_override ) {
case    movie_DVCors: return &DVCors;                         val = Config_Get(cf, "facz");
case    movie_VDCorp: return &VDCorp;                         if ( val != NULL ) user_facz = atof(val);
case    movie_VDCors: return &VDCors;                         val = Config_Get(cf, "thrz");
case    movie_PCor: return &PCor;                             if ( val != NULL ) user_thrz = atof(val);
case    movie_DCor: return &DCor;                             }
case    movie_SCor: return &SCor;                    /* Process user voting parameters. */
if(user_voting && Config_Set_Section(cf,"user_voting")){
/* Standard deviation types */                        user_force_vote_in_Voter_Loop = _is_enabled(cf, \
case user_dUVsdp: return &dUVsdp;                                     "force_vote_in_Voter_Loop");
case user_dUVsds: return &dUVsds;                     user_force_vote_after_Voter_Loop = _is_enabled(cf, \
case user_Vsdp_Usdp: return &Vsdp_Usdp;                               "force_vote_after_Voter_Loop");
case user_Vsds_Usds: return &Vsds_Usds;               user_reset_support = _is_enabled(cf, "reset_support");
user_boundary_override=_is_enabled(cf,"boundary_override");
case    movie_dMNsdp: return &dMNsdp;                 if ( user_boundary_override ) {
case    movie_dMNsds: return &dMNsds;                         val = Config_Get(cf, "facz");
case    movie_Nsdp_Msdp: return &Nsdp_Msdp;                   if ( val != NULL ) user_facz = atof(val);
case    movie_Nsds_Msds: return &Nsds_Msds;                   val = Config_Get(cf, "thrz");
}                                                                    if ( val != NULL ) user_thrz = atof(val);
return NULL;                                                         }
}

/*   Public method. parses configuration file and translates the ASCII
*   key/value pairs into appropriate configuration variables.
*   \param file          A character pointer to the file name containing
*                        the configuration to be read.
*   \return              A boolean value is returned to indicate whether
*                        or not the read of the configuration file was
*                        successful. A true value indicates success while
*                        failure is indicated by a false value. */
bool PredictionConfig::read_config(const char * const file)
{
auto char *val;
auto Config cf;
Movie_Prune_Users_in_SupM.enabled=_is_enabled(cf,
PredictionConfig.C page 4                                                    "Prune_Users_in_SupM");
Movie_Prune_Movies_in_SupU.enabled=_is_enabled(cf,
_set_internal_prune(cf,&dvCorp,"dvCorp","dvThrp","dvCorpWeight");           "Prune_Movies_in_SupU");
_set_internal_prune(cf,&dvCors,"dvCors","dvThrs","dvCorsWeight");         Movie_Prune_Users_in_CoSupMN.enabled=_is_enabled(cf,
_set_internal_prune(cf,&vdCorp,"vdCorp","vdThrp","vdCorpWeight");            "Prune_Users_in_CoSupMN");
_set_internal_prune(cf,&vdCors,"vdCors","vdThrs","vdCorsWeight");                if ( Movie_Prune_Users_in_SupM.enabled )
_set_internal_prune(cf,&pCor,"pCor","pThr","pCorWeight");                           _set_external_prune(cf,&Movie_Prune_Users_in_SupM,
_set_internal_prune(cf,&dCor,"dCor","dThr","dCorWeight");                           "movie_voting Prune_Users_in_SupM");
_set_internal_prune(cf,&sCor,"sCor","sThr","sCorWeight");                        if ( Movie_Prune_Movies_in_SupU.enabled )
_set_external_prune(cf,&Movie_Prune_Movies_in_SupU,
_set_stddev_prune(cf,&dUVsdp,"dUVsdp","dUVsdpThr","dUVsdpExp");                    "movie_voting Prune_Movies_in_SupU");
_set_stddev_prune(cf,&dUVsds,"dUVsds","dUVsdsThr","dUVsdsExp");                  if ( Movie_Prune_Users_in_CoSupMN.enabled )
_set_stddev_prune(cf,&Vsdp_Usdp,"Vsdp_Usdp","Vsdp_UsdpThr",                        _set_external_prune(cf, \
"Vsdp_UsdpExp");                                                             &Movie_Prune_Users_in_CoSupMN, \
"movie_voting Prune_Users_in_CoSupMN");
_set_stddev_prune(cf,&Vsds_Usds,"Vsds_Usds","Vsds_UsdsThr",                    }
"Vsds_UsdsExp");                                                         Config_Destroy(cf);
Prune_Movies_in_SupU.enabled=_is_enabled(cf,                                   return true;
"Prune_Movies_in_SupU");                                                 }
Prune_Users_in_SupM.enabled=_is_enabled(cf,                  fputs("\t\t\tPruning method: ", output);
"Prune_Users_in_SupM");                                switch ( sp->method ) {
case UserPrune: fputs("UserPrune\n", output); break;
Prune_Movies_in_CoSupUV.enabled=_is_enabled(cf,                  case UserFastPrune: fputs("UserFastPrune\n", output); break;
"Prune_Movies_in_CoSupUV");                                case UserCommonCoSupportPrune: fputs("UserCommonCoSupportPrune\n",
if(Prune_Movies_in_SupU.enabled)_set_external_prune(cf,                output);break;
&Prune_Movies_in_SupU,"user_voting Prune_Movies_in_SupU");       case MoviePrune: fputs("MoviePrune\n", output); break;
case MovieFastPrune: fputs("MovieFastPrune\n", output); break;
if(Prune_Users_in_SupM.enabled)_set_external_prune(cf,            case MovieCommonCoSupportPrune:fputs("MovieCommonCoSupportPrune\n",
&Prune_Users_in_SupM,"user_voting Prune_Users_in_SupM");              output);break; }
if(Prune_Movies_in_CoSupUV.enabled)_set_external_prune(cf,
&Prune_Movies_in_CoSupUV,"user_voting Prune_Movies_in_CoSupUV");}
/* Process movie voting configuration. */
if ( movie_voting && Config_Set_Section(cf,                  fprintf(output,"\t\t\t\tmstrt: %-llu\tmultiplier: %-7.2f\n",
"movie_voting")){                                                  pp->mstrt,pp->mstrt_mult);
movie_force_vote_in_Voter_Loop=_is_enabled(cf,                   fprintf(output,"\t\t\t\tustrt: %-llu\tmultiplier: %-7.2f\n",
pp->ustrt,pp->ustrt_mult);
"force_vote_in_Voter_Loop");                          fprintf(output,"\t\t\t\tTSa: %-7.2f\tTSb: %-7.2f\n", pp->TSa,
movie_force_vote_outside_Voter_Loop=_is_enabled                        pp->TSb);
(cf, "force_vote_outside_Voter_Loop");                         fprintf(output,"\t\t\t\tTdvp: %-7.2f\tTdvs: %-7.2f\n",
movie_reset_support = _is_enabled(cf,"reset_support");                    pp->Tdvp,pp->Tdvs);
movie_boundary_override = _is_enabled(cf,"boundary_override");         fprintf(output,"\t\t\t\tTvdp: %-7.2f\tTvds: %-7.2f\n",
if(movie_boundary_override) { val=Config_Get(cf, "facz");              pp->Tvdp,pp->Tvds);
if(val!=NULL)movie_facz=atof(val); val = Config_Get(cf, "thrz");       fprintf(output,"\t\t\t\tTD: %-7.2f\tTP: %-7.2f\n",
if ( val != NULL ) movie_thrz = atof(val); }                              pp->TD, pp->TP);
_set_internal_prune(cf,&DVCorp,"DVCorp","DVThrp","DVCorpWeight");         fprintf(output,"\t\t\t\tPPm: %-7.2f\n", pp->PPm);
_set_internal_prune(cf,&DVCors,"DVCors","DVThrs","DVCorsWeight");         fprintf(output,"\t\t\t\tTV: %-7.2f\tTSD: %-7.2f\n",
_set_internal_prune(cf,&VDCorp,"VDCorp","VDThrp","VDCorpWeight");         pp->TV, pp->TSD);
_set_internal_prune(cf,&VDCors,"VDCors","VDThrs","VDCorsWeight");         fprintf(output,"\t\t\t\tCh: %-7.2f\tCt: %-7.2f\n",
_set_internal_prune(cf,&PCor, "PCor", "PThr", "PCorWeight");              pp->Ch, pp->Ct);
_set_internal_prune(cf,&DCor, "DCor", "DThr", "DCorWeight");         return;
_set_internal_prune(cf,&SCor, "SCor", "SThr", "SCorWeight");         }
_set_stddev_prune(cf, &dMNsdp, "dMNsdp", "dMNsdpThr","dMNsdpExp");
_set_stddev_prune(cf, &dMNsds, "dMNsds", "dMNsdsThr","dMNsdsExp");
_set_stddev_prune(cf,&Nsdp_Msdp,"Nsdp_Msdp","Nsdp_MsdpThr",
"Nsdp_MsdpExp");
_set_stddev_prune(cf,&Nsds_Msds,"Nsds_Msds","Nsds_MsdsThr",
"Nsds_MsdsExp");
PredictionConfig.C                page 5                              if ( movie_boundary_override ) {
fputs("is enabled:\n", output);
fprintf(output, "\t\t\tfacz: %-7.2f\tthrz: %-7.2f\n",
movie_facz, movie_thrz);
/* Public method prints interpretation of configuration.                      }
* \param output     file descriptor to be used for output.*/                else
void PredictionConfig::print(FILE *output)                                       fputs("not enabled.\n", output);
{
if ( name == NULL ) fputc('\n', output);                                        _print_internal_prune(&DVCorp, "DVCorp", output);
else fprintf(output, "\tName: %s\n\n", name);                                   _print_internal_prune(&DVCors, "DVCors", output);
if ( user_voting ) { fputs("\tUser voting enabled.\n", output);                 _print_internal_prune(&VDCorp, "vdCorp", output);
fprintf(output, "\t\tUser vote weight: %-7.2f\n", \                    _print_internal_prune(&VDCors, "vdCors", output);
user_vote_weight);                                             _print_internal_prune(&PCor,   "pCor",   output);
fputs("\t\tForce vote in voter loop will be ", output);                _print_internal_prune(&DCor,   "dCor",   output);
if(user_force_vote_in_Voter_Loop ) fputs("enabled.\n",output);         _print_internal_prune(&SCor,   "sCor",   output);
else fputs("disabled.\n", output);
fputs("\t\tForce vote after voter loop will be ", output);             _print_stddev_prune(&dMNsdp, "dMNsdp", output);
if(user_force_vote_after_Voter_Loop) fputs("enabled.\n",output);       _print_stddev_prune(&dMNsds, "dMNsds", output);
else fputs("disabled.\n", output);                                     _print_stddev_prune(&Nsdp_Msdp,"Nsdp_Msdp",output);
fputs("\t\tUser support ", output);                                    _print_stddev_prune(&Nsds_Msds,"Nsds_Msds",output);
if ( user_reset_support ) fputs("will be reset.\n", output);
else fputs("will not be reset.\n", output);                            _print_external_prune(&Movie_Prune_Users_in_SupM,
fputs("\t\tBoundary override ", output);                                                 "Prune_Users_in_SupM", output);
if ( user_boundary_override ) { fputs("is enabled:\n", output);        _print_external_prune(&Movie_Prune_Movies_in_SupU,
fprintf(output, "\t\t\tfacz: %-7.2f\tthrz: %-7.2f\n", \                                             "Prune_Movies_in_SupU", output);
user_facz, user_thrz);                                          _print_external_prune(&Movie_Prune_Users_in_CoSupMN,
}                                                                                          "Prune_Users_in_CoSupMN", output);
else fputs("not enabled.\n", output);                                    fputc('\n', output);
_print_internal_prune(&dvCorp, "dvCorp", output);                    }
_print_internal_prune(&dvCors, "dvCors", output);
_print_internal_prune(&vdCorp, "vdCorp", output);                  return;
_print_internal_prune(&vdCors, "vdCors", output);               }
_print_internal_prune(&pCor,    "pCor",   output);
_print_internal_prune(&dCor,    "dCor",   output);
_print_internal_prune(&sCor,    "sCor",   output);
_print_stddev_prune(&dUVsdp, "dUVsdp", output);
_print_stddev_prune(&dUVsds, "dUVsds", output);
_print_stddev_prune(&Vsdp_Usdp,"Vsdp_Usdp",output);
_print_stddev_prune(&Vsds_Usds,"Vsds_Usds", output);
_print_external_prune(&Prune_Movies_in_SupU, \
"Prune_Movies_in_SupU", output);
_print_external_prune(&Prune_Users_in_SupM, \
"Prune_Users_in_SupM", output);
_print_external_prune(&Prune_Movies_in_CoSupUV, \
"Prune_Movies_in_CoSupUV", output);
fputc('\n', output);
}
if ( movie_voting ) {
fputs("\tMovie voting enabled.\n", output);
fprintf(output, "\t\tMovie vote weight: %-7.2f\n", \
1.0 - user_vote_weight);
fputs("\t\tForce vote in voter loop will be ", output);
if ( movie_force_vote_in_Voter_Loop )
fputs("enabled.\n", output);
else fputs("disabled.\n", output);
fputs("\t\tForce vote outside voter loop will be ",output);
if ( movie_force_vote_outside_Voter_Loop )
fputs("enabled.\n", output);
else fputs("disabled.\n", output);
fputs("\t\tMovie support ", output);
PredictionConfig.H                                             /* Standard deviation pruning. */
struct pruning dUVsdp, dUVsds, Vsdp_Usdp, Vsds_Usds;
struct external_prune Prune_Movies_in_SupU,
#if !defined(PREDICTIONCONFIG_H)                                      Prune_Users_in_SupM, Prune_Movies_in_CoSupUV;
#define PREDICTIONCONFIG_H                                     /* Movie voting parameters. */
/* Standard include files. */                                  bool movie_force_vote_in_Voter_Loop,
#include <stdio.h>                                                  movie_force_vote_outside_Voter_Loop;
/* Local include files. */                                     bool movie_reset_support;
/* Enumeration types describing various                        bool movie_boundary_override;
* methods of internal pruning. */                             double movie_facz, movie_thrz;
enum internal_pruning {                                        /* Internal vote pruning. */
user_dvCorp, user_dvCors, user_vdCorp,                         struct pruning DVCorp, DVCors, VDCorp, VDCors,
user_vdCors, user_pCor, user_dCor, user_sCor,                         PCor, DCor, SCor;
/* Standard deviation pruning. */
movie_DVCorp, movie_DVCors, movie_VDCorp,                      struct pruning dMNsdp, dMNsds, Nsdp_Msdp, Nsds_Msds;
movie_VDCors, movie_PCor, movie_DCor, movie_SCor,              struct external_prune Movie_Prune_Users_in_SupM,
Movie_Prune_Movies_in_SupU,Movie_Prune_Users_in_CoSupMN;
/* Standard deviation types. */                               public:
user_dUVsdp, user_dUVsds, user_Vsdp_Usdp, user_Vsds_Usds,      /* Void constructor. */
movie_dMNsdp,movie_dMNsds,movie_Nsdp_Msdp,movie_Nsds_Msds};    PredictionConfig(void);
/* Destructor. */
/* following structure def used to generically describe        ~PredictionConfig(void);
* internal or standard deviation pruning methods. */          /* Inline public methods to return type of voting. */
struct pruning {                                               inline bool do_user_voting(void) {return user_voting;}
bool enabled; bool weight;                             inline bool do_movie_voting(void) {return movie_voting;}
double threshold; double exponent; };                  /* Inline public method to access user vote weight. */
/* following structure def used to encapsulate parameters      inline double get_user_vote_weight(void) {return user_vote_weight;}
* which configure the external pruning routines. */
struct pruning_parameters {                                    /* Inline public methods to determine whether user and
unsigned long long int mstrt, ustrt;                    * movie support be reset after initial external pruning */
double mstrt_mult, ustrt_mult,                         inline bool reset_user_support(void) {return user_reset_support;}
TSa, TSb, Tdvp, Tdvs, Tvdp, Tvds,               inline bool reset_movie_support(void) {return movie_reset_support;}
TD, TP, PPm, TV, TSD, Ch, Ct; };                /* Inline public methods to return status of vote forcing. */
/* following structure definition is used to encapsulate       inline bool user_vote_force_in_loop(void) {
* information for the external pruning routines. */                   return user_force_vote_in_Voter_Loop;}
enum prune_type {                                              inline bool user_vote_force_after_loop(void) {
UserPrune, UserFastPrune, UserCommonCoSupportPrune,                  return user_force_vote_after_Voter_Loop;}
MoviePrune,MovieFastPrune,MovieCommonCoSupportPrune};        inline bool movie_vote_force_in_loop(void) {
struct external_prune {bool enabled;                                   return movie_force_vote_in_Voter_Loop;}
enum prune_type method;                                      inline bool movie_vote_force_after_loop(void) {
struct pruning_parameters params;};                                  return movie_force_vote_outside_Voter_Loop;}
/* Name of the prediction configuration. */                   /* Inline accessor functions for returning external pruning conf. */
char *name;                                                   inline struct external_prune *get_user_Prune_Movies_in_SupU(void) {
/* First section of variables affect global prediction                return &Prune_Movies_in_SupU; }
* parameters at level of mpp-user.C file. Subsequent         inline struct external_prune *get_user_Prune_Users_in_SupM(void) {
* var sections will provide params specific to either                return &Prune_Users_in_SupM; }
* user or movie voting. */                                   inline struct external_prune *get_user_Prune_Movies_in_CoSupUV(void){
bool user_voting, movie_voting;                                       return &Prune_Movies_in_CoSupUV; }
double user_vote_weight;                                      inline struct external_prune *get_movie_Prune_Users_in_SupM(void) {
return &Movie_Prune_Users_in_SupM;}
/* User voting parameters. */                                  inline struct external_prune *get_movie_Prune_Movies_in_SupU(void) {
bool user_force_vote_in_Voter_Loop,                                    return &Movie_Prune_Movies_in_SupU;}
user_force_vote_after_Voter_Loop;                         inline struct external_prune *get_movie_Prune_Users_in_CoSupMN(void){
bool user_reset_support;                                               return &Movie_Prune_Users_in_CoSupMN;}
bool user_boundary_override;                                   /* Public method to read a configuration file. */
double user_facz, user_thrz;                                   bool read_config(const char *);
/*Public accessor for returning ptr to structure of internal prune. */
/* Internal vote pruning. */                                 struct pruning *get_internal_prune(enum internal_pruning);
struct pruning dvCorp, dvCors, vdCorp,                       /* Public method to print out a configuration. */
vdCors, pCor, dCor, sCor;                     void print(FILE *);};
#endif
UserSet.C
/* Divide by the number of movies in the cosupport list to
/* System include files. */                                           * complete the mean. */
#include <limits.h>                                                  return mean / cosupport.get_count();
}
/* Local include files. */                                  /* Public method returns a PTree describing the set
#include "UserSet.H"                                          * of movies which a user has rated. */
PTree UserSet::get_movies(unsigned long int index)
/* Variables static to this module. */
{     auto size_t slot = index * 3;
/* No arguement constructor. */                                   return ptree_set[slot] | ptree_set[slot+1] | ptree_set[slot+2];
UserSet::UserSet(void) : ptree_set()                        }
{ id_numbers = NULL; return; }
/* Public method.
/* Destructor. */                                             * This method converts a user identity number into the index value
UserSet::~UserSet(void)                                       * used to reference the PTree's corresponding to this user. */
{ if ( id_numbers != NULL ) free(id_numbers); return; }     unsigned long int UserSet::get_index(unsigned long int id_number)
{
/**
* Public method.                                                   auto unsigned long int identities = ptree_set.size() / 3;
* Calculates the rating a user provided for a movie.               for (unsigned long int lp= 0; lp < identities; ++lp)
* \param user The index number of the user.                           if ( id_numbers[lp] == id_number) return lp; return 0;
* \param movie The index number of the movie.             }
* \return   Rating number is returned to the caller. */
double UserSet::get_rating(unsigned long int user_index,    /*Public method used to obtain pointers to rating PTree's of given
unsigned long int movie_index)               * user. Each user has 3 associated PTree's corresponding to one of
{                                                             * the three bits used to represent movie ratings. The zeroth PTree
auto int rating = 0, val    = 1;
* represents the high order bit of the rating value.
auto size_t slot = user_index * 3;
* \param       Index number of user whose rating is to be returned.
for (int tree= 2; tree >= 0; --tree) {                    * \param       Bit position PTree to be returned.
if(ptree_set[slot+tree].is_set(movie_index))              * \return      NULL is returned if an invalid PTree is requested.
rating+=val;val<<=1;} return rating; }                    *              Else requested PTree is returned to the caller. */
PTree UserSet::get_ptree(unsigned long int user_index, int bit)
/* Public method.                                           {
* Calculates mean rating of a group of movies by user               auto size_t slot = user_index * 3;
* Specification of grp of movies is provided by bit mask            if ( slot >= ptree_set.size() ) return NULL;
* in the PTree supplied as an arguement to this method.
return ptree_set[slot + bit];
* \param user_index    Index number of the user for whom
*                      the mean is being predicted.        }
* \param cosupport   Mask specifying group of movies for
*                    mean is to be calculated.             /*Public method.
* \return            Rating number returned to caller.*/     * This method sets up an array containing the numerical value of
double UserSet::get_mean(unsigned long int user_index,        * user identity numbers corresponding to the PTree slots.
PTree &cosupport) {                                    * \return      A boolean return value is used to indicate success
auto size_t slot = user_index * 3;                            *              or failure of the load. */
auto double mean = 0;                               bool UserSet::load_identities(void)
auto PTree bitcolumn;                               {
/* Iterate over the three bit positions which represent              auto size_t cnt = 0, number_of_identities;
* movie ratings. Multiply number of 1 bits in               auto char *p, bufr[PATH_MAX];
* bit column by bitvalue of the tree. */
auto FILE *input;
for (int tree=2, bit=0; tree>=0; --tree, ++bit) {
bitcolumn = cosupport & ptree_set[slot+tree];
mean += bitcolumn.get_count()*pow(2.0, bit);}
UserSet.C              page 2                                       UserSet.H
/*Public method.                                                    #if !defined(USERSET_H)
* sets up an array containing the numerical value of              #define USERSET_H
* user identity numbers corresponding to the PTree slots.
* \return   Boolean return is used to indicate success            /* Standard include files. */
*              or failure of the load. */                         #include <stdio.h>
{                                                                   /* Local include files. */
auto size_t cnt = 0, number_of_identities;                 #include "PTreeSet.H"
auto char *p, bufr[PATH_MAX];
auto FILE *input;
class UserSet
/* PTreeSet must be loaded. */                           {
if ( ptree_set.size() == 0 ) return false;               private:
unsigned long int *id_numbers;
/* Allocate an array of integers to hold identities*/      PTreeSet ptree_set;
number_of_identities = ptree_set.size() / 3;             public:
id_numbers=(unsigned long int *)malloc(number_of_identities*     /* Void constructor. */
sizeof(unsigned long int));                  UserSet(void);
if ( id_numbers == NULL ) return false;

/* Read file and convert identities to integers. */        /* Constructor to initialize an in-memory tree. */
snprintf(bufr, sizeof(bufr), \                             /* Destructor. */
"%s/mpred-data/nf_mv_us_pt/user-attributes.txt",PTREEDATA);       ~UserSet(void);
input = fopen(bufr, "r");
if ( input == NULL ) return false;                         /* Public inline method to return the identity of a user index*/
while ( !feof(input) ) {                                   unsigned long int get_identity(unsigned long int index) {
if(fgets(bufr, sizeof(bufr),input)==NULL) return false;               return id_numbers[index]; }
if ((p=strrchr(bufr, '\n')) != NULL) *p = '\0';
id_numbers[cnt++] = strtoul(bufr, NULL, 10);
/* Public method to return the index of a user identity. */
if ( cnt == number_of_identities ) return true;
}                                                          unsigned long int get_index(unsigned long int);
/* Public method to return the rating of a movie by a user. */
if ( cnt != number_of_identities ) return false;           double get_rating(unsigned long int, unsigned long int);
return true;                                               /* Public method to return the mean rating of a set of movies.*/
}                                                                     double get_mean(unsigned long int, PTree &);
/* Public method to return the set of movies rated by a user. */
/* Public method.                                                     PTree get_movies(unsigned long int);
* This method dumps all component PTree's of set of moves.          /* Public method to return rating PTree's. */
* \param output    Output descriptor where the PTree's are
PTree get_ptree(unsigned long int, int);
*                      to be directed. */
void UserSet::dump(FILE *output)                                      /* Public method to load a list of user identities. */
for (int lp= 0; lp < ptree_set.size(); ++lp)                 /* Public method to print sparseness of set. */
ptree_set[lp].dump(output);                               void dump(FILE *);
return;                                                   /* Public method to load a set of PTree's saved in ASCII format*/
for (unsigned long int lp= 0; lp < identities; ++lp)         /* Public method to load a binary PTree set. */
fprintf(output, "%lu -> %lu\n", lp, id_numbers[lp]);        bool load_binary(void);
return;                                                     /* Public method to print index/attribute pairings. */
}                                                                    void print(FILE *);
};
#endif
Sample config file                                                            # dvCorp = disabled
# dvThrp = 0
# Sample prediction configuration file.                                       # dvCorpWeight = disabled
# Name of the configuration.
name = default                                                                # dvCors = disabled
# dvThrs = 0
# dvCorsWeight = disabled
# Allow user voting: enabled or disabled
user_voting = disabled                                                        # vdCorp = disabled
# vdThrp = 0
# Do movie based voting: enabled or disabled                                  # vdCorpWeight = disabled
movie_voting = enabled
# vdCors = disabled
# vdThrs = 0
# User vote weighting. Movie vote weighting will be derived from              # vdCorsWeight = disabled
# the value of this variable.
user_vote_weight = 0                                                          # pCor = disabled
# pThr = 0
# pCorWeight = disabled
# User voting configuration.
# dCor = disabled
# This section is only processed if user voting is enabled.                   # dThr = 0
[user_voting]                                                                 # dCorWeight = disabled

# The following options specify where and if votes are forced into            # sCor = disabled
# their standard range of 1-5.                                                # sThr = 0
force_vote_in_Voter_Loop = disabled                                           # sCorWeight = disabled
force_vote_after_Voter_Loop = disabled                                        # Standard deviation pruning.
# One of more of the following methods can be selected. The default is
# The following variable controls whether or not user support is reset        # for all these methods to be disabled.
# after user pruning is completed.                                            # Each pruning method has a threshold and exponent value associated with
reset_support = disabled                                                      # it. The defaults values are noted in the definitions below.

# dUVsdp = disabled
# The following variables control Boundary Based prediction overrides.        # dUVsdpThr = 0
# The parameters are only evaluated if the boundary based method is           # dUVsdpExp = -1
# enabled.
# boundary_override = disabled;                                               # dUVsds = disabled
# facz = 0                                                                    # dUVsdsThr = 0
# dUVsdsExp = -1
# thrz = 0
# Vsdp_Usdp = disabled
# Internal pruning configuration.
# One or more of the pruning functions can be enabled.
# Vsdp_UsdpThr = 0
# For each pruning type a default threshold can be set. If not set the
# Vsdp_UsdpExp = -1
# default value indicated below is used.
# The third variable selects a vote weighting option. If the weight variant
# Vsds_Usds = disabled
# of the pruning method is enabled the value of uCor is set to that value.    # Vsds_UsdsThr = 0
# Note that the last enabled weight will set uCor.                            # Vsds_UsdsExp = -1
# External pruning configuration                                         # Movie voting configuration.
# The following section selects the use of any combination of three      # This section is only processed if the
# movie_vote variable is set to
Sample config file - pg 2
# pruning functions. By default pruning is disabled.
# Each pruning method is encapsulated in its own section. This allows    # enabled in the Default section.
# a pruning configuration to be turned on and off without disturbing     [movie_voting]
# the pruning configuration.
# Within each pruning section there are six different methods for        # The following options specify where and if votes are forced into
# implementing the pruning. These methods are:                           # their standard range of 1-5.
#     UserPrune, UserFastPrune, UserCommonCoSupportPrune                 force_vote_in_Voter_Loop = disabled
#     MoviePrune, MovieFastPrune, MovieCommonCosupportPrune              force_vote_outside_Voter_Loop = disabled
# There are a total of 15 parameters which select the configuration of
# of the pruning. Default values are noted.                              # The following variable controls whether or not user support is reset
# after user pruning is completed.
Prune_Movies_in_SupU = disabled                                          reset_support = disabled
Prune_Users_in_SupM = disabled
Prune_Movies_in_CoSupUV = disabled                                       # following variables control Boundary Based prediction overrides.
# Parameters are only evaluated if boundary based method is enabled.
[user_voting Prune_Movies_in_SupU]                                       # boundary_override = disabled;
# facz = 0
method = UserPrune                                                       # thrz = 0
leftside = 0
width = 0                                                                # Internal pruning configuration.
mstrt = 0                                                                # One or more of the pruning functions can be enabled.
mstrt_mult = 0.0                                                         # For each pruning type a default threshold can be set. If not set the
ustrt = 0                                                                # default value indicated below is used.
ustrt_mult = 0.0                                                         # Third variable selects a vote weighting option. If weight variant
TSa = -100                                                               # of pruning method is enabled the value of uCor is set to that value.
TSb = -100                                                               # Note that the last enabled weight will set uCor.
Tdvp = -1
Tdvs = -1                                                                # DVCorp = disabled
Tvdp = -1                                                                # DVThrp = 0
Tvds = -1                                                                # DVCorpWeight = disabled
TD = -1
TP = -1                                                                  # DVCors = enabled
[user_voting Prune_Movies        # DVThrs = 0
PPm = .1                                in_CoSupUV]
TV = -1                                                                  # DVCorsWeight = disabled
TSD = -1                                method = UserPrune
Ch = 1                                  leftside = 0                     # VDCorp = enabled
Ct = 2                                  width = 0                        # VDThrp = 0
mstrt = 0                        # VDCorpWeight = disabled
[user_voting Prune_Users_in_SupM]       mstrt_mult = 0.0
ustrt = 0                        # VDCors = enabled
method = UserPrune                      ustrt_mult = 0.0                 # VDThrs = 0
TSa = -100                       # VDCorsWeight = disabled
leftside = 0                            TSb = -100
width = 0                               Tdvp = -1
mstrt = 0                                                                # PCor = enabled
Tdvs = -1                        # PThr = 0
mstrt_mult = 0.0                        Tvdp = -1
ustrt = 0                               Tvds = -1                        # PCorWeight = disabled
TD     = -1
ustrt_mult = 0.0                        TD = -1
TP     = -1       TP = -1                          # DCor = enabled
TSa = -100            PPm = .1
TSb = -100                              PPm = .1                         # DThr = 0
TV = -1           TV = -1                          # DCorWeight = disabled
Tdvp = -1             TSD = -1          TSD = -1
Tdvs = -1             Ch = 1            Ch = 1
Tvdp = -1             Ct = 2                                             # SCor = enabled
Ct = 2                           # SThr = 0
Tvds = -1
# SCorWeight = disabled
# Standard deviation pruning.                                              leftside = 0
# One of more of the following methods can be selected. The default is     width = 0                             Sample config file - pg 3
# for all these methods to be disabled.                                    mstrt = 0
mstrt_mult = 0.0
# Each pruning method has a threshold and exponent value associated with
ustrt = 0
# it. The defaults values are noted in the definitions below.              ustrt_mult = 0.0
TSa   = -100
# dMNsdp = disabled                                                        TSb   = -100
# dMNsdpThr = 0                                                            Tdvp = -1
# dMNsdpExp = -1                                                           Tdvs = -1
Tvdp = -1
# dMNsds = disabled                                                        Tvds = -1
# dMNsdsThr = 0                                                            TD    = -1
TP    = -1
# dMNsdsExp = -1
PPm   = .1
TV    = -1
# Nsdp_Msdp = enabled                                                      TSD   = -1
# Nsdp_MsdpThr = 0                                                         Ch    = 1
# Nsdp_MsdpExp = -1                                                        Ct    = 2

# Nsds_Msds = enabled
# Nsds_MsdsThr = 0                                                         [movie_voting Prune_Movies_in_SupU]
# Nsds_MsdsExp = -1                                                        method = MovieCommonCoSupportPrune
leftside = +40
width     = 10
# External pruning configuration                                           mstrt = 0
# The following section selects the use of any combination of three        mstrt_mult = 0.0
# pruning functions. By default pruning is disabled.                       ustrt = 0
# Each pruning method is encapsulated in its own section. This allows                                     [movie_voting Prune_Users_in_CoSupMN]
ustrt_mult = 0.0
# a pruning configuration to be turned on and off without disturbing       TSa   = -100                   method = UserCommonCoSupportPrune
# the pruning configuration.                                               TSb   = -100                   leftside = 0
# Within each pruning section there are six different methods for          Tdvp = -1                      width = 8000
# implementing the pruning. These methods are:                             Tdvs = -1                      mstrt = 0
Tvdp = -1                      mstrt_mult = 0.0
#     UserPrune, UserFastPrune, UserCommonCoSupportPrune
Tvds = -1                      ustrt = 0
#     MoviePrune, MovieFastPrune, MovieCommonCoSupportPrune                TD    = -1
# There are a total of 15 parameters which select the configuration of                                    ustrt_mult = 0.0
TP    = -1                     TSa = -100
# of the pruning. Default values are noted.                                PPm   = .1                     TSb = -100
TV    = -1                     Tdvp = -1
Prune_Users_in_SupM = disabled                                             TSD   = -1                     Tdvs = -1
Prune_Movies_in_SupU = enabled                                             Ch    = 1                      Tvdp = -1
Prune_Users_in_CoSupMN = enabled                                           Ct    = 2                      Tvds = -1
TD = -1
[movie_voting Prune_Users_in_SupM]                                                                         TP = -1
method = UserPrune                                                                                         PPm = .1
TV = -1
TSD = -1
Ch = 1

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 4 posted: 11/5/2011 language: English pages: 57