Electronic Engineering and Computing Technology

Document Sample
Electronic Engineering and Computing Technology Powered By Docstoc
					Feature Selection for Ranking


                                                  SIGIR 2007
Xiubo Geng(1,2), Tie-Yan Liu(1), Tao Qin(1,3), Hang Li(1)
(1) Microsoft Research Asia
(2) Institue of Computing Technology, Chinese Academy of Sciences
(3) Dept. Electronic Engineering, Tsinghua University, Beijing
Why feature selection
important

   feature selection can help enhance
    accuracy in many machine learning
    problems
       over-fitting
   feature selection can also help improve
    the efficiency of training
Now
   no methods of feature selection dedicatedly
    proposed for ranking. Most of the methods
    used in ranking were developed for
    classification
   feature selection methods in classification fall
    into three categories
       Filter : information gain (IG) , chi-square (CHI)
       Wrapper
       Embedded method
Problem
   In ranking, ordered categories are used
    in classification the categories are “flat”
   evaluation measures
       1) in ranking usually precision is more
        important than recall while in classification
        both precision and recall are important
       2) in ranking correctly ranking the top-n
        instances is more critical while in
        classification making a correct classification
        decision is of equal significance for all
        instances
So , in this paper
   propose a novel method for this
    purpose with the following properties
       uses evaluation measures or loss
        functions in ranking to measure the
        importance of features
       considers the similarities between
        features
       to find a set of features with maximum
        importance and minimum similarity
    Feature Selection Method
   Overview
   Goal : select t (1≤t≤m) features from the entire
    feature set { V1,V2,…, Vm }
       (1) define the importance score of each feature Vi
       (2) define the similarity between any two features Vi
        and Vj
       (3) employ an efficient algorithm to maximize the
        total importance scores and minimize the total
        similarity scores of a set of features
Feature Selection Method
   Importance of feature
       using an evaluation measure like MAP
        and NDCG or a loss function (e.g. pair-
        wise ranking errors ) to compute the
        importance score
    Feature Selection Method
    Similarity between feature
         on the basis of their ranking results


    Kendall’s τ



    Dq denotes the set of instance pairs (ds,dt) in response
     with respect to query q
     #{∙} represents the number of elements in a set
    ds≺vi dt implies that instance dt is ranked ahead of
     instance ds by feature vi
Feature Selection Method
   Optimization




   t denotes the number of selected features
   Xi=1 (or 0) indicates that feature Vi is selected (or
    not)
   Wi denotes the importance score of feature Vi
   ei,j denotes the similarity between feature Vi and
    feature Vj
Feature Selection Method
   Linear combination



   Typical 0-1 integer programming
    problem
   Greedy search algorithm
Feature Selection Method-
greedy algorithm
   Construct an undirected graph G
                  wi          wk
             Vi        ei,k        Vk

   Set S to contain the selected feature
   For i=1..t
       select largest weight node Vk
       wj = wj – ek,j*2c k!=j
       Add Vk into S and remove it from G
        Experiment Settings
   DataSets
       .gov data from task of Web track of TREC 2004
           Binary relevance judgment
           44 feature (doc length , term frequency , idf , BM25 ….)
       OHSUMED data from TREC 9 filtering task
           “definitely relevant”, “possibly relevant”, and “not relevant”
           26 feature
   Evaluation measure
       MAP , NDCG
   Ranking Model
       Ranking SVM , Ranknet
        Experiment Settings
   Experiment conducted as following
        (1) ran a feature selection method on the training set
        (2) use the selected features to train a ranking model
        with the training set, and tuned the parameters of the
        ranking model with the validation set
        (3) use the obtained ranking model to conduct ranking
        on the test set, and evaluated the results in terms of
        MAP and NDCG
   Algorithm for comparison
       GAS-E : evaluation
       GAS-L : loss function
       IG
       CHI
     Experimental Results - .gov




most cases GAS-L can outperform GAS-E,
although not significantly

are more stable than those with IG and CHI
Experimental Results - .gov
    Experimental Results - OHSUMED




It can be seen that CHI
performs the worst
both IG and our algorithms can achieve
good ranking accuracies with less than 5
features. With more features added, our
algorithms gradually outperform IG.
Experimental Results
    Discussions
   Two observation
       1) Feature selection can improve the
        ranking performance more significantly for
        the .gov dataset than for the OHSUMED
        dataset
       2) proposed algorithms outperform IG and
        CHI more significantly for the .gov dataset
        than for the OHSUMED dataset.
  Discussions




the .gov dataset contains more ineffective features (or noisy
features). In this case, feature selection can help remove
noisy features and thus improve the performance of final
ranking
   Discussions




 .gov dataset are clustered into many blocks, for each
 cluster, only that if the effects of features vary largely
we conclude representative features can be selected and
there are redundant features, our method can work very well
 OHSUMED dataset, there are only two large blocks, with
 most features similar to each other
Contribution

   made clear the limitations of the
    existing feature selection methods
    when applied to ranking
   proposed a novel method to select
    features for ranking
   Experimental results have validated
    the effectiveness and efficiency of the
    proposed method
Future Work

   one could also choose to minimize
    redundancy among three or four
    features
   optimization method for feature
    selection
   further conduct experiments on larger
    datasets and with more features

				
DOCUMENT INFO
Description: Electronic Engineering and Computing Technology document sample