Learning Center
Plans & pricing Sign in
Sign Out

Protein Fold Recognition with Relevance Vector Machines


									Protein Fold Recognition
 with Relevance Vector
         Patrick Fernie
         COMS 6772
   Advanced Machine Learning
 Relevance Vector Machine

A Bayesian treatment of a generalized
linear model
Yields a formulation similar to that of a
Support Vector Machine
Hyperparameters Instead of Margin/Costs
   Relevance Vector Machine
         SVM                     RVM

Hard Binary Outputs or    Probabilistic Outputs
   Point Estimates
  Requires a Mercer         Can Use Arbitrary
        Kernel                   Kernel
Must Determine Suitable    “Nuisance” Values
 Cost and Insensitivity       Automatically
        Values                 Determined
 Sparse (USPS ~2500)      Sparser USPS (~316!)
 Relevance Vector Machine

Can’t Use qp()
Must solve iteratively (Sequential
Minimization Optimization)
As we iterate, many hyperparameters (αi)
values become arbitrarily large; allows
 Relevance Vector Machine

Faster Algorithm (Still not SVM fast)
Minimizes Number of Active Kernel
Functions to Reduce Computation Time
Analytic Approach to Pruning/Adding
Basis Functions
   Protein Fold Recognition

Protein Structure Families
Many Fold Families
Not Necessarily Directly Related by
Protein Sequence
   Protein Fold Recognition

Prime Situation for Machine Learning
NN, SVM, etc.
Large Number of Classes
     Protein Fold Recognition

27 Fold Families
Train Many 2-Class Classifiers
   One vs. Others – False Positives
   Unique One vs. Others – Like One vs. Others,
    with Another Round of Training
   All vs. All – Requires a Lot of Classifiers!
       RVMs & Protein Folds

Why RVMs?
   Probabilistic Outputs
   Sparsity (useful only in assessment)
   True Multiclass Prediction
   No Need to Find “Nuisance” Parameters
      Issues/Future Work
Optimize RVM Classification
Implement True Multiclass
Reduced Greediness and Sequential
Convergence Optimization
Novel Kernels?
M. Tipping, “The Relevance Vector Machine”,
M. Tipping, “Sparse Bayesian Learning and the
Relevance Vector Machine”, JMLR, 2001 1:211-
M. Tipping and A. Faul, “Fast Marginal
Likelihood Maximisation for Sparse Bayesian
C. Ding and I. Dubchak, “Multi-class Protein Fold
Recognition Using Support Vector Machines”,

To top