QA for the Web

Reviews
Shared by: Juan Agui
Stats
views:
22
rating:
not rated
reviews:
0
posted:
4/28/2009
language:
English
pages:
0
    Final Exam (not cumulative) Next Tuesday Dec. 12, 7-8:15 PM 1105 SC (This Room) Miscellaneous Topics CS446-Fall ’06 1 Topics Since Midterm      Statistical Learning Parameterized Models Generative Models Discriminative Models Bayes            Likelihood function Estimation   - Rule - Networks Naïve - Weak Learning Margin Distribution Frequentist / Bayesian Statistics ANNs Backpropagation Nonparametric classifiers           Conjugate Priors K-means Clustering Expectation Maximization Jordan's talk Logistic Regression Information Theory    Maximum Likelihood Maximum A Posteriori  Dimensionality Reduction   Nearest Neighbor Bias / Variance k Nearest Neighbor Kernel Smoothing    Model Selection    LDA, PCA, MDS, FA Local Linear Embedding Neural Network Derived Features Fit / Regularization AIC, BIC Kolmogorov Complexity, MDL  Ensembles   Conditional Information Mutual Information KL Divergence Bayes Optimal Classifier Bagging Boosting     K-fold Cross Validation Leave one out cross validation Learning Curve ROC Curve  Miscellaneous Topics CS446-Fall ’06 2 Dimensionality Reduction   Many approaches Supervised    Feature selection Linear Discriminant Analysis (LDA/Fisher; we’ve seen this already) …  Unsupervised - what does this mean?     Principle Component Analysis (PCA) Multidimensional Scaling Factor Analysis … Neural Net application CS446-Fall ’06 3  Also some nonlinear dimensionality reduction  Miscellaneous Topics Linear Discriminant Analysis (LDA)   Introduce a “new” feature Linear combination of old features The new feature should maximize distance between classes And simultaneously minimize variance within classes: Maximize     | mP – mN|2 / (S2p + s2N)  Project points into subspace and repeat Miscellaneous Topics CS446-Fall ’06 4 Unsupervised – do not use class label  Principle component analysis (PCA)  Like LDA but maximize variance in X Observed raw features are imagined to be derived Introduce K latent features that “cause” the observed features Estimate them  Factor Analysis     Multidimensional Scaling   Suppose we know distances between examples Find a map in K dimensions that supports   Placement of examples Computes desired distances Miscellaneous Topics CS446-Fall ’06 5 Principle Component Analysis (PCA) Find direction of maximum variance Project Repeat Y 1 2 X This is an eigenvalue problem; we are looking for eigenvectors Miscellaneous Topics CS446-Fall ’06 6 Principle Component Analysis         Eigenvector of the largest eigenvalue is the direction of greatest variance Second largest is greatest remaining variance etc. They are orthogonal and form the new (linear) features Use eigenvectors of the largest k eigenvalues or down to some relative size of eigenvalues Previous LDA example: First LDA component First PCA component Is this good? Miscellaneous Topics CS446-Fall ’06 7 Multidimensional Scaling (MDS)    Given distances between examples Position examples in a lower dimensional metric space faithful to these distances Atlanta Flying times between pairs Chicago Denver Houston Los Angeles Miami New York San Francisco Seattle Washington D.C. Works particularly well here. Why? Miscellaneous Topics CS446-Fall ’06 8 Factor Analysis (FA)      Assume the observed features are really manifestations of some number k of more primitive factors These factors are unobservable Assume they are uncorrelated linear combinations of the original features Features In matrix form: X =  + LF +  Factors Where X is an example  is the mean of each feature L is a matrix of factor loads F are the factors  is a noise term Class  Similar to PCA except for  which can aid in post hoc interpretation of the factors CS446-Fall ’06 9 Miscellaneous Topics Non-linear Dimensionality Reduction     Linear is limited: many classifiers easily handle anyway Potential big gains in nonlinear transformations But a rich space with huge potential for overfitting Kernel PCA – PCA with kernel functions Local Linear Embedding (LLE) – currently very popular  Note that distance is measured along the lower dimensional manifold, assumptions: smoothness, density CS446-Fall ’06 10 Miscellaneous Topics Using Neural Networks     Features Use hidden layers to learn lower dimensional features Hidden Couple the example Layer features to both the input and output Learns to reproduce the Features input features Hidden layer is floating Miscellaneous Topics CS446-Fall ’06 11 Using Neural Networks      Features Use hidden layers to learn lower dimensional features Hidden Couple the example Layer features to both the input and output Learns to reproduce the Features input features Hidden layer is floating Limit the number of hidden units CS446-Fall ’06 12 Miscellaneous Topics Using Neural Networks      Features Use hidden layers to learn lower dimensional features Hidden Couple the example Layer features to both the input and output Learns to reproduce the Features input features Hidden layer is floating Hidden Nodes Learn the Best Limit the number of Nonlinear Transformations hidden units that Reproduce the Input Features CS446-Fall ’06 13 Miscellaneous Topics Model Selection   Comparing error rate alone on models of different complexity is not fair Consider the Likelihood Function   It will tend to prefer a more complex model Why?     Overfitting Need regularization Penalty to compensate for complexity Richer model families are more likely to find a good fit by accident Miscellaneous Topics CS446-Fall ’06 14 Information Criteria  L is the likelihood; k is number of parameters; N is number of examples Prefer the higher scoring models Akaike Information Criterion (AIC) AIC = ln(L) – k fit complexity penalty Bayesian Information Criterion (BIC) BIC = ln(L) – k ln(N)/4     Minimum Description Length (MDL) is the same as BIC Kolmogorov complexity    Learning = Data Compression Compression bounds; bound test accuracy from training alone Luckiness Framework; PAC-Bayes CS446-Fall ’06 15 Miscellaneous Topics Cross Validation  Good For    Setting Parameters Choosing Models Evaluating a Learner     Data Resampling Technique Different partition sets of the training data are somewhat independent Overlap introduces some bias, this can be estimated if necessary In statistics: bootstrap, jackknife Miscellaneous Topics CS446-Fall ’06 16 Computing a learning curve       Classifier performance as a function of training Desire confidences Perhaps Error Bars: Usually 95% standard error of the mean Need multiple runs but have limited data Each point is generated by cross validation Miscellaneous Topics CS446-Fall ’06 17 Cross Validation     k-fold cross validation Partition the data D into k equal disjoint sets: d1, d2… dk For i=1 to k Train on D-di Test on di Generates a population of results Can compute average performance Confidence measures Most popular / most standard is k = 10 When k = |D| it is called “Leave one out cross validation” CS446-Fall ’06 18   Miscellaneous Topics Cross Validation   Every example gets used as a test example exactly once Every example gets used as a training example k-1 times. Test sets are independent but training sets overlap significantly.     The hypotheses are generated using (k-1)/k of the training data. With resampling, “paired” statistical design can be used to compare two or more learners Paired tests are statistically stronger since outcome variations due to the test set are identical in each fold CS446-Fall ’06 19 Miscellaneous Topics ROC Curve        Miscellaneous Topics Often a classifier can be adjusted to have more false positives or more false negatives This can be used to hide weaknesses of the classifier Receiver Operating Characteristic Curve Prob. of True Positive vs. Prob. of False Positive as sensitivity is increased The – (left peak) and + (right peak) populations overlap The classification boundary is the vertical line The relevant areas are labeled: TP: true positives = Red + Purple; FP: false positives = Pink + Purple; TN: true negatives = Dark Blue + Light Blue; FN: false negatives = Light Blue CS446-Fall ’06 20

Related docs
qa manual
Views: 121  |  Downloads: 27
Transfer QA
Views: 7  |  Downloads: 0
QA Plan
Views: 3  |  Downloads: 0
QA pdf
Views: 3  |  Downloads: 0
list of qa focus briefing document
Views: 1  |  Downloads: 0
QA---STATS-Indiana-Data-Everyone-Can-Use
Views: 0  |  Downloads: 0
QA brochure
Views: 3  |  Downloads: 0
QA Blueprint w charts for web
Views: 0  |  Downloads: 0
qa-engineer-resume 345
Views: 33  |  Downloads: 0
qa-specialist-resume 346
Views: 15  |  Downloads: 0
qa-tester-resume 348
Views: 1  |  Downloads: 0
Discontinuance of Audrey QA
Views: 2  |  Downloads: 0
The Future of Testing QA
Views: 138  |  Downloads: 31
CSTE_QA
Views: 54  |  Downloads: 14
Other docs by Juan Agui
Prenatal Massage Therapy
Views: 672  |  Downloads: 18
There is a Habitation
Views: 366  |  Downloads: 2
dv100s
Views: 221  |  Downloads: 0
Vaskie v West American Ins Co
Views: 314  |  Downloads: 8
Palsgraf v Long Island R R Co
Views: 423  |  Downloads: 10
Be Strong and Courageous
Views: 217  |  Downloads: 1
Tell Me the Story of Jesus
Views: 337  |  Downloads: 3
A New Annointing
Views: 240  |  Downloads: 0
52_CORP OUTLINE
Views: 357  |  Downloads: 20
I See the Lord
Views: 319  |  Downloads: 0
Be Still and Know
Views: 210  |  Downloads: 1
de147
Views: 114  |  Downloads: 0
Reynolds
Views: 175  |  Downloads: 0
Firm Foundation
Views: 185  |  Downloads: 1
Teleportation Physics Study
Views: 627  |  Downloads: 23