# CSE 591: Machine learning and applications

Document Sample

```					 Basics of probability and
linear algebra, LSI

Jieping Ye
Department of Computer Science & Engineering
Arizona State University
http://www.public.asu.edu/~jye02

1
Outline of lecture
   Survey summary

   Project and homework

   Basics of probability

   Basics of linear algebra

   Latent Semantic Indexing

2
Summary of Survey
   Most interesting topics
   Support Vector Machines (SVM)
   Manifold learning
   Kernel learning
   Semi-supervised learning

3
Project and Homework
   Proposal due on Feb 8.
   Talk to me before you choose the topic.
   You may change your topic later
   Group project: 1-3 students form a group.
   Specify the responsibility of each group member.
   Homework (3)
   Calculation, proof, and implementation
   Academic Integrity and Student Conduct

4
Project Topics
    Machine learning techniques for specific applications

    An extensive comparative study of a specific topic

    Literature survey

    Development of novel algorithms

http://www.public.asu.edu/~jye02/CLASSES/Spring-2007/Papers/

5
Basics of probability
   An experiment is a well-defined process with
observable outcomes.

   The set or collection of all outcomes of an
experiment is called the sample space, S.

   An event E is any subset of outcomes from S.

   Probability of an event, P(E) is P(E) = number of
outcomes in E / number of outcomes in S.

6
Bayes’ Theorem
   Conditional probability: P(A|B) = P(A, B)/P(B).

   Test of Independence: A and B are said to be
independent if and only if P(A, B) = P(A) P(B).

   Bayes' Theorem: P(A|B) = P(B|A) P(A)/P(B).

7
Illustration
A     0     0     1    1     1     0
B     0     1     1    0     1     1
   P(A=1) = 3/6 = 1/2, P(A=0) = 3/6 = 1/2.
   P(B=1) = 4/6 = 2/3, P(B=0) = 2/6 = 1/3.
   P(A=1, B = 1) = 2/6 = 1/3.
   P(A=1 | B = 1) = P(A=1, B = 1) / P(B=1) = 1/2.
   P(B=1 | A = 1) = P(B=1, A = 1) / P(A=1) = 2/3.
   P(A=1 | B = 1) P(B=1)/P(A=1) = 2/3 = P(B=1 | A =1).
 Bayes’ Theorem

8
Mean, variance, and standard
deviation
   The mean of a random variable X is the average
value X takes.

   The variance of X is a measure of how dispersed the
values that X takes are.

   The standard deviation is simply the square root of
the variance.

9
Example
   X= {1, 2} with P(X=1) = 0.8 and P(X=2) = 0.2

   Mean
   0.8 X 1 + 0.2 X 2 = 1.2

   Variance
   0.8 X (1 – 1.2) X (1 – 1.2) + 0.2 X (2 – 1.2) X (2-1.2)

10
Normal distribution (for
continuous data)
Univariate normal distribution

Multivariate normal distribution

Electronic Statistics Textbook ( http://www.statsoft.com/textbook/stathome.html )
11
Eigenvector, Eigenvalue, and
orthogonal matrix

12
Eigenvector, Eigenvalue, and
orthogonal matrix

1
1
..
.   1

13
Matrix norms and trace

14
Symmetric and positive definite
matrix and QR

15
Singular Value Decomposition

orthogonal
orthogonal   diagonal
16
Properties of SVD

17
Properties of SVD

• That is, Ak is the optimal approximation in terms of the approximation
error measured by the Frobenius norm, among all matrices of rank k

• Forms the basics of LSI (Latent Semantic Indexing) in informational
retrieval

18
Low rank approximation by SVD

19
Applications of SVD
   Pseudo-inverse
   Range, null space and rank
   Matrix approximation
   Information retrieval
   LSI (Latent Semantic Indexing)

20
LSI (Latent Semantic Indexing)
work well in information retrieval
   One term may have multiple meaning (polysemy)
   Different terms may have the same meaning
(synonymy)

   We want to capture the concepts instead of
words.

21
LSI (Latent Semantic Indexing)
   LSI approach tries to overcome the deficiencies of
term-matching retrieval by treating the unreliability of
observed term-document association data as a
statistical problem.

   The goal is to find effective models to represent the
relationship between terms and documents. Hence a
set of terms, which is by itself incomplete and
unreliable, will be replaced by some set of entities
which are more reliable indicants.

22
Term-Document matrix

   Document-Term M
   Weight schemes
   Decompose M by SVD.
   Approximating M using
truncated SVD

23
Truncated SVD

Map each row and column of A into the k-dimensional LSI space
24
Query
   A query q is also mapped into this
space, by qk  q TU k  1
k

   Compare the similarity in the new space

   Intuition: Dimension reduction through
LSI brings together “related” axes in the
vector space.

25
Intuition
   E.g. car and automobile

   Do not match using traditional term-matching
method

   They occur with many of the same words,
such as motor, model, vehicle, engine, etc.
   Similar k-dimensional representation

26
Example

27
Example (cont.)

28
Example
(Mapping)

29
Example (cont. Query)

Query: Application and Theory

30
Example
(cont. Query)

Cosine similarity

31
How to set the value of k?
   LSI is useful only if k << n.
   If k is too large, it doesn't capture the
underlying latent semantic space; if k is
too small, too much is lost.
   No principled way of determining the
best k.

32
How well does LSI work?
   Effectiveness of LSI compared to regular
term-matching depends on nature of
documents.
   Typical improvement: 0 to 30% better precision.
   Advantage greater for texts in which synonymy and
ambiguity are more prevalent.
   Best when recall is high.

   Costs of LSI might outweigh improvement.
   SVD is computationally expensive; limited use for
really large document collections
33
References
   Mini tutorial on the Singular Value Decomposition
 http://www.cs.brown.edu/research/ai/dynamics/tutoria

l/Postscript/SingularValueDecomposition.ps
   Basics of linear algebra
 http://www.stanford.edu/class/cs229/section/section_li

n_algebra.pdf
   Indexing by Latent Semantic Analysis
 http://lsi.research.telcordia.com/lsi/papers/JASIS90.pdf

   SVD and LSI Tutorial
 http://www.miislita.com/information-retrieval-

tutorial/information-retrieval-tutorials.html
34
Next class
   Topics
 Clustering: basics, algorithms, and evaluations.

   Readings (available at the class webpage)
   Cluster analysis for gene expression data: A survey
   Normalized cuts and image segmentation

35

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 3 posted: 10/4/2012 language: English pages: 35