lecture 1 - pantherFILE

Document Sample
lecture 1 - pantherFILE Powered By Docstoc
					CS 657/790

Machine Learning and
Data Mining

Course Introduction
Student Survey
• Please hand in sheet of paper with:
  • Your name and email address
  • Your classification (eg, 2nd year computer
    science PhD student)
  • Your experience with MATLAB (none, some or
  • Your undergraduate degree (when, what,
  • Your AI experience (courses at UWM or
  • Your programming experience
Course Information

• Course Instructor: Joe Bockhorst
 • email:
 • office: 1155 EMS
 • Course webpage:
 • office hours: ???
    •   Possible times:
         •   before class on Monday (3:30-5:30)
         •   Monday morning
         •   Wednesday morning
         •   after class Monday (7:00-9:00)
Textbook & Reading
• Machine Learning (Tom Mitchell)
  • Bookstore in union, $140 new
  • hard cover: $125 new , $80 used
  • soft cover: < $30

• Read (posted on class web page)
  •   Preface
  •   Chapter 1
  •   Sections 6.1, 6.2, 6.9, 6.10
  •   Sections 8.1, 8.2
Powerpoint Vs
• Powerpoint encourages words over
  pictures (not good)
• But powerpoint can be saved,
  tweaked, easily shared, …
  • Notes posted on course website following
• Your thoughts?
Full Disclosure

•   Slides are a combination of
    1) Jude Shavlik’s notes from UW-Madison
       machine learning course (Prof. I had)
    2) Textbook Slides (Google “machine
       learning textbook”)
    3) My notes
Class Email List

 • Is there one?
Course Outline

• 1st half covers supervised learning
  • Algorithms: support vector machines,
    neural networks, probabilistic models …
  • Methodology
• 2nd half covers graphical probability
  • Powerful statistical models very useful for
    learning in complex and/or noisy settings
Course "Style"

 • Primarily algorithmic & experimental
 • Some theory, both mathematical & conceptual
   (much on statistics)
 • "Hands on" experience, interactive
 • Broad survey of many ML subfields
    •   "symbolic" (rules, decision trees)
    •   "connectionist" (neural nets)
    •   Support Vector Machines
    •   statistical ("Bayes rule")
    •   genetic algorithms (if time)
Two Major Goals

 • to understand what a learning system
   should do

 • to understand how (and how well)
   existing systems work
Background Assumed

• Programming
 • Data structures and algorithms
    •    CS 535
• Math
 • Calculus (partial derivatives)
 • Simple probability & statistics
Assignments in MATLAB
  •   Fast prototyping
  •   Integrated plotting
  •   Widely used in academia (industry too?)
  •   Will save you time in the long run
• Why not MATLAB?
  • Proprietary software
  • Harder to work from home

• Optional Assignment: familiarize yourself
  with MATLAB, use MATLAB help system
Student Computer Labs

• E256, E280, E285, E384, E270
• All have MATLAB installed under
  Windows XP

 • Bi-weekly programming plus perhaps some
   “paper & pencil” homework
    •   "hands on" experience valuable
    •   HW0 – build a dataset
    •   HW1 & HW2 supervised learning algorithms
    •   HW3 & HW4 graphical probability models
 • Midterm exam (after about 8-10 weeks)
 • Final exam
 • Find project of your choosing
    •   during last 4-5 weeks of class

   HW's                 25%
   Project              20%
   Midterm              20%
   Final                30%
   Quality Discussion    5%
Late HW's Policy

 • HW's due @ 4pm
 • you have 5 late days to use over the
    •   (Fri 4pm → Mon 4pm is 1 late "day")
 • SAVE UP late days!
    •   extensions only for extreme cases
 • Penalty points after late days exhausted
    •   10% per day
 • Can't be more than one week late
Machine Learning Vs
Data Mining
• Machine Learning: computer
  algorithms that improve automatically
  through experience [Mitchell].
• Data Mining: Extracting knowledge
  from large amounts of data. [Han &
  Kamber] (synonym: knowledge
  discovery in databases (KDD))
       What’s the difference?
       Topics in ML and DM texts
       (Mitchell Vs Han & Kamber)
               Supervised learning, decision trees, neural nets,
               Bayesian networks, k-nearest neighbor, genetic
               algorithms, unsupervised learning (clustering in DM

reinforcement                                                   Data Warehouse,
learning, learning                                              OLAP, query languages,
theory, evaluating                                              association rules,
learning systems,                                               presentation, …
using domain
knowledge,                   ML               DM
inductive logic
programming, …

                             We’ll try to cover topics in red
The learning problem

• Learning = improving with experience
       Improve over task T,
       with respect to performance
       measure P,
       based on experience E

• Example: learn to play checkers
       T: Play Checkers
       P: % of games won
       E: games played against self
Famous Example:
Discovering Genes
• T: find genes in DNA sequences

• P: % of genes found
• E: experimentally verified genes

* Prediction of Complete Gene Structures in Human Genomic DNA,
Burge & Carlin J. Molecular Biology, 1997, 268 78-94
Famous Example 2:
Autonomous Vehicles Driving
• T: drive vehicle
• P: reach destination
• E: machine observation of human
ML key to winning DARPA
Grand Challenge
Stanford team won 2005 driverless vehicle race
across Mojave Desert

                           “The robot's software
                           system relied predominately
                           on state-of-the-art AI
                           technologies, such as
                           machine learning and

                           [Winning the DARPA Grand
                           Challenge, Thrun et al., Journal
                           of Field Robotics, 2006]
Why study machine
learning   ?    (data mining)

• Data is plentiful
    • Retail, video, images, speech, text, DNA,
      bio-medical measurements, …
•   Computational power is available
•   Budding Industry
•   ML has great applications
•   ML still relatively immature
Next Time: HW0 – Create
Your Own Dataset
 • Think about this
    •   will need to create it by week after next
 • Google to find:
    •   UCI archive (or UCI KDD archive)
    •   UCI ML archive (UCI machine learning
HW0 – Your “Personal Concept”

 • Step 1: Choose a Boolean (true/false) concept
    • Subjective Judgement
        • Books I like/dislike
        • Movies I like/dislike
        • Web pages I like/dislike
    • “Time will tell” concepts
        • Stocks to buy
        • Medical outcomes
    • Sensory interpretation
        • Face recognition (See text)
        • Handwritten digit recognition
        • Sound recognition
HW0 – Your “Personal Concept”

• Step 2: Choosing a feature Space
   • We will use fixed-length feature vectors
      • Choose N features
      • Each feature has Vi possible values         Defines a space
      • Each example is represented by a vector of N feature values
        (i.e., is a point in the feature space)
        e.g.: <red, 50, round>
                color weight   shape

   • Feature Types
      •   Boolean
      •   Nominal
                       In HW0 we will use a subset
      •   Ordered
                            (see next slide)
      •   Hierarchical
• Step 3: Collect examples (“I/O” pairs)
Standard Feature Types
for representing training examples
 – source of “domain knowledge”

• Nominal
   • No relationship among possible values
     e.g., color є {red, blue, green} (vs. color = 1000 Hertz)
• Linear (or Ordered)
   • Possible values of the feature are totally ordered
     e.g., size є {small, medium, large} ← discrete
           weight є [0…500] ← continuous
• Hierarchical
   • Possible values are partially
     ordered in an ISA hierarchy       polygon         continuous
     e.g. for shape ->
                                 square     triangle circle      ellipse
        Example Hierarchy
        (KDD* Journal, Vol 5, No. 1-2, 2001, page 17)

                                                                   Tea     99 Product

                 2302 Product          Dried          Canned
                  Subclasses          Cat Food          Cat Food

                                            Friskies    ~30k
• Structure of one feature!                 Liver, 250g Products
• “the need to be able to incorporate hierarchical (knowledge
about data types) is shown in every paper.”
- From eds. Intro to special issue (on applications) of KDD journal, Vol 15, 2001

*   Officially, “Data Mining and Knowledge Discovery”, Kluwer Publishers
Our Feature Types
(for homeworks)

• Discrete
  • tokens (char strings, w/o quote marks and
• Continuous
  • numbers (int’s or float’s)
     • If only a few possible values (e.g., 0 & 1) use discrete
  • i.e., merge nominal and discrete-ordered
    (or convert discrete-ordered into 1,2,…)
  • We will ignore hierarchy info and
    only use the leaf values (it is rare any way)
Today’s Topics

• Creating a dataset of
      fixed length feature vectors

• HW0 out on-line
  • Due next Monday
Some Famous Examples
  • Car Steering (Pomerleau)
             Digitized             Learned    Steering
          camera image             Function    Angle
  • Medical Diagnosis (Quinlan)
Medical                                         ill
                    age = 13       Learned
 record                                         vs
                sex = M wgt = 18   Function
  •   DNA Categorization
  •   TV-pilot rating
  •   Chemical-plant control
  •   Back gammon playing
  •   WWW page scoring
  •   Credit application scoring
HW0: Creating your dataset

1. Choose a dataset
  •   based on interest/familiarity
  •   meets basic requirements
      • >1000 examples
      • category (function) learned should be
        binary valued
      • ~500 examples labeled class A,
        other 500 labeled class B
      → Internet Movie Database (IMD)
        HW0: Creating your dataset

        2. IMD has a lot of data that are
           not discrete or continuous or
           binary-valued for target function
                            Country  Name
                                      Studio List of movies        Year of birth
Name                                                      Actor    Gender
               Director/                                           Oscar nominations
Year of birth                         Made
List of movies Producer    Directed              Acted in
                                                                   List of movies

                                      Movie   Title, Genre, Year, Opening Wkend BO receipts,
                                              List of actors/actresses, Release season
HW0: Creating your dataset

3. Choose a boolean or binary-
   valued target function (category)
  •   Opening weekend box office receipts >
      $2 million
  •   Movie is drama? (action, sci-fi,…)
  •   Movies I like/dislike (e.g. Tivo)
HW0: Creating your dataset

4. How to transfer available attributes:
   Other example attributes (select
   predictive features)
  •   Movie
      •   Average age of actors
      •   Number of producers
      •   Percent female actors
  •   Studio
      •   Number of movies made
      •   Average movie gross
      •   Percent movies released in US
HW0: Creating your dataset

 • Director/Producer
   •   Years of experience
   •   Most prevalent genre
   •   Number of award winning movies
   •   Average movie gross
 • Actor
   • Gender
   • Has previous Oscar award or nominations
   • Most prevalent genre
HW0: Creating your dataset
   David Jensen’s group at UMass used Naïve Bayes (NB) to
   predict the following based on attributes they selected and a
   novel way of sampling from the data:
 • Opening weekend box office receipts > $2
    • 25 attributes
    • Accuracy = 83.3%
    • Default accuracy = 56%
 • Movie is drama?
    • 12 attributes
    • Accuracy = 71.9%
    • Default accuracy = 51%
What Do You Think
Machine Learning Means?
What is Learning?

 Learning denotes changes in the system that
 … enable the system to do the same task …
 more effectively the next time.
                                 - Herbert Simon

 Learning is making useful changes in our minds.
                                  - Marvin Minsky
Major Paradigms of
Machine Learning
 • Inducing Functions from I/O Pairs
   •   Decision trees (e.g., Quinlan’s C4.5 [1993])
   •   Connectionism / neural networks (e.g., backprop)
   •   Nearest-neighbor methods
   •   Genetic algorithms
   •   SVM’s
 • Learning without a Teacher
                                    Not in Mitchell’s
   • Conceptual clustering
                                    textbook (will spend
   • Self-organizing systems        0-2 lectures on this –
   • Discovery systems              but also in CS776)
Major Paradigms of
Machine Learning
 • Improving a Multi-Step Problem
   • Explanation-based learning
   • Reinforcement learning       Will be covered briefly

 • Using Preexisting Domain
   Knowledge Inductively
   • Analogical learning
   • Case-based reasoning
   • Inductive/explanatory hybrids

Shared By:
wang nianwu wang nianwu http://
About wangnianwu