Embed
Email

Educational Data Mining

Document Sample
Educational Data Mining
Shared by: HC111130034054
Categories
Tags
Stats
views:
1
posted:
11/29/2011
language:
English
pages:
39
Educational Data Mining



Ryan S.J.d. Baker

PSLC/HCII

Carnegie Mellon University



Richard Scheines

Professor of Statistics, Machine Learning, and Human-Computer Interaction

Carnegie Mellon University



Ken Koedinger

CMU Director of PSLC

Professor of Human-Computer Interaction & Psychology

Carnegie Mellon University

In this segment…



 We will give a brief overview of classes of

Educational Data Mining methods



 Discussing in detail

 Causal Data Mining

 An important Educational Data Mining method

 Bayesian Knowledge Tracing

 One of the key building blocks of many Educational Data

Mining analyses

Baker (under review)

EDM Methods

 Prediction

 Clustering

 Relationship Mining

 Discovery with Models

 Distillation of Data for Human Judgment

Coverage at EDM2008

(of 31 papers; not mutually exclusive)

 Prediction – 45%

 Clustering – 6%

 Relationship Mining – 19%

 Discovery with Models – 13%

 Distillation of Data for Human Judgment – 16%



 None of the Above – 6%

We will talk about three

approaches now





 2 types of Prediction

 1 type of Relationship Mining



Tomorrow, 9:30am: Discovery with Models

Yesterday: Some examples of Distillation of

Data for Human Judgment

Prediction



 Pretty much what it says



 A student is using a tutor right now.

Is he gaming the system or not?

(“attempting to succeed in an interactive learning environment by

exploiting properties of the system rather than by learning the

material”)



 A student has used the tutor for the last half hour.

How likely is it that she knows the knowledge component in

the next step?



 A student has completed three years of high school.

What will be her score on the SAT-Math exam?

Two Key Types of Prediction









This slide adapted from slide by Andrew W. Moore, Google

http://www.cs.cmu.edu/~awm/tutorials

Classification



 There is something you want to predict (“the

label”)

 The thing you want to predict is categorical

 The answer is one of a set of categories, not a number



 CORRECT/WRONG (sometimes expressed as 0,1)

 HELP REQUEST/WORKED EXAMPLE

REQUEST/ATTEMPT TO SOLVE

 WILL DROP OUT/WON’T DROP OUT

 WILL SELECT PROBLEM A,B,C,D,E,F, or G

Classification



 Associated with each label are a set of

“features”, which maybe you can use to

predict the label





KnowledgeComp pknow time totalactions right

ENTERINGGIVEN 0.704 9 1 WRONG

ENTERINGGIVEN 0.502 10 2 RIGHT

USEDIFFNUM 0.049 6 1 WRONG

ENTERINGGIVEN 0.967 7 3 RIGHT

REMOVECOEFF 0.792 16 1 WRONG

REMOVECOEFF 0.792 13 2 RIGHT

USEDIFFNUM 0.073 5 2 RIGHT

….

Classification



 The basic idea of a classifier is to determine

which features, in which combination, can

predict the label





KnowledgeComp pknow time totalactions right

ENTERINGGIVEN 0.704 9 1 WRONG

ENTERINGGIVEN 0.502 10 2 RIGHT

USEDIFFNUM 0.049 6 1 WRONG

ENTERINGGIVEN 0.967 7 3 RIGHT

REMOVECOEFF 0.792 16 1 WRONG

REMOVECOEFF 0.792 13 2 RIGHT

USEDIFFNUM 0.073 5 2 RIGHT

….

Many algorithms you can use



 Decision Trees (e.g. C4.5, J48, etc.)

 Logistic Regression

 Etc, etc



 In your favorite Machine Learning package

 WEKA

 RapidMiner

 KEEL

Regression



 There is something you want to predict (“the

label”)

 The thing you want to predict is numerical



 Number of hints student requests (0, 1, 2, 3...)

 How long student takes to answer (4.7 s., 8.9 s.,

88.2 s., 0.3 s.)

 What will the student’s test score be (95%, 84%,

33%, 100%)

Regression



 Associated with each label are a set of

“features”, which maybe you can use to

predict the label





KnowledgeComp pknow time totalactions numhints

ENTERINGGIVEN 0.704 9 1 0

ENTERINGGIVEN 0.502 10 2 0

USEDIFFNUM 0.049 6 1 3

ENTERINGGIVEN 0.967 7 3 0

REMOVECOEFF 0.792 16 1 1

REMOVECOEFF 0.792 13 2 0

USEDIFFNUM 0.073 5 2 0

….

Regression



 The basic idea of regression is to determine

which features, in which combination, can

predict the label’s value





KnowledgeComp pknow time totalactions numhints

ENTERINGGIVEN 0.704 9 1 0

ENTERINGGIVEN 0.502 10 2 0

USEDIFFNUM 0.049 6 1 3

ENTERINGGIVEN 0.967 7 3 0

REMOVECOEFF 0.792 16 1 1

REMOVECOEFF 0.792 13 2 0

USEDIFFNUM 0.073 5 2 0

….

Linear Regression



 The most classic form of regression is linear

regression



 Numhints = 0.12*Pknow + 0.932*Time –

0.11*Totalactions

Many more complex algorithms…



 Neural Networks

 Support Vector Machines



 Surprisingly, Linear Regression performs quite

well in many cases despite being overly simple

 Particularly when you have a lot of data



 Which increasingly is not a problem in EDM…

Relationship Mining



 Richard Scheines will now talk about one

type of relationship mining, Causal Data

Mining

Bayesian Knowledge-Tracing







The algorithm behind the skill bars …



Being improved by Educational Data Mining

Key in many EDM analyses and models

Bayesian Knowledge Tracing



 Goal: For each knowledge component (KC),

infer the student’s knowledge state from

performance.



 Suppose a student has six opportunities to

apply a KC and makes the following

sequence of correct (1) and incorrect (0)

responses. Has the student has learned the

rule?

001011

Model Learning Assumptions



 Two-state learning model

 Each skill is either learned or unlearned





 In problem-solving, the student can learn a skill

at each opportunity to apply the skill



 A student does not forget a skill, once he or she

knows it



 Only one skill per action

Model Performance Assumptions



 If the student knows a skill, there is still some

chance the student will slip and make a

mistake.



 If the student does not know a skill, there is

still some chance the student will guess

correctly.

Corbett and Anderson’s Model



p(T) Learned

Not learned



p(L0)

p(G) 1-p(S)



correct correct

Two Learning Parameters

p(L0) Probability the skill is already known before the first opportunity to use the skill in

problem solving.

p(T) Probability the skill will be learned at each opportunity to use the skill.

Two Performance Parameters

p(G) Probability the student will guess correctly if the skill is not known.

p(S) Probability the student will slip (make a mistake) if the skill is known.

Bayesian Knowledge Tracing



 Whenever the student has an opportunity to

use a skill, the probability that the student

knows the skill is updated using formulas

derived from Bayes’ Theorem.

Formulas

Knowledge Tracing



 How do we know if a knowledge tracing model is

any good?



 Our primary goal is to predict knowledge

Knowledge Tracing



 How do we know if a knowledge tracing model is

any good?



 Our primary goal is to predict knowledge



 But knowledge is a latent trait

Knowledge Tracing



 How do we know if a knowledge tracing model is

any good?



 Our primary goal is to predict knowledge



 But knowledge is a latent trait



 But we can check those knowledge predictions

by checking how well the model predicts

performance

Fitting a Knowledge-Tracing Model



 In principle, any set of four parameters can

be used by knowledge-tracing



 But parameters that predict student

performance better are preferred

Knowledge Tracing



 So, we pick the knowledge tracing parameters

that best predict performance



 Defined as whether a student’s action will be

correct or wrong at a given time



 Effectively a classifier

Recent Advances



 Recently, there has been work towards

contextualizing the guess and slip parameters

(Baker, Corbett, & Aleven, 2008a, 2008b)



 The intuition:

Do we really think the chance that an incorrect

response was a slip is equal when

 Student has never gotten action right; spends 78

seconds thinking; answers; gets it wrong

 Student has gotten action right 3 times in a row;

spends 1.2 seconds thinking; answers; gets it wrong

Recent Advances



 In this work, P(G) and P(S) are determined

by a model that looks at time, previous

history, the type of action, etc.



 Significantly improves predictive power of

method

 Probability of distinguishing correct from incorrect

increases by about 15% of potential gain

 To 71%, so still room for improvement

Uses



 Outside of EDM, can be used to drive tutorial

decisions



 Within educational data mining, there are

several things you can do with these models

Uses of Knowledge Tracing



 Often key components in models of other

constructs

 Help-Seeking and Metacognition (Aleven et al,

2004, 2008)

 Gaming the System (Baker et al, 2004, in press)

 Off-Task Behavior (Baker, 2007)

Uses of Knowledge Tracing



 If you want to understand a student’s

strategic/meta-cognitive choices, it is helpful

to know whether the student knew the skill



 Gaming the system means something

different if a student already knows the step,

versus if the student doesn’t know it



 A student who doesn’t know a skill should

ask for help; a student who does, shouldn’t

Uses of Knowledge Tracing



 Can be interpreted to learn about skills

Skills from the Algebra Tutor



skill L0 T



AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01



ApplyExponentExpandExponentsevalradicalE 0.333 0.497



CalculateEliminateParensTypeinSkillElimi 0.979 0.001



CalculatenegativecoefficientTypeinSkillM 0.953 0.001



Changingaxisbounds 0.01 0.01



Changingaxisintervals 0.01 0.01



ChooseGraphicala 0.001 0.306



combineliketermssp 0.943 0.001

Which skills could probably be

removed from the tutor?



skill L0 T



AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01



ApplyExponentExpandExponentsevalradicalE 0.333 0.497



CalculateEliminateParensTypeinSkillElimi 0.979 0.001



CalculatenegativecoefficientTypeinSkillM 0.953 0.001



Changingaxisbounds 0.01 0.01



Changingaxisintervals 0.01 0.01



ChooseGraphicala 0.001 0.306



combineliketermssp 0.943 0.001

Which skills could use better

instruction?



skill L0 T



AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01



ApplyExponentExpandExponentsevalradicalE 0.333 0.497



CalculateEliminateParensTypeinSkillElimi 0.979 0.001



CalculatenegativecoefficientTypeinSkillM 0.953 0.001



Changingaxisbounds 0.01 0.01



Changingaxisintervals 0.01 0.01



ChooseGraphicala 0.001 0.306



combineliketermssp 0.943 0.001

END



 This last example is a simple example of

Discovery with Models



 Tomorrow at 9:30am, we’ll discuss some

more complex examples


Related docs
Other docs by HC111130034054
FEE STRUCTURE FROM 2011 2012
Views: 1  |  Downloads: 0
Capo X: Del contratto di agenzia
Views: 0  |  Downloads: 0
ESTADO DE LA CERTIFICACION ISO EN EL MUNDO
Views: 13  |  Downloads: 0
142
Views: 0  |  Downloads: 0
Sheet1
Views: 4  |  Downloads: 0
ELL Listen4 5
Views: 0  |  Downloads: 0
strategi
Views: 5  |  Downloads: 0
Syllabus English II 2009 2010
Views: 16  |  Downloads: 0
press 00092
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!