Learning Center
Plans & pricing Sign in
Sign Out

Learning under concept drift_ an overview.pptx


									Learning under concept drift: an overview

               Zhimin He
               iTechs – ISCAS
p   What’s Concept Drift
p   Causes of a Concept Drift
p   Types of Concept Drift
p   Detecting and Handling Concept Drift
p   Implications for Software Engineering Research
p Prediction
  l           is a vector in p-dimensional feature space
    observed at time t and yt is the corresponding label.
  l We call Xt an instance and a pair (Xt; yt) a labeled
    instance. We refer to instances (X1; : : : ;Xt) as
    historical data and instance Xt+1 as target (or testing)
  l The task is to predict a label yt+1 for the target
    instance Xt+1.

p Concept Drift
  l Every instance Xt is generated by a source St.
  l If all the data is sampled from the same source, i.e. S1
    = S2 = : : : = St+1 = S we say that the concept is stable.
  l If for any two time points i and j Si != Sj , we say that
    there is a concept drift.
Causes of Concept Drift
p Let         is an instance in p-dimensional
  feature space.        , where c1, c2,….ck is the
  set of class labels.
p The optimal classier to classify           is
  determined by a prior probabilities for the
  classes P(ci) and the class-conditional
  probability density functions p(X | ci), i = 1,….k.
p Concept /data source:
   l a set of a prior probabilities of the classes and class-
     conditional pdf's:
Causes of Concept Drift (cont.)

p Concept drift may occur in three ways:
  l Class priors P(c) might change over time.
  l The distributions of one or several classes p(X|ci)
    might change. (virtual drift)
  l The posterior distributions of the class memberships
    p(ci|X) might change.(real drift)
Types of Concept Drift

p Types:
  l Sudden drift
  l Gradual drift
  l Incremental drift
  l reoccurring contexts
Detecting and Handling Concept Drift
p Detecting
  l Monitoring the raw data
  l Monitoring parameters of learners
  l Monitoring prediction errors of learners
p Handling
  l Ensemble learning
  l Instance selection
  l Instance weights
  l Training windows

  l Training windows are naturally suitable for sudden concept
    drift, while ensembles are more flexible in terms of change
Detecting and Handling Concept Drift (cont.)
p Overall solution for learning under concept drift
Implications for SE Research
p Concept drift is a fundamental issue for SE
  l Cost estimation, defect prediction…
  l Especially in the cross-company/cross-project context
  l Be harmful to performance of prediction models
p Detecting and handling concept drift is a
  challenging task!
  l Quality problems of SE data, e.g., insufficient data
  l Data generation context is highly unstable.
p Has become a increasingly popular research
  topic in SE field!
  l E.g., Burak Turhan [JESE 2012], Jayalath Ekanayake
    [MSR 2009, JESE 2011]
1.Indre Zliobaite, “Learning under Concept Drift  an
  Overview,” Tech-report, 2009
2.A. Dries and R. Ulrich, “Adaptive Concept Drift
  Detection,” Journal of Statictical Analysis and Data
  Mining, 2009
3.L. Minku, A. White, and X. Yao. “The impact of diversity
  on on-line ensemble learning in the presence of concept
  drift.” IEEE Transactions on Knowledge and Data
  Engineering, 2009
4.M. Kelly, D. Hand, and N. Adams. “The impact of
  changing populations on classier performance.”

To top