Data Mining Proposal (PDF download)

Document Sample
Data Mining Proposal (PDF download) Powered By Docstoc
					         AAAI workshop proposal on Educational Data Mining

Workshop topic
This workshop will bring together researchers interested in educational data mining—using
artificial intelligence and statistical analysis to answer questions about how people learn. One
important issue we will address in this workshop is how researchers can learn from the data
collected by computer tutors and standardized tests, including a learner’s responses to questions,
mouse clicks, and mouse movements recorded at a fine time-scale. Computer tutors are now
generating data at a pace exceeding the ability of researchers to analyze it. Tutors often gather up
to a year’s worth of instruction with hundreds or thousands of users. The challenge is to
determine which tools are most appropriate for making sense of these data and to discover which
investigations we can make with a goal of understanding student learning at a fine-grained level.

This workshop will focus on tools and techniques for educational data mining and ask questions
such as: Which AI techniques are most appropriate, e.g., from information retrieval, data mining
or machine learning.? What are the limitations of each tool? Educational data mining data from
computer tutors differs from classic data mining in that the software (computer tutor) has
probably been explicitly instrumented to make mining easier, and there are sometimes strong,
pre-existing psychological theories of how people learn can augment the techniques. Do these
advantages provide us with any new capabilities?

Although this workshop will accept submissions from all areas of educational data mining, the
primary focus of the workshop will be on techniques for modeling student performance.
Specifically we are interested in approaches that have not been used before, and in comparing
known approaches with competing techniques. For example, at the AAAI 20005 workshop on
this topic, there was interesting discussion comparing the assumptions of q-matrices and standard
factor analysis. The other specific area of emerging interest is privacy of student’s records. One
problem of sharing student data within the community is student confidentiality. Therefore, the
workshop is interested in both protocols for recording data so as to prevent identifying students
(but still retain capabilities of merging records from different sources) as well as best practices for
storing information on centralized servers to prevent unauthorized access of data.

Why the topic is interesting now
Educational data mining for studying learning processes is relevant now due to the scaling up of
the number of students using intelligent computer tutors. When the field of computer-based
education was new, the main challenges of artificial intelligence were to construct approaches to
plan tutorial interactions and to build a model of the student’s competencies. Few students used
such systems, and controlled studies were of brief duration and had relatively few users. In the
present, studies involving computer tutors have scaled up in scope both longitudinally and in the
number of users. This increase in scale has created a problem: what to do with the data? For the
first time we have the ability to answer educational questions about how to best teach individual
students, or to answer subtle questions about learning. The missing ingredient is the
computational toolkit to organize, visualize, and learn from the data.

This topic is of increasing interest to educational research communities as well. At the past two
IERI (Interagency Educational Research Initiative) PI meetings there have been sessions on
Educational Data Mining. At the 2004 meeting there was a breakout session on the topic. At the
2005 session there was panel. IERI researchers have (for the most part) not connected with the
existing educational data mining research communities. Given the current and increasing interest
in the IERI community, a workshop next summer would be well timed.

Educational data mining at the institutional level is also of particular interest now because of
increasing emphasis on educational standards. These standards have led to an increased
repository of data on standardized tests. These data do not lend themselves to traditional
analyses. Data mining is particularly suited to solving problems in the educational field, where
standard assumptions rarely apply.

Workshop format
We expect that one-day is an appropriate length for the workshop. We will have an invited
speaker, and a panel discussion on new directions and applications of educational data mining.
Likely panel members are those in the IERI community who would not normally attend an
artificial intelligence conference. We will solicit papers for presentation at the workshop. The
advantage of the workshop is that it permits an in-depth presentation of the subject matter, with
substantial time for comments afterwards. Therefore, we will have fewer presentations, but more
time for each presentation. Our proposed format is to budget 30 minutes for a presentation, with
10 of those minutes devoted to questions. We will group papers by themes, and have a discussant
lead a session that compares/contrasts the papers in the area and discusses remaining open issues.
We used this format at the AAAI 2005 workshop on this topic and it worked quite well. If we
receive many strong submissions, we will have a poster session to allow informal discussion
among researchers working on related problems.

Organizing committee
Joseph E. Beck
Phone: 412 268 5726; Fax: 412 268 6436
Postal: NSH 4215
        Carnegie Mellon University
        5000 Forbes Avenue
        Pittsburgh, PA 15213

Joseph Beck is faculty at the Center for Automated Learning and Discovery at Carnegie Mellon
University. He has a Ph.D. in computer science, and works in intelligent tutoring systems,
student modeling, and specializes in educational data mining. He has organized and chaired
workshops on applying machine learning approaches to improving computer tutors at ITS2000,
ITS2004, and AAAI2005.

Tiffany Barnes
Phone: 704 687 6403
Postal: Department of Computer Science
         University of North Carolina at Charlotte
        9201 University City Blvd.
       Charlotte, NC 28223
Tiffany Barnes is an Assistant Professor in the Department of Computer Science at the University
of North Carolina at Charlotte. She received her PhD in Computer Science from North Carolina
State University in December 2003. Her PhD work investigated several aspects of the Q-matrix
method, an innovative tool for data mining and understanding student knowledge and augmenting
tutorial systems to be adaptive. Dr. Barnes's research interests include Artificial Intelligence,
Bioinformatics, Human-Computer Interaction, Data Mining and KDD, Diversity in Technology,
and the use of Technology in Education.

Esma Aimeur
Phone: +1 (514) 343-6794; Fax: +1 (514) 343-5834
Université de Montréal
Département d'informatique et de recherche opérationnelle
Pavillon André-Aisenstadt
C.P. 6128, Succ. Centre-Ville, Montréal (QC)
H3C 3J7 Canada

Esma Aimeur is Associate Professor at the department of Computer Science and Operations
Research of the University of Montreal. She received her Masters Degree in 1990 and her PhD
degree in 1994 from the University of Paris 6 in the field of Artificial Intelligence. Her research
domains are numerous, including: Artificial Intelligence (Machine Learning, Knowledge
Acquisition, Case-Based Reasoning …), Intelligent Tutoring Systems (Curriculum, Pedagogical
Strategies, Learner Model …) and Electronic Commerce. She published about one hundred
papers (refereed international journals and international conferences) and she is and has been
member of the program and organizational committees of many international conferences.

Possible attendees
The first AAAI workshop on Educational Data Mining (at AAAI2005) had about two dozen
attendees. We’re hoping for higher attendance this year. One reason attendance was lower was a
similar workshop was held at approximately the same time in Europe; as a result there were no
European attendees at the workshop (and some U.S. members couldn’t make the trip to Pittsburgh
due to tight scheduling with the European conference). This year there is no such workshop in
Europe, so attendance should be better. Pre-registrants for last year’s workshop were Andrew
Arnold, Ryan Baker, Tiffany Barnes, Joseph E. Beck, Ted Carmichael, Hao Cen, Kai-min Chang,
Mingyu Feng, Joao Furtado, Janice Gobert, Paul Horwitz, Jeff Johns, Tara Madhysatha, Jack
Mostow, Jiang Su, Titus Winters, and Weixiong Zhang (there were additional attendees at the
workshop). The 2005 workshop did a good job of bringing in attendees who would not have
otherwise attended the AAAI conference (based on polling those there) and who were not aware
of existing community for this work. One goal of the 2006 workshop is to get higher
participation from IERI members. At the past IERI PI meeting (August 2005) there was a panel
on Educational Data Mining that was well received.

Description: Data Mining Proposal document sample