Analysis of Examination Results Data Using Various Mining Techniques
Document Sample


(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 8, August 2012
ANALYSIS OF EXAMINATION RESULTS
DATA USING VARIOUS MINING
TECHNIQUES
DEVENDRA SINGH RAJPOOT
Ph.D. Scholor , UIT, RGPV,Bhoapl (M.P.)
dsrphd@yahoo.com
Dr. Kanak Saxena Dr. Anubhuti Khare
Professor & Head, Associate Professor,
Computer Applications DoEC, UIT,
SATI, Vidisha (M.P.) RGPV,Bhopal (M.P.)
Kanak.saxena@gmail.com anubhutikhare@gmail.com
ABSTRACT
The paper comprises of various pattern system to mine different types of pattern. In
mining techniques from data mining such as the pattern analysis phase the mined patterns
statistical techniques, classification and which in great number to be evaluated.
clustering. The domain we have chosen is the Mining system is classified and explained.
university domain for the above entitled Commonly a mining system introduces three
thesis. The objective for choosing a university parts:
domain is, as educational data mining is an
emerging discipline concern with the (i) Data Preprocessing
developing method for the exploring the
(ii) Pattern Discovery
unique types of data that come from the
educational context. Due to an increasing (iii) Pattern Analysis
number of institutions and students' technical
educational institutions becoming increasingly General Mechanism
oriented to performance and their
measurement and an accordingly setting goals Data
and developing strategies for their
Pre-process
achievements [02]. This already happens in
Europe in Croatia, USA [01] but still lacking in Pattern Discovery
India. The pattern extracted after applying
mining techniques, clearly shows the impact Pattern Analysis
of subject contents in the students' career
Predict user behavior
with the variations in the examination policy.
INTRODUCTION:
DATA DESCRIPTION:
In our mining system the data preprocess is
There are about millions of data on students
the phase where data cleaned from noise by
who belongs to various courses, years,
overcoming the difficulties of recognizing
semesters etc. Among which we have taken a
students, semester, branch in order to be
sample of approx 2 lacs data, When we
used as input to the next phase of pattern
applied various analytical techniques we
discovery. In the pattern mining phase various
found the results of the analysis takes very
mining algorithms are incorporated into the
long time and every time we have to pre-
1
69 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 8, August 2012
process the data. Thus for simplicity we have Course Intake
taken a particular semester and a specific
range of year from 2004 to 2008 with only one BE (All discipline) 64430
course. The sample data comes out to be near B. Pharm. 5880
about 16574. For the complete analysis the
data are chosen from the university which MCA 5980
consists of total attributes 154. Applying the
B. Arch. 300
mining algorithms on the complete data the
problems of execution due to the constraints Table 1. Shows total intake of students of Technical
University in the year 2008.
of computer system exist. Thus we reduce our
data set with approx 16574. No doubt the
system accumulates vast amount information
which is very valuable for analyzing the
student behavior and could create valuable
information to the educational system but as
discussed earlier, for mining the entire data
would not be possible. Hence the data which
consider for the valuation is consisting of
Engineering III Semester (All disciplines) since
the year 2004 to 2008. The interest for
performance indicators in the technical
education has become extremely high as the
reason for this lies in the relevant political and
social changes in the recent years Figure 1.Shows total intake of students of Technical
[03,04,05,06,07,08,09,10]. University in the year 2008 with the help of pie chart.
WORK DONE: PROPOSED METHOD:
Data mining is the process of efficient With the increase in demand of technology
discovery of non-obvious valuable pattern interest towards technical field is increasing
from a large collection of data [11]. To day by day due to which students are taking
comprehend better the student’s behavior, admission in engineering. As compared to
statistical data processing will be performed. other courses job opportunities are more in
In the first segment, graphs will be used to the engineering field. The above figure no.1
present the basic information on the structure shows the number of students took admission
of the student’s data and second segment the in engineering for which it is clearly
analysis will be carried out by using various understood that interest of students in
regression techniques. engineering is more compared to other
courses. B Pharmacy is less in demand due to
For this work we use weka 3.6.2 because of its less number of colleges, limited seats and less
important characteristics [12]: job opportunities in this field. Admission in
MCA is less because now a day’s students
(i) Free Software System which is
prefers to do other courses such as B.Tech.
implemented in the Java interface.
and M.Tech. after bachelor degree of
(ii) Open source software that provides a
engineering due to number of seats increase.
collection of machine learning and data
Least admissions are in B. Arch because
mining algorithms.
students interested in this field choose civil
(iii) The algorithms and routines can be
engineering as their subject, so admissions in
modified using the same programming
this field are less.
language.
2
70 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 8, August 2012
Std_appe Overall while in year 2005, 4130 students pass out of
Exam_Yr Std_pass Result %
ar Result %
8148 and 4318 students are failing, year 2006,
3992 students pass out of 9484, year 2007,
2004 7559 2840 37.57 43.64
6473 students pass out of 15944 and year
2008, 10475 students pass out of 17731.
2005 8148 4130 50.68 52.35
For this we have used the classification
techniques a classifier is a mapping from X to
2006 9484 3992 42.09 49.78 a discrete set of labels Y [13]. These analyses
predict the class label which is based on
2007 15944 6473 40.59 43.51 supervised learning and provides a collection
of labeled i.e. Pre classified pattern. The
classification has been used for discovering
2008 17731 10475 59.07 52.18 the students' behavior which similar
characteristics and reaction to a specific
Table 2.: Shows number of students in Engineering and
their result from 2004 to 2008.
pedagogical strategies [14], predicting
students' performance [15] as well as the
relevance of the examination paper in a
semester (Regular as well as back papers)
20000 involved.
17731
18000
15944
16000 Correctly Incorrectly
Classification Mode of
Classified Classified
14000 Method Test
Instances Instances
12000
9484 Exam_Yr
10000 8148
7559 10 fold 14732 518
8000 Std_app_301
6000
Decision 75%
4000 3697 124
Table splitting
2000
0 Training
14768 482
1 2 3 4 5 set
Figure 2. Shows number of students in Engineering since 10 fold 14570 680
2004 to 2008.
RESULT DISCUSSIONS: 75%
REPtree 3657 164
splitting
Due to increase in engineering
colleges as well as an increase in intake in the Training
14570 680
state, Number of students appearing in exams set
are also increasing. As per the table no.2.
Table 3. Correctly classified and incorrectly classified
Number of students appeared and the instances on different classification methods and mode of
number of students passed in these exams the test.
have also shown the trend in decreased of
overall results with every year. After analysis
We have performed total 6 classification
we found that failure rate is more than pass
experiments on the university data, Decision
rate in more students are failing to clear the
Table & REPtree method with three different
subject of Mathematics-III. In year 2004, 7559
Test Mode ( 10 Cross Fold, 75% split, Full
students were appeared in the examination
training set). Which is shown in table No.3 and
and 2840 are successful to clear and 4719
figure No.3.
students are failing in Mathematics-III, like
3
71 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 8, August 2012
Figure 3. Decision Table & REPtree method with three Figure 4. Kappa Statistics on different classification
different Test Mode methods and mode of test.
Decision table classification methods Decision table classification methods calculate
classify correctly the highest number of the highest kappa statistics 0.9388. Kappa is a
instances 14768, while data size (16000 x 27) measure of agreement normalized for chance
is taken as training set. REPtree classification agreement.
methods classify correctly the lowest number
of instances 14570, while data size (16000 x
27) is taken as training set.
P(A) - P(E)
K = ---------------
1 - P(E)
Classification Mode of Kappa
Method Test Statistics
Where P (A) is the percentage agreement
10 fold 0.9343 (e.g., Between your classifier and ground
truth) and P(E) is the chance agreement.
75% 0.9369
Decision
splitting K=1 indicates perfect agreement,
Table
Training 0.9388 K=0 indicates chance agreement.
set
Kappa is a chance-corrected measure of
10 fold 0.9128
agreement between the classifications and
75% 0.9157 the true classes. It's calculated by taking the
REPtree splitting agreement expected by chance away from the
observed agreement and dividing by the
Training 0.9128 maximum possible agreement. A value greater
set
than 0 means that your classifier is doing
Table 4. Kappa Statistics on different classification better than chance.
methods and mode of test.
4
72 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 8, August 2012
Institution to Knowledge Business, Edward Elgar
weighted average recall
Classification Method
Publishing, Inc., Massachusetts
weighted average TP
weighted average FP
Time Taken (second)
weighted average F-
weighted average
Mode of Test
Precision
Measure
[05] NCVVO (2009):Vodič za provedbu samovrjednovanja
rate
rate
u osnovnim školama, Nacionalni centar za vanjsko
vrednovanje obrazovanja, Zagreb
[06] Vašiček, V., Budimir, V., Letinić, S. (2007): Pokazatelji
uspješnosti u visokom obrazovanju, Privredna
10 kretanja i ekonomska politika, 17 (110): str. 51 - 80.
0.966
0.025
0.968
0.966
27.98
0.96
fold
[07] Orsingher, Ch. (Ed.) (2006): Assessing Quality in
Decision Table
75% European Higher Education Institutions:
Dissemination, Methods and Procedures, Physica-
0.968
0.024
0.969
0.968
0.962
29.06
splitti
Verlag: Springer,
ng
Heidelberg
Train [08] Knust, M., Hanft, A. (Ed.) (2009): Continuing Higher
0.968
0.023
0.968
0.963
27.77
ing
0.97
Education and Lifelong Learning: An International
set Comparative Study on Structures, Organisation and
Provisions, Springer Science & Business Media,
Heidelberg
10
0.955
0.033
0.955
0.937
36.24
0.92
fold [09] Deem, R., Hillyard, S., Reed, M. (2007): Knowledge,
Higher Education, and the New Managerialism: The
75% Changing Management of UK Universities, Oxford
REPtree
University Press Inc., New York
0.957
0.031
0.922
0.957
0.939
38.08
splitti
ng
[10] Michael, S. O., Kretovics, M. A. (Ed.) (2005):
Financing Higher Education in a Global Market,
Train Algora Publishing, New York
0.955
0.033
0.955
0.937
37.23
ing
0.92
set [11] Klosgen, W., & Zytkow, J. (2002). Handbook of data
mining and knowledge discovery. New York: Oxford
Table 5. Classification Factors of Decision Table, REPtree University Press.
on different test mode
[12] Witten, I. H., & Frank, E. (2005). Data mining:
Practical machine learning tools and techniques.
CONCLUSIONS AND FUTURE WORK:
Morgan Kaufman.
In this work analysis of examination data has [13] Duda, R. O., Hart, P. E., & Stork, D. G. (2000).
been done. Classification of data has been Pattern classification. Wiley Interscience.
done using Decision table and REPtree and [14] Chen, G., Liu, C., Ou, K., & Liu, B. (2000).
Kappa statistics has played its own role. Work Discovering decision knowledge from web log
done has been compared with the help of well portfolio for managing classroom processes by
applying decision tree and data cube technology.
known tool, which shows good results. In Journal of Educational Computing Research, 23(3),
future some more data will be taken to 305–332.
anylaysed results. [15] Minaei-Bidgoli, B., & Punch, W. (2003). Using
genetic algorithms for data mining optimization in
REFERENCES: an educational web-based system. In Genetic and
evolutionary computation conference, Chicago,
[01] Al-Hawaj, A. Y., Elali, W., Twizell, E. H. (Ed.) (2008): USA (pp. 2252–2263).
Higher Education in the Twenty-First Century: Issues
and Challenges, Taylor & Francis Group, London
[02] Pausits, A., Pellert, A. (2007): Higher Education
Management and Development in Central, Southern
and Eastern Europe, WAXMANN Verlag, Munster
[03] GFME (2008): The Global Management Education
Landscape: Shaping the future of business schools,
Global Foundation for Management Education
[04] McKelvey, M., Holmén, M. (Ed.) (2009): Learning to
Compete in European Universities: From Social
5
73 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Related docs
Other docs by ijcsiseditor
Digital Images Encryption in Spatial Domain Based on Singular Value Decomposition and Cellular Automata
Views: 0 | Downloads: 0
Agent Behavior in Multiagent Systems: Issues and Challenges in Design, Development and Implementation
Views: 1 | Downloads: 0
Optimizing Cost, Delay, Packet Loss and Network Load in AODV Routing Protocols
Views: 2 | Downloads: 0
Get documents about "