Analysis of Examination Results Data Using Various Mining Techniques

Document Sample
Analysis of Examination Results Data Using Various Mining Techniques Powered By Docstoc
					                                   (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                    Vol. 10, No. 8, August 2012

                                   DEVENDRA SINGH RAJPOOT
                               Ph.D. Scholor , UIT, RGPV,Bhoapl (M.P.)

Dr. Kanak Saxena                                                              Dr. Anubhuti Khare
Professor & Head,                                                             Associate Professor,
Computer Applications                                                         DoEC, UIT,
SATI, Vidisha (M.P.)                                                          RGPV,Bhopal (M.P.)                                              


The paper comprises of various pattern                    system to mine different types of pattern. In
mining techniques from data mining such as                the pattern analysis phase the mined patterns
statistical techniques, classification and                which in great number to be evaluated.
clustering. The domain we have chosen is the              Mining system is classified and explained.
university domain for the above entitled                  Commonly a mining system introduces three
thesis. The objective for choosing a university           parts:
domain is, as educational data mining is an
emerging discipline concern with the                         (i)     Data Preprocessing
developing method for the exploring the
                                                             (ii)    Pattern Discovery
unique types of data that come from the
educational context. Due to an increasing                    (iii)   Pattern Analysis
number of institutions and students' technical
educational institutions becoming increasingly                           General Mechanism
oriented to performance and their
measurement and an accordingly setting goals                                      Data
and developing strategies for their
achievements [02]. This already happens in
Europe in Croatia, USA [01] but still lacking in                          Pattern Discovery
India. The pattern extracted after applying
mining techniques, clearly shows the impact                                Pattern Analysis
of subject contents in the students' career
                                                                        Predict user behavior
with the variations in the examination policy.

                                                          DATA DESCRIPTION:
In our mining system the data preprocess is
                                                          There are about millions of data on students
the phase where data cleaned from noise by
                                                          who belongs to various courses, years,
overcoming the difficulties of recognizing
                                                          semesters etc. Among which we have taken a
students, semester, branch in order to be
                                                          sample of approx 2 lacs data, When we
used as input to the next phase of pattern
                                                          applied various analytical techniques we
discovery. In the pattern mining phase various
                                                          found the results of the analysis takes very
mining algorithms are incorporated into the
                                                          long time and every time we have to pre-


                                                                                   ISSN 1947-5500
                                    (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                     Vol. 10, No. 8, August 2012

process the data. Thus for simplicity we have                Course                                           Intake
taken a particular semester and a specific
range of year from 2004 to 2008 with only one                BE (All discipline)                              64430
course. The sample data comes out to be near                 B. Pharm.                                          5880
about 16574. For the complete analysis the
data are chosen from the university which                    MCA                                                5980
consists of total attributes 154. Applying the
                                                             B. Arch.                                            300
mining algorithms on the complete data the
problems of execution due to the constraints               Table 1. Shows total intake of students        of Technical
                                                           University in the year 2008.
of computer system exist. Thus we reduce our
data set with approx 16574. No doubt the
system accumulates vast amount information
which is very valuable for analyzing the
student behavior and could create valuable
information to the educational system but as
discussed earlier, for mining the entire data
would not be possible. Hence the data which
consider for the valuation is consisting of
Engineering III Semester (All disciplines) since
the year 2004 to 2008. The interest for
performance indicators in the technical
education has become extremely high as the
reason for this lies in the relevant political and
social changes in the recent years                         Figure 1.Shows total intake of students of Technical
[03,04,05,06,07,08,09,10].                                 University in the year 2008 with the help of pie chart.

WORK DONE:                                                 PROPOSED METHOD:

          Data mining is the process of efficient          With the increase in demand of technology
discovery of non-obvious valuable pattern                  interest towards technical field is increasing
from a large collection of data [11]. To                   day by day due to which students are taking
comprehend better the student’s behavior,                  admission in engineering. As compared to
statistical data processing will be performed.             other courses job opportunities are more in
In the first segment, graphs will be used to               the engineering field. The above figure no.1
present the basic information on the structure             shows the number of students took admission
of the student’s data and second segment the               in engineering for which it is clearly
analysis will be carried out by using various              understood that interest of students in
regression techniques.                                     engineering is more compared to other
                                                           courses. B Pharmacy is less in demand due to
For this work we use weka 3.6.2 because of its             less number of colleges, limited seats and less
important characteristics [12]:                            job opportunities in this field. Admission in
                                                           MCA is less because now a day’s students
(i) Free Software System which is
                                                           prefers to do other courses such as B.Tech.
      implemented in the Java interface.
                                                           and M.Tech. after bachelor degree of
(ii) Open source software that provides a
                                                           engineering due to number of seats increase.
      collection of machine learning and data
                                                           Least admissions are in B. Arch because
      mining algorithms.
                                                           students interested in this field choose civil
(iii) The algorithms and routines can be
                                                           engineering as their subject, so admissions in
      modified using the same programming
                                                           this field are less.


                                                                                       ISSN 1947-5500
                                                  (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                   Vol. 10, No. 8, August 2012

                   Std_appe                                 Overall      while in year 2005, 4130 students pass out of
  Exam_Yr                         Std_pass    Result %
                      ar                                    Result %
                                                                         8148 and 4318 students are failing, year 2006,
                                                                         3992 students pass out of 9484, year 2007,
   2004             7559          2840        37.57          43.64
                                                                         6473 students pass out of 15944 and year
                                                                         2008, 10475 students pass out of 17731.
   2005             8148          4130        50.68          52.35
                                                                         For this we have used the classification
                                                                         techniques a classifier is a mapping from X to
   2006             9484          3992        42.09          49.78       a discrete set of labels Y [13]. These analyses
                                                                         predict the class label which is based on
   2007            15944          6473        40.59          43.51       supervised learning and provides a collection
                                                                         of labeled i.e. Pre classified pattern. The
                                                                         classification has been used for discovering
   2008            17731          10475       59.07          52.18       the students' behavior which similar
                                                                         characteristics and reaction to a specific
Table 2.: Shows number of students in Engineering and
their result from 2004 to 2008.
                                                                         pedagogical strategies [14], predicting
                                                                         students' performance [15] as well as the
                                                                         relevance of the examination paper in a
                                                                         semester (Regular as well as back papers)
  20000                                                                  involved.
  16000                                                                                                  Correctly     Incorrectly
                                                                           Classification   Mode of
                                                                                                         Classified     Classified
  14000                                                                       Method         Test
                                                                                                         Instances      Instances
                              9484                       Exam_Yr
  10000              8148
            7559                                                                            10 fold       14732           518
   8000                                                  Std_app_301
                                                                             Decision         75%
   4000                                                                                                    3697           124
                                                                              Table         splitting
      0                                                                                     Training
                                                                                                          14768           482
            1         2       3        4      5                                               set

Figure 2. Shows number of students in Engineering since                                     10 fold       14570           680
2004 to 2008.

RESULT DISCUSSIONS:                                                                           75%
                                                                             REPtree                       3657           164
         Due to increase in engineering
colleges as well as an increase in intake in the                                            Training
                                                                                                          14570           680
state, Number of students appearing in exams                                                  set
are also increasing. As per the table no.2.
                                                                         Table 3. Correctly classified and incorrectly classified
Number of students appeared and the                                      instances on different classification methods and mode of
number of students passed in these exams                                 the test.
have also shown the trend in decreased of
overall results with every year. After analysis
                                                                         We have performed total 6 classification
we found that failure rate is more than pass
                                                                         experiments on the university data, Decision
rate in more students are failing to clear the
                                                                         Table & REPtree method with three different
subject of Mathematics-III. In year 2004, 7559
                                                                         Test Mode ( 10 Cross Fold, 75% split, Full
students were appeared in the examination
                                                                         training set). Which is shown in table No.3 and
and 2840 are successful to clear and 4719
                                                                         figure No.3.
students are failing in Mathematics-III, like


                                                                                                    ISSN 1947-5500
                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                              Vol. 10, No. 8, August 2012

Figure 3. Decision Table & REPtree method with three                Figure 4. Kappa Statistics on different classification
different Test Mode                                                 methods and mode of test.

     Decision table classification methods                          Decision table classification methods calculate
classify correctly the highest number of                            the highest kappa statistics 0.9388. Kappa is a
instances 14768, while data size (16000 x 27)                       measure of agreement normalized for chance
is taken as training set. REPtree classification                    agreement.
methods classify correctly the lowest number
of instances 14570, while data size (16000 x
27) is taken as training set.
                                                                                                  P(A) - P(E)
                                                                                   K    =        ---------------
                                                                                                   1 - P(E)

 Classification         Mode of              Kappa
    Method               Test               Statistics
                                                                    Where P (A) is the percentage agreement
                      10 fold                0.9343                 (e.g., Between your classifier and ground
                                                                    truth) and P(E) is the chance agreement.
                      75%                    0.9369
                      splitting                                     K=1 indicates perfect agreement,
                      Training               0.9388                 K=0 indicates chance agreement.
                                                                    Kappa is a chance-corrected measure of
                      10 fold                0.9128
                                                                    agreement between the classifications and
                      75%                    0.9157                 the true classes. It's calculated by taking the
     REPtree          splitting                                     agreement expected by chance away from the
                                                                    observed agreement and dividing by the
                      Training               0.9128                 maximum possible agreement. A value greater
                                                                    than 0 means that your classifier is doing
Table 4. Kappa Statistics on different classification               better than chance.
methods and mode of test.


                                                                                                 ISSN 1947-5500
                                                                                                                                 (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                                                                  Vol. 10, No. 8, August 2012

                                                                                                                                                                                     Institution to Knowledge Business, Edward Elgar

                                                                                                       weighted average recall
 Classification Method
                                                                                                                                                                                     Publishing, Inc., Massachusetts

                                        weighted average TP

                                                              weighted average FP

                                                                                                                                                          Time Taken (second)
                                                                                                                                    weighted average F-
                                                                                    weighted average
                         Mode of Test


                                                                                                                                                                                [05] NCVVO (2009):Vodič za provedbu samovrjednovanja


                                                                                                                                                                                     u osnovnim školama, Nacionalni centar za vanjsko
                                                                                                                                                                                     vrednovanje obrazovanja, Zagreb

                                                                                                                                                                                [06] Vašiček, V., Budimir, V., Letinić, S. (2007): Pokazatelji
                                                                                                                                                                                     uspješnosti u visokom obrazovanju, Privredna
                           10                                                                                                                                                        kretanja i ekonomska politika, 17 (110): str. 51 - 80.





                                                                                                                                                                                [07] Orsingher, Ch. (Ed.) (2006): Assessing Quality in
Decision Table

                          75%                                                                                                                                                        European      Higher     Education    Institutions:
                                                                                                                                                                                     Dissemination, Methods and Procedures, Physica-





                                                                                                                                                                                     Verlag: Springer,

                         Train                                                                                                                                                  [08] Knust, M., Hanft, A. (Ed.) (2009): Continuing Higher




                                                                                                                                                                                     Education and Lifelong Learning: An International
                          set                                                                                                                                                        Comparative Study on Structures, Organisation and
                                                                                                                                                                                     Provisions, Springer Science & Business Media,





                          fold                                                                                                                                                  [09] Deem, R., Hillyard, S., Reed, M. (2007): Knowledge,
                                                                                                                                                                                     Higher Education, and the New Managerialism: The
                          75%                                                                                                                                                        Changing Management of UK Universities, Oxford

                                                                                                                                                                                     University Press Inc., New York






                                                                                                                                                                                [10] Michael, S. O., Kretovics, M. A. (Ed.) (2005):
                                                                                                                                                                                     Financing Higher Education in a Global Market,
                         Train                                                                                                                                                       Algora Publishing, New York






                          set                                                                                                                                                   [11] Klosgen, W., & Zytkow, J. (2002). Handbook of data
                                                                                                                                                                                     mining and knowledge discovery. New York: Oxford
Table 5. Classification Factors of Decision Table, REPtree                                                                                                                           University Press.
on different test mode
                                                                                                                                                                                 [12] Witten, I. H., & Frank, E. (2005). Data mining:
                                                                                                                                                                                      Practical machine learning tools and techniques.
                                                                                                                                                                                      Morgan Kaufman.

In this work analysis of examination data has                                                                                                                                    [13] Duda, R. O., Hart, P. E., & Stork, D. G. (2000).
been done. Classification of data has been                                                                                                                                            Pattern classification. Wiley Interscience.
done using Decision table and REPtree and                                                                                                                                        [14] Chen, G., Liu, C., Ou, K., & Liu, B. (2000).
Kappa statistics has played its own role. Work                                                                                                                                        Discovering decision knowledge from web log
done has been compared with the help of well                                                                                                                                          portfolio for managing classroom processes by
                                                                                                                                                                                      applying decision tree and data cube technology.
known tool, which shows good results. In                                                                                                                                              Journal of Educational Computing Research, 23(3),
future some more data will be taken to                                                                                                                                                305–332.

anylaysed results.                                                                                                                                                               [15] Minaei-Bidgoli, B., & Punch, W. (2003). Using
                                                                                                                                                                                      genetic algorithms for data mining optimization in
REFERENCES:                                                                                                                                                                           an educational web-based system. In Genetic and
                                                                                                                                                                                      evolutionary computation conference, Chicago,
[01] Al-Hawaj, A. Y., Elali, W., Twizell, E. H. (Ed.) (2008):                                                                                                                         USA (pp. 2252–2263).
     Higher Education in the Twenty-First Century: Issues
     and Challenges, Taylor & Francis Group, London

[02] Pausits, A., Pellert, A. (2007): Higher Education
     Management and Development in Central, Southern
     and Eastern Europe, WAXMANN Verlag, Munster

[03] GFME (2008): The Global Management Education
     Landscape: Shaping the future of business schools,
     Global Foundation for Management Education

[04] McKelvey, M., Holmén, M. (Ed.) (2009): Learning to
     Compete in European Universities: From Social


                                                                                                                                                                                                             ISSN 1947-5500

Shared By: