Docstoc

Analysis of Educational Web Pattern Using Adaptive Markov Chain For Next Page Access Prediction

Document Sample
Analysis of Educational Web Pattern Using Adaptive Markov Chain For Next Page Access Prediction Powered By Docstoc
					                                              (IJCSIS) International Journal of Computer Science and Information Security,
                                              Vol. 9, No. 7, July 2011



   Analysis of Educational web pattern using
    Adaptive Markov Chain for Next page
               Access Prediction
Harish Kumar,                                                  Dr. Anil Kumar Solanki
PhD scholar, Mewar University.                                 MIET Meerut.
Chittorgarh.




                    ABSTRACT                                   The IT revolution is the fastest emerging
                                                               revolution seen by the human race. The Internet
The Internet grows at an amazing rate as an
                                                               surpasses       online        education,      Web       based
information gateway and as a medium for
                                                               information and volume of click the web site has
business and education industry. Universities
                                                               reached at huge proportions. Internet and the
with web education rely on web usage analysis
                                                               common use of educational databases have
to obtain students behavior for web marketing.
                                                               formed huge need for KDD methodologies. The
Web Usage Mining (WUM) integrates the
                                                               Internet is an infinite source of data that can
techniques of two popular research fields - Data
                                                               come either from the Web content, represented
Mining and the Internet. Web usage mining
                                                               by the billions of pages publicly available, or
attempts to discover useful knowledge from the
                                                               from the Web usage, represented by the log
secondary data (Web logs). These useful data
                                                               information daily collected by all the servers
pattern are use to analyze visitors activities in
                                                               around        the     world[1][2].      The      information
the web sites. So many servers manage their
                                                               collection through data mining has allowed E-
cookies for distinguishing server address. User
                                                               education Applications to make more revenues
Navigation pattern are in the form of web logs
                                                               by being able to better use of the internet that
.These Navigation patterns are refined and
                                                               helps    students        to      make     more     decisions.
resized and modeled as a new format. This
                                                               Knowledge Discovery and Data Mining (KDD)
method is known as “Loginizing”. In this paper
                                                               is an interdisciplinary area focusing upon
we study the navigation pattern from web usage
                                                               methodologies for mining useful information or
and modeled as a Markov Chain. This chain
                                                               knowledge from data [1]. Users leave navigation
works on higher probability of usage .Markov
                                                               traces, which can be pulled up as a basis for a
chain is modeled for the collection of navigation
                                                               user behavior analysis. In the field of web
a pattern and used for finding the most likely
                                                               applications          similar     analyses       have   been
used navigation pattern for a web site.
                                                               successfully executed by methods of Web Usage
Keyword: Web mining, web usage, web logs,                      Mining [2] [3]. The challenge of extracting
Markov Chain.                                                  knowledge from data draws upon research in
                                                               statistics,         databases,     pattern       recognition,
INTRODUCTION:
                                                               machine             learning,      data       visualization,
                                                               optimization, web user behavior and high-



                                                       124                                       http://sites.google.com/site/ijcsis/
                                                                                                 ISSN 1947-5500
                                                 (IJCSIS) International Journal of Computer Science and Information Security,
                                                 Vol. 9, No. 7, July 2011

performance computing, to deliver advanced                        web site we need to assign different threshold
business    intelligence   and    web     discovery               value.
solutions[3][4]. It is a powerful technology with                 Important properties of Markov Chain:
great potential to help various industries focus             1.   Markov Chain is successful in sequence
on the most important information in their data                   matching generation.
warehouses. Data mining can be viewed as a                   2.   Markov model is depending on previous state.
result of the natural evolution of information               3.   Markov Chain model is Generative.
technology. In Web usage analysis, these data                4.   Markov Chain is a discrete – time stochastic
are the sessions of the site visitors: the activities             process
performed by a user from the moment he enters
the site until the moment he leaves it. Web                       Due to the generative nature of Markov chain,
usage mining consists on applying data mining                     navigation tours can automatically derived.
techniques for analyzing web user’s activity. In                  Sarukkai proposed a technique ho Markov
educational contexts, it has been used for                        model predict the next page accessed page by
personalizing       e-learning     and     adapting               the      user[4][2].   Pitkow    and     Deshpande
educational hypermedia, discovering potential                     ,Dongshan       and    Junyi    proposed      various
browsing problems, automatic recognition of                       techniques for log mining using               Makov
learner    groups     in   exploratory      learning              Model[5][2]
environments or predicting student performance.
The discovered patterns are usually represented                   METHODOLOGY:
as collection of web pages, objects or resources                  This Markov model is an easiest way of
that are frequently accessed by groups of users                   representing navigation patterns and navigation
with common needs or interests [10][11].                          tree. Suppose we have an e web site of a
Generally user visit a web site in sequential                     university.
nature means user visit first home page then                      Navigation pattern sequences are
second page and then third and then finish his               1.   ABCDEF
work with this user leaves his navigation marks              2.   ACF
on a server. These navigation marks are called               3.   ACE
navigation pattern that can be used to decide the            4.   BCD
next likely web page request based on                                   Navigation Pattern          Frequency
                                                                                                    of visit
significantly statistical correlations. If that
                                                                        SABCDEFT                    3
sequence is occurring very frequently then this                         SACFT                       2
sequence indicated most likely traversal pattern.                       SACET                       3
                                                                        SBCDT                       2
If this pattern occurs sequentially, Makov chains                       Total No of web             10
have been used to represent navigation pattern                          site navigate
of the web site. This is because in Markov chain
                                                                           Table 1: Navigation pattern table
present state is depending on previous state. If a
web site contains more navigation pattern
(“Interesting Pattern”) high supporting threshold
is assign to it and less interesting patterns are
ignored. So we can say that at different level of




                                                          125                                http://sites.google.com/site/ijcsis/
                                                                                             ISSN 1947-5500
                                               (IJCSIS) International Journal of Computer Science and Information Security,
                                               Vol. 9, No. 7, July 2011

                                                                So we can identify that total probability of visit
                                                                of A is 8/39, B is 5/39, C is 10/39, D is 5/39, E
                                                                is 6/39 and F is 5/39.Here NPi j is a navigation
                                                                probability matrix where NP is the probability
                                                                where next stage will be j. Navigation
                                                                probability is defined as

                                                                                     NPi j      0,1

                                                                And for all j NPi j =1. The initial probability of a
                                                                state is estimated as the how many number of
The probability of transition is calculated by the              times a page was requested by user so we can
ratio of the number of times the corresponding                  say that every state has a positive probability.
sequence of pages was traversed and the number                  The Traditional Markov model has some
of times a hyperlink page was visited. A state of               limitations which are as follows.
a page is composed by two other states Start
state(S) and Terminal State (F).                                    1.   Low order Markov Models has good
                                                                         coverage but less accurate due to poor
                                                                         history.
                                                                    2.   High order Markov Models suffers
                                                                         from high state space complexity.

                                                                In higher-order Markov model number of states
                                                                exponential increases as increase in the order of
                                                                model. The exponential increment in number of
                                                                states increases search space and complexity Higher-
Probability of hyperlink is based on the content
                                                                order Markov model also have low coverage
of page being viewed. Navigation matrix is as
follows:                                                        problem. In proposed model, each request with its
                                                                time-duration is considered as a state. A session is a
This Indicate navigation control can reach at
total 10 times at T.                                            sequence of such states. The m-step Markov model
                                                                assumes that the next request depends only on last m
          A    B      C     D     E    F      T
          0    3      1     0     0    0      0                 requests. Hence, the probability of the next request is
    A
               /      /
               5      2
                                                                calculated by
    B     0    0      1     0     0    0      0
                      /
                      2                                                    P(r n+1|rn...r1) = P(r n+1|r n...r n− m +1),
    C     0    0      0     1     1    2      0
                                  /    /
                                  2    5
    D     0    0      0     0     1    0      1                 Where ri is the i th request in a session, i=1, 2... n, rn
                                  /           /
                                  2           5                 is the current request, and r         n+1   is the next request.
    E     0    0      0     0     0    3      3
                                       /      /
                                                                From this equation, if m=1 (the 1-step model), the
                                       5      1                 next request is determined only by the current
                                              0
    F     0    0      0     0     0    0      1                 request [5]. The Matrix CM is of conditional
                                              /
                                              2                 probability of previous occurrence. The state matrix
    T     0    0      0     0     0    0      1
                                                                CM is a square matrix. So we need to be calculating
                                                                the probability of each page. So we need to design a
  Table 2: frequency of each Node and their
                                                                model that is dynamic in nature means prediction is
                   probability.
                                                                based on the next incoming and outgoing node. The
                                                                Markov model construction starts with the first row
                                                                of table (first navigation pattern)



                                                        126                                   http://sites.google.com/site/ijcsis/
                                                                                              ISSN 1947-5500
                                                 (IJCSIS) International Journal of Computer Science and Information Security,
                                                 Vol. 9, No. 7, July 2011

                                                                  leaving the control at a page or reaching at another
                                                                  page. Now with this dynamic Markov model it is
                                                                  possible to predict the most probable next web page
                                                                  accessed by the user.

                                                                  CONCLUSION:
                                                                  This main goal of this paper is to analyzing hidden
                                                                  information from large amount of log data. This
                                                                  paper emphasizes on dynamic Makov chain model
                                                                  among the different processes. I define a novel
  Figure: First Order Dynamic Markov Model
                                                                  approach for similar kind of web access pattern. This
                 (For pattern1)
                                                                  approach serve as foundation for the web usage
Similarly we create patterns chain for all the                    clustering that were described and I conclude that
above pattern of table 1.
                                                                  web mining methods and clustering technique are
                                                                  used for self-adaptive websites and intelligent
                                                                  websites to provide personalized service and
                                                                  performance optimization.


                                                                  REFERENCES:
                                                                  [1] Ajith Abraham, “Business Intelligence from Web
                                                                  Usage    Mining”      Journal     of      Information         &
                                                                  Knowledge Management, Vol. 2, No. 4 (2003) 375-
  Figure: First Order Dynamic Markov Model
                 (For pattern2)                                   390
                                                                  [2] Jos´e Borges, Mark Levene “An Average Linear
Summaries above pattern chain into one model and
                                                                  Time Algorithm for Web Usage Mining” Sept 2003.
set the in link and out link. So each node contains
                                                                  [3] Hengshan Wang,         Cheng Yang, Hua Zeng “
name of web page, count of web page and an inlink
                                                                  Design and Implementation of a Web Usage Mining
list and outlink list.
                                                                  Model    Based     On     Fpgrowth        and        Prefixspan,
                                                                  Communications of the IIMA, Volume 6 Issue 2
                                                                  [4] Jaideep Srivastava_ y , Robert Cooleyz , Mukund
                                                                  Deshpande, Pang-Ning Tan ”Web Usage Mining:
                                                                  Discovery and Applications of UsagePatterns from
                                                                  Web Data” Volume 1 Issue 2-Page13
                                                                  [5] Alice Marques, Orlando Belo “Discovering
                                                                  Student web Usage Profiles Using Markov Chains”
                                                                  The Electronic Journal of e-Learning Volume 9 Issue
        Figure: Dynamic Makov Model Node                          1 2011, (pp63-74)
Inlink list contains the list pointer of Inlink web               [6] Ji He,Man Lan, Chew-Lim Tan,Sam-Yuan Sung,
pages and outlink list contains outlink web pages                 Hwee-BoonLow,           “Initialization         of      Cluster
every node contains its frequency as well (as per                 refinement algorithms: a review and comparative
Table no 2).Frequency of every visited node will                  study”, Proceeding of International Joint Conference
change whenever number of inlink pointer is                       on Neural Networks[C].Budapest,2004.
increase means when a page is visited by any user.
So this helps us to predict the next web page before


                                                          127                                http://sites.google.com/site/ijcsis/
                                                                                             ISSN 1947-5500
                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                Vol. 9, No. 7, July 2011

[7] Renata       Ivancsy, Ferenc Kovacs “Clustering
Techniques Utilized in Web Usage Mining”
International Conference on Artificial Intelligence,
Knowledge Engineering and Data Bases, Madrid,
Spain, February 15-17, 2006 (pp237-242)
[8] Bradley P S, Fayyad U M. “Refining Initial
Points     for     Kmeans,Clustering      Advances    in
Knowledge Discovery and Data Mining”, MIT
Press.
[9] Ruoming Jin , Anjan Goswami and Gagan
Agrawal. “Fast and exact out-of-core and distributed
k-means clustering Knowledge and Information
Systems”, Volume 10, Number 1/July, 2006.
[10] Bhawna.N and Suresh. J “Generating a New
Model for Predicting the Next Accessed Web Page
in   Web     Usage     Mining”    Third    International
Conference on Emerging Trends in Engineering and
Technology, ICETET.2010.56
[11] Bindu Madhuri, Dr. Anand Chandulal.J, Ramya.
K, Phanidra.M “Analysis of Users’ Web Navigation
Behavior using GRPA with Variable Length Markov
Chains” IJDKP.2011.1201.

AUTHORS PROFILE




                    Harish Kumar is has completed his
M.Tech (IT) in 2009 from Guru Gobind Singh
Indraprastha University, Delhi. He is currently
pursuing     his    PhD   from    Mewar      University,
Chittorgarh.




                   Prof. A.K. Solanki, Director of the
Institute, has obtained his Ph.D. in Computer
Science     &      Engineering   from      Bundelkhand
University, Jhansi. He has published a good number
of International & National Research papers in the
area of Data warehousing and web mining and
always ready to teach the subjects to his students
which he does with great finesse.



                                                           128                             http://sites.google.com/site/ijcsis/
                                                                                           ISSN 1947-5500

				
DOCUMENT INFO
Description: Journal of Computer Science and Information Security (IJCSIS ISSN 1947-5500) is an open access, international, peer-reviewed, scholarly journal with a focused aim of promoting and publishing original high quality research dealing with theoretical and scientific aspects in all disciplines of Computing and Information Security. The journal is published monthly, and articles are accepted for review on a continual basis. Papers that can provide both theoretical analysis, along with carefully designed computational experiments, are particularly welcome. IJCSIS editorial board consists of several internationally recognized experts and guest editors. Wide circulation is assured because libraries and individuals, worldwide, subscribe and reference to IJCSIS. The Journal has grown rapidly to its currently level of over 1,100 articles published and indexed; with distribution to librarians, universities, research centers, researchers in computing, and computer scientists. Other field coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. (See monthly Call for Papers) Since 2009, IJCSIS is published using an open access publication model, meaning that all interested readers will be able to freely access the journal online without the need for a subscription. We wish to make IJCSIS a first-tier journal in Computer science field, with strong impact factor. On behalf of the Editorial Board and the IJCSIS members, we would like to express our gratitude to all authors and reviewers for their sustained support. The acceptance rate for this issue is 32%. I am confident that the readers of this journal will explore new avenues of research and academic excellence.