Analysis of Educational Web Pattern Using Adaptive Markov Chain For Next Page Access Prediction
Journal of Computer Science and Information Security (IJCSIS ISSN 1947-5500) is an open access, international, peer-reviewed, scholarly journal with a focused aim of promoting and publishing original high quality research dealing with theoretical and scientific aspects in all disciplines of Computing and Information Security. The journal is published monthly, and articles are accepted for review on a continual basis. Papers that can provide both theoretical analysis, along with carefully designed computational experiments, are particularly welcome. IJCSIS editorial board consists of several internationally recognized experts and guest editors. Wide circulation is assured because libraries and individuals, worldwide, subscribe and reference to IJCSIS. The Journal has grown rapidly to its currently level of over 1,100 articles published and indexed; with distribution to librarians, universities, research centers, researchers in computing, and computer scientists. Other field coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. (See monthly Call for Papers) Since 2009, IJCSIS is published using an open access publication model, meaning that all interested readers will be able to freely access the journal online without the need for a subscription. We wish to make IJCSIS a first-tier journal in Computer science field, with strong impact factor. On behalf of the Editorial Board and the IJCSIS members, we would like to express our gratitude to all authors and reviewers for their sustained support. The acceptance rate for this issue is 32%. I am confident that the readers of this journal will explore new avenues of research and academic excellence.
- views:
- 118
- posted:
- 8/12/2011
- language:
- English
- pages:
- 5

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 7, July 2011
Analysis of Educational web pattern using
Adaptive Markov Chain for Next page
Access Prediction
Harish Kumar, Dr. Anil Kumar Solanki
PhD scholar, Mewar University. MIET Meerut.
Chittorgarh.
ABSTRACT The IT revolution is the fastest emerging
revolution seen by the human race. The Internet
The Internet grows at an amazing rate as an
surpasses online education, Web based
information gateway and as a medium for
information and volume of click the web site has
business and education industry. Universities
reached at huge proportions. Internet and the
with web education rely on web usage analysis
common use of educational databases have
to obtain students behavior for web marketing.
formed huge need for KDD methodologies. The
Web Usage Mining (WUM) integrates the
Internet is an infinite source of data that can
techniques of two popular research fields - Data
come either from the Web content, represented
Mining and the Internet. Web usage mining
by the billions of pages publicly available, or
attempts to discover useful knowledge from the
from the Web usage, represented by the log
secondary data (Web logs). These useful data
information daily collected by all the servers
pattern are use to analyze visitors activities in
around the world[1][2]. The information
the web sites. So many servers manage their
collection through data mining has allowed E-
cookies for distinguishing server address. User
education Applications to make more revenues
Navigation pattern are in the form of web logs
by being able to better use of the internet that
.These Navigation patterns are refined and
helps students to make more decisions.
resized and modeled as a new format. This
Knowledge Discovery and Data Mining (KDD)
method is known as “Loginizing”. In this paper
is an interdisciplinary area focusing upon
we study the navigation pattern from web usage
methodologies for mining useful information or
and modeled as a Markov Chain. This chain
knowledge from data [1]. Users leave navigation
works on higher probability of usage .Markov
traces, which can be pulled up as a basis for a
chain is modeled for the collection of navigation
user behavior analysis. In the field of web
a pattern and used for finding the most likely
applications similar analyses have been
used navigation pattern for a web site.
successfully executed by methods of Web Usage
Keyword: Web mining, web usage, web logs, Mining [2] [3]. The challenge of extracting
Markov Chain. knowledge from data draws upon research in
statistics, databases, pattern recognition,
INTRODUCTION:
machine learning, data visualization,
optimization, web user behavior and high-
124 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 7, July 2011
performance computing, to deliver advanced web site we need to assign different threshold
business intelligence and web discovery value.
solutions[3][4]. It is a powerful technology with Important properties of Markov Chain:
great potential to help various industries focus 1. Markov Chain is successful in sequence
on the most important information in their data matching generation.
warehouses. Data mining can be viewed as a 2. Markov model is depending on previous state.
result of the natural evolution of information 3. Markov Chain model is Generative.
technology. In Web usage analysis, these data 4. Markov Chain is a discrete – time stochastic
are the sessions of the site visitors: the activities process
performed by a user from the moment he enters
the site until the moment he leaves it. Web Due to the generative nature of Markov chain,
usage mining consists on applying data mining navigation tours can automatically derived.
techniques for analyzing web user’s activity. In Sarukkai proposed a technique ho Markov
educational contexts, it has been used for model predict the next page accessed page by
personalizing e-learning and adapting the user[4][2]. Pitkow and Deshpande
educational hypermedia, discovering potential ,Dongshan and Junyi proposed various
browsing problems, automatic recognition of techniques for log mining using Makov
learner groups in exploratory learning Model[5][2]
environments or predicting student performance.
The discovered patterns are usually represented METHODOLOGY:
as collection of web pages, objects or resources This Markov model is an easiest way of
that are frequently accessed by groups of users representing navigation patterns and navigation
with common needs or interests [10][11]. tree. Suppose we have an e web site of a
Generally user visit a web site in sequential university.
nature means user visit first home page then Navigation pattern sequences are
second page and then third and then finish his 1. ABCDEF
work with this user leaves his navigation marks 2. ACF
on a server. These navigation marks are called 3. ACE
navigation pattern that can be used to decide the 4. BCD
next likely web page request based on Navigation Pattern Frequency
of visit
significantly statistical correlations. If that
SABCDEFT 3
sequence is occurring very frequently then this SACFT 2
sequence indicated most likely traversal pattern. SACET 3
SBCDT 2
If this pattern occurs sequentially, Makov chains Total No of web 10
have been used to represent navigation pattern site navigate
of the web site. This is because in Markov chain
Table 1: Navigation pattern table
present state is depending on previous state. If a
web site contains more navigation pattern
(“Interesting Pattern”) high supporting threshold
is assign to it and less interesting patterns are
ignored. So we can say that at different level of
125 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 7, July 2011
So we can identify that total probability of visit
of A is 8/39, B is 5/39, C is 10/39, D is 5/39, E
is 6/39 and F is 5/39.Here NPi j is a navigation
probability matrix where NP is the probability
where next stage will be j. Navigation
probability is defined as
NPi j 0,1
And for all j NPi j =1. The initial probability of a
state is estimated as the how many number of
The probability of transition is calculated by the times a page was requested by user so we can
ratio of the number of times the corresponding say that every state has a positive probability.
sequence of pages was traversed and the number The Traditional Markov model has some
of times a hyperlink page was visited. A state of limitations which are as follows.
a page is composed by two other states Start
state(S) and Terminal State (F). 1. Low order Markov Models has good
coverage but less accurate due to poor
history.
2. High order Markov Models suffers
from high state space complexity.
In higher-order Markov model number of states
exponential increases as increase in the order of
model. The exponential increment in number of
states increases search space and complexity Higher-
Probability of hyperlink is based on the content
order Markov model also have low coverage
of page being viewed. Navigation matrix is as
follows: problem. In proposed model, each request with its
time-duration is considered as a state. A session is a
This Indicate navigation control can reach at
total 10 times at T. sequence of such states. The m-step Markov model
assumes that the next request depends only on last m
A B C D E F T
0 3 1 0 0 0 0 requests. Hence, the probability of the next request is
A
/ /
5 2
calculated by
B 0 0 1 0 0 0 0
/
2 P(r n+1|rn...r1) = P(r n+1|r n...r n− m +1),
C 0 0 0 1 1 2 0
/ /
2 5
D 0 0 0 0 1 0 1 Where ri is the i th request in a session, i=1, 2... n, rn
/ /
2 5 is the current request, and r n+1 is the next request.
E 0 0 0 0 0 3 3
/ /
From this equation, if m=1 (the 1-step model), the
5 1 next request is determined only by the current
0
F 0 0 0 0 0 0 1 request [5]. The Matrix CM is of conditional
/
2 probability of previous occurrence. The state matrix
T 0 0 0 0 0 0 1
CM is a square matrix. So we need to be calculating
the probability of each page. So we need to design a
Table 2: frequency of each Node and their
model that is dynamic in nature means prediction is
probability.
based on the next incoming and outgoing node. The
Markov model construction starts with the first row
of table (first navigation pattern)
126 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 7, July 2011
leaving the control at a page or reaching at another
page. Now with this dynamic Markov model it is
possible to predict the most probable next web page
accessed by the user.
CONCLUSION:
This main goal of this paper is to analyzing hidden
information from large amount of log data. This
paper emphasizes on dynamic Makov chain model
among the different processes. I define a novel
Figure: First Order Dynamic Markov Model
approach for similar kind of web access pattern. This
(For pattern1)
approach serve as foundation for the web usage
Similarly we create patterns chain for all the clustering that were described and I conclude that
above pattern of table 1.
web mining methods and clustering technique are
used for self-adaptive websites and intelligent
websites to provide personalized service and
performance optimization.
REFERENCES:
[1] Ajith Abraham, “Business Intelligence from Web
Usage Mining” Journal of Information &
Knowledge Management, Vol. 2, No. 4 (2003) 375-
Figure: First Order Dynamic Markov Model
(For pattern2) 390
[2] Jos´e Borges, Mark Levene “An Average Linear
Summaries above pattern chain into one model and
Time Algorithm for Web Usage Mining” Sept 2003.
set the in link and out link. So each node contains
[3] Hengshan Wang, Cheng Yang, Hua Zeng “
name of web page, count of web page and an inlink
Design and Implementation of a Web Usage Mining
list and outlink list.
Model Based On Fpgrowth and Prefixspan,
Communications of the IIMA, Volume 6 Issue 2
[4] Jaideep Srivastava_ y , Robert Cooleyz , Mukund
Deshpande, Pang-Ning Tan ”Web Usage Mining:
Discovery and Applications of UsagePatterns from
Web Data” Volume 1 Issue 2-Page13
[5] Alice Marques, Orlando Belo “Discovering
Student web Usage Profiles Using Markov Chains”
The Electronic Journal of e-Learning Volume 9 Issue
Figure: Dynamic Makov Model Node 1 2011, (pp63-74)
Inlink list contains the list pointer of Inlink web [6] Ji He,Man Lan, Chew-Lim Tan,Sam-Yuan Sung,
pages and outlink list contains outlink web pages Hwee-BoonLow, “Initialization of Cluster
every node contains its frequency as well (as per refinement algorithms: a review and comparative
Table no 2).Frequency of every visited node will study”, Proceeding of International Joint Conference
change whenever number of inlink pointer is on Neural Networks[C].Budapest,2004.
increase means when a page is visited by any user.
So this helps us to predict the next web page before
127 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 7, July 2011
[7] Renata Ivancsy, Ferenc Kovacs “Clustering
Techniques Utilized in Web Usage Mining”
International Conference on Artificial Intelligence,
Knowledge Engineering and Data Bases, Madrid,
Spain, February 15-17, 2006 (pp237-242)
[8] Bradley P S, Fayyad U M. “Refining Initial
Points for Kmeans,Clustering Advances in
Knowledge Discovery and Data Mining”, MIT
Press.
[9] Ruoming Jin , Anjan Goswami and Gagan
Agrawal. “Fast and exact out-of-core and distributed
k-means clustering Knowledge and Information
Systems”, Volume 10, Number 1/July, 2006.
[10] Bhawna.N and Suresh. J “Generating a New
Model for Predicting the Next Accessed Web Page
in Web Usage Mining” Third International
Conference on Emerging Trends in Engineering and
Technology, ICETET.2010.56
[11] Bindu Madhuri, Dr. Anand Chandulal.J, Ramya.
K, Phanidra.M “Analysis of Users’ Web Navigation
Behavior using GRPA with Variable Length Markov
Chains” IJDKP.2011.1201.
AUTHORS PROFILE
Harish Kumar is has completed his
M.Tech (IT) in 2009 from Guru Gobind Singh
Indraprastha University, Delhi. He is currently
pursuing his PhD from Mewar University,
Chittorgarh.
Prof. A.K. Solanki, Director of the
Institute, has obtained his Ph.D. in Computer
Science & Engineering from Bundelkhand
University, Jhansi. He has published a good number
of International & National Research papers in the
area of Data warehousing and web mining and
always ready to teach the subjects to his students
which he does with great finesse.
128 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Get documents about "