Analysis of Educational Web Pattern Using Adaptive Markov Chain For Next Page Access Prediction
Journal of Computer Science and Information Security (IJCSIS ISSN 1947-5500) is an open access, international, peer-reviewed, scholarly journal with a focused aim of promoting and publishing original high quality research dealing with theoretical and scientific aspects in all disciplines of Computing and Information Security. The journal is published monthly, and articles are accepted for review on a continual basis. Papers that can provide both theoretical analysis, along with carefully designed computational experiments, are particularly welcome. IJCSIS editorial board consists of several internationally recognized experts and guest editors. Wide circulation is assured because libraries and individuals, worldwide, subscribe and reference to IJCSIS. The Journal has grown rapidly to its currently level of over 1,100 articles published and indexed; with distribution to librarians, universities, research centers, researchers in computing, and computer scientists. Other field coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. (See monthly Call for Papers) Since 2009, IJCSIS is published using an open access publication model, meaning that all interested readers will be able to freely access the journal online without the need for a subscription. We wish to make IJCSIS a first-tier journal in Computer science field, with strong impact factor. On behalf of the Editorial Board and the IJCSIS members, we would like to express our gratitude to all authors and reviewers for their sustained support. The acceptance rate for this issue is 32%. I am confident that the readers of this journal will explore new avenues of research and academic excellence.
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011 Analysis of Educational web pattern using Adaptive Markov Chain for Next page Access Prediction Harish Kumar, Dr. Anil Kumar Solanki PhD scholar, Mewar University. MIET Meerut. Chittorgarh. ABSTRACT The IT revolution is the fastest emerging revolution seen by the human race. The Internet The Internet grows at an amazing rate as an surpasses online education, Web based information gateway and as a medium for information and volume of click the web site has business and education industry. Universities reached at huge proportions. Internet and the with web education rely on web usage analysis common use of educational databases have to obtain students behavior for web marketing. formed huge need for KDD methodologies. The Web Usage Mining (WUM) integrates the Internet is an infinite source of data that can techniques of two popular research fields - Data come either from the Web content, represented Mining and the Internet. Web usage mining by the billions of pages publicly available, or attempts to discover useful knowledge from the from the Web usage, represented by the log secondary data (Web logs). These useful data information daily collected by all the servers pattern are use to analyze visitors activities in around the world. The information the web sites. So many servers manage their collection through data mining has allowed E- cookies for distinguishing server address. User education Applications to make more revenues Navigation pattern are in the form of web logs by being able to better use of the internet that .These Navigation patterns are refined and helps students to make more decisions. resized and modeled as a new format. This Knowledge Discovery and Data Mining (KDD) method is known as “Loginizing”. In this paper is an interdisciplinary area focusing upon we study the navigation pattern from web usage methodologies for mining useful information or and modeled as a Markov Chain. This chain knowledge from data . Users leave navigation works on higher probability of usage .Markov traces, which can be pulled up as a basis for a chain is modeled for the collection of navigation user behavior analysis. In the field of web a pattern and used for finding the most likely applications similar analyses have been used navigation pattern for a web site. successfully executed by methods of Web Usage Keyword: Web mining, web usage, web logs, Mining  . The challenge of extracting Markov Chain. knowledge from data draws upon research in statistics, databases, pattern recognition, INTRODUCTION: machine learning, data visualization, optimization, web user behavior and high- 124 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011 performance computing, to deliver advanced web site we need to assign different threshold business intelligence and web discovery value. solutions. It is a powerful technology with Important properties of Markov Chain: great potential to help various industries focus 1. Markov Chain is successful in sequence on the most important information in their data matching generation. warehouses. Data mining can be viewed as a 2. Markov model is depending on previous state. result of the natural evolution of information 3. Markov Chain model is Generative. technology. In Web usage analysis, these data 4. Markov Chain is a discrete – time stochastic are the sessions of the site visitors: the activities process performed by a user from the moment he enters the site until the moment he leaves it. Web Due to the generative nature of Markov chain, usage mining consists on applying data mining navigation tours can automatically derived. techniques for analyzing web user’s activity. In Sarukkai proposed a technique ho Markov educational contexts, it has been used for model predict the next page accessed page by personalizing e-learning and adapting the user. Pitkow and Deshpande educational hypermedia, discovering potential ,Dongshan and Junyi proposed various browsing problems, automatic recognition of techniques for log mining using Makov learner groups in exploratory learning Model environments or predicting student performance. The discovered patterns are usually represented METHODOLOGY: as collection of web pages, objects or resources This Markov model is an easiest way of that are frequently accessed by groups of users representing navigation patterns and navigation with common needs or interests . tree. Suppose we have an e web site of a Generally user visit a web site in sequential university. nature means user visit first home page then Navigation pattern sequences are second page and then third and then finish his 1. ABCDEF work with this user leaves his navigation marks 2. ACF on a server. These navigation marks are called 3. ACE navigation pattern that can be used to decide the 4. BCD next likely web page request based on Navigation Pattern Frequency of visit significantly statistical correlations. If that SABCDEFT 3 sequence is occurring very frequently then this SACFT 2 sequence indicated most likely traversal pattern. SACET 3 SBCDT 2 If this pattern occurs sequentially, Makov chains Total No of web 10 have been used to represent navigation pattern site navigate of the web site. This is because in Markov chain Table 1: Navigation pattern table present state is depending on previous state. If a web site contains more navigation pattern (“Interesting Pattern”) high supporting threshold is assign to it and less interesting patterns are ignored. So we can say that at different level of 125 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011 So we can identify that total probability of visit of A is 8/39, B is 5/39, C is 10/39, D is 5/39, E is 6/39 and F is 5/39.Here NPi j is a navigation probability matrix where NP is the probability where next stage will be j. Navigation probability is defined as NPi j 0,1 And for all j NPi j =1. The initial probability of a state is estimated as the how many number of The probability of transition is calculated by the times a page was requested by user so we can ratio of the number of times the corresponding say that every state has a positive probability. sequence of pages was traversed and the number The Traditional Markov model has some of times a hyperlink page was visited. A state of limitations which are as follows. a page is composed by two other states Start state(S) and Terminal State (F). 1. Low order Markov Models has good coverage but less accurate due to poor history. 2. High order Markov Models suffers from high state space complexity. In higher-order Markov model number of states exponential increases as increase in the order of model. The exponential increment in number of states increases search space and complexity Higher- Probability of hyperlink is based on the content order Markov model also have low coverage of page being viewed. Navigation matrix is as follows: problem. In proposed model, each request with its time-duration is considered as a state. A session is a This Indicate navigation control can reach at total 10 times at T. sequence of such states. The m-step Markov model assumes that the next request depends only on last m A B C D E F T 0 3 1 0 0 0 0 requests. Hence, the probability of the next request is A / / 5 2 calculated by B 0 0 1 0 0 0 0 / 2 P(r n+1|rn...r1) = P(r n+1|r n...r n− m +1), C 0 0 0 1 1 2 0 / / 2 5 D 0 0 0 0 1 0 1 Where ri is the i th request in a session, i=1, 2... n, rn / / 2 5 is the current request, and r n+1 is the next request. E 0 0 0 0 0 3 3 / / From this equation, if m=1 (the 1-step model), the 5 1 next request is determined only by the current 0 F 0 0 0 0 0 0 1 request . The Matrix CM is of conditional / 2 probability of previous occurrence. The state matrix T 0 0 0 0 0 0 1 CM is a square matrix. So we need to be calculating the probability of each page. So we need to design a Table 2: frequency of each Node and their model that is dynamic in nature means prediction is probability. based on the next incoming and outgoing node. The Markov model construction starts with the first row of table (first navigation pattern) 126 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011 leaving the control at a page or reaching at another page. Now with this dynamic Markov model it is possible to predict the most probable next web page accessed by the user. CONCLUSION: This main goal of this paper is to analyzing hidden information from large amount of log data. This paper emphasizes on dynamic Makov chain model among the different processes. I define a novel Figure: First Order Dynamic Markov Model approach for similar kind of web access pattern. This (For pattern1) approach serve as foundation for the web usage Similarly we create patterns chain for all the clustering that were described and I conclude that above pattern of table 1. web mining methods and clustering technique are used for self-adaptive websites and intelligent websites to provide personalized service and performance optimization. REFERENCES:  Ajith Abraham, “Business Intelligence from Web Usage Mining” Journal of Information & Knowledge Management, Vol. 2, No. 4 (2003) 375- Figure: First Order Dynamic Markov Model (For pattern2) 390  Jos´e Borges, Mark Levene “An Average Linear Summaries above pattern chain into one model and Time Algorithm for Web Usage Mining” Sept 2003. set the in link and out link. So each node contains  Hengshan Wang, Cheng Yang, Hua Zeng “ name of web page, count of web page and an inlink Design and Implementation of a Web Usage Mining list and outlink list. Model Based On Fpgrowth and Prefixspan, Communications of the IIMA, Volume 6 Issue 2  Jaideep Srivastava_ y , Robert Cooleyz , Mukund Deshpande, Pang-Ning Tan ”Web Usage Mining: Discovery and Applications of UsagePatterns from Web Data” Volume 1 Issue 2-Page13  Alice Marques, Orlando Belo “Discovering Student web Usage Profiles Using Markov Chains” The Electronic Journal of e-Learning Volume 9 Issue Figure: Dynamic Makov Model Node 1 2011, (pp63-74) Inlink list contains the list pointer of Inlink web  Ji He,Man Lan, Chew-Lim Tan,Sam-Yuan Sung, pages and outlink list contains outlink web pages Hwee-BoonLow, “Initialization of Cluster every node contains its frequency as well (as per refinement algorithms: a review and comparative Table no 2).Frequency of every visited node will study”, Proceeding of International Joint Conference change whenever number of inlink pointer is on Neural Networks[C].Budapest,2004. increase means when a page is visited by any user. So this helps us to predict the next web page before 127 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 7, July 2011  Renata Ivancsy, Ferenc Kovacs “Clustering Techniques Utilized in Web Usage Mining” International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, Madrid, Spain, February 15-17, 2006 (pp237-242)  Bradley P S, Fayyad U M. “Refining Initial Points for Kmeans,Clustering Advances in Knowledge Discovery and Data Mining”, MIT Press.  Ruoming Jin , Anjan Goswami and Gagan Agrawal. “Fast and exact out-of-core and distributed k-means clustering Knowledge and Information Systems”, Volume 10, Number 1/July, 2006.  Bhawna.N and Suresh. J “Generating a New Model for Predicting the Next Accessed Web Page in Web Usage Mining” Third International Conference on Emerging Trends in Engineering and Technology, ICETET.2010.56  Bindu Madhuri, Dr. Anand Chandulal.J, Ramya. K, Phanidra.M “Analysis of Users’ Web Navigation Behavior using GRPA with Variable Length Markov Chains” IJDKP.2011.1201. AUTHORS PROFILE Harish Kumar is has completed his M.Tech (IT) in 2009 from Guru Gobind Singh Indraprastha University, Delhi. He is currently pursuing his PhD from Mewar University, Chittorgarh. Prof. A.K. Solanki, Director of the Institute, has obtained his Ph.D. in Computer Science & Engineering from Bundelkhand University, Jhansi. He has published a good number of International & National Research papers in the area of Data warehousing and web mining and always ready to teach the subjects to his students which he does with great finesse. 128 http://sites.google.com/site/ijcsis/ ISSN 1947-5500