Docstoc

Categories Of Unstructured Data Processing And Their Enhancement

Document Sample
Categories Of Unstructured Data Processing And Their Enhancement Powered By Docstoc
					                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                         Vol. 8, No. 7, October 2010




   CATEGORIES OF UNSTRUCTURED DATA PROCESSING AND THEIR ENHANCEMENT

                                                   Prof.(Dr). Vinodani Katiyar
                             Sagar Institute of Technology and Management Barabanki U.P. (INDIA)
                                                     (drvinodini@gmail.com)

                                                      Hemant Kumar Singh
                                Azad Institute of Engineering & Technology Lucknow, U.P. INDIA.
                                                     (hemantbib@gmail.com)


ABSTRACT
Web Mining is an area of Data Mining which deals with the                 scholars because there are huge heterogeneous, less structured
extraction of interesting knowledge from the World Wide Web.              data available on the web and we can easily get overwhelmed
The central goal of the paper is to provide past, current                 with data [2].
evaluation and update in each of the three different types of web         According to Oren Etzioni[6] Web mining is the use of data
mining i.e. web content mining, web structure mining and web              mining techniques to automatically discover and extract
usages mining and also outlines key future research directions.
                                                                          information from World Wide Web documents and service.
Keywords: Web mining; web content mining; web usage mining;
web structure mining;                                                     Web mining research can be classified in to three categories:
                                                                          Web content mining (WCM), Web structure mining (WSM),
1. INTRODUCTION
The amount of data kept in computer files and data bases is               and Web usage mining (WUM) [3]. Web content mining
growing at a phenomenal rate. At the same time users of these             refers to the discovery of useful information from web
data are expecting more sophisticated information from them               contents, including text, image, audio, video, etc.Web
.A marketing manager is no longer satisfied with the simple               structure mining tries to discover the model underlying the
listing of marketing contacts but wants detailed information              link structures of the web. Model is based on the topology of
about customers’ past purchases as well as prediction of future           hyperlinks with or without description of links. This model
purchases. Simple structured / query language queries are not             can be used to categorize web pages and is useful to generate
adequate to support increased demands for information. Data               information such as similarity and relationship between
mining steps is to solve these needs. Data mining is defined as           different websites. Web usage mining refers discovery of user
finding hidden information in a database alternatively it has             access patterns from Web servers. Web usages data include
been called exploratory data analysis, data driven discovery,             data from web server access logs, proxy server logs, browser
and deductive learning [7]. In the data mining communities,               logs, user profiles, registration data, user session or
there are three types of mining: data mining, web mining, and             transactions, cookies, user queries, bookmark data, mouse
text mining. There are many challenging problems [1] in                   clicks and scrolls or any other data as result of interaction.
data/web/text mining research. Data mining mainly deals with              Minos N. Garofalakis, Rajeev Rastogi, et al[4] presents a
structured data organized in a database while text mining                 survey of web mining research [1999] and analyses Today's
mainly handles unstructured data/text. Web mining lies in                 search tools are plagued by the following four problems:
between and copes with semi-structured data and/or                        (1) The abundance problem, that is, the phenomenon of
unstructured data. Web mining calls for creative use of data              hundreds of irrelevant documents being returned in response
mining and/or text mining techniques and its distinctive                  to a search query, (2) limited coverage of the Web (3) a
approaches. Mining the web data is one of the most                        limited query interface that is based on syntactic keyword-
challenging tasks for the data mining and data management                 oriented search (4) limited customization to individual users




                                                                    144                               http://sites.google.com/site/ijcsis/
                                                                                                      ISSN 1947-5500
                                                          (IJCSIS) International Journal of Computer Science and Information Security,
                                                          Vol. 8, No. 7, October 2010



and listed research issues that still remain to be addressed in            2.1 Web Content Mining- Margaret H. Dunham[7] stated
the area of Web Mining .                                                   Web Content Mining can be thought of the extending the work
Bin Wang, Zhijing Liu[5] presents a survey [2003] of web                   performed by basic search engines. Web content mining
mining research With the explosive growth of information                   analyzes the content of Web resources. Recent advances in
sources available on the World Wide Web, it has become                     multimedia data mining promise to widen access also to
more and more necessary for users to utilize automated tools               image, sound, video, etc. content of Web resources. The
in order to find, extract, filter, and evaluate the desired                primary Web resources that are mined in Web content mining
information      and   resources.    In    addition,   with   the          are individual pages. Information Retrieval is one of the
transformation of the web into the primary tool for electronic             research areas that provides a range of popular and effective,
commerce, it is essential for organizations and companies,                 mostly statistical methods for Web content mining. They can
who have invested millions in Internet and Intranet                        be used to group, categorize, analyze, and retrieve documents.
technologies, to track and analyze user access patterns. These             content mining methods which will be used for Ontology
factors give rise to the necessity of creating server-side and             learning, mapping and merging ontologies, and instance
client-side intelligent systems that can                                   learning [8].
effectively mine for knowledge both across the Internet and in             To reduce the gap between low-level image features used to
particular web localities. The purpose of the paper is to                  index images and high-level semantic contents of images in
provide past, current evaluation and update in each of the                 content-based image retrieval (CBIR) systems or search
three different types of web mining i.e. web content mining,               engines, Zhang et al.[9] suggest applying relevance feedback
web structure mining and web usages mining             and    also         to refine the query or similarity measures in image search
outlines key future research directions.                                   process. They present a framework of relevance feedback and
2. LITERATURE REVIEW                                                       semantic learning where low-level features and keyword
Both Etzioni[6] and Kosala and Blockeel[3] decompose web                   explanation are integrated in image retrieval and in feedback
mining into four subtasks that respectively, are (a) resource              processes to improve the retrieval performance. They
finding; (b) information selection and preprocessing;(c)                   developed a prototype system performing better than
generalization; and (d) analysis. Qingyu Zhang and Richard s.              traditional approaches.
Segall[2] devided the web mining process into the following                The dynamic nature and size of the Internet can result in
five subtasks:                                                             difficulty finding relevant information. Most users typically
(1) Resource finding and retrieving;                                       express their information need via short queries to search
(2) Information selection and preprocessing;                               engines and they often have to physically sift through the
(3) Patterns analysis and recognition;                                     search results based on relevance ranking set by the search
(4) Validation and interpretation;                                         engines, making the process of relevance judgement time-
(5) Visualization                                                          consuming. Chen et al[10] describe a novel representation
The literature in this paper is classified into the three types of         technique which makes use of the Web structure together with
web mining: web content mining, web usage mining, and web                  summarization techniques to better represent knowledge in
structure mining. We put the literature into five sections: (2.1)          actual Web Documents. They named the proposed technique
Literature review for web content mining; (2.2) Literature                 as Semantic Virtual Document (SVD). The proposed SVD can
review for web usage mining; (2.3) Literature review for web               be used together with a suitable clustering algorithm to
structure mining; (2.4) Literature review for web mining                   achieve an automatic content-based categorization of similar
survey; and (2.5) Literature review for semantic web.                      Web Documents. This technique allows an automatic content-
                                                                           based categorization of web documents as well as a tree-like

                                                                     145                             http://sites.google.com/site/ijcsis/
                                                                                                     ISSN 1947-5500
                                                           (IJCSIS) International Journal of Computer Science and Information Security,
                                                           Vol. 8, No. 7, October 2010



graphical user interface for browsing post retrieval document              [14].Through an original algorithm for hyperlink analysis
browsing enhances the relevance judgment process for                       called HITS (Hypertext Induced Topic Search), Kleinberg[15]
Internet users. They also introduce cluster-biased automatic               introduced the concepts of hubs (pages that refer to many
query expansion technique           to interpret short queries             pages) and authorities (pages that are referred by many
accurately. They present a prototype of Intelligent Search and             pages)[16]. Apart from search ranking, hyperlinks are also
Review of Cluster Hierarchy (iSEARCH) for web content                      useful for finding Web communities. A web community is a
mining.                                                                    collection of web pages that are focused on a particular topic
Typically, search engines are low precision in response to a               or theme. Most community mining approaches are based on
query, retrieving lots of useless web pages, and missing some              the assumption that each member of a community has more
other important ones. Ricardo Campos et al[11] study the                   hyperlinks within than outside its community. In this context,
problem of the hierarchical clustering of web and proposed an              many graph clustering algorithms may be used for mining the
architecture of     a meta-search engine called WISE that                  community structure of a graph as they adopt the same
automatically builds clusters of related web pages embodying               assumption, i.e. they assume that a cluster is a vertex subset
one meaning of the query. These clusters are then                          such that for all of its vertices, the number of links connecting
hierarchically    organized   and     labeled   with   a    phrase         a vertex to its cluster is higher than the number of links
representing the key concept of the cluster and the                        connecting the vertex outside its cluster[17].
corresponding web documents.                                               Furnkranz[18] described the Web may be viewed as a
Mining search engine query log is a new method for                         (directed) graph whose nodes are the documents and the edges
evaluating web site link structure and information architecture.           are the hyperlinks between them and exploited the graph
Mehdi Hosseini , Hassan Abol hassani [12] propose a new                    structure of the World Wide Web for improved retrieval
query-URL co-clustering for a web site useful to evaluate                  performance and classification accuracy. Many search engines
information architecture and link structure. Firstly, all queries          use graph properties in ranking their query results.
and clicked URLs corresponding to particular web site are                  The continuous growth in the size and use of the Internet is
collected from a query log as bipartite graph, one side for                creating difficulties in the search for information. To help
queries and the other side for URLs. Then a new content free               users search for information and organize information layout,
clustering is applied to cluster queries and URLs concurrently.            Smith and Ng[19] suggest using a SOM to mine web data and
Afterwards, based on information entropy, clusters of URLs                 provide a visual tool to assist user navigation. Based on the
and queries will be used for evaluating link structure and                 users’ navigation behavior, they develop LOGSOM, a system
information architecture respectively.                                     that utilizes SOM to organize web pages into a two-
Data available on web is classified as structured data, semi               dimensional map. The map provides a meaningful navigation
structured data and Unstructured data. Kshitija Pol, Nita Patil            tool and serves as a visual tool to better understand the
et al[13] presented a survey on web content mining described               structure of the web site and navigation behaviors of web
various problems of web content mining and techniques to                   users.
mine the Web pages including structured and semi structured                As the size and complexity of websites expands dramatically,
data.                                                                      it has become increasingly challenging to design websites on
2.2 Web Structure Mining-Web information retrieval tools                   which web surfers can easily find the information they seek.
make use of only the text on pages, ignoring valuable                      Fang and Sheng[20] address the design of the portal page of a
information contained in links. Web structure mining aims to               web site. They try to maximize the efficiency, effectiveness,
generate structural summary about web sites and web pages.                 and usage of a web site’s portal page by selecting a limited
The focus of structure mining is on link information                       number of hyperlinks from a large set for the inclusion in a

                                                                     146                               http://sites.google.com/site/ijcsis/
                                                                                                       ISSN 1947-5500
                                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                                       Vol. 8, No. 7, October 2010



portal page. Based on relationships among hyperlinks (i.e.             and note that the final outcome of preprocessing should be
structural relationships that can be extracted from a web site         data that allows identification of a particular user’s browsing
and access relationship that can be discovered from a web              pattern in the form of page views, sessions, and click streams.
log), they propose a heuristic approach to hyperlink selection         Click streams are of particular interest because they allow
called Link Selector.                                                  reconstruction of user navigational patterns In the previous six
Instead of clustering user navigation patterns by means of a           years collection of user navigation session were presented in
Euclidean distance measure, Hay et al.[21] use the Sequence            form of many models such as               Hyper Text Probabilistic
Alignment Method (SAM) to partition users into clusters,               Grammar (HPG), N-Gram Model, Dynamic clustering based
according to the order in which web pages are requested and            morkov model etc[25].Using a footstep graph, The user’s click
the different lengths of clustering sequences. They validate           stream data can be visualized and any interesting pattern can
SAM by means of user traffic data of two different web sites           be discovered more easily and quickly than with other
and results show that SAM identifies sequences with similar            visualization tools. Recent work by Yannis Manolopoulos, A
behavioral patterns.                                                   Nanopoulos et al[26] provides a comprehensive discussion of
To meet the need for an evolving and organized method to               Web logs for usage mining and suggests novel ideas for Web
                                                     37
store references to web objects, Guan and McMullen design              log indexing. Such preprocessed data enables various mining
a new bookmark structure that allows individuals or groups to          techniques.
access the bookmark from anywhere on the Internet using a              Recently, several Web Usage Mining algorithms [27, 28, 29]
Java-enabled web browser. They propose a prototype to                  have been proposed to mining user navigation behavior.
include more features such as URL, the document type, the              Partitioning method was one of the earliest clustering methods
document title, keywords, date added, date last visited, and           to be used in Web usage mining [28].Web based recommender
date last modified as they share bookmarks among groups of             systems are very helpful in directing the users to the target
users.                                                                 pages in particular web sites. Web usage mining recommender
Song and Shepperd[22] view the topology of a web site as a             systems have been proposed to predict user’s intention and
directed graph and mine web browsing patterns for e-                   their navigation behaviors. We can take into account the
commerce. They use vector analysis and fuzzy set theory to             semantic knowledge [explained in later section] about
cluster users and URLs. Their frequent access path                     underlying      domain    to    improve       the    quality     of   the
identification algorithm is not based on sequence mining.              recommendation. Integrating semantic web and web usage
2.3 Web Usages Mining- Several surveys on Web usage                    mining can achieve best recommendations in the dynamic
mining exist in [3, 23, 24] Web usage mining model is a kind           huge web sites [30].
of mining to server logs. And its aim is getting useful users’         As new data is published every day, the Web’s utility as an
access information in logs to make sites can perfect                   information source will continue to grow. The only question
themselves with pertinence, serve users better and get more            is: Can Web mining catch up to the WWW’s growth? There
economy benefit The main areas of research in this domain are          are existing Web Usages mining models for modeling the user
Web log data preprocessing and identification of useful                navigation patterns. My work will be an effort to advance the
patterns from this preprocessed data using mining techniques.          existing web usages mining system and to present the work
Most data used for mining [23] is collected from Web servers,          principle of the system. The key technologies in system design
clients, proxy servers, or server databases, all of which              are   session      identification,    data     cleaning      and      web
generate noisy data. Because Web mining is sensitive to noise,         Personalization.
data cleaning methods are necessary. Jaideep Srivastava and            2.4 Web Mining- In 1996 it’s Etzioni [6] who first coined the
R. Cooley [23] categorize data preprocessing into subtasks             term web mining.Etzioni starts by making a hypothesis that

                                                                 147                                  http://sites.google.com/site/ijcsis/
                                                                                                      ISSN 1947-5500
                                                      (IJCSIS) International Journal of Computer Science and Information Security,
                                                      Vol. 8, No. 7, October 2010



information on web is sufficiently structured and outlines the         A is a part of B and Y is a member of Z) and the properties of
subtasks of web mining and describes the web mining process.           things (like size, weight, age, and price). Semantic Web
Web mining may be decomposed into the following sub tasks:             Mining aims at combining the two fast-developing research
1. Re so ur ce Di sco ve ry : locating unfamiliar documents            areas Semantic Web and Web Mining. More and more

and services on the Web.                                               researchers are working on improving the results of Web

2 . I nfo rm a tio n Ex t ra c tio n: automatically extracting         Mining by exploiting semantic structures in the Web, and they

specific information from newly discovered Web resources.              make use of Web Mining techniques for building the Semantic

3.   Gen e ra l iza tio n:   uncovering general patterns at            Web. Last but not least, these techniques can be used for

individual Web sites and across multiple Sites.                        mining the Semantic Web itself [38]. The Semantic Web is a

Kosala and Blockeel[3] who perform research in the area of             recent initiative, inspired by Tim Berners-Lee[39], to take the

web mining and suggest the three web mining categories of              World-Wide Web much further and develop in into a

web content, web structure, and web usage mining.                      distributed   system   for    knowledge        representation       and

Han and Chang[32] author a paper on data mining for web                computing. The aim of the Semantic Web is to not only

intelligence that claims that ―incorporating data semantics            support access to information ―on the Web‖ by direct links or

could substantially enhance the quality of keyword-based               by search engines but also to support its use. Instead of

searches,‖ and indicate research problems that must be solved          searching for a document that matches keywords, it should be

to use data mining effectively in developing web intelligence.         possible to combine information to answer questions. Instead

The latter includes mining web search-engine data and                  of retrieving a plan for a trip to Hawaii, it should be possible

analyzing web’s link structure, classifying web documents              to automatically construct a travel plan that satisfies certain

automatically, mining web page semantic structures and page            goals and uses opportunities that arise dynamically. This gives

contents, and mining web dynamics. Web dynamics is the                 rise to a wide range of challenges. Some of them concern the

study of how the web changes in the context of its contents,           infrastructure, including the interoperability of systems and

structure, and access patterns.                                        the languages for the exchange of information rather than data.

Barsagade[33] provides a survey paper on web mining usage              Many challenges are in the area of knowledge representation,

and pattern discovery.Chau et al.[34] discuss personalized             discovery and engineering. They include the extraction of

multilingual web content mining. Kolari and Joshi [35]                 knowledge from data and its representation in a form

provide an overview of past and current work in the three              understandable by arbitrary parties, the intelligent questioning

main areas of web mining research-content, structure, and              and the delivery of answers to problems as opposed to

usage as well as emerging work in semantic web mining.                 conventional queries and the exploitation of formerly

Scime45 edit a ―Special Issue on Web Content Mining‖ of the            extracted knowledge in this process .

Journal of Intelligent Information Systems (JIIS).                     3.0 CONCLUSION-
                                                                       This paper has provided a more current evaluation and update
2.5 Semantic Web Mining- The Semantic Web[37] is a web
                                                                       of web mining research available. Extensive literature has
that is able to describe things in a way that computers can
                                                                       been reviewed based on three types of web mining, namely
understand. Statements are built with syntax rules. The syntax
                                                                       web content mining, web usage mining, and web structure
of a language defines the rules for building the language
                                                                       mining. This paper helps researchers and practitioners
statements. But how can syntax become semantic? This is
                                                                       effectively accumulate the knowledge in the field of web
what the Semantic Web is all about. Describing things in a
                                                                       mining, and speed its further development.
way that computers applications can understand it. The
Semantic Web is not about links between web pages. The
Semantic Web describes the relationships between things (like

                                                                 148                                http://sites.google.com/site/ijcsis/
                                                                                                    ISSN 1947-5500
                                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                                             Vol. 8, No. 7, October 2010



4.0 FUTURE RESEARCH DIRECTIONS-                                               [6]   O.etzioni. The world wield web: Quagmire or Gold
1.    Investigation into Semantic Web applications such as that                     Mining. Communicate of the ACM, (39)11:65-68, 1996;
      for   bioinformatics   in     which   biological   data    and          [7]   Margaret H. Dunham, ―Data Mining Introductory &
      knowledge bases are interconnected.                                           Advanced Topics‖, Pearson Education
2.    Applications of intelligent personal assistant or intelligent           [8]   Semantic Web Mining:State of the art and future
      software agent that automatically accumulates and                             directions‖ Web Semantics: Science, Services and
      classifies suitable information based on user preference                      Agents on the World Wide Web, Volume 4, Issue
3.    Although we have focused on representing knowledge in                         2, June 2006, Pages 124-143
      HTML Web Documents, there are numerous other file                       [9]   H. Zhang, Z. Chen, M. Li and Z. Su, Relevance feedback
      formats that are publicly accessible on the Internet. Also,                   and learning in content-based image search, World Wide
      if both the actual Web Documents and corresponding                            Web 6(2) (2003) 131–155.
      Back Link Documents were mainly composed of                             [10] L. Chen, W. Lian and W. Chue, Using web structure and
      multimedia information (e.g. graphics, audio, etc.), SVD                      summarization techniques for web content mining,
      will not be particularly effective in revealing more textual                  Inform. Process. Management: Int. J. 41(5) (2005) 1225–
      information. It would be worthwhile to research new                           1242
      techniques to include these file formats and multimedia                 [11] Ricardo Campos, Gael Dias, Celia Nunes, "WISE:
      information for knowledge representation.                                     Hierarchical Soft Clustering of Web Page Search Results
                                                                                    Based on Web Content Mining Techniques," wi, pp.301-
REFERENCE                                                                           304, 2006 IEEE/WIC/ACM International Conference on
[1]    Q. Yang and X. Wu, 10 challenging problems in data                           Web Intelligence (WI'06), 2006.
       mining research, Int. J Inform.Technol. Decision Making                [12] Mehdi Hosseini, Hassan Abolhassani,‖ Mining Search
       5(4) (2006) 597–604.                                                         Engine Query Log for Evaluating Content and Structure
[2]    Qingyu Zhang and Richard s. Segall,‖ Web mining: a                           of a Web Site‖, International Conference on Web
       survey of current research,Techniques, and software‖, in                     Intelligence 2007
       the International Journal of Information Technology &                  [13] Kshitija Pol, Nita Patil et al,‖A Survey on Web Content
       Decision Making Vol. 7, No. 4 (2008) 683–720                                 Mining and extraction of Structured and Semistructured
[3]    Kosala and Blockeel, ―Web mining research: A survey,‖                        data‖ in Proceedings of the 2008 First International
       SIGKDD:SIGKDD Explorations: Newsletter of the                                Conference on Emerging Trends in Engineering and
       Special Interest Group (SIG) on Knowledge Discovery                          Technology.
       and Data Mining, ACM, Vol. 2, 2000                                     [14] Sanjay Kumar Madria , Sourav S. Bhowmick , Wee
[4]    Minos N. Garofalakis, Rajeev Rastogi, et al ―Data                            Keong Ng , Ee-Peng Lim, Research Issues in Web Data
       Mining and the Web: Past, Present and Future‖                                Mining,   Proceedings    of    the     First    International
       Proceedings of the 2nd international workshop on Web                         Conference on Data Warehousing and Knowledge
       information    and    data    management     Kansas      City,               Discovery, p.303-312, September 01, 1999
       Missouri, United States pp: 43 - 47 (1999)                             [15] J. M. Kleinberg. Authoritative sources in a hyperlinked
[5]    Bin Wang, Zhijing Liu, "Web Mining Research," iccima,                        environment. Journal of the ACM, 46(5):604–632, 1999.
       pp.84, Fifth International Conference on Computational                 [16] Nacim Fateh Chikhi, Bernard Rothenburger, Nathalie
       Intelligence and Multimedia Applications (ICCIMA'03),                        Aussenac-Gilles ―A Comparison of Dimensionality
       2003                                                                         Reduction Techniques for Web Structure Mining‖,



                                                                        149                             http://sites.google.com/site/ijcsis/
                                                                                                        ISSN 1947-5500
                                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                                       Vol. 8, No. 7, October 2010



     Proceedings of the IEEE/WIC/ACM International                          Graph Partitioning algorithm " Journal of Theoretical and
     Conference on Web Intelligence,P.116-119 ,2007                         Applied Information Technology vol. 4, pp. 1125-1130,
[17] Lefteris Moussiades, Athena Vakali, "Mining the                        2008
     Community Structure of a Web Site," bci, pp.239-244,               [29] Zhang Huiying; Liang Wei;,"An intelligent algorithm of
     2009 Fourth Balkan Conference in Informatics, 2009                      data pre-processing in Web usage mining," Intelligent
[18] J. Furnkranz, Web structure mining — Exploiting the                     Control and Automation, 2004. WCICA 2004. Fifth
     graph structure of the worldwide web, ¨OGAI-J. 21(2)                    World Congress on , vol.4, no., pp. 3119- 3123 Vol.4,
     (2002) 17–26                                                            15-19 June 2004
[19] K. A. Smith and A. Ng, Web page clustering using a                 [30] Mehdi Hosseini , Hassan Abol hassani ,―Mining Search
     self-organizing   map   of   user   navigation   patterns,              Engine Query Log for Evaluating Content and Structure
     Decision Support Syst. 35(2) (2003) 245–256                             of    a   Web      Site‖    in    Proceedings       of     the    2007
[20] X. Fang and O. Sheng, LinkSelector: A web mining                        IEEE/WIC/ACM International Conference on Web
     approach to hyperlink selection for web portals, ACM                    Intelligence.
     Trans. Internet Tech. 4(2) (2004) 209–237                          [31] J. Han and C. Chang, Data mining for web intelligence,
[21] B. Hay, G. Wets and K. Vanhoof, Mining navigation                       Computer (November 2002),pp. 54–60, http://www-
     patterns using a sequence alignment method, Knowledge                   faculty.cs.uiuc.edu/∼hanj/pdf/computer02.pdf
     Inform. Syst. 6(2) (2004) 150–163                                  [32] N. Barsagade, Web usage mining and pattern discovery:
[22] Q. Song and M. Shepperd, Mining web browsing                            A survey paper, Computer Science and Engineering
     patterns for e-commerce, Comput. Indus. 57(7) (2006)                    Dept., CSE Tech Report 8331 (Southern Methodist
     622–630                                                                 University,Dallas, Texas, USA, 2003).
[23] Jaideep Srivastava, R. Cooley, ―Web Usage Mining:                  [33] R. Chau, C. Yeh and K. Smith, Personalized multilingual
     Discovery and Applications of Usage Patterns from Web                   web content mining, KES (2004), pp. 155–163
     Data‖, ACM SIGKDD, VOL.7 No. 2 Jan 2000                            [34] P. Kolari and A. Joshi, Web mining: Research and
[24] Subhash K.Shinde, Dr.U.V.Kulkarni, ―A New Approach                      practice, Comput. Sci. Eng.July/August (2004) 42–53
     For On Line Recommender System in Web Usage                        [35] A. Scime, Guest Editor’s Introduction: Special Issue on
     Mining‖,Proceedings     of   the    2008    International               Web Content Mining: Special Issue on Web Content
     Conference on Advanced Computer Theory and                              Mining, J. Intell. Inform. Syst. 22(3) (2004) 211–213
     Engineering Pages: 973-977                                         [36] W3Schools,          Semantic         web        tutorial         (2008)
[25] Borges and M. Levene,‖A dynamic clustering-based                        http://www.w3schools.com/semweb/default.asp
     markov model for web usage Mining‖, cs.IR/0406032,                 [37] Semantic Web Mining:State of the art and future
     2004                                                                    directions‖ Web Semantics: Science, Services and
[26] Yannis Manolopoulos et al, ―Indexing web access-logs                    Agents on the World Wide Web, Volume 4, Issue
     for pattern queries‖,Workshop On Web Information And                    2, June 2006, Pages 124-143
     Data Management archive Proceedings of the 4th                     [38] Berners-Lee, T., Fischetti, M.: Weaving the Web. Harper, San
     international workshop on Web information and data                       Francisco (1999
     management pp: 63 - 68 2002                                        [39] Bettina             Berendt, Andreas                  Hotho, Dunja
[27] B. Liu and K. Chang, Editorial: Special issue on web                    Mladenic, Maarten                  van             Someren, Myra
     content mining, SIGKDD Explorations 6(2) (2004) 1–4                     Spiliopoulou and Gerd Stumme ―A Roadmap for Web
[28] M. Jalali, N. Mustapha, M. N. Sulaiman, and A. Mamat,                   Mining:From Web to Semantic Web‖DOI: 10.1007/978-
    "Web User Navigation Pattern mining approach based on                    3-540-30123-3_1

                                                                  150                                   http://sites.google.com/site/ijcsis/
                                                                                                        ISSN 1947-5500

				
DOCUMENT INFO
Description: Vol. 8 No. 7 October 2010 International Journal of Computer Science and Information Security