An Intelligent Agent Based Text-Mining System: Presenting Concept through Design Approach by ijcsiseditor


More Info
									                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                            Vol. 9, No. 4, April 2011

            An Intelligent Agent Based Text-Mining
      System: Presenting Concept through Design
1                                     2                                        3
    Kaustubh S. Raval                     Ranjeetsingh S. Suryawanshi              Professor Devendra M. Thakore

M.Tech. (Computer Engineering) M.Tech. (Computer Engineering) Department of Computer Engineering        

                                       1, 2, 3
                                                 Bharati Vidyapeeth Deemed University,

                                          College of Engineering, Pune – 411043.

       Abstract – Text mining is a variation on a field            useful to the data owner. It derives business
called data mining and refers to the process of                    intelligence from the data warehouse by using
deriving high-quality information from unstructured                advanced analytical techniques such as neural
text. In text-mining the goal is to discover unknown
                                                                   network heuristics, fuzzy logic, statistical analysis
information, something that may not be known by
people. Now here the aim is to design an intelligent
agent based text-mining system which reads on the
text (input) and based on the keyword provide the
                                                                          Automated Data Mining: Using automated data

matching documents (in the form of links) or options               mining we can sweep through databases and
(statements) according to the user’s query. In this                discover previously unknown patterns. In their
paper the effort is to depict design approach for                  paper [1], Dr. V. Saravanan and J. Rajan proposed
intelligent agent based text mining system.                        an automated data mining system which compasses
                                                                   familiar data mining algorithms. According to them
       Keywords – Data Mining, Text Mining, Intelligent
                                                                   the system will automatically select the appropriate
                                                                   data mining technique and select the necessary field
                     I.   INTRODUCTION                             needed from the database at the appropriate time
                                                                   without expecting the users to specify the specific
       First of all, we need basic information about
                                                                   techniques and the parameters.
various terms on which this work is to be carried
                                                                          Text Mining: Text-mining is a variation on a
                                                                   field called data-mining and refers to the process of
       Data Mining: Data mining is the analysis of
                                                                   deriving     high-quality    information       from     the
(often large) observational data sets to find
                                                                   unstructured text. ‘High quality’ in text-mining
unsuspected relationships and to summarize the
data in novel ways that are both understandable and
                                                                                         ISSN 1947-5500
                                           (IJCSIS) International Journal of Computer Science and Information Security,
                                           Vol. 9, No. 4, April 2011

usually refers to some combination of relevance,                 Engine’ is the best example of optimized intelligent
novelty and interestingness. [3]                                 software     agent     based      text-mining      system
                                                                 encompassing a very large domain of web.

                                                                                      II. SYSTEM DESIGN
    Intelligent   Agents:    Intelligent   agents    are
                                                                      System design includes use-case diagram and
software entities that carry out some set of
                                                                 sequence diagram. Use-case diagram depicts how
operations on behalf of a user with some degree of
                                                                 the user interacts with the proposed intelligent
independence or autonomy, and in doing so,
                                                                 agent based system whereas the sequence diagram
employ some knowledge or representation of the
                                                                 depicts how the flow of actions carried out by
user’s goals or desires. Software agents are useful
                                                                 different agents in the system.
in automating repetitive tasks, finding and filtering
information, intelligently summarizing complex
data, and so on, but more importantly, just like their
human counterparts, intelligent agents can have
capability to learn from the managers and even
make recommendations to them regarding a
particular course of action. Agents have several
common characteristics, such as their ability to
communicate, cooperate, and coordinate with other
agents in system. Each agent is capable of acting
autonomously, cooperatively, and collectively to
achieve the collective goal of a system. The
coordination capability helps manage problem
solving so that co-operating agents work together as
a single team. [9]

    The literature study of various research papers
and my interest in the field of ‘Data Mining’
motivated me to take up this as my dissertation
topic for post-graduation.

    Study of existing biomedical text mining
system, named, ‘PolySearch’ also provide the
insights to overall ‘text mining system’ and thus
lead me to take up ‘Intelligent Software Agent
Based Text Mining’ as my dissertation topic.

    Working scenario of ‘Google Search Engine’
                                                                             Fig. 1 User Interacting with system
also has been the motivational factor to take up this
                                                                      As shown in the Fig. 1 user will type the text
topic as my dissertation work. ‘Google Search
                                                                 then text miner agent 1, which is keyphrase-based,
                                                                                       ISSN 1947-5500
                                      (IJCSIS) International Journal of Computer Science and Information Security,
                                      Vol. 9, No. 4, April 2011

will decide the keyword then intelligent agent will                         III. SYSTEM DESCRIPTION
decide the context for that ‘keyword’ then text                  System description is the context which
miner agent 2, which is keyword based, will decide          includes the details about the overall working of the
the meaning of the keyword in particular context,           existing or proposed system.
find out related documents, calculate weight matrix              Why Agents?
value and then attach that value to the document.                Text mining mainly includes the field of
Then intelligent agent will rank the documents              information retrieval which means the finding of
based on weight-matrix values.                              documents which contain answers to questions and
                                                            not the finding of answers itself and for this to
                                                            achieve statistical measures and methods are used.
                                                            By    using    statistical    measures    and    methods
                                                            automatic processing of text data and comparison to
                                                            given question is performed. But the issue here is
                                                            how to automate the processing of text data? And
                                                            that is where ‘Agents’ come into picture.

                                                                 System Architecture
                                                                 Fig. 5 shows the architectural diagram for
                                                            intelligent agent based text-mining system. It
                                                            includes all the components required to make the
                                                            system     workable     and     the   relationship     and
                                                            interaction between them. There are mainly three
                                                            agents, one dataset, the user category, and one
                                                            cache/log component.
                                                            Working of the Intelligent Agent in two phases::
                                                            Phase 1:
                                                                Takes the input from Text Miner Agent 1 (that
                                                                 is key-phrase/keyword).
                                                                Find out the contexts (documents) for key-
                                                                 phrase word.
                                                            Phase 2:
                                                                Takes input from Text Miner Agent 2 that is
                                                                 links and their associated weight matrix values.
                                                                Compare the weight matrix values of various
               Fig. 2 Sequence Diagram                           links and decide which one is the ‘close-to-
    Fig. 2 shows the sequence diagram of the                     best-match’ for user’s query.
system interaction diagram between different                    The link with the highest weight matrix value
agents of the system.                                            ranked first, the link with second highest
                                                                 weight matrix value ranked second, the link

                                                                                  ISSN 1947-5500
                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                       Vol. 9, No. 4, April 2011

    with third highest weight matrix value ranked            determined. Fig. 4 shows the pictorial view of the
    third and so on.                                         working of the intelligent agent in phase 1 in terms
   Display the ranked links to the user.                    of flowchart.
                                                                  In phase 2, the intelligent agent takes the input
                                                             from text miner agent 2, that is ‘Keyword based
                                                             agent’. The input contains the list of links
                                                             (documents/options) with associated ‘weight matrix
                                                             value’. These links are retrieved by checking the
                                                             every context, containing different documents, in
                                                             which the ‘key-phrase’ or ‘keyword’ has appeared.
                                                             Now, using ‘Decision making algorithm’ the
                                                             intelligent agent decides which one of the many
                                                             links (documents/options) is the ‘close-to-exact-
                                                             match’ for the information user is looking forward.
                                                                  The link (document/option) with associated
                                                             highest ‘weight matrix value’ is decided to be the
                                                             ‘close-to-best-match’ then the next link with second
                                                             highest ‘weight matrix value’ is the second best
                                                             match and so on. Then these links are ordered and
                                                             ranked according to their ‘weight matrix value’ and
                                                             presented to the user. Fig. 5 shows the pictorial
                                                             view of the working of the intelligent agent in
                                                             phase 1 in terms of flowchart.

Fig. 3 Architecture of Intelligent Agent Based Text-
                  Mining System

    Phases in working of Intelligent Agent
    In the proposed ‘Intelligent agent based’
system, the intelligent agent should have to work in
two phases.
    In phase 1, the intelligent agent would prompt
the text miner agent 1, which is ‘Key-order and
Key-phrase based agent’, for the required ‘key-
phrase’ based on which various contents need to be                 Fig. 4 Working of Intelligent agent in phase 1
                                                                                  ISSN 1947-5500
                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                         Vol. 9, No. 4, April 2011

                                                                   base for the specific ‘Text-Mining System for
                                                                   Medical Science’ and provide the automated
                                                                   way of dealing with details required for various
                                                                   diseases and their probable solutions.

                                                               2) Space Science

                                                                         There are always new researches are going
                                                                   on in the field of space science and those are
                                                                   mainly related to astronomy.

                                                                         Scientists are working to find out the cause
                                                                   of earth’s birth, how the environment has been
                                                                   developed on earth? How these all planets were
                                                                   taken birth? How the perimeters have been
                                                                   decided for every planet? All these types of
                                                                   questions    require   mining      of     too   much
                                                                   information and scientists have to look for each
                                                                   and every aspect of the information very

                                                                         Thus, the system which is to be developed
        Fig. 5 Working of Intelligent Agent in phase 2
                                                                   can work as the base for ‘Text-Mining System
                                                                   for Space Science’ and provide the useful
                      IV APPLICATIONS
                                                                   information to scientists for their research work.
    The proposed system would work as the base
for some specific fields where there is a
                                                               3) Engineering Technologies
requirement of intelligent agent based text-mining.
    Each of these fields has different requirements                      Engineering is the field which encompasses
for the type of information according to various                   various specific fields in it. All these fields have
uses.                                                              specific applications and this requires dealing
                                                                   with too much text content. Engineers in
1) Medical Science
                                                                   different fields need to be finding out solutions
                                                                   for    various   technological      and     technical
          In medical science field, the new inventions
                                                                   problems. Now, dealing with huge amount of
   of medicines and vaccines are increasing day by
                                                                   text data is not an easy task, so it’s better to
   day. So, the doctors need to be aware of what is
                                                                   have an automated (intelligent agent based)
   going on in their field? Moreover, doctors are
                                                                   system to perform all this work.
   concerned to cure patients properly using
   medicines and by other means.
                                                                         The intelligent agent based text mining
                                                                   system works with huge amount of data and
          Thus, the system which is to be developed
                                                                   retrieve required data in fraction of seconds or
   under this dissertation work will provide the
                                                                   minutes (In an ideal condition). Thus the
                                                                                    ISSN 1947-5500
                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                Vol. 9, No. 4, April 2011

    intelligent agent based systems can speed up the                  [7] Dae Su Kim, Chang Suk Kim, and Kee Wook Rim,

    data retrieval and processing.                                    “Modelling and Design of Intelligent Agent System”,
                                                                      International Journal of Control, Automation, and
         Thus, the system which is to be developed                    Systems Vol. 1, No. 2, pages 257-261, June 2003.
    can work as the base for ‘Text-Mining System                      [8] Andreas Hotho, Andreas Nurnberger, and Gerhard

    for Engineering Technologies’ and provide the                     Paaß, “A brief Survey of text mining”.
                                                                      [9] Stuart Russell and Peter Norvig, “Artificial In
    useful information to scientists/engineers for
                                                                      telligence, Chapter 2: Intelligent Agents – A Modern
    their research work.

                                                                                          AUTHORS PROFILE
     Based on these design specifications, the
                                                                           1) Kaustubh S. Raval graduated (B.E -
intelligent agent based text-mining system would
                                                                                Computer      Engineering)      from     Gujarat
be developed in which intelligent agent need to
                                                                                University, Ahmedabad, and State-Gujarat
incorporate two algorithms:
                                                                                in the year 2009. Currently pursuing
     1) Decision making algorithm – to determine
                                                                                M.Tech. (Computer) with specialization in
          possible context (documents) for the
                                                                                subject    ‘Data    Mining’     from     Bharati
                                                                                Vidyapeeth Deemed University College of
     2) Ranking          algorithm    –    to     rank    the
                                                                                Engineering, Pune.
          documents (options).
                                                                           2) Ranjeetsingh S. Suryawanshi graduated
                                                                                (B.E – Computer Engineering) from Pune
                                                                                University, and State – Maharashtra in the
[1] Dr. V. Saravanan and J. Rajan, “A Framework of an
                                                                                year 2005. Currently pursuing M.Tech.
Automated Data Mining System using Autonomous
Intelligent   Agents”,     International   Conference      on                   (Computer) with specialization in subject
computer Science and Technology, pages 700-704, 2008.                           ‘Data Mining’ from Bharati Vidyapeeth
[2] Ranjit Bose and Vijayan Sugumaran, “IDM: An                                 Deemed        University        College        of
Intelligent Software Based Data Mining Environment”,                            Engineering, Pune.
IEEE, pages 288-2893, 1998.                                                3) Professor D.M.Thakore graduated (B.E –
[3] Vishal Gupta and Gurpreet S. Lehal, “A Survey of                            Computer      Engineering)      from     Shivaji
Text Mining Techniques and Applications”, Journal of
                                                                                University,      Sangali,      and     State       –
Emerging Technologies in Web Intelligence, vol. 1,pages
                                                                                Maharashtra in 1990.
60-76, August 2009.
                                                                                He had pursued his M.E. (Computer) from
[4] Ah-Hwee Tan, “Text Mining: The state of the art and
the challenges”.                                                                Bharati Vidyapeeth University College of
[5] J. You and J. Liu, “An Agent Based Visual Data                              Engineering, Pune in 2004.
Mining for Intelligent Web Browsing with E-Commerce                             He is currently pursuing his Ph.D. with
Applications”,     IEEE    International   Fuzzy     Systems                    specialization       in      subject       ‘Data
Conference, pages 936-939, 2001.                                                Mining/Text        Mining’      from     Bharati
[6] Azuraliza Abu Bakar, Zulaiha Ali Othman, Abdul                              Vidyapeeth Deemed University College of
Razak Hamdan, Rozianiwati Yusof, Ruhaizan Ismail,
                                                                                Engineering, Pune.
“Agent Based Data Classification Approach for Data
Mining”, IEEE, 2008.

                                                                                            ISSN 1947-5500

To top