Docstoc

CS4200-2010-RS-Grp-21-SeMap

Document Sample
CS4200-2010-RS-Grp-21-SeMap Powered By Docstoc
					    University of Moratuwa
    Computer Science & Engineering




    SeMap – Mapping Dependency
    Relationships into Semantic
    Frame Relationships
    Software Requirements Specification
    Version 1.1




Internal Supervisor:            070085B   N. H. N. D. de Silva
        Dr. Shehan Perera       070125B   C. S. N. J. Fernando
External Supervisor:            070298F   M. K. D. T. Maldeniya
        Dr. Ben Goertzel        070548A   D. N. C. Wijeratne
SeMap                                                   Version:   1.1

Software Requirements Specification                     Date:      05/Nov/2010




Revision History

        Date          Version                   Description               Author

 29 / 10 / 2010          1.0                  Draft Document               SeMap
 05 / 11 / 2010          1.1          Approval pending SRS Document        SeMap




                                ©University of Moratuwa, 2010                i|Page
SeMap                                                                                                       Version:                     1.1

Software Requirements Specification                                                                         Date:                        05/Nov/2010




Table of Contents

1. Introduction................................................................................................................................................................1
    1.1 Purpose..................................................................................................................................................................1
    1.2 Document Conventions..................................................................................................................................1
    1.3 Intended Audience ...........................................................................................................................................1
    1.4 Project Scope ......................................................................................................................................................2
    1.5 Definitions, Acronyms, and Abbreviations ...........................................................................................3
    1.6 References ............................................................................................................................................................3
    1.7 Overview ...............................................................................................................................................................4
2. Overall Description .................................................................................................................................................5
    2.1 Project Objective ...............................................................................................................................................5
    2.2 Product Features...............................................................................................................................................5
    2.3 Users of the System .........................................................................................................................................5
    2.4 Operating Environment.................................................................................................................................6
    2.5 Design and Implementation Constraints ..............................................................................................6
    2.6 User Documentation .......................................................................................................................................6
    2.7 Assumptions and Dependencies ...............................................................................................................7
3. Functional Requirements.....................................................................................................................................8
4. Architectural Overview .........................................................................................................................................9
5. Non – Functional Requirements .................................................................................................................... 12
    5.1 Performance Requirements ..................................................................................................................... 12
    5.2 Software Quality Requirements ............................................................................................................. 12




                                                            ©University of Moratuwa, 2010                                                                      ii | P a g e
SeMap                                                   Version:     1.1

Software Requirements Specification                     Date:        05/Nov/2010



1. Introduction

1.1 Purpose

Purpose of this document is to identify software requirements of research project “SeMap”,
and to provide clear guidelines for development team and evaluators on functionalities
expected from the project. The objective of SeMap is to develop an improved framework for
mapping semantic relationships drawn from English sentences to sets of semantic frames.
The project consists of two primary phases and an additional phase which is expected to be
carried once the primary phases are completed.

This document provides detailed description of both functional and non-functional
software requirements of SeMap. This is to be used for the purpose of designing the
software architecture and implementation of the project.



1.2 Document Conventions

The following document conventions have been used to ensure ease of readability.

          • Major Headings               16pt, Cambria, Dark Blue, Bold
          • Sub Headings                 14pt, Cambria, Blue, Bold
          • Other Headings               13pt, Cambria, Black, Bold, Underline
          • All Body Text                12pt, Cambria, Black



1.3 Intended Audience

This document is intended for the following audience.

   • Project supervisors
   • Course coordinator and support staff
   • Project designers and development team
   • Researchers in the field of general Purpose AI development and machine learning



                               ©University of Moratuwa, 2010                     1|P a ge
SeMap                                                  Version:       1.1

Software Requirements Specification                    Date:          05/Nov/2010


1.4 Project Scope

As mentioned previously the project expectation is to develop a superior/improved
framework for mapping semantic relationships drawn from English sentences to sets of
semantic frames. The goal is reliable performance for relatively simple sentences involving
common words and concepts, and reasonable accuracy for more complex sentences.
Implementing any sense disambiguation beyond what is already present within the
existing RelEx framework is not focused upon in the project; so in cases of highly
ambiguous words, our system will output multiple possible interpretations.

The project may be viewed in the form of two primary phases and one additional phase
which would be dependent on the success of the first two. The first phase would focus on
analyzing the RelEx2Frame software and incorporating an appropriate standard rule
engine to it with necessary modifications to achieve improved performance. It would also
include integrating RelEx2Frame with RelEx and the Artificial General Intelligence
Framework OpenCog which establishes a pipeline that translates English to semantic nodes
and links. Extensive testing will be carried out at this point to verify the accuracy and
execution speed of the framework.

The second phase will include researching statistical natural language processing
techniques, identifying and evaluating potential learning techniques and algorithms that
will enable expanding the rule base for mapping semantic relations to sets of semantic
frames and implementing a selected model to incorporate automatic extensibility to the
framework.

Based on the success of the second phase as well as time constraints an attempt may be
made to generate a “commonsense knowledge base” by employing a suitable data mining
algorithm on the results from the mapping framework. The expectation of this phase is to
employ the developed framework to identify probabilistic relationships between concepts
in human knowledge.




                               ©University of Moratuwa, 2010                     2|P a ge
SeMap                                                            Version:         1.1

Software Requirements Specification                              Date:            05/Nov/2010


1.5 Definitions, Acronyms, and Abbreviations


 •       SeMap                Semantic Mapping
 •       AI                   Artificial Intelligence
 •       OpenCog              Open Cognitive
                              Relationship Extractor – A system developed under OpenCog
 •       RelEx
                              Project
                              Relationship Extractor to Frame – A system developed to map
 •       RelEx2Frame          dependency relationships to frame relationships (Expected system
                              to be replaced with SeMap)
                              Natural Language Generator – A system developed under OpenCog
 •       NLGen
                              Project
 •       WordNet/JWNL         Java WordNet Library
 •       OpenNLP              Open Natural Language Processing

 •       getopt               C library function used to parse command-line options

 •       FrameNet             Output of RelEx2Frame



1.6 References

     •        Ranjit Bose, “Natural Language Processing: Current state and future direction,”
              International Journal of the Computer, the Internet and Management, vol. 12, 2004.
     •        Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural
              Language Processing, 6th. Ed. Cambridge: The MIT Press, 2003.
     •         (2007, Oct.). David Hart. OpenCog Project. [Online]. Available :
              https://launchpad.net/opencog-project
     •        (2010, Aug.). The Open Cognition Project. [Online]. Available :
              http://wiki.opencog.org/w/The_Open_Cognition_Project
     •        (2007, Oct.). RelEx Developers. RelEx Dependency Relationship Extractor. [Online].
              Available : https://launchpad.net/relex
     •        (2010, May). RelEx Dependency Relationship Extractor. [Online]. Available :
              http://wiki.opencog.org/w/RelEx
     •        (2009, Jun.). RelEx2Frame. [Online]. Available :
              http://wiki.opencog.org/w/RelEx2Frame
     •        (2010, Feb.). RelEx2Atoms. [Online]. Available :
              http://wiki.opencog.org/w/RelEx2Atoms


                                      ©University of Moratuwa, 2010                        3|P a ge
SeMap                                                    Version:       1.1

Software Requirements Specification                      Date:          05/Nov/2010

    •   (2007, Oct.). OpenCog Developers. OpenCog Framework. [Online]. Available :
        https://launchpad.net/opencog
    •   The Stanford Parser: A Statistical Parser. [Online]. Available :
        http://nlp.stanford.edu/software/lex-parser.shtml
    •    (2010, Sep.). Davy Temperley, John Lafferty and Daniel Sleator. Link Grammar
        Parser. [Online]. Available: http://www.abisource.com/projects/link-grammar/
    •   Open Source rule engines in Java. [Online]. Available: http://java-source.net/open-
        source/rule-engines
    •   (2009, Mar.). Natural Language Generator. [Online]. Available :
        https://launchpad.net/nlgen
    •   (2010, Jul.). Bonnie Webber. Natural Language: Understanding and Generating Text
        & speech. [Online]. Available :
        http://www.aaai.org/AITopics/pmwiki/pmwiki.php/AITopics/NaturalLanguage


1.7 Overview

This software requirements specification document is structured in the following manner.

•   Overall Description – A background overview of the system, its dependencies and
    product features of SeMap are identified under this section.
•   Functional Requirements – This section of the document includes the functional
    requirements of the system to a satisfactory detail in order to carry out the designing of
    the system.
•   Architectural Overview – RelEx dependency relations of the system are identified under
    this section of the document.
•   Non – Functional Requirements – Performance and software attribute requirements for
    the SeMap system are identified under this section.




                                ©University of Moratuwa, 2010                       4|P a ge
SeMap                                                     Version:        1.1

Software Requirements Specification                       Date:           05/Nov/2010


2. Overall Description
2.1 Project Objective

The key objective of this project is to develop a superior framework for mapping English
Language Semantic Dependency Relationships to sets of semantic frames with reasonable
accuracy for complex sentences with an integrated statistical linguistics based artificial
intelligence component to allow automatic extensibility.



2.2 Product Features

Primary Features

•   This product will map dependency relationships in English sentences into semantic
    frame relationships. The first main feature will be applying a given set of rules, which
    are based on the English grammatical structure and conventions, on given sentences.
    This process would result in structured semantic frame relationships which are
    comprehensible for the OpenCog based products.

•   The second main feature would be the ability to learn rules that are not already in the
    rule base, by means of statistical analysis of the current set of rules and given sentences.


Extended Features

•   The primary extended feature will be the ability to detect probabilistic relationships
    between concepts by running a data-mining algorithm on the results (i.e set of
    semantics), which has been generated by running the product on a corpus. This feature
    would provide us with unforeseen relationships between semantics of the English
    language.



2.3 Users of the System

Following users will find this product useful in their filed
    • NLGen developers
    • General purpose AI developers
    • Linguists
    • Researchers interested in Machine learning

                                ©University of Moratuwa, 2010                         5|P a ge
SeMap                                                    Version:       1.1

Software Requirements Specification                      Date:          05/Nov/2010


2.4 Operating Environment

•   Research is based on a component of the RelEx, which is RelEx2Frame. Thus, RelEx
    which is a syntactic dependency extractor and semantic framing generator will be the
    development environment.

•   RelEx in turn is a narrow artificial intelligence component of the OpenCog, an artificial
    intelligence framework, thus OpenCog can be considered as the foundation of the
    development.

•   Both RelEx and OpenCog can be used in Linux or windows based operating systems.



2.5 Design and Implementation Constraints

The Framework will be developed to receive input from the RelEx dependency relationship
extractor and as such the input interface will be restricted based on the interactions with
RelEx.
One of the key goals of this project as well as the overall OpenCog endeavor is extensibility.
Thus the framework will need to be developed as a module and the implementation of the
module itself must facilitate extension.
The project requires that an appropriate rule engine be integrated in to framework to
handle the mapping rules. However the development of a rule engine is not a objective of
the project. Instead the expectation is to employ an existing rule engine for this purpose.
Thus the architecture and the interfaces of the chosen rule engine will impose restrictions
on the design and implementation of the framework.
Further the already carried out development work with regard to a prototype version of
the framework by other developers will impose implementation restrictions if that work is
to be incorporated.



2.6 User Documentation

Since the research is carried out under OpenCog, wiki pages which are connected to RelEx
and RelEx2Frame will be updated as necessary.

Documentation on creating new rules using the new rule engine will also be added.


                                ©University of Moratuwa, 2010                       6|P a ge
SeMap                                                  Version:        1.1

Software Requirements Specification                    Date:           05/Nov/2010


2.7 Assumptions and Dependencies

A possible beneficiary of our research would be a RelEx user. That is our main assumption.

Since this is an improvement on RelEx, our research is highly dependent on it. RelEx is
dependent on many other applications as well, such as Link Grammar Parser, WordNet,
JWNL Java WordNet Library, GNU getopt library, OpenNLP etc.

A suitable free and open source rule engine such as Drools or OpenRule would be used in
the research, and that rule engine will be another dependency.

Since the system receives input from the RelEx dependency relationship extractor, it is
assumed that the RelEx will function reliably.




                               ©University of Moratuwa, 2010                      7|P a ge
SeMap                                                    Version:        1.1

Software Requirements Specification                      Date:           05/Nov/2010


3. Functional Requirements

Mapping dependency relationships in English sentences into semantic frame
relationships using the given set of rules.

Like any language, English also has a grammatical structure defined. The Relex2Frame
currently possesses a set of approximately five thousand hard coded rules. The product is
required to port these to a rule engine so that it can present the user with the functionality,
of dismantling given sentences in to a series of semantic relationships.



Statistical learning based on the current rules

English is a matured language with a considerable number of grammar rules which is well
beyond the mere 5000 rules that are hard coded in to the rule engine. The product is
required to learn new possible grammar rules using statistics on the rules that it know
already and with logical approximation to the new sentence that it was given and decide on
new additions to its vocabulary and/or rule base.



Detecting probabilistic relationships between concepts

There are instances where the context of a sentence would handle concepts that are not
literally expressed. The usage of idioms is a classic example of such a situation. The product
is expected to mine through the results (i.e.: set of semantics), which has been generated by
running the product on a corpus, and produce a list of probabilistic relationships between
concepts in that set of results.




                                ©University of Moratuwa, 2010                        8|P a ge
SeMap                                                   Version:       1.1

Software Requirements Specification                     Date:          05/Nov/2010


4. Architectural Overview


     RelEx Semantic                                     RelEx2Frame Rule
  Dependency Relations                                        Base




                            Core Framework for                          Statistical Linguistics
                             Mapping Semantic                            Based Learning AI
                          Relationships to semantic
                                   Nodes



     Human Knowledge
    Concept Data Mining
        Component



                                                 OpenCog Artificial
                                                General Intelligence
      “Common Sense”                               Framework:
      Knowledge Base                               Frame2Atom



                           RelEx Semantic Dependency Relations



The Semantic Dependency Relation Extractor RelEx which has been developed as a module
for the OpenCog Framework provides the semantic relations of a given English sentence in
a standard format that is compatible with Link Grammar output.




                                ©University of Moratuwa, 2010                      9|P a ge
SeMap                                                  Version:     1.1

Software Requirements Specification                    Date:        05/Nov/2010

Ex :

Sample Sentence:

Alice looked at the cover of Shonen Jump.


RelEx Output:

at(look, cover)
_subj(look, Alice )
tense(look, past)
of(cover, Jump)
DEFINITE-FLAG(cover, T)
noun_number(cover, singular)
_amod(Shonen, Jump)
DEFINITE-FLAG(Shonen, T)
noun_number(Shonen, singular)
DEFINITE-FLAG(Jump, T)
noun_number(Jump, singular)
DEFINITE-FLAG(Alice , T)
gender(Alice , feminine)
noun_number(Alice , singular)
person-FLAG(Alice , T)

RelEx2Frame Rule base

The rule base for mapping the Semantic Relations to semantic nodes will comprise of rules
compatible with the Drools Rule Engine which is the rule engine selected for the project.
The following is a sample mapping rule in the format expected to be implemented.

rule "1"
       when p: Processor( eval(p.existence("_predobj(be,$atLocation)")) )
       then
       eval(p.AppendRule(" ^1_Existence:Place($atLocation,$atLocation)"));
end

The rule verifies the presence of a combination of semantic relations in a given RelEx
output and maps it to the corresponding semantic node


                               ©University of Moratuwa, 2010                  10 | P a g e
SeMap                                                  Version:       1.1

Software Requirements Specification                    Date:          05/Nov/2010


Core Mapping Framework

The mapping framework will receive the RelEx output from a given body of text and match
these against the rule base and deliver the relevant semantic nodes. The primary
component of the framework will be the Rule Engine which would be responsible for
“firing” the rules in the rule base against the RelEx output. The process of verifying the
conditions (semantic relation combinations) has been divided in to a number of
independent operations based on the types of semantic relations produced by RelEx as
output and will also include support for extracting and matching concept variables in the
RelEx output to values in the concept variable value store in RelEx.



Statistical Linguistics Based Learning AI

This artificial intelligence component would be responsible for analyzing the RelEx output
in the context of the existing rule base and using statistical linguistic based learning to
extend the rule base as well as the concept variable store. This is an experimental
component which forms the bulk of the research to be carried out in the project and a
suitable architecture is currently being researched.



Human Knowledge Concept Data Mining Component

This Data mining module will receive the semantic nodes from the core framework as input
and use these to extract probabilistic relationships among human knowledge concepts for
the purpose of automatically developing a “common sense knowledge base” like Cyc. The
module would be developed as optional since this is a specific extended function which can
be ignored in the normal use of the RelEx framework




                               ©University of Moratuwa, 2010                    11 | P a g e
SeMap                                                     Version:        1.1

Software Requirements Specification                       Date:           05/Nov/2010


5. Non – Functional Requirements
5.1 Performance Requirements

Since the system is expected to be used in applications like chat bots, text critiquing, and
information retrieval etc. execution time is very important. The existing Relex2Frame
project outputs 20 FrameNet outputs within an average of 500ms. Since the Project
objective of SeMap is to replace Relex2Frame, it is required to perform beyond the existing
level of performance.

Developing performance requirements for the statistical learning AI is currently difficult as
the performance measures can only be identified after significant amount of research has
been carried out as well as at least an initial design finalized for the phase 2 of the project.



5.2 Software Quality Requirements

Project SeMap need to adhere to the following software quality requirements for successful
completion.

•   The system should provide reasonably accurate results for relatively simple sentences.

•   The system should be designed in such a manner that it will facilitate extendibility for
    further research and development.

•   System is required to interact with the RelEx dependency relationship extractor. Thus
    input interface of the system should be compatible to obtain input from RelEx.

•   Standard Java Coding conventions should be carried out to support code readability and
    maintainability.




                                ©University of Moratuwa, 2010                       12 | P a g e

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:31
posted:3/28/2011
language:English
pages:15