Docstoc

QUESTION ANSWERING SYSTEM FROM FAQ PAGES 2010

Document Sample
QUESTION ANSWERING SYSTEM FROM FAQ PAGES 2010 Powered By Docstoc
					         QUESTION ANSWERING SYSTEM FROM FAQ PAGES

 

 
A dissertation submitted to the University of Manchester for the degree of MSc in Informatics in the
Faculty of Engineering and Physical Sciences.




                                              2010




                               Anastasia Tsoutsoumpi

                            School of computer science
TABLE OF CONTENTS 

Abstract…………………………………………………………………………………………………………………7 

Declaration…………………………………………………………………………………………………………….8 

Copyright Statement………………………………………………………………………………………………9 

Acknowledgements……………………………………………………………………………………………..10 



1.  Introduction  

   1.1   Motivation…………………………………………………………………………………………………….11 

   1.2   Aim and Objectives………………………………………………………………………………………..13 

   1.3   Structure of the dissertation…………………………………………………. ……………………..13   

 

2.  Background 

   2.1   Introduction to Question Answering 
Systems……………………………………………………………………………………………………………………15 

   2.2   Review of Question Answering 
Systems……………………………………………………………………………………………………………………17 

   2.3   
Conclusion……………………………………………………………………………………………………….........20 

 

 

 

 

                                                                       1 
 
3.  System Analysis and Design  

   3.1   Characterization of 
questions…………………………………………………………………………………………………………………21 

   3.2   Requirements of the 
system……………………………………………………………………………...........................................23 

   3.3   General architecture of the 
system……………………………………………………………………………………………………………………..26 

   3.4   Creation of the dictionary………………………………………………………………………………28 

   3.5   Design of the 
database…………………………………………………………………………………..................................30 

   3.6   Establishing the theme and class of a 
question…………………………………………………………………………………………………………………..31   

   3.7   Design of 
patterns…………………………………………………………………………………………………………………..33 

   3.8   Reasoning and extraction of 
answers……………………………………………………………………………………………………………………34  

   3.9   Design of the graphical user 
interface………………………………………………………………………………………………………………….36  

 

 

 

 

 


                                                                               2 
 
4.  Implementation Notes  

   4.1   
Technologies……………………………………………………………………………………………………………37 

   4.2   From a software engineering point of view……………………………………………………38 

   4.3   Structure of the 
system……………………………………………………………………………………………………………………..40 

   4.4   Functionality of Genia 
Tagger……………………………………………………………………………………………………………………..45 

   4.5   Data Structures……………………………………………………………………………………………..45 

   4.6    MySQL Database ………………………………………………………………………………………….48  

 

5.  Results and Evaluation  

   5.1   Evaluation Environment……………………………………………………………………………….50 

   5.2   Results and 
Evaluation…………………………………………………………………………………….............................51 

   5.3    Discussion…………………………………………………………………………………………………….51 

 
6.  Conclusion and Future 
Work……………………………………………………………………………………………………54 
  
REFERENCES…………………………………………………………………………………………58 
    
  APPENDIX I………………………………………………………………………………………..62  

  APPENDIX II……………………………………………………………………………………….64 

                                                                        3 
 
   APPENDIX III…………………………………………………………………………………….65 

   APPENDIX IV…………………………………………………………………………………….66    

   Word Count:  13757 
 
LIST OF TABLES  
 
Table 2.1 Question Answering System with FAQ answer injection…………………………………………………20 

Table 3.1 RES themes of the 
questions…………………………………………………………………………………........................................................22 

Table 3.2 Classes of the 
questions……………………………………………………………………………………………………………………………………….22 

Table 3.3 Functional requirements of the system with associated risks and effects……………………….25 

Table 3.4 Non‐functional requirements of the system with associated risks and effects………………..25 

Table 3.5 General risks assigned to the whole 
system……………………………………………………………………………………………………………………………………………26

Table 3.6 Example1 for the establishment of theme(s) and class(es) of a 
question…………………………………………………………………………………………………………………………………………32 

Table 3.7 Example2 for the establishment of theme(s) and class(es) of a 
question…………………………………………………………………………………………………………………………………………33

Table 4.1 A view of the system’s basic modules……………………………………………………………………………..41 

 

 

 

 

 

 


                                                                                                 4 
 
LIST OF FIGURES 
 
Figure 3.1: Abstract model of the system’s 
architecture……………………………………………………………………................................................................26 

Figure 3.2: Abstract representation of the prototype……………………………………………………………………27 

Figure 3.3: Architecture of the system………………………………………………………………………………………….28  

Figure 3.4: Creation of the dictionary……………………………………………………………………………………………30 

Figure 3.5: Conceptual model of the database …………………………………………………………………………….30 

Figure 3.6: Graphical User Interface……………………………………………………………………………………………..36  

 Figure 3.7: Graphical User Interface ‐QA example………………………………………………………………………36 

Figure 4.1: 3‐TIER model of the RES SYSTEM…………………………………………………………………………………39 

Figure 4.2: Data flow model of the RES 
system……………………………………………………………………………………………………………………………………………39 

Figure 4.3: Noun phrases, theme(s) and class(es) for the question What is ocean energy……………..43 

Figure 4.4: Pattern for the question What is ocean energy…………………………………………………………….43

Figure 4.5: Noun phrases, theme(s) and class(es) for the question Which is the cost of a photovoltaic 
cell………………………………………………………………………………………………………………………………………………….44 

Figure 4.6: Pattern for the question Which is the cost of a photovoltaic 
cell………………………………………………………………………………………………………………………………………………….44

Figure 4.7: Sample output of Genia shallow parsing 
tool…………………………………………………………………………………………………………………………………………………45 

Figure 4.8: Identifying the theme(s) and class(es) with threshold for terms equal to 
0.2………………………………………………………………………………………………………………………………………………….47 

 



                                                                                                        5 
 
Figure 4.9: Screenshot‐1  taken from MySql the database……………………………………………………………..48  

Figure 4.10: Screenshot‐2  taken from MySql database………………………………………………………………….49  

 

 
ABBREVIATIONS 

FAQ    FREQUENTLY ASKED QUESTIONS 

GUI     GRAPHICAL USER INTERFACE 

POS    PART OF SPEECH TAGGING 

QA      QUESTION ANSWERING 

RES     RENEWABLE ENERGY RESOURCES 

WRB  Wh‐adverb (how, where, when) 

WDT   Wh‐determiner (what, which) 

 
 
 
 
 
 
 
 
 
 
 
 

                                                                                  6 
 
                                             ABSTRACT 

Environmental pollution has augmented the importance of renewable energy sources leading to an
increased demand for information on both renewable energy sources and environmental issues.
However, the present state of technology renders the reliable acquisition of such information
problematic. This dissertation will address this problem by proposing a methodology for the
development of a question answering prototype for the domain of renewable energy resources(RES)
based on frequently asked questions. Various solutions have been suggested for question answering
systems in restricted domains, without having been developed for the area of renewable energy
resources. This dissertation describes the creation of a question answering prototype, which will be
called RES, discussing the relevant research in regard to the question answering systems and
delineating the full life cycle of the prototype developed. The proposed methodology for the
implementation of the system is based on the semi-automatic creation of a dictionary for the
renewable energy resources. The system will use a corpus of frequent asked questions selected after
an extensive search on the Web. These questions will be used as the system’s knowledge repository.
The processing of the questions is based on three fundamental concepts; the theme(s), the class(es)
and the pattern. We identified 6 themes. Also, 14 classes, each of these expressing different aspects
of RES such as technology and finances were identified. Each theme describes a different type of
renewable energy such as solar energy or hydropower. Patterns are created by analyzing the output
of the Genia shallow parsing tool. For the establishment of the theme(s) and class(es) of each
question, a comparison with the terms of the dictionary for renewable energy resources takes place.
The method used is that of 3-grams. For the retrieval and ranking of answers “reasoning” is applied.
The “reasoning” is a set of rules built on the concept of intersection between the theme(s), the
class(es) and the pattern for any two questions. The system has been evaluated with a small number
of questions. The precision of the system was between 30% and 40%. Furthermore, the dictionary
identifies topics for further development.




                                                                                                   7 
 
                                         DECLARATION 


No portion of work referred to in the dissertation has been submitted in support of an application for
another degree or qualification of this or any other university or other institute of learning.




                                                                                                         8 
 
                                                     COPYRIGHT 

i.  Copyright  in  text  of  this  dissertation  rests  with  the  author.  Copies  (by  any  process)  either  in  full,  or  of 
extracts,  maybe  made  only  in  accordance  with  instructions  given  by  the  author.  Details  may  be  obtained 
from the appropriate Graduate Office. This page must form part of any such copies made. Further copies (by 
any process) of copies made in accordance with such instructions may not be made without the permission 
(in writing) of the author.  
 
ii. The ownership of any intellectual property rights which may be described in this dissertation is vested in 
the University of Manchester, subject to any prior agreement to the contrary, and my not be made available 
for use by third parties without the written permission of the University, which will prescribe the terms and 
conditions of any such agreement. 
 
 
iii.  Further  information  on  the  conditions  under  which  disclosures  and  exploitation  may  take  place  is 
available from the Head of the School of Computer Science. 




.




 




                                                                                                                               9 
 
                                  ACKNOWLEDGEMENT 
                                                   
Several people contributed towards the realization of this project. First and foremost I would like to

mention and warmly thank my supervisor Dr Goran Nenadic for his unflinching support and patience

and for prompting me towards a very rewarding and rich topic. I would also like to thank Professors

Nikolaos Karidas and Alexandros Karakos for the support and encouragement that they gave me in

my decision to pursue a postgraduate degree in the University of Manchester. Last but not least I

would like to thank my parents Giannis and Dimitra and my brother Spiros for their unlimited

support and goodwill throughout the year and even before that, their trust and love have been a

constant inspiration during this year.


                                                   




                                                                                                   10 
 
                                          CHAPTER 1
                                     INTRODUCTION




1.1 Motivation

Environmental concerns in the past few years are becoming increasingly acute. Rising temperatures

and natural disasters have made even the most skeptical aware of these emerging problems and have

sparked a rising interest in the importance and the usage of renewable energy resources. The need

and search for renewable energy resources has become even more acute due to the rapid increase in

urbanization in emerging economies and the consequent higher demand for energy consumption in

various forms. Furthermore, the precariousness of seeking non-renewable sources in an increasingly

vulnerable environment, often leads to environmental disasters. Consequently, renewable energy is

now conceived as a plausible choice not just for environmentalists but also for governments and

companies [1,2].


At the same time, debates about the usage of renewable energy sources are becoming increasingly

valuable to the broader public, and in some cases have led politicians to adopt their agendas in order

to address such issues [3].


    This interest has been prompted by a variety of reasons. The first is the profound climatic changes,

especially soaring temperatures and climate related catastrophes such as hurricane Katrina in 2005.

Additionally, accidents such as the Mexico gulf oil spill have further underlined how

environmentally damaging non renewable sources are. Finally, in 2010 the increasing scarcity and

difficulty of extracting non-renewable energy sources had led to a soaring of prices.




                                                                                                     11 
 
Rising prices have further contributed to the understanding of the need to adapt to new forms of

sources since it has had a direct impact on the everyday lives of people around the world [4].


In contrast to coal and oil, renewable sources such as solar and wind power are virtually

inexhaustible and environment friendly. Other sources, such as geothermic energy have an equally

low carbon dioxide emission level. Renewable sources also have the potential of generating a variety

of energy sources. For instance, biomass and algae can generate important amounts of biofuel. Many

European countries are decidedly turning towards clean, renewable energy sources and they have set

as a future target the covering of their needs in energy from renewable sources, investing in projects

like wind farms [5,6].


    It can be therefore argued that the importance of the renewable energy sources for the world’s

economies, and the increasing demand for relevant information among internet users had rendered

the construction of a question answering system for renewable energy resources important for

everyday users who are not experts and are interested in getting well-targeted answers to their

questions. A simple Google1 search on ‘renewable energy resources’ would return 6,810,000 results

and these are very difficult to be read. Also, to the best of our knowledge, there are no published

Question Answering systems related to renewable energy sources.


This lack leads people, who seek to find relevant data and information to use search engines such as

Google or systems like Yahoo Answers 2.The dissertation will discuss the extent to which such a

system can be implemented. It will further suggest a methodology for the implementation of a

question answering prototype whereas, it will also offer an overall evaluation of this system. The

following paragraphs will discuss the aims and the objectives of this master project.




                                                            
1
     http://www.google.com 
2
     http://answers.yahoo.com 

                                                                                                   12 
 
1.2 Aim and objectives

The aim of this dissertation is to design and build a question answering system for the domain of

renewable energy resources which will be based on a corpus of frequently asked questions. The

objectives of dissertation are listed below.


    1) Development of a terminology for the domain of renewable energy resources.

    2) Development of a corpus of frequently asked questions for the field of renewable energy

       resources.

    3) Design and implementation of a prototype of the question-answering module.

    4) Evaluation.




1.3 Structure of the dissertation


The dissertation is structured as follows.


The second Chapter will discuss the background of question answering systems, and will also

review the work that has been done in this field.


The third Chapter will describe the initial stages of the software lifecycle of the question

answering system to be developed. These stages are concerned with the analysis and the design

of the system. The chapter will initially discuss the system requirements; it will furthermore

describe analytically its architecture and explain in depth the methodology that was followed

during its implementation.




                                                                                                    13 
 
Chapter four will discuss the implementation of the system. It will give technical information

concerning, for example, the programming environment and the shallow parsing software tool

that was used. It will furthermore describe several interesting points which arose during the

course of the system’s implementation.


The results and the evaluation of the question answering prototype will be discussed in the fifth

Chapter and will seek to present the results of the software tool which has been developed. It

will furthermore analyse the results and make a general evaluation of the implemented system.


The sixth and final Chapter will discuss the extent to which the aim of the project has been

realized, presenting the learning outcomes and also make suggestions for future developments

of the project.


Three appendices can be found in the dissertation. The first appendix includes definitions for all

the classes in the dictionary of renewable energy resources. The second contains a

representative sample of the renewable energy sources dictionary.


The third appendix lists a series of questions from the total corpus of frequently asked questions

and answers that was semi-automatically created. In the fourth appendix are included the

questions which were used for the initial evaluation of the prototype.




All the data and tools are available at: http://gnode1.mib.man.ac.uk/projects/RES




 



                                                                                               14 
 
                                                                CHAPTER 2
                                                               BACKGROUND




Nowadays the World Wide Web is packed full of RES1 information. Consequently, everyday users

who, more often than not, approach the tool with a layman’s knowledge of abstract or theoretical

topics, find it increasingly difficult to find answers to their questions when they use search engines

like Google, because the information they require is scattered. For this reason, many question

answering systems have been developed and various approaches have been proposed for the

extraction of the most suitable answers to the users’ queries. This Chapter gives an overview of these

approaches.




2.1 Introduction to question answering systems

The origins of the Question Answering Systems can be traced back to 1961, with Baseball and Lunar

being some of the most popular systems of this period [8]. A Question Answering is a process in

which a query is expressed in natural language and the returned answers are retrieved from relevant

documents, or knowledge from selected sources. These answers are sentences retrieved from the

documents which are qualified and prioritized. The question answering systems can be classified in

two general categories. There are open domain question answering systems and also closed domain

systems. The former are designed to answer any question, whereas the later provide answers to

domain specific questions. [8]




                                                            
1
     A Google search for ‘renewable energy resources’ has returned 6,810,000 results 

                                                                                                   15 
 
A typical question answering system consists of components for question analysis, source retrieval,

answer extraction and answer presentation [8]. The question analysis module is used to identify the

type of question. The source retrieval module extracts information from documents, web pages or

databases. The answer extraction module selects the most relevant answers to the question posed on

the system. Finally, the answer is displayed with the aid of a graphical user interface.


Many of the published systems were evaluated at TREC.2 The repository of such systems is a large

collection of documents. The majority of systems participating in this conference try to adapt the

design of their systems so as to achieve the best possible results. An example of such a system is the

IBM’ statistical question answering system [9], this system claims to have improved its results over

the years by applying query expansion. This process is carried out by adding a pre-defined number of

words in the initial question for the selection of answers.


Initially, the questions which were asked in early systems were factoid. These are questions which

start with words as who, what, when or where. However, modern systems deal with a variety of

questions. In [8] the different types of questions are listed. There are several question types such as

temporal or spatial questions, definitional and descriptional questions, biographical questions,

questions asking about opinions, as well as multilingual and multimedia questions. The most

common text processing techniques used by question answering systems are the following:

automatic term recognition, shallow parsing and part-of-speech tagging. “Automatic term

recognition is the extraction of technical terms from special language corpora with the use of

computers” [10]. Shallow parsing is defined as the process of recognizing syntactic chunks in a

sentence. For instance, a noun phrase is a common type of chunk. Chunks can be defined as

combined parts of speech. “Part of speech tagging is the task of labeling each word in a sentence

with its appropriate part of speech” [11].


                                                            
2
     http://trec.nist.gov 

                                                                                                    16 
 
2.2 Review of question answering systems

The majority of the question answering systems attempt to analyze the questions and retrieve

answers from a corpus of documents. For the processing of questions many systems use thesaurus

like WordNet in order to find keywords in a question [12]. At the same time, the initial query posed

on the system is expanded in order to find an answer. It is believed that the expanded query may

contain an answer word. A system based on this principal was the IBM statistical system [9]. Also,

the selection of the answer at this particular system was based on the theory of max entropy [13].

The theory of max entropy is also used by PiQASso question answering systems in order to split the

sentences in the documents which are considered to be candidate answers [14].


    However, there are question answering systems which use simple approaches for answer retrieval.

The Answer Bus system [15] is such an example. The question extraction is based on similarity of

words between the questions and the answers. It was evaluated by using 200 questions from TREC.

For this system, the average percentage of successful results was 65%.


Another system, which uses a rather straightforward methodology for the processing of the questions

and the extractions of the answers is the YorkQA protype, which is based on syntactic and semantic

information for answer retrieval [16]. This system is based on named entity recognition and on

document indexing. The results of the system’s evaluation were relatively low. They achieved a

score of 18% of successful answers. Also, they suggested that a possible cause of error was that

separate modules returned wrong results.


Additionally, some systems combine the lexical and syntactic analysis of the question with its

semantic representation. An example of such a system is the Javelin question answering system [17].

At this system, the identification of the type of the questions is attempted. The system was finally

evaluated by counting the percentage of successful answers according to the type of questions posed.



                                                                                                 17 
 
For instance, questions about locations gave more successful results compared to those asking for a

person’s name.


The method of n-grams [11] for the finding of an approximate match between two strings is widely

used in question answering systems [18]. Some of them, use n-grams in order to find a candidate

answer among a corpus of answers [19]. Also, in the implementation of some other systems like

AskMSR[20], n-grams is the ‘tool’ for retrieving answers from data which is available online. This

system applies query reformulation for the processing of questions. For the AskMSR system the

extraction of answers was based on a decision tree [21].


Apart from n-grams, other text mining approaches are also used in question answering systems.

These are, for example, sequence mining algorithms; these algorithms are often used to answer

definitional questions. They discover definition patterns from the web, by obtaining all maximal

frequent sequences of words from the set of definitions. Furthermore they identify the best candidate

answer, by obtaining all maximal frequent sequences of words from the set of descriptions [22].


Further to the above descriptions of systems, some question answering systems combine different

approaches. The QED and Wee is such a system. More specifically, the system combines deep

linguistic analysis (QED system) and shallow parsing (Wee system) [23]. The contribution of this

system is the assumption that deep linguistic analysis can improve the quality of exact answers.


Many of the question answering systems mentioned above use the MRR3, as a metric for their

evaluation. Such systems are the Javelin, the AskMSR and the IBM’s statistical question answering

system. However, some others use precision and recall metrics for evaluation. Precision is defined as

the percentage of the questions answered correctly over the total of asked questions. Recall is the

percentage of questions for which there is an answer in the database that is retrieved.



                                                            
3
     Mean Reciprocal Rank 

                                                                                                   18 
 
In order for this metric to be applied, questions of the same type should be considered. These are the

metrics that are also used by FAQ Finder system [24].


FAQ Finder system uses frequent asked questions files. This system is based on vectors’ similarity

between the questions in the files and the query posed on the system. The pairs of questions and

answers are scattered in files. The basic metrics which are used for the extraction of the answers are

the frequency of common terms in the pairs of question-answers and the semantic similarity among

the pairs’ vectors. This system is also based on the use of a thesaurus, WordNet, in order to relate a

question to the answer. Furthermore, each question from the FAQ file is compared with the question

posed by the user and a score is assigned to it. Finally, the first five questions are returned to the

user. The evaluation of the FAQ system gave very good results in terms of precision (88%) and

satisfying results in terms of recall (23%).


Frequent asked questions are also used for the evaluation of open-domain question answering

systems [25]. For each question five potential answers are considered. Also, for the answers in the

FAQ corpus patterns based on regular expressions are created. However, the creation of patterns

cannot be done automatically. Field experts need to create these patterns. The basic idea of the

proposed methodology is the “injection” of an answer into a random document as shown in Figure

2.1. This document includes a pair of question and answer. In continuation the question included in

the document is used for the evaluation of the system. The purpose of this process was to observe if

the system was able to find the correct answer, given the fact that for each injected question a track

was kept for its placement in the document. The system does not always return the answer from the

correct document. However, the returned answer may still be correct. The design of the system was

based in modular approach.




                                                                                                   19 
 
           Figure 2.1: Question Answering System with FAQ answer injection (taken by [25] ) 




2.3 Conclusion

This Chapter has referred to a selection of question answering systems that were found after an

extensive search on the published and unpublished works in the domain of question answering

systems. The question answering system which will be constructed will not be a repetition of any of

the above approaches. Instead, we will propose a methodology in which new ideas are injected and

implemented. However, as a means of construction, some common techniques -for example n-grams

- will be used. Also, its evaluation will be based on precision similarly to the FAQ Finder System.




 




                                                                                                      20 
 
` 


                                        CHAPTER 3
                  SYSTEM ANALYSIS AND DESIGN




This chapter analyses the question answering system prototype. More specifically, it presents the

requirements and the architecture of the system along with the methodology that was followed for its

creation. It also delineates the design of the prototype. Starting from the input question which a user

poses on the system, it will describe all the subsequent steps of the question’s processing, up to the

extraction of the most suitable answers.


 


3.1 Characterisation of questions

Renewable energy resources are typically distinguished by the following energy types: geothermal

energy, wind energy, solar energy, ocean energy, hydropower and biomass. Each of the above six

energy types can be derived from a range of natural sources which are found in a variety of

geographical locations around the world. Renewable energy resources are also associated with

technology whereas their usage is determined by regulations. It is therefore apparent that renewable

energy has many different aspects. In the framework of this project, a dictionary for renewable

energy resources was semi-automatically created to model the RES domain. In this dictionary all the

terms related to the above energy types were classified in order to express the various different

aspects of renewable energy. We differentiated 14 different aspects that are listed in Appendix I.




                                                                                                     21 
 
` 


Each question is characterized by three concepts: theme, class and pattern. The theme sets the

context of a question and indicates the kind of renewable energy type that is discussed. Τhe class sets

the content of a question and refers to the aspect(s) of renewable energy sources which are discussed

in the question. For the process of the questions in our system, we have identified 6 potential themes

and 14 classes. The latter have been engineered after a careful study of the dictionary for renewable

energy resources. The themes are shown in Table 3.1 and the classes in Table 3.2.




       Geothermal         Wind            Solar          Ocean        Hydropower       Biomass 
          Energy         Energy          Energy          Energy 
                                  Table 3.1: RES themes of the questions 




       Connectivity and transmission to the grid        Commercial energy production 
       Suitable areas for energy production             Worldwide known resources 
       Geographical locations                           Origin 
       Physical Quantities                              Measurement Units 
       Benefits to the environment                      Environmental issues 
       Technology                                       Usage 
       Rules and Regulations                            Finances 
                                    Table 3.2: Classes of the questions 




Definitions for all classes are provided in appendix I and samples of the dictionary are provided in

appendix II. The third concept that characterizes each question is its pattern. A pattern is both a

syntactic and a semantic representation of the question. It is generated by the analysis of the question

with the aid of a shallow parsing tool. Below are displayed three examples, with their theme, class

and pattern of a question. Section 3.7 the notion describes the notion of patterns in detail.


                                                                                                     22 
 
` 


     Example1: What is the cost of solar power?


     •   Theme: Solar energy
     •   Class: Finances - Derived from the term cost
     •   Pattern: [Wh-Word]-[verb]-[noun phrase]-[noun phrase] // [What]-[be]-[cost]-[noun phrase]

     Example 2: What is wind energy?
     • Theme: Wind energy

     •   Class: It cannot be found
     •   Pattern: [Wh-Word]-[verb]-[noun phrase] // [What]-[be]-[wind energy]

     Example 3: Which is the difference between ocean energy and hydropower?
     • Theme: Ocean energy, hydropower

     •   Class: It cannot be found
     •   Pattern: [Wh-Word]-[verb]-[noun phrase]-[noun phrase]-[noun phrase] //
         // [Which]-[be]-[difference]-[ocean energy]-[hydropower]




3.2 Requirements of the system

Before describing the architecture of the system it is important to identify its requirements and

present an analysis of potential risks. At this point it will be useful to repeat the objectives of the

system to be built.

     •   Development of a terminology system for the domain of renewable energy resources.

     •   Development of a corpus of frequent asked pages for the field of renewable energy resources.

     •   Design and implementation of a prototype of the question-answering module.

     •   Evaluation of the system.




                                                                                                    23 
 
` 


The requirements of the system can be divided into functional and non-functional. The non-

functional requirements are not directly related to the specific functions delivered by the system [7].

The risk types are relevant to the requirements of the prototype to be built, to the time required to

develop the software and to the integration of software components. Each type of risk may appear

according to a probability. This probability may be low, moderate or high and the effects can vary

from insignificant and moderate to serious. Tables 3.3 and 3.4 describe the risks of the functional and

non functional requirements respectively.


We believe that the risk for the application of reasoning may appear according to a high probability.

This may happen if the requirements for the extraction of noun phrases or the identification of

theme(s) and class(es) cannot be met. This would have as a result the failure in the last stage of the

implementation of the RES system. As far as the non-functional requirements are concerned we

consider that the possibility of a component to produce a faulty output may be high. However, this

may have tolerable effects since we decided to build the system by following a modular approach.

Therefore we expect that it would be feasible to identify any potential errors that may occur during

the implementation process.


Table 3.5 describes some more general risks of the whole design. These are the possibility of the

requirements to change, the difficulty which may appear during the integration of the various

software components and the under-estimation of the time required to develop the software. We

believe that the most serious problem will occur, if we underestimate the maximum time needed for

the development of each software component.




                                                                                                    24 
 
` 


                      FUNCTIONAL REQUIREMENTS                              PROBABILITY      EFFECTS 
                                                                              OF RISK 
     Collection of terms for the field of renewable energy resources         MODERATE     INSIGNIFICANT 
     through a wide research on various sources ‐ classification of  
     the above terms and creation of the dictionary of renewable  
     energy resources 
     Design of the database to store the dictionary and the questions          LOW           SERIOUS 
     Receive the input question posed by the user                              LOW           SERIOUS 
     Extraction of the main noun phrases from the input question             MODERATE        SERIOUS 
     Identification of the theme(s)  and class(es)  of the questions by      MODERATE        SERIOUS 
     comparing the noun phrases with the terms in the dictionary of 
     renewable energy resources 
     Extraction of question’s pattern                                        MODERATE        SERIOUS 
     Application of reasoning for the selection of the most related           HIGH             SERIOUS 
     answers to the input question ‐ weights should be assigned to 
     the candidate questions 
     The first three answers should be returned to the user in                LOW             TOLERABLE 
     descendant order 
     Design of a graphical user interface and integration  of all the         LOW           INSIGNIFICANT 
     modules 
                Table 3.3: Functional requirements of the system with associated risks and effects 




                NON‐FUNCTIONAL REQUIREMENTS                           PROBABILITY         EFFECTS 
                                                                           OF RISK 
     Repairability                                                          LOW           TOLERABLE 
     Software Reliability                                                   HIGH          TOLERABLE 
     Modularity of the system                                               LOW           TOLERABLE 
            Table 3.4: Non‐functional requirements of the system with associated risks and effects 




                                                                                                             25 
 
` 


                  RISK                        PROBABILITY                        EFFECTS 
      Change of requirements may                    LOW                            SERIOUS 
      occur 
      Integration of software                   MODERATE                        TOLERABLE 
      components cannot happen 
      Time required to develop                     HIGH                            SERIOUS 
      the software is 
      underestimated 
                           Table 3.5: General risks assigned to the whole system




3.3 General architecture of the system

The question answering system to be built will function as follows. The question posed on the

system will be processed by a question answering engine. The dictionary of renewable energy terms

will be used for the processing of the question and the extraction of the candidate answers. The

answers will be retrieved from the repository of frequent asked questions. Figure 3.1 illustrates the

above abstract model of the system.




                    WEB 



                                               QA ENGINE 
                            FAQ                                       USER 
                           CORPUS




                                               DICTIONARY 


                                                       

                           Figure 3.1: Abstract model of the system’s architecture 



                                                                                                  26 
 
` 


Figure 3.2 shows an abstract graphical representation of the whole system. Each question is

processed following three basic stages: the extraction of noun phrases, the establishment of its theme

and class, and the creation of its pattern. The next step is the application of reasoning for the retrieval

of answers from the database and their ranking. Finally, the results are returned to the user.




                                                                                INPUT QUESTION

                                  

                                  
                                                                     Noun            Theme        Question 
                                                                    phrases         and class     pattern 
                                  

                                  

                                  
                                                                                   REASONING                   FAQ 
                                                                                                              CORPUS
                                  

                                  
                                                                                RETREIVAL/RANKING 
                                                                                               
                                                                                   OF ANSWERS
                                                               Figure 3.2: Abstract representation of the prototype 




Figure 3.3 depicts the system’s architecture. Each input question is analyzed with a shallow parsing

tool. The output of the tool is used for the extraction of noun phrases and the establishment of the

question’s theme(s), class(es) and patterns. A look up in the dictionary is taking places in order for

the noun phrases to be compared to RES themes1 which are stored in the dictionary of renewable

energy resources. Furthermore, the noun phrases are compared to terms included in all 14 classes

existing in the dictionary. The above comparisons give the theme(s) and class(es) of the input

question. Additionally, the question’s pattern is compared to the question patterns which are already

stored in the database. The results of all the above comparisons are integrated.
                                                            
1
     Geothermal, solar, wind, ocean energy, hydropower and biomass and their synonyms  
 

                                                                                                                       27 
 
` 


They are combined then with the application of reasoning for the extraction of ranked answers,

which are returned to the user.



                                                                   USER 
                   REASONING 
                                           RANKED 
                                           ANSWER  
                                                                    GUI
         QA 
                         QUESTION 
                         PATTERNS                                 QUESTION  
     REPOSITORY 

 
                                                                                             Noun Phrases 
 
                                                    RES               SHALLOW 
                                                 CATEGORIES           PARSING                  Patterns 
          INTEGRATION 

 
                                                      LOOK UP                  COMPARISON 

                         THEME/CLASS 


                                                                 DICTIONARY 
                                          INTEGRATION 
 

                                        Figure 3.3: Architecture of the system  

                                                             




3.4 Creation of the dictionary

One of the most important steps for the construction of a RES question answering system is the

creation of a dictionary of terms that are relevant to renewable energy resources. The steps followed

for the manual creation of the glossary of the renewable energy are listed below and illustrated in

Figure 3.4.


      •   Exploring the Web and other resources for the renewable energy types and associated

          synonyms. Renewable energy is classified in six types: geothermal energy, solar energy,

          wind energy, hydropower, and biomass and ocean power.


                                                                                                             28 
 
` 


       •      Searching for scientific documents and articles with the Google search engine. A collection

              of four sources for every energy type was the basis for the initial extraction of terms. This

              stage was also useful for gaining an understanding of the domain of renewable energy

              resources. Additionally, further search on the Web was conducted, using all the terms along

              with the renewable energy type they were related to. For example, the term reservoir which

              was found in documents about geothermal energy was queried in Google as reservoir

              geothermal energy. During this process a large proportion of terms were found. These terms

              were extracted semi-automatically2 and systematically classified, according to the attributes

              they described.



       •      Expansion of the dictionary. A specific format was used in order to query the terms found in

              previous steps on the search engine. The format was [semantic type - type of energy] and was

              applied to all semantic types. An example is [large scale wind farms - wind energy]. The

              returned result gave new terms such as intermittency.



       •      During the process, various Web sources were used for the retrieval of additional terms.

              Sources that were used include, among others, sites which provide technical information as

              well as sites from industry or web locations about RES around the world. The foremost aims

              of this process were the expansion of the dictionary and its validation.



       •      The terms were validated by comparing the vocabulary found in the corpus of the electronic

              text. More specifically, the frequency of a term’s appearance and the context in which it was

              found were the metrics which were used for validation. The validation was based on

              observation and on the experience gained by studying a number of sources.

                                                            
2
     The author’s background in the field of electrical engineering was helpful to this process. 

                                                                                                        29 
 
` 


                       
                                     6 Energy types 



                                      Initial search 



                                      Initial terms 



                                       Expanded 
                                        terms & 
                                       dictionary 


                                       Validation               Dictionary 

                                 Figure 3.4: Creation of the dictionary 




3.5 Design of the database

Figure 3.5 illustrates the conceptual model of the database.

                       Database design based on the entity relationship model.




                                                                                  

                             Figure 3.5: Conceptual model of the database  

                                                         

                                                                                     30 
 
` 


The main entities are listed below.


       •   The term entity stores all the terms found in the RES dictionary.

       •   The energy type entity stores the 6 RES themes described in Section 3.1.

       •   The synonym entity stores the synonyms of the above energy types.

       •   The class entity stores the 14 different aspects of RES described in Section 3.1.

       •   The pattern entity stores the pattern (semantic) corresponding to each question of the corpus.

       •   The question entity stores the content of each question of the corpus.




    The relationships between the term and the class and also between the term and the energy type are

M-N. That means that a term can be related to more than one energy types and also to more than one

class(es). Additionally, a class and an energy type can include any number of terms.


Similarly, a question can have more than one class(es) and a particular class can contain any number

of questions. However, each question has only one pattern and an energy type can have more than

one synonym while each synonym belongs only to a specific energy type.




3.6 Establishing the theme and class of a question

The establishment of the theme(s) and the class(es) of the questions is based on the extraction of the

noun phrases that a question includes, with the use of a shallow parsing tool. All the extracted noun

phrases are compared with the terms in the dictionary of renewable energy resources. Initially, the

target is to find an exact match with the RES categories. For instance, if a noun phrase is solar

power, we need to look in the database for a term named solar power. Consequently, the noun phrase

belongs to the theme solar energy and thus the question’s theme is solar energy.

                                                                                                       31 
 
` 


If it is not possible to find an exact match, we compare all the noun phrases - one by one - to all the

terms stored in the dictionary. This time, the aim is to find an approximate match. For this purpose,

the method of 3-grams is used and a threshold is set.


Consequently, we get all those terms that give a result greater or equal to a given threshold. From

those terms we keep the one with the maximum similarity. If that result is equal to 1 then it is

apparent that an exact match was found and a term which is identical to the noun phrase exists in the

database. If that result is less than 1, we have an approximate match, the term which is the most

similar to the noun phrase. In both cases, the database is searched in order to find the theme(s) and

the class(es) which correspond to this particular term.


The same process is repeated for all the noun phrases which are extracted by the shallow parsing

tool. All the theme(s) and the class(es) which are returned from each noun phrase separately give the

theme(s) and the class(es) of the question.


The following examples are used to illustrate the process for the establishment of the theme(s) and

the class(es) of a question. For the question What is a geothermal closed loop system the process is

described in Table 3.6. For the question Which is the cost of Buoy the process is described in Table

3.7.


                            Q1: What is a geothermal closed loop system?
         Extracted noun phrase(s) from                  geothermal closed loop system 
         the question
         Term in the dictionary                         geothermal closed loop system 
         Similarity                                                  1.0 
         Theme in the dictionary                             geothermal energy 
         Class in the dictionary                                technology 
         Theme of the question                               geothermal energy 
         Class of the question                                   technology 

            Table 3.6: Example1 for the establishment of theme(s) and class(es) of a question 




                                                                                                    32 
 
` 


                                    Q2: Which is the cost of Buoy?
        Extracted noun phrase(s)                 cost                         Buoy 
        from the question
        Term in the dictionary                     cost                   Buoy method 
        Similarity                                 1.0                        0.266 
        Theme in the dictionary            all energy types               ocean energy 
        Class in the dictionary                 finances                   technology 
        Theme of the question                               all energy types 
        Class of the question                             finances, technology 

            Table 3.7: Example2 for the establishment of theme(s) and class(es) of a question




3.7 Design of patterns

Apart from the theme(s) and the class(es), each question is also characterized by its pattern. We

believe that the comparison between the user’s question and the questions of the corpus stored in the

database should not rely only on the similarity of theme(s) and class(es). The reason is that input

questions may not include any of the terms stored in the dictionary of renewable energy resources.

Further to that, a question posed on the system may start with words such as which or where.

Furthermore, the question may contain verbs such as use, install, maintain or adverbs like easily and

adjectives like hazardous, which indicate specific aims that the user wants to get answers for.

Consequently, by ignoring such information from an input question there is a possibility of getting an

answer that will not be relevant to the input question. Therefore, we decided to create a pattern for

each question. The purpose was to include all the information described above. A pattern is a

syntactic representation of the question and consists of five fields: the ‘wh-words’, the ‘verb’, the

‘adverb’, the ‘noun phrases’ and the ‘adjectives’. Each of these fields has a semantic value. A

question may include all or some of the above fields. Additionally, each of the fields may contain

more than one values. For the establishment of patterns, the output of the shallow parsing tool should

be analyzed.

                                                                                                   33 
 
` 


    The aim of that is to get the parts of speech which correspond to each separate word of the question.

The steps of the methodology we follow are listed below and describe how we use the output of the

tool.


       •   The words that are recognized as question forms, e.g. WRB or WDT are inserted in the field
           named ‘wh-words’.

       •   The words that are recognized as verbs, e.g. VBZ are inserted in the field named ‘verb’.

       •   The words that are recognized as adverbs, e.g. ADV are inserted in the field named ‘adverb’.

       •   The words that are recognized as noun phrases, e.g. NP are inserted in the field named ‘noun
           phrases’.

       •   The words that are recognized as adjectives, e.g. ADJP are inserted in the field named
           ‘adjectives’.




3.8 Reasoning and extraction of answers

The extraction and the ranking of the answers is the step that follows from the extraction of noun

phrases, the establishment of the theme, class and pattern. Reasoning includes a set of ‘rules’ which

are able to detect the most resembling questions to the one posed by the user. The reasoning is based

on the existence or the non-existence of an intersection between the input question and each question

in the database.




The rules are the following:


1. Search if there is a non-empty intersection between the theme/themes of the input question with

those of a given question in the database. If so, weight x1 is given to the match.



                                                                                                      34 
 
` 


2. Search if there is a non-empty intersection between the class/classes of the input question with

those of a given question in the database. If so, weight x2 is given to the match.


3. Search if there is a non-empty intersection between the ‘wh-words’ of the input question and the

‘wh-words’ of a given question in the database. If so, weight x3 is given to the match.


4. Search if there is a non-empty intersection between the verbs of the input question and the verbs of

a given question in the database. If so, weight x4 is given to the match when the match gives only

one common verb.


5. Search if there is a non-empty intersection between the verbs of the input question and the verbs of

a given question in the database. If so, weight x5 is given to the match when the match gives more

than one common verbs.


6. Search if there is a non-empty intersection between the adverbs of the input question and the

adverbs of a given question in the database. If so, weight x6 is given to the match.


7. Search if there is a non-empty intersection between the noun phrases of the input question and the

noun phrases of a given question in the database. If so, weight x7 is given to the match.


8. Search if there is a non-empty intersection between the ‘adjectives’ of the input question and

‘adjectives’ of a given question in the database. If so, weight x8 is given to the match.


8. Sum up all weights to get the total weight of the database question.


9. Ordering of the questions in descendent order. Return the first three top ranked questions.




                                                                                                    35 
 
` 


3.9 Design of the graphical user interface

The interaction between the user and the system will happen through a simple graphical user
interface. Figure 3.6 illustrates a sketch of the graphical user interface and Figure 3.7 illustrates a
question posed on the system together with the returned answers.



                                                                                     RES  SYSTEM 




                                                                                      INSERT  QUESTION 
                                                                                                                                                                 ENTER 


                                                                 Figure 3.6: Graphical User Interface  



                                                                                                  


                                                                                      RES SYSTEM 

                                                                                                  
                                                                                                                                                                                                                          

                                                       The question was: What is geothermal energy? 

                                                      Answer 1: Geothermal means… 

                                                       Answer 2: Because its source…     

                                                      Answer 3: Unlike ordinary systems… 
 
                                         
                                                                                                     INSERT QUESTION                                             ENTER 
 

                                                             Figure 3.7: Graphical User Interface ‐QA example 

 



                                                                                                                                                                                               36 
 
                                                               CHAPTER 4
                                                IMPLEMENTATION NOTES




This chapter describes the implementation of the question answering system prototype. It presents

how the system has been implemented in terms of programming environment, database usage,

software components and shallow parsing tools. It also refers to issues occurred during the creation

of software as well as to interesting points of the design. Further to that it presents some snippets of

code.


                                                                      


4.1 Technologies

The system has been implemented in Java programming language - J2SE1 platform on the Linux

operating system. Java is a forceful and contemporary platform. It includes effective manipulation

through Collections and Array Lists for large volume of data. Furthermore, it is portable, giving to

the source code the opportunity to run on any operating system and in addition to that it allows the

integration of various software components. An example of that is a component responsible for

connecting to a web service. Finally, it is based on effective error control mechanism, which is very

important for an application which encompasses a big number of components.


The database which is used is MySql.2 MySql is an open-source light weight database server, which

can perform effectively even when handling a large volume of data. Given the fact that the question

answering system implemented was only a prototype and not a large scale application, the storage

requirements offered by MySql server were adequate.
                                                            
1
     http://www.oracle.com/us/sun/index.html 
2
      http://dev.mysql.com 

                                                                                                     37 
 
The selected shallow parsing tool is Genia Tagger.3 The Genia shallow parsing tool was initially

selected because it can be easily integrated with Java. Moreover, compared to other parsing tools,

such as Stanford parser,4 its output is easier to be understood and analyzed by the software

developer. The fact that Genia was designed for the biomedical field, motivated us to test it on a very

different area, that of renewable energy resources.




4.2 From a software engineering point of view

The prototype for question answering system represents the business logic tier in the 3 tier

architecture. The upper tier is the graphical user interface, through which the user poses a question to

the system and receives the returned answers. The lower tier is the database where the dictionary of

renewable energy sources and the corpus of questions and answers are stored. Figure 4.1 illustrates

the above description. The input question is transferred from the graphical user interface to the RES

Question Answering Engine. At this layer, the question is processed, and its theme(s), class(es) and

pattern are established. For this purpose, there is an interaction with the lower level where the

dictionary of RES and the corpus of frequently asked questions are stored. The three questions with

the highest score are in conclusion extracted from the total of questions stored in the database.

Finally, these are returned to the user through the graphical user interface, which is the upper level of

the application.


Further to the above, the data-flow model of the system will also be presented. The data-flow model

shows how data flows following a sequence of processing steps [7]. In Figure 4.2 the model is

illustrated. The input data in the system is the user’s question. The output data are the 3 top ranked

answers returned to the user.


                                                            
3
     http://www‐tsujii.is.s.u‐tokyo.ac.jp/GENIA/tagger 
4
     http://nlp.stanford.edu/software/lex‐parser.shtml 

                                                                                                      38 
 
                                                      Graphical User Interface




                                             RES Question Answering Engine




                                                   
                                                       FAQ                   RES 
                                                      CORPUS             DICTIONARY
                                                                                       




                                         Figure 4.1: 3‐TIER model of the RES SYSTEM 




                                                                                              RES 
                                                                                          DICTIONARY




                                                                                          Identification 
                                                                                          of theme(s) & 
                                                                                                                FAQ 
                                                    Analysis of                              class(es)
                         GENIA                                                                                 CORPUS
                                                  GENIA’s output 
                         Tagger 
   Input question 
                                                                                  Establishment 
                                                                                    of pattern

                                                                                                            Reasoning  
                          Descendent                          Ranking of 
                            order                             questions 
      3 First answers 

                                                                          

                                        Figure 4.2: Data flow model of the RES system 




                                                                                                                          39 
 
The RES system was not only tested as a whole. It was also tested through the various stages of its

implementation. More specifically, a significant number of tests were conducted for each individual

module of the system. The aim was to discover any defects in the software before the final

integration of all the system’s components. For instance, for the module which was responsible for

analyzing the Genia tool, we tried to identify if the noun phrases, the verbs, the adverbs, the

adjectives, as well as the WRB question forms were correctly extracted. Also, the module which is

responsible for establishing the question’s theme(s) and class(es) was thoroughly tested. Similarly, a

number of tests took place for the module which ranks the questions in the database.


After the testing of individual components and their integration, the system was tested as a whole.

Our main concern was to reassure that the input question is processed correctly so as to give the 3

top answers as output. However, it should be underlined, that during these series of tests we did not

attempt to evaluate the precision of the returned answers. In fact, we checked whether or not the

requirements which were initially set for the RES system have been met.




4.3 Structure of the system

The system consists of several software components. These components are presented in Table 4.1

and all of them are Java classes. Each class calls a number of other Java classes. The reason for

which a modular approach has been selected for the implementation is the re-use of Java code and

the achievement of efficient error control during each step of the implementation. Additionally, this

approach gives us the opportunity to assess the prototype gradually by judging the results of each

separate component. The table has a dual purpose. Primarily, it underlines the sequential structure of

the various components and also gives a global ‘picture’ of the implementation. Table 4.1 shows in

the first column the basic Java classes which were used. In the second column, the functionality of

each class is described.

                                                                                                   40 
 
In the third column there is a ‘yes’ indication if the program communicates with the database in

order to read data and a ‘no’ indication when this is not the case. The fourth column includes a ‘yes’

answer when the output of each specific component is visible to the user and a ‘no’ answer

otherwise.




             SOFTWARE                     DESCRIPTION                       COMMUNICATION    OUTPUT 
              MODULES                                                         WITH MySql     VISIBLE 
                                                                              DATABASE       TO THE 
                                                                                              USER 
         AnalyzeGenia.java      This module extracts noun phrases.               NO            NO
                                Its input is the output of Genia 
                                Tagger. 
          FindTheme.java        This module finds the theme(s) and               YES           NO
 
                                the class(es) of the question. Its input 
                                is the output of AnalyzeGenia.java. 
                                Also all the terms in the dictionary 
                                are retrieved from the database and 
                                compared with the noun phrases.   
             Patterns.java      This module creates the pattern of               NO            NO
                                the input question. Its input  is the 
                                output of Genia Tagger and the 
                                output of  AnalyzeGenia.java 
        RankQuestions.java      This module ranks  the questions in              YES           NO
                                the database by applying reasoning. 
                                Its input is the output of 
                                FindTheme.java and Patterns.java. 
                                Also the  patterns for each question 
                                stored in the database are retrieved 
                                together with the theme(s) and  
                                class(es) of the corpus questions. 
        SelectQuestions.java    This module puts the ranked                      NO            NO
                                questions in order and selects the 
                                first three answers corresponding to 
                                them. Its input is the output of  
                                RankQuestions.java 
             Results.java       This module integrates all the above             YES           YES
                                modules.
             Display.java       This module creates a graphical user             YES           YES
                                interface, takes the input question 
                                and returns the results. 
     

                                Table 4.1: A view of the system’s basic modules 

                                                                

                                                                

                                                                



                                                                                                        41 
 
For all the questions of the corpus the theme(s) and the class(es) in which they belong, along with

their pattern should be established. The Java classes which are used for the preprocessing of these

questions and their storage in the database are listed below.


    •   StoreTheme.java, stores the theme(s) and the class(es) of the question. Its input is the output

        of AnalyzeGenia.java.


    •   StorePattern.java, stores the pattern of the question. Its input is the output of Patterns.java.


It should be noted that the processing and the storage of questions took place before the integration

of the software modules which are described above in Table 4.1. The RES prototype has been

designed and implemented to receive the user’s question through a graphical user interface. This

graphical interface is not used for storing any additional questions in the corpus or adding terms in

the dictionary of RES. This process takes place from the command-line of the system. Consequently,

we decided to keep the pre-processing and the storage of frequently asked questions separately from

the presentation layer. However, in terms of software the pre-processing of the questions of the

corpus and the processing of the user’s input question are identical.


In continuation, two examples illustrating the extraction of noun phrases from a question, the

creation of the question’s pattern and the establishment of the theme(s) and class(es) will be

displayed. These examples are the output of AnalyzeGenia.java and the output of FindTheme.java

for the question What is ocean energy are shown in Figure 4.3. The output of Pattern.java for the

same question is shown in Figure 4.4. Similarly, the output of AnalyzeGenia.java and the output of

FindTheme.java for the question Which is the cost of a photovoltaic cell are shown in Figure 4.5. The

output of Pattern.java for the same question is shown in Figure 4.6.




                                                                                                           42 
 
    Figure 4.3: Noun phrases, theme(s) and class(es) for the question What is ocean energy 

                                                




                                                                                                 
                  Figure 4.4: Pattern for the question What is ocean energy

                                                                                              43 
 
    Figure 4.5: Noun phrases, theme(s) and class(es) for the question Which is the cost of a photovoltaic cell 

                                                          




                   Figure 4.6: Pattern for the question Which is the cost of a photovoltaic cell

                                                                                                             44 
 
4.4 Functionality of Genia Tagger

The Genia Tagger is the shallow parsing tool that we use, in order to extract the noun phrases from

the input question. Additionally, the creation of the patterns is based on the analysis of the tool’s

output. A sample output of the Genia Tagger is shown in Figure 4.7. The fourth column of the table

indicates the chunks. More specifically, B-NP shows the beginning of a noun phrase and the meaning

of I-NP is inside the noun phrase. Similarly, B-VP is an indication for the beginning of a verb phrase.

The third column of the array refers to the part of speech to which each single word of the question

belongs. JJ is used for the adjectives, DT for the definite article, NN and NNS for singular and plural

nouns, IN for prepositions and WRB for question forms.




WORD                 LEMA                         POS                 CHUNK          DOMAIN­TAGS 
How                  How                          WRB                   B‐NP                O 
many                 many                            JJ                  I‐NP               O 
turbines             turbine                       NNS                   I‐NP               O 
are                  be                            VBP                  B‐VP                O 
needed               need                          VBN                   I‐VP               O 
for                  for                            IN                  B‐PP                O 
a                    a                              DT                  B‐NP                O 
wind                 wind                           NN                   I‐NP               O 
park                 park                           NN                   I‐NP               O 
?                    ?                                .                    O                O
                          Figure 4.7: Sample output of Genia shallow parsing tool 




4.5 Data Structures

The most interesting point of the design was the use of Array Lists in order to manipulate the data

read from the database. It must be noted though that because of the large number of both the terms

and the questions stored, it was deemed impossible to read the data and store it in simple String

arrays.


                                                                                                    45 
 
Further to the above, the use of arrays was the cause for many exceptions thrown by Java as the size

of data was not known from the beginning of the design. Sets and Hash Maps were used for the

comparison of each separate component of the patterns of the questions. Sets in Java follow the basic

rule of sets as defined in mathematics. Consequently, they do not allow any duplicate entry of data.

Another reason for using Sets is that the intersection in Java is easily implemented by applying

operations to Sets. In general, a Hash Set5 serves as an instance of a Hash Map and its basic

functionality is to assure that iteration through the Set’s elements as well as operations on data like

‘add’ or ‘remove’ take place properly.


The functionality of the class SelectedTerm.java is based on the use of Array Lists. The source code

for this class is presented in Figure 4.8. The class receives as input 3 different parameters. The first is

a noun phrase extracted with the aid of the Genia tool. Additionally, the total terms read from the

dictionary of RES are stored in an Array List and ‘passed’ as the second input parameter to the above

class. The third parameter is the threshold set for the selection of terms from the dictionary of RES.

The comparison between the noun phrases and the terms in the dictionary is done with the use of 3-

grams. The method of 3-grams is a metric for similarity between strings. If the comparison gives 1.0

as a result there is an exact match for the strings. However, if the result is less than 1.0 there is an

approximate match between the strings. The implementation of 3-grams was based to a Java class6

which was ready to be used.




                                                            
5
     http://download.oracle.com/javase/6/docs/api/java/util/HashSet.html 
6
     http://www.chime.ucl.ac.uk/blog/?p=335 

                                                                                                        46 
 
                                                                  
    public String  SelectedTerm(String npPhrase, ArrayList<String> terms,int threshold)

              {                                                   

    //npPhrase is the a nounPhrase extracted from a question – terms is an Array List holding all the terms of the database 
                                                                
     ArrayList<Double> similarities = new ArrayList<Double>();  //stores the id numbers of most similar terms 
                                                                  
    ArrayList<String> selected_terms = new ArrayList<String>(); //store the most similar terms from the db 
                                                                  
    Iterator iter = terms.iterator();  Iterator itr1 = similarities.iterator(); //Iteration through the lists – both of 
    them have the same size 
                                                                  
    Iterator itr2 = selected_terms.iterator(); 
                                                                  
    String str = " "; String choice = " "; double result=0; //Values should always be initialised 

    double maximum=0; int position=0; 

        

        for(int i= 0;i < terms.size(); i++) { str = terms.get(i).toString(); //Iteration through the list of dictionary terms 

            {  

      result = Distance(npPhrase,str); //Distance compares the noun phrase to each term by applying 3‐grams 

            if(result>threshold)  //The threshold is set to 0.2  //Terms are not acceptable below that value 

        {similarities.add(result);  selected_terms.add(str);} //The selected terms and their ids’ added in the list 

              } 

          for(int i= 0;i < selected_terms.size(); i++)  

                {  

           //A search is taking place for the maximum element in the list 

            maximum = findMax(similarities); 

                    } 

          for(int i= 0;i < similarities.size(); i++) //For all the ids’ in the list only the one has the maximum value// 

           { 

       maximum = findMax(similarities); 

      Double max = new Double(maximum); 

      position = similarities.indexOf(maximum); 

       choice = selected_terms.get(position);  

                }          return position; } 




      Figure 4.8: Identifying the theme(s) and class(es) with threshold for terms equal to 0.2 




                                                                                                                                 47 
 
4.6 MySQL Database

In Chapter 3 the conceptual schema of the database was expressed as an ER diagram. Below it is
presented the relational schema of the database and all the tables created are described. There are 10
tables created in the database. Table class stores all 14 classes found in the dictionary. Table
energy_type stores the 6 energy types and table synonyms_of_energytype stores all the possible
synonyms that an energy type may have. Table term stores all the terms in the dictionary. Table
pattern stores the FAQ questions’ patterns. Also, table question_has_class associates each question
with the class(es) in which it belongs. Table question_ has_theme associates each question with the
theme(s) in which it belongs. Additionally, table term_belongsto_class associates each tem with the
class(es) in which it belongs. Finally, table term_related_entype associates each term with the
theme(s) in which it belongs.

Figure 4.9 and Figure 4.10 below show screenshots taken from the MySql database and describe
each separate table.




                       Figure 4.9: Screenshot‐1  taken from MySql the database  

 

                                                                                                   48 
 
                                                                

    Figure 4.10: Screenshot‐2  taken from MySql database  




 

 




                                                             49 
 
                                         CHAPTER 5
                       RESULTS AND EVALUATION




This chapter will discuss the testing results for the question answering prototype that was

implemented. Tests have been conducted during all the stages of the prototype’s implementation.

However, because of the limited time-frame it did not stood possible to perform a final evaluation

with the participation of the annotators.




5.1 Evaluation Environment

The number of total questions stored in the database is 300. Also 1300 terms have been collected in

the dictionary of renewable energy resources. The weights assigned are listed below.


    •   Intersection of the ‘wh-words’ gives a weight equal to 2.

    •   Intersection of the theme(s) gives a weight equal to 8.

    •   Intersection which returns only one verb gives a weight equal to 1.3.

    •   Intersection which returns more than one verb gives a weight equal to 1.6.

    •   Intersection of the noun phrases gives a weight equal to 1.1.

    •   Intersection of the adjectives gives a weight equal to 0.9

    •   Intersection of the adverbs gives a weight equal to 0.8

    •   Intersection of the class(es) gives a weight equal to 0.6.


Also, the n-gram threshold was set to two.




                                                                                                  50 
 
5.2 Results and Evaluation

The initial evaluation of the prototype has been conducted with a very small number of questions.

The metric we used for the answers that the system returns, is that of precision. We consider the

outcome of the system successful, even when we get only one relevant answer to the question we

posed. The total percentage of precision is the number of correctly answered questions out of the

total of questions posed on the system.


The questions used can be found in Appendix IV. For the questions which were correctly answered,

there was only one correct answer among the 3 top answers returned. One of those questions gave 0

correct answers.




5.3 Discussion

The most important thing was to identify and characterize the total results we get as bad or good or

alternatively expected/non expected. However, it should be noted that the estimation of the results

cannot be objective as the system was not tested either by annotators or by domain experts.


The reasoning approach we have proposed gave us reasonable results. We decided to give the highest

possible weight for the intersection of ‘wh-words’ between two questions. The reason is that such

words are dominant in all the questions of the corpus. The second highest weight was given to two

questions with common theme(s). This approach gives good results in the case that a term in the

input question exists in the dictionary and is not a common term for all or some of the energy types.

Furthermore, we consider the verbs more important than the noun phrases. We believe that a

relatively high percentage of precision can be related to this assumption.




                                                                                                    51 
 
Finally, despite the fact that the adjectives and adverbs are given less weight, their role to the

extraction of reasonable answers is rather important. For questions which have the same ‘wh-words’

the same theme and the same verbs, adjectives and adverbs determine the precision of the extracted

answers.


Also the threshold of 0.2 that we selected for the identification of the theme and the class can

identify with a precision of 70% the theme(s) and the class(es) of a question. To a great extent this

percentage is because of the ambiguous terms. These terms may belong to more than one theme(s) or

class(es).


Some examples are also given so as to make the above observation more lucid. Thus, the term

photovoltaic system belongs only to solar energy, wind turbine only to wind energy and algae photo

bioreactor only to biomass. On the other hand, the word river is a common term for geothermal

energy, biomass and hydropower. This term no matter to which theme is related, it belongs in the

class called suitable areas for energy production.


However, a term may belong to one or more classes. When this is the case, the term has a different

context in each class. For instance, the term plant can belong to the class suitable areas for energy

production and also to the class commercial energy production. In the first case it describes a living

organism. In the second its meaning is synonymous to power station. When the term has the first

meaning it belongs to theme biomass. In the second case, its theme can be any of the six possible

themes, given the fact that all kinds of energy can be produced in a power plant.


Consequently, the design should be improved to address such issues. Given the fact that common

terms determine the theme(s) and class(es) of the questions, a solution could be the alteration of the

rules of “reasoning”. Additionally, as we do not use any thesaurous to identify synonyms for the

noun phrases which are extracted from the question, the precision of the answers may affected.



                                                                                                   52 
 
Finally, it must also be underlined that the given time-frame for the implementations and the testing

of the system was limited. Therefore, there is a possibility of minor mishaps in the evaluation of the

final system, since it did not stood possible to pose to the system an adequate number of questions

that would incorporate possible combination of the various terms existing in the renewable energy

sources dictionary. Nevertheless, the individual software components of the system function

correctly and the systems integration was achieved.




 




                                                                                                   53 
 
                                        CHAPTER 6
                   CONCLUSION AND FUTURE WORK


The aim of the MSc project was the development of a question answering prototype for renewable

energy resources based on FAQ pages. To the best of our knowledge, there was no previous

published work in the field of question answering systems for renewable energy. This fact was an

added challenge and a motive towards the realization of the dissertation’s aim. The methodology that

was developed was based in the construction of a semi-automatically created dictionary for

renewable energy resources and the implementation of reasoning during the selection of the more

relevant answers from the database.


One of the more challenging issues that we dealt with during the realization of the project was the

extraction of noun phrases and the creation of question patterns that was accomplished with the

usage of a shallow parsing tool. A very significant number of tests had to be conducted with the tool

in order to construct an algorithm that would produce patterns for any question correctly.


In general, the initial objectives which we set for the project have been met.


    •   A dictionary of terms for the domain of renewable energy resources has been developed.

    •   A corpus of frequent asked questions for the field of renewable energy resources has been
        created.

    •   A prototype for the question-answering module has been designed and implemented.

    •   An evaluation of the system has been presented and further testing will identify any problems
        that might have been over-sighted.




                                                                                                  54 
 
However, if I was to build the system again I would choose first of all to expand the dictionary of

renewable energy resources. Further to that, I would apply a combination of metrics for approximate

match. For instance 3-grams could be combined with edit-distance. Finally, I would try to alter the

reasoning rules as far as the theme(s) and class(es) are concerned.


As it did not stood possible to evaluate the prototype with the participation of annotators, exhaustive

testing should be conducted in the future. The purpose will be to estimate the percentage of precision

and recall for questions posed on the system both from domain experts and users with a layman’s

knowledge of the topic.


However, it is certain, that the proposed methodology can be enhanced with the adding of new

parameters in a future project that would be based in the present dissertation’s assumptions. The key

for the system’s success would be the potent organisation of the dictionary combined with the adding

of a significant number of new terms as well as the expansion of the questions and answers. The

following paragraphs will develop some ideas for a future project.


1. The present system can be ‘transformed’ into an active learning system, i.e a system that could

‘learn’ from the questions that each user poses by developing them, memorising them and storing

them in its database.


2. It could further use a thesaurus like Wordnet that would enable the finding of synonyms for the

words that would be contained in the user’s questions.


3. In regard to reasoning, the system checks only if there is an intersection between each part of the

input question’s pattern and the equivalent part of the pattern of each question in the database. If an

intersection is found between the correspondent part of the input and the stored question, then a

weight is assigned.




                                                                                                    55 
 
However, we do not count the number of intersected words except from the intersection between

verbs. And it is exactly in this point that the reasoning approach we proposed may change in the

future. More specifically, weights can be assigned according to the number of intersected words.


4. Potentially, the system may become a web application. In that case security issues and speed in

response to the user’s question should be taken into consideration.


5. Also, additional rules could be added in order to apply “reasoning” for ambiguous terms.




This dissertation has been a most challenging experience, since it gave me the chance to work within

a new field of the computer science - text processing- working within this framework I have learned

a number of new skills, one of the most important of which was to construct a dictionary of terms

that would be useful for the processing of questions. Further to that, I became acquainted with a

growing and dynamic field of knowledge –that of renewable energy sources. I was also given the

chance to get involved in the full life cycle of software development. This was a very rewarding and

useful experience, since I learned how to find the requirements for a system to be developed, how to

assign risks its functional and non-functional requirements of the system and how to meet the

deadline by planning efficiently my work. Moreover, I enhanced my knowledge of the java

programming language. At the same time I learned how to design correctly a database schema based

on the entity relationship model. Finally, the challenges and the problems that I was called to solve

during the various stages of the system’s implementation together with the testing of the system

prepared efficiently for my future career both as an engineer and a software developer.


What probably makes the processing of the natural language so interesting for a human is the

difficulty to analyze it.




                                                                                                   56 
 
And what makes the processing of natural language so intriguing for the field of computer science is

the difficulty of a human to turn its perception of natural language into a software application. My

involvement in the development of the question answering prototype for renewable energy resources

based on a corpus of frequent asked questions was enough to convince me about this fact.




 




                                                                                                 57 
 
REFERENCES 
 

[1] Renewable Energy Might be Slow to Spur U.S. Economy. Available at:
http://www.scientificamerican.com/article.cfm?id=renewable-energy-slow-boost.
Accessed on: 02/09/2010


[2] Increase Renewable Energy Sources Generate more Jobs. Available at:
http://www.alternative-energy-news.info/increase-renewable-energy-sources-
generate-more-jobs. Accessed on: 02/09/2010



[3] Australia May Boost Renewable Energy Projects. Available at:
www.environmentalleader.com/2010/08/23/greens-election-grains-may-boost-
renewable-energy-projects/ Acessed on: 02/09/2010


[4] Miller.P Energy conservation. Available at:
www.ngm.nationalgeographic.com.2009/03/energy-conservation/miller-text/1.
Acessed on: 02/09/2010



[5] Exploring the benefits of wind power. Available at:
http://.www.ewea.org/index.php?id=1551. Accessed on: 02/09/2010
 

[6] Renewable energy sources what do we want to achieve? Available at:
http://ec.europa.eu/energy/renewables/index_en.htm. Acessed on: 02/09/2010



[7] Sommerville I, Software Engineering (Reading, Massachusetts, 2004)



[8] Maybury M, New Directions in Questions Ansewring (Cambridge MA, 2004)




                                                                                58 
 
[9] Abraham Ittycheriah M. A, Franz M and Roukos S, IBM's Statistical Question
Answering System-TREC-10, Proceedings of the Tenth Text Retrieval Conference,
(TREC 2001). Available at:
Comminfo.rutgers.edu/`muresan/IR/TREC/Proceedings/t10_proceedings/papapers/tre
c2001.pdf. Accessed on: 02/09/2010


[10] Frantzi K.T and Ananiadou S, Automatic term recognition using contextual
cues, Proceedings of 3rd DELOS Workshop1997


[11] Manning D. C and Schütze H, Foundations of Statistical Natural Language
Processing (Cambridge MA, 1999)


[12] Nyberg E and Mitamura T, Carbonell J, Callan J, Collins-Thompson K,
Czuba K and Duggan M. and Hiyakumoto L, Hu N, Huang Y, Ko J, Lira L. V,
Murtagh S Pedro V. and Svoboda D, The JAVELIN Question-Answering System
Proceedings of TREC 12, 2002. Available at:
www.cs.cmu.edu/~hyifen/publication/TREC2003.pdf, Accessed on: 02/09/2010


[13] Berger, A.l, Della Pietra, S.A, Della Pietra, V.J, A Maximum Entropy approach
to Natural Language Processing, A Maximum Entropy approach to Natural Language
Processing, Computational Linguistics (1996), pp 39-71


[14] Giuseppe A, Cisternino A, Formica F, Simi Mand and Tommasi R, PiQASso:
Pisa Question Answering System, Proceedings of the Tenth Text REtrieval
Conference (TREC,2001), pp. 599-607


[15] Zheng Z, AnswerBus Question Answering System 2002. Available at:
citeseerx.ist.psu.edu. Accessed on: 02/09/2010


[16] Alfonseca E, De Boni M, Jara-Valencia, J.S and Suresh Manandhar S, A
prototype Question Answering System using syntactic and semantic information for
answer retrieval, Proceedings of the 10th Text Retreival Conference (TREC, 2002). pp.
680-686




                                                                                  59 
 
[17] Nyberg E and Mitamura T, Carbonell J, Callan J, Collins-Thompson K,
Czuba K and Duggan M. and Hiyakumoto L, Hu N, Huang Y, Ko J, Lira L. V,
Murtagh S Pedro V. and Svoboda D, The JAVELIN Question-Answering System
Proceedings of TREC 12, 2002.
Available at: www.cs.cmu.edu/~hyifen/publication/TREC2003.pdf,
Accessed on: 02/09/2010


[18] Augusto L, Pizzato S and Molla- Aliod D , Extracting Exact Answers Using a
Meta Question Answering System, 2005. Available at: http://www.clt.mq.edu.au.
Accessed on: 02/09/2010


[19] Buscaldi D, Rosso, P, Gomez-Soriano J.S, Sanchis E, Answering questions with
an n-gram based passage retrieval engine,  Journal of Intelligent Information Systems
(2010), pp. 113 - 134



[20] Brill E, Lin J, Banko M, Dumais S and Andrew N, Data-Intensive Question
Answering, Proceedings of the Tenth Text REtrieval Conference (TREC,2001), pp.
393-400


[21] Berry W. M, Survey of Text Mining: Clustering, Classification and Retrieval
(New York, 2004)



[22] Denicia-Carral C, Montes-y-gómez M, Villaseñor-Pineda L and García
Hernández R , A Text Mining Approach for Definition Question Answering,
Proceedings for the Fifth International Conference on Natural Language Processing
(FinTal 2006). Available at: citeseerx.ist.psu.edu. Accessed on: 02/09/2010


[23] Kisuh A, Bos J, ClarkS, Curran J.R, Dalmas T, . Leidner J.L, Smillie M.B and
Webber B Question Answering with QED and WEE at TREC-2004, Proceedings of
the Thirteenth Text Retrieval Conference (TREC, 2004). Available at:
trec.nist.gov/pubs/trec13/papers/uedinburg-syd.bos.qa.pdf. Accessed on: 02/09/2010



[24] Burke R.D, Hammond K.J , Kulyukin V.A,. Lytinen S.L , Tomuro N and
S. Schoenberg, Question Answering from Frequently Asked Question Files, AI
Magazine 18 (2), 1997, pp. 57-66




                                                                                   60 
 
[25] Burch, C and Leidner J Evaluating Question Answering Systems Using FAQ
Answer Injection, Proceeding of the 6th Annual Computational Linguistics
Research Colloquium (CLUK -6) , (Edinburgh, 2003)


 




                                                                        61 
 
APPENDIX I


ORIGIN: Includes all the terms which describe the natural origin of each separate
renewable energy type. For instance, volcano for geothermal energy, solar radiation
for solar energy and reservoirs for most of the energy types, organic matter for
biomass and water for hydropower and ocean power.



 SUITABLE AREAS FOR ENERGY PRODUCTION: Lists all the terms which
describe the areas in which resources for the production of each separate energy type
can be found. Some examples are rivers, valleys and hills. Furthermore this semantic
type contains terms which delineate the parameters that may determine the choice of
an area’s suitability for energy production. For example, altitude, wind speed, and
Weibull distribution.



 PHYSICAL QUANTITIES: Comprises all the terms which describe quantities
related to the natural resources from which each separate energy type is originated.
An example is the kinetic energy of the water or the temperature of the earth.



 CONNECTIVITY and TRANSMISSION TO THE GRID: Displays all the terms
which describe the way in which the electrical energy produced in power plants or
farms (like wind farm), is distributed to end consumers. These terms are not solely
technical. The word intermittent is an example. Further to that, this type contains
terms which describe whether a power station, a photovoltaic system or a farm is
connected to the grid or it is stand alone.



 TECHNOLOGY: Encompasses all the words and phrases which are related to
systems, equipments and machinery which used for energy production along with
their components.




COMMERCIAL ENERGY PRODUCTION: Lists the terms which describe both
industrial large-scale energy production and smaller scale production. Additionally, in
this type terms which indicate processes like gasification or energy conversion are
included. The different types of produced energy, such as mechanical energy are also
incorporated.

                                                                                    62 
 
BENEFITS FOR THE ENVIRONMENT: Incorporates terms which indicate the
positive environmental impact of each renewable energy type.



 ENVIRONMENTAL ISSUES: Includes terms which indicate the negative
environmental impact of each renewable energy type.



USAGE: Encompasses words and phrases which describe the usage of each
renewable energy type.



 FINANCES: Lists financial terms. Most of them are common for all renewable
energy types.



RULES AND REGULATIONS: Includes terms related to laws and regulations which
apply to renewable energy, it also highlights various associations for each separate
energy type.



MEASUREMENT UNITS: Describes measurement scales and SI (International
system of units) units for each separate renewable energy type.



 GEOGRAPHICAL LOCATIONS: Describes locations around the world where
natural resources are found or renewable energy production takes place. For instance,
Germany and California. Additionally, they contain phrases consisting of a word
describing the location and of a term like river, dam, ocean, fall, mountain or valley.
Some examples are Columbia river, Niagara Falls and Pacific Ocean.



 INTERNATIONALLY KNOWN RESOURCES: Includes ‘prominent’ renewable
energy resources or industrial systems around the world., such as Yellowstone
National Park or Carrizo solar farm.




                                                                                    63 
 
    APPENDIX II
 




                     

 




                     




                  64 
 
                                     APPENDIX III


        GEOTHERMAL ENERGY: 

        1. Does a Geothermal system heat and cool? 

        2. Do I need separate earth loops for heating and cooling? 

        3. Is a geothermal heat pump difficult to install? 

        4. Do I need to increase the size of my electric service? 

        5. How efficient is a Geothermal system? 

        Questions Available at:  

         http://www.umrgeothermal.com/faqs.html   

        Accessed on: 02/09/2010 

         

        SOLAR ENERGY: 


    1. What is a solar photovoltaic system? 
    2. Is Ontario a good place to install a solar PV system? 
    3. How much is a solar PV system for my home? 
    4. How large is a typical commercial solar PV system? 
    5. How much electricity will a solar PV system produce? 
    6. Do I need any permits when I install a solar PV system? 
    7. How long does a solar PV system last? 
    8. Does the Ontario government provide any incentives to help me buy a solar PV? 
    9. Where do I buy a solar PV system? 
    10. Can I install a solar PV system myself? 


       Questions  Available at:   

http://www.mei.gov.on.ca/en/energy/renewable/index.php?page=solarpvc_faq#whatis
solar

        Accessed on: 02/09/2010 


                                                                                        65 
 
                             APPENDIX IV


Input Question: What is geothermal energy?


ANSWERS AND THEIR QUESTIONS:

A1: Geothermal means earth-heat.This heat can be captured from the earth’s interior
and can be harnessed to use as energy.

Q1: What is geothermal energy?




A2: Because its source is the almost unlimited amount of heat generated by the
Earth's core. Even in geothermal areas dependent on a reservoir of hot water, the
volume taken out can be reinjected, making it a sustainable energy source.

Q2: Why is geothermal energy a renewable resource?

A3: Unlike ordinary systems, Geothermal systems do not burn fossil fuel to generate
heat; they simply transfer heat to and from the earth to provide a more efficient,
affordable and environmentally friendly method of heating and cooling. Typically,
electric power is used only to operate the unit's fan, compressor and pump.

Q3: What makes a geothermal system different from ordinary systems?




                                                                                66 
 
Input Question: Can renewable energy resources used at home?


ANSWERS AND THEIR QUESTIONS:

A1: Because its source is the almost unlimited amount of heat generated by the
Earth's core. Even in geothermal areas dependent on a reservoir of hot water, the
volume taken out can be reinjected, making it a sustainable energy source.

Q1: Why is geothermal energy a renewable resource?




A2: A typical solar PV system ranges from $9,000 to $11,000 per kilowatt (kW)
installed. The least expensive system to install is a grid-tied system while the most
expensive is the off-grid system.

Q2: How much does a commercial solar PV system cost?




A3: Algae need CO2 for their growth. For capturing CO2, algae are fed with
industrial emissions after the removal of sulphur dioxide. Algae utilize the CO2 which
are dissolved in the water (bicarbonates) as a source of carbon and grows. To grow 1
T of algae about 1.8 T of CO2 is required.

Q3: How are algae used for capturing CO2?




                                                                                   67 
 
Input Question: How long do alternative sources last?


ANSWERS AND THEIR QUESTIONS:

A1: Solar PV panels now have a life expectancy of more than 25 years. The inverter
(a component of the electrical system) require replacement approximately every ten
years. As they age, solar PV systems produce less electricity.

Q1: How long does a solar PV system last?




A2: The PowerBuoy is an offshore wave energy converter, most of which is
submerged below the water's surface. Inside, a piston-like structure moves as the
PowerBuoy bobs with the rise and fall of the waves. This movement drives a
generator, producing electricity, which is sent to the shore by an underwater cable. An
OPT "power plant" will consist of an array of identical PowerBuoys that are
electrically connected to provide the desired power capacity.

Q2: How does the Power Buoy work?




A3: Generating hydroelectricity starts with the annual hydrologic, or water cycle,
providing seasonal rain and runoff from snow pack. The runoff from rain and snow
collects in lakes, streams and rivers and flows to dams downstream. The water funnels
through a dam, into a powerhouse and turns a large wheel called a turbine. The
turbine turns a shaft that rotates a series of magnets past copper coils in a generator to
create electricity. The water then returns to the river. From the powerhouse,
transmission lines carry electricity to communities.

Q3: How does hydropower work?




                                                                                       68 
 
Input Question: What is the cost of renewable energy?


ANSWERS AND THEIR QUESTIONS:

A1: Over the last 20 years, the cost of electricity from utility-scale wind systems has
dropped by more than 80%. In the early 1980s, when the first utility-scale turbines
were installed, wind-generated electricity cost as much as 30 cents per kilowatt-hour.
Now, state-of-the-art wind power plants can generate electricity for less than 5
cents/kWh with the Production Tax Credit in many parts of the U.S., a price that is
competitive with new coal- or gas-fired power plants.The National Renewable Energy
Laboratory (NREL) is working with the wind industry to develop a next generation of
wind turbine technology. The products from this program are expected to generate
electricity at prices that will be lower still.

Q1: How much does wind energy cost?




A2: The most important factors in determining the cost of wind-generated electricity
from a wind farm are: (1) the size of the wind farm; (2) the wind speed at the site; and
(3) the cost of installing the turbines. Each of these factors can have a major
impact.On New England ridgelines, for example, wind farms are likely to be smaller,
to experience lower wind speeds, and to cost more to install than in the flat terrain of
northern Plains states. While wind power may cost less than 5 cents/kWh in the
northern Plains, it may cost 6-7 cents/kWh in New England.

Q2: Why does the cost of wind energy vary from place to place?




A3: Wind is the low-cost emerging renewable energy resource.

Q3: How do utility-scale wind power plants compare in cost to other renewable
energy sources?




                                                                                     69 
 

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:4/5/2013
language:Unknown
pages:70