Docstoc

English Query

Document Sample
English Query Powered By Docstoc
					                 Natural Language Interfaces:
     Comparing English Language Front End and English Query




A thesis submitted in partial fulfillment of the requirements for the degree of Master of
                    Science at Virginia Commonwealth University




                                         By
                                   Richa A Bhootra




                                      Director:
                       Dr Lorraine Parker, Associate Professor
                        Virginia Commonwealth University
                                 Richmond, Virginia
                                  December, 2004
                                                                                     ii
                                Acknowledge ment


I would like to thank Dr Lorraine M. Parker for her assistance, without which this
thesis would not have been completed.

I would like to thank all the faculty and staff of the Departme nt of Compute r
Science at Virginia Commonwealth Unive rsity for everything I have learned here.
                                                                                                                                                                             iii
                                                                    Table of Contents

List of Figures.................................................................................................................... v
Abstract.............................................................................................................................. 1
Chapter 1 ........................................................................................................................... 1
   Introduction......................................................................................................................1
       1.1     Natural Language Interfaces ............................................................................................................... 1
       1.2 Overview of Pro ject ....................................................................................................................................... 3
       1.2     Fundamental steps involved in the conversion ................................................................................ 4
Chapter 2 ........................................................................................................................... 8
             English Query.............................................................................................................................................. 8
       2.1 An Overv iew of English Query.................................................................................................................... 8
       2.2 English Query Environ ment ......................................................................................................................... 9
       2.3 Steps to conversion......................................................................................................................................... 9
       2.4 Lexicon in English Query ........................................................................................................................... 10
       2.5 Semantic Dict ionary..................................................................................................................................... 12
       2.6 Synonyms....................................................................................................................................................... 12
       2.7 Working with Relat ionships ....................................................................................................................... 13
          2.7.1 Adding phrasings to a relationship ................................................................................................... 15
          2.7.2 Types of phrasing................................................................................................................................. 16
             Name/ID Phrasing .................................................................................................................................... 16
             Trait Phrasing ............................................................................................................................................ 17
             Adjective Phrasing ................................................................................................................................... 18
             Single ad jective phrasing ........................................................................................................................ 18
             Entity contains adjectives........................................................................................................................ 19
             Measurements ........................................................................................................................................... 19
             Subset Phrasing......................................................................................................................................... 20
             Verb Ph rasings in Relat ionships ............................................................................................................ 20
             Prepositional Phrasings ........................................................................................................................... 21
             Grouped Phrasings Examp les ................................................................................................................. 22
       2.8 Su mmary ........................................................................................................................................................ 23
Chapter 3 ......................................................................................................................... 25
   English Language Front End (ELF)................................................................................25
       3.1 Introduction ................................................................................................................................................... 25
       3.2 How are Lexicon and Semantic dictionary built in ELF? ..................................................................... 25
       3.3 Steps to Conversion...................................................................................................................................... 29
       3.4 Why does ELF perform better? .................................................................................................................. 31

Chapter 4 ......................................................................................................................... 37
   Comparing ELF and EQ.................................................................................................37
       4.1 Introduction ................................................................................................................................................... 37
       4.2 First Test ........................................................................................................................................................ 38
       4.3 Second Test ................................................................................................................................................... 39
          4.3.1 Examp le 1.............................................................................................................................................. 39
          4.3.2 Examp le 2.............................................................................................................................................. 40
          4.3.3 Results of second test .......................................................................................................................... 41
       4.4 Conclusion ..................................................................................................................................................... 42
                                                                                                                                   iv
Chapter 5 ......................................................................................................................... 43
   Future work ....................................................................................................................43
Appendix .......................................................................................................................... 44
   Appendix A.....................................................................................................................45
   Appendix B .....................................................................................................................57
   Appendix C.....................................................................................................................64
   Appendix D.....................................................................................................................68
References ........................................................................................................................ 73
                                                            v


                         List of Figures

1) The architecture of transformation ……………………………….. 5

2) SQL Project Wizard………………………………………………. 10

3) New Dictionary Entry…………………………………………….. 11

4) Adding synonym to customer_phone_entity……………………… 13

5) Adding new relationship page…………………………………….. 15

6) ELF analysis page…………………………………………………. 25

7) ELF custom analysis page…………………………………………. 26

8) ELF field selection page…………………………………………… 28

9) ELF Lexicon lookup………………………………………………. 29

10) Query box in ELF…………………………………………………. 30

11) First Intermediate result window………………………………….. 30

12) Second Intermediate result window……………………………….. 31

13) ER diagram………………………………………………………… 38

14) Results from experiment 1…………………………………………. 39

15 a) Results from EQ after making changes………………………….. 41

15 b) Overall results form experiment 2……………………………….. 41
                                       Abstract

There are many Natural Language Interfaces availab le for commercial use and each claim

to perform better than the other. The two most commonly used Interfaces, Microsoft

English Query and Access English Language Front End (ELF) were selected for

comparison. In this study, experiments are conducted to compare the performance of

these two interfaces on the basis of accuracy. Each Natural Language Interface

automatically extracts database semantics to answer commonly asked questions.

However each system needs to be tailor made to answer all the possible quest ions for a

particular database.
                                        Chapter 1

                                       Introduction

1.1 Natural Language Inte rfaces

The purpose of a Natural language Interface for a database system is to accept requests in

English and attempt to ―understand‖ them. A natural language interface usually has its

own dictionary. This dictionary contains words related to database and its relationships.

In addition to this, the interface also maintains a standard dictionary (e.g. Webster‘s

dictionary). A natural language interface refers to words in its own dictionary as well as

to the words in the standard dictionary, in order to interpret a query. If the interpretation

is successful, the interface generates a SQL query corresponding to the natural language

request and submits it to the DBMS for processing; otherwise, a dialogue is started with

the user to clarify the request.



The area of NLP research is still very experimental and systems so far have been limited

to small domains, where only certain types of sentences can be used. When the systems

are scaled up to cover larger domains, NLP becomes difficult due to the vast amount of

information that needs to be incorporated in order to parse sentences. For example, the

sentence: ―The woman saw the man on the hill with a telescope‖ could have many

different meanings. To understand what the intended meaning is, we have to take into

account the current context, such as the woman is a witness, and any background

information, such as there is a hill near by with a telescope on it. Alternatively the man

could be on the hill, and the woman may be looking through the telescope. All this
                                                                                             2
information is difficult to represent, so restricting the domain of an NLP system is a

practical way to get a manageable subset of English to work with.



The standard approach to database NLP systems is well established. This approach

creates a ‗semantic grammar‘ for each database, and uses this to parse the English

question. The semantic grammar creates a representation of the semantics of a sentence.

After some analysis of the semantic representation, a database query can be generated in

SQL or any other database language.



The drawback of this approach is that the grammar must be tailor- made for each

database. Some systems allow automatic generation of an NLP system for each database,

but in almost all cases there is insufficient information in the database to create a reliable

NLP system.



Many databases cover a small domain so that an English question about the data within it

can be easily analyzed by an NLP system. The database can be consulted and an

appropriate response can be generated.



The need for a Natural Language Interface (NLI) to databases has become increasingly

important as more and more people access information through web browsers, PDA‘s

and cell phones. These people are casual users and it is necessary to have a way that they

can make queries in their own natural language rather than to first learn and then write
                                                                                      3
SQL queries. But the important point is that NLI‘s are only usable if they map natural

language questions to SQL queries correctly.


1.2 Overvie w of Project

Asking questions to database in a natural language is a very convenient and easy method

of data access, especially for casual users who do not understand complicated database

query languages such as SQL. Many commercial products have emerged to generate

Natural Language Systems. Products such as English Language Front End (ELF) [3] and

English Query [6] attempt to generate a Natural Language system for any database, so

that a database can be queried through an interface. Ideally the process of creating a

Natural Language System is simple. This ideal situation is never the case because extra

information is needed. There is still a large amount of work to be done to make these

systems easy to use and more reliable.



There are many Natural Language Interfaces available for commercial use and each claim

to perform better than the other. The two most commonly used Interfaces, Microsoft

English Query and Access English Language Front End (ELF) were selected for

comparison. In this study, experiments are conducted to compare the performance of

these two interfaces on the basis of accuracy. Each Natural Language Interface

automatically extracts database semantics to answer commonly asked questions.

However each system needs to be tailor made to answer all the possible questions for a

particular database.
                                                                                           4
The first set of experiment was to compare the performance of the Interfaces using an

automatically extracted semantic model. This shows the capabilities of these Interfaces to

automatically extract the semantics from a database. The second set of experiment

compares them after making changes to the semantic model. This includes adding or

modifying relationships.


1.2 Fundame ntal steps involved in the conversion

The transformation of a given English query to an eq uivalent SQL form requires some

basic steps. The workings of all Natural language to SQL software packages deal with

these basic steps in some manner.



First there is a dictionary, where all the words that are exp ected to be used in any English

question are declared. These words consist of all the elements (relations, attributes and

values) of the database and their synonyms. Then these words are mapped to the database

system. This implies that the meaning of each word needs to be defined. They may be

called by different names in different systems but these two (i.e. definition and mapping

of the words) form the basis of the conversion. These are domain dependent modules and

have to be there.



The architecture of transformation process is shown in Figure 1. Note that the, domain

dependent modules (the lexical dictionary, Semantic dictionary and interface with data)

are dependant on the data contained in the database. Below is the detailed explanation of

each of these modules.
                                                                                       5
These are the three basic steps for NL to SQL conversion.




                                                            Do main Dependent Modules
                 English Question




                   Parser                                        Lexical dictionary




                             Parse tree




              Semantic Interpreter                            Semantic dictionary & type
                                                                      hierarchy



                             LQL Query




           LQL to SQL Translator                                 Interface with data




                             SQL Query




            DBMS T ran receiver                                   DBMS




                             Query result




                                                                   Database
             Response generator




                                            Figure 1
                                                                                          6
Lexical dictionary: This holds the definition of all the words that may occur in a

question. The first step of any system is to parse an English question and identify all the

words that are found in the lexical dictionary. The Lexical dictionary also contains the

synonyms of root words.



Semantic dictionary: Once the words are extracted from the English question using the

lexical dictionary, they are mapped to the database. The semantic dictionary contains

these mappings.



This process transforms the English question to an internal language (LQL for the

architecture shown in Figure 1) which is then converted to SQL.



During the mapping process words are attached to each other or to the entities or to the

relations. So the output of this step is a function. For example, consider the question

―What is the salary of each manager?‖ Here the attributes, salary and manager are

attached and so the output is the function has_salary (salary, manager).



Inte rface with data: The next step is the conversion of the internal query developed

above to an equivalent SQL statement. This is done somewhat differently by different

systems. This step may be combined with the step above to directly get a SQL statement.

There are basically some predefined rules (depending on the interface) that change the
                                                                                       7
above generated internal language statement into SQL and so Interface with data contain

all those rules.
                                         Chapter 2

                                      Englis h Query

2.1 An Overview of English Query

English Query is a Natural Language Interface that is a part of Microsoft SQL Server 7.0

or higher. An English Query application takes questions asked in English as input,

determines their meaning, and then writes and executes a SQL query. The first step in

building an application is to create a model. A model is the collection of all information

that is known about the objects in the English Query application. A model includes the

specified database objects (such as tables, fields, and joins) and semantic objects (such as

entities, the relationships between them, and additional dictionary entries).


English Query works best with normalized databases. Applying normalization rules to a

database ensures each table represents a single entity, each column defines one unique

attribute, and each row represents one instance of the entity. English Query can best

translate English into SQL when a database is normalized, and results generated from a

normalized database will be more accurate. However, circumstances might mandate a

structure that is not fully normalized. In this case, views can be used to solve problems

that non-normalized databases cause. The English Query domain editor doesn't

automatically import views. To add a view as an entity, select Table from the Insert menu

and enter the name of the view. The English Query Help file provides examples of how

to use views with non-normalized data.
English Query requires primary keys and foreign keys to perform joins between tables to

satisfy user requests. If these keys are not defined in the database, then they need to be

defined in the domain editor. English Query recognizes joins based on primary and

foreign keys, and establishes relationships for these joins with the wizards. English Query

cannot build an application correctly without these keys.


2.2 Englis h Query Environme nt

English Query features a Microsoft® Visual Studio® version 6.0 development

environment. These features are used to create a project (.eqp) and to test the model

(.eqm) with the English Query engine. After the project has been tested, it can be

compiled into an English Query application (.eqd).


An English Query application can be deployed on the Web and works with Web pages

running on a Microsoft® Internet Information Services (IIS) version 3.0 or later.


2.3 Steps to conversion

The basic model for English Query is developed using the SQL Project Wizard. SQL

Project Wizard automatically associates entities with tables and fields in the database.

The SQL Project Wizard displays a list of the potential entities available based on the

tables in the database. To remove potential entities from the model, clear the check box

(figure 2) to exclude those tables from the model.


Once a model is developed, the authoring tool allows testing it against queries that users

will pose. For example, test question for a library database might be ―What books are
                                                                                     10
checked out most often?‖ and ―Which books are currently overdue?‖ If t he authoring

tool encounters any queries it can‘t process, it makes suggestions and allows you to

define relationships manually to improve the model. Once a suggestion is accepted, EQ

learns the proper way to answer questions (e.g. about overdue and frequently checked out

books) and will handle them properly in the future. The runtime engine handles the

English-to-SQL translation.




                                           Figure 2




2.4 Lexicon in English Query

English Query includes a dictionary containing thousands of common English words.

This dictionary provides an English Query application with the terminology needed to

answer most questions posed in English.
                                                                                       11
Unlike other Interfaces, English Query has only one dictionary which is also the Lexicon.

This has words related to entities, attributes and relationships and common English

words. The lexicon is also the semantic dictionary.


Creating entities (with synonyms) and relationships provides most of the specialized

vocabulary required for an application. Words related to tables, attributes and values are

automatically created in the Lexicon. A dictionary entry for a word is created if the word

being defined is not associated with a particular entity or relationship. The new terms

appear under the Dictionary Entries on Semantics tab in the Model Editor. To view the

entries, expand Dictionary Entries, then add, edit, or delete dictionary entries.




                                          Figure 3
                                                                                       12
2.5 Semantic Dictionary

The semantic dictionary contains the mappings and relationships which are used to map

the question to the database. In English Query the Semantic Object tab of the Semantics

tab in the Model Editor represents the Semantic Dictionary. This contains all the entities

and relationships. A model can be refined by adding relationships to the automatically

generated Semantic model by using the Project wizard of English Query.


2.6 Synonyms

Adding synonyms is an important part of creating a model. Synonyms are useful for

situations when some words are not in the database or are not stored in a form that users

may expect. For example, the phone number for a customer may be stored in the database

as "phone," but users may typically refer to it as "phone number." If the synonym, "phone

number‖ is not added to the entity phone, and a user asks a question "List the customers

and their phone numbers" the response generated is that         English Query does not

recognize "phone number".


There are two ways in which a synonym can be added. The first is to add a synonym for

any word in the dictionary. As can be seen in figure (3), whenever a word is added or

viewed in a dictionary, a synonym can be added. The second way is to add a synonym for

attributes or entities.


Consider the example shown in Figure 4. Add a synonym phone number for phone. In the

left pane of the Se mantics tab of the Model Editor window, expand Entities if it is not

already expanded. Expand custome r, and then double-click customer_phone. The
                                                                                       13
Entity dialog box appears (see figure 4). To the right of the Words list, click the tab to

view a list of synonyms for "phone." The list of synonyms does not include phone

number. Click the Words box and type phone number at the cursor.




                                         Figure 4




2.7 Working with Relations hips

Relationships describe how entities relate to one another. Although the initial goal in an

English Query project might be to answer the most common questions users will ask, the

ultimate goal is to identify and model all the relationships between entities in the

database. A semantic model is desired that represents the business for which English

Query is used.
                                                                                        14
To create the complete semantic model, all the relationships among all the entities in the

database have to be identified. These relationships can be exposed by asking questions

that users might ask. Every question asked is actually a proxy—an example that

represents a class of questions that user might ask. When the relationship or phrasing is

modeled that lets English Query answer your specific question, English Query will

probably be able to use that new relationship or phrasing to answer an entire class of

related questions.


After defining the tables and entities in English Query, relationships must be defined. For

example, although the customers and products entities are defined, English Query has no

inherent understanding of how these two entities relate to one another. English Query

doesn't know that customers buy products until the relationship stating that fact is

defined. To add a relationship, double click the Relationships on the Semantics tab. The

new relationship dialog appears (see Figure 5). The entities for the relationship can then

be selected.
                                                                                          15




                                             Figure 5



2.7.1 Adding phrasings to a relationship

Phrasings are a way of expressing relationships among entities. The phrasing that most

closely reflects how users are likely to ask their questions, is selected. Since two entities

can be related in more than one way, each set of entities might have several phrasings. A

model with all the possible relationships and all possible phrasings can be created, but

that task might be too large. To limit the scope of the application, think about the most

likely questions the intended audience might ask. The ways that users might ask these
                                                                                         16
questions should be considered. The model should include the relationships and phrasings

necessary to answer the target questions.


2.7.2 Types of phrasing

Name/ID Phrasing

Almost every entity has a name or ID. The name/ID phrasing is used to let English Query

know which entity (column) contains the name or ID of the target entity. The Project

Wizard will discover almost all of the relationships between entities and their names.


Here are some examples of name/ID relationship phrasings.


"Employee names are the names of employees." This phrasing defines how the entity

employee_name is related to the entity employee. In this case, the employee_name entity

refers to the firstname and lastname columns of the Northwind database's Employees

table.


"Employee IDs are the IDs of employees." As with the employee name, this relationship

tells English Query that the employee_ID entity is related to the employee entity.


These phrasings help English Query respond to the requests like "List the employee

names" and "Show the customer names." If the response generated for this question is


I don't understand the word "customer" in the phrase "customer name,"


then probably a name/ID phrasing is missing that tells English Query that "customer

names are the names of customers."
                                                                                      17
Trait Phrasing

Trait phrasing is used to describe an entity's attributes. For example, English Query can

be told that:


        Employees have birthdates


        Employees have Social Security numbers


        Employees have phone numbers


        Employees have names


These trait phrasings let English Query successfully answer the questions such as:


"List the employees and their birthdates."


"List the employees' phone numbers and birthdates."


"What is the phone number of Mary Smith?"


"What is John Jones' Social Security number?"


"What is the Social Security number of John Jones?"


"What Social Security number does John Jones have?"
                                                                                          18
Adjective Phrasing

Adjectives are used all the time, but many of us never notice them, even when we talk

about old books, good customers, or lazy employees. If the right phrasings are provided,

English Query allows questions that use adjectives. The three types of adjective phrasings

are Single adjective; Entity contains adjectives, and Measurements.


Adjectives are used all the time, but many of us never notice them, even when we talk

about old books, good customers, or lazy employees. If the right phrasings are provided,

English Query allows questions that use adjectives. The three types of adjective phrasings

are Single adjective, Entity contains adjectives, and Measurements.


Single adjective phrasing

An adjective such as old can be added as single adjective for entity "employee‖. English

Query will apply the adjective old to all employees. Unless the circumstance under which

a relationship is true is restricted, the relationship always applies, so all employees would

be old employees.


A Boolean expression can be added which, when true, signifies that an employee is old.

For e.g. the condition birthdate < 'Jan 1, 1960' can be added to define old employees as

those employees whose birth date is before January 1, 1960. Now questions such as

"Who is old?" and "Which employees are old?" can be asked to English query
                                                                                     19
Entity contains adjectives

Many adjectives might describe an entity and so the adjectives will likely be attributes

stored in the database.


Codes like gender, race, and education level are in the Employees table. The gender

column contains the values M and F, but people might refer to gender by the words man,

male, woman, or female. If the database has a table that maps the gender codes to the

gender names, English Query lets allows including these adjectives in the model. If an

adjective phrasing is added to the relationship "Employees have employee genders" then

English Query's semantic model will be accurate and responsive to employee gender

questions.


Then English Query can successfully answer questions or requests such as


"Which employees are men?"


"List the female employees."


Measure ments

In addition, specifying a measurement adjective phrasing allows questions that use the

comparative or superlative forms. Following are the examples of Measurements phrasing


      "Is John older than Mary?" (comparative)


      "Who is the youngest employee?" (superlative)
                                                                                         20
Subset Phrasing

The subset relationship phrasing refers to subsets of entities. For example, if there is an

entity named mountains, a subset phrasing might tell English Query that "Some

mountains are volcanoes."


Other examples of subset phrasings are


Some employees are programmers


Some employees are contractors


Some books are bestsellers


Verb Phrasings in Relations hips

When a relationship between two entities can be expressed by an action word or verb,

then verb phrasing is used to describe it (for example, salespeople sell briefcases from the

warehouse).



When specifying a phrasing, English Query provides the passive equivalent of the phrase.

For example, specifying the verb phrasing ―Salespeople sell customers products ―also

allows users to ask the passive question "Which products were sold to customers by

which salespeople?‖


In general, active voice is used, rather than passive voice. For example, instead of

creating the phrasing ―products are sold to customers by salespeople‖, create the
                                                                                     21
phrasing salespeople sell customers products. By creating the phrasing in the active

voice, English Query understands questions in both the active and passive voices. For

example, it could answer both ―Who sold John a lawnmower?” (Active voice) and ―What

was sold to John by Fred?” (Passive voice).


Prepositional Phrasings

Prepositional phrasing can be added in the format, Subjects are preposition object—for

example, "Books are about subjects," "Employees are in departments," and "Patients are

on medications." You can add as many as three additional prepositional phrases as well;

for example, in "Employees are on projects (for customers)(at locations)(on contracts),"

the parenthetical phrases are the additional prepositional phrases. These phrases are

added one at a time. Adding the phrase "Employees are on projects (for customers)(at

locations)(on contracts)" lets English Query answer the following questions:


Who is on project X?"


"Which employees are on project X?"


"Who is on a project for customer Y?"


"List the employees on a project on contract Z."


"Who is on project X at location Y?"
                                                                                           22
Words commonly used as prepositions include about, above, acros s, before, below,

concerning, down, for, from, in, like of, on, over, past, regarding, since, through, till, to,

toward, under, until, with, and without.


Phrasal prepositions include according to, along with, as to, because of, due to, in case of,

in place of, instead of, up to, and with regard to.


Grouped Phrasings Examples

The following examples show when phrasings need to be grouped to correctly specify the

relationship.


Example 1: Consider a database that contains information about people and their hair

color. One phrasing that describes this relationship is the trait phrasing, such as people

have hair color. However, this phrasing will not answer questions such as, "What is the

color of John's hair?" For this, the phrasings people have hair and hair has color need to

be added.


"Hair", in this case, is an entity that is not represented by a database object.


These two phrasings collectively describe the relationship between people and hair color.

In order for English Query to treat these two phrasings as one logical unit, they need to be

grouped.


Example 2: Consider a table containing suppliers, parts, and colors. The model is

expected to answer questions such as, "Who sells green parts? ‖ This is a single
                                                                                           23
relationship among suppliers, parts, and colors. The following phrasings in a group is

needed: suppliers supply parts (verb phrasing) and parts have colors (adjective phrasing).

Although creating separate relationships for these two phrasings can be considered, this

would not give the correct answer. In this table, the colors of the parts are inherently

dependent on who supplied them. If an independent relationships are created for these

two phrasings, then the question, "Who sells green parts," is necessarily interpreted as,

"Find all of the suppliers and parts in the sales table such that the part also appears in the

sales table with the color green" (in other words, "Who sells parts (in any color) that are

also sold (by any supplier) in green").


2.8 Summary

There are 4 basic steps involved to build an English Query application.


      Determine the questions that end users are most likely to ask.

      Create a basic model using the SQL Project Wizard

      Refine the model to address any questions that cannot be answered using the basic

       model.

      Test the model and refine it until the model successfully returns the data requested

       using English questions


   English Query has a set of tools that database administrators, application developers,

   and Web professionals can use to develop a natural- language interface to a database.

   Using English Query applications, users can perform database queries using English

   questions or statements.
                                                                                24
English Query provides a robust environment for developing an English Query

model. However, because databases tend to be unique and users ask unique questions,

creating a model that answers users' questions can be a complex process. English

Query requires adding a lot of relationships to the model which becomes complicated

and time consuming.
                                         Chapter 3

                          Englis h Language Front End (ELF)


3.1 Introduction

ELF is a commercial system that generates Natural Language Processing System for a

database. It is developed by ELF Software Co. ELF is an interface which works with

Microsoft Access and Visual Basic.


3.2 How are Lexicon and Se mantic dictionary built in ELF?

The lexicon is automatically built in ELF. In other words, ELF takes an existing database

and scans through it so that it can understand both the data and the relationships that are

set up. This process is the Analyze function, and the interface to it is shown in figure 6




                                          Figure 6
For simpler cases, the process the Express Analysis is sufficient. This causes ELF to

automatically read all the information it needs out of the database. Words related to

attributes and relationships of the database are stored into the lexicon dictionary.


There might be situations when certain tables and relationships need to be excluded from

the lexicon. Custom Analysis is selected for such situations. Using this function decisions

can be made in the beginning to help Access ELF decide where to concentrate, what to

evaluate, and what to ignore. The following screen shows custom analysis window where

the tables to be considered can be selected manually.
                                                                                          27
                                          Figure 7

This window contains all the table names. When a table (or query) in the Custom

Analysis window is de-selected, Access ELF is excused from answering any questions

related to these tables. ELF will not make an attempt to look at how fields of these tables

relate to each other, and will not store any of the words related to these table and their

relationships in its own dictionary. Of course, this speeds up the Analysis process.

Depending upon the situations some information is used frequently, occasionally or not

asked at all. For information which is used frequently in searches and if it's a significant

amount of data, it may be wise to reduce processing time by selectively ignoring parts of

the table. To do this, right click on any table in the Custom Analysis window's Da ta Set

list. A listing of the fields in that table will appear, giving the option of "Acknowledging"

(Ack) and/or "Memorizing" (Mem) each one.
                                                                                          28




                                          Figure 8



If the Acknowledge field is not selected then it is similar to ignoring an entire table;

Access ELF acts as if the field does not exist and will not be able to answer questions

about it. If the Acknowledge field is selected but Memorize is de-selected then this means

that Access ELF will know its type and which table it comes from, as well as many other

details such as whether it participates in relationships, whether it seems to be a person's

name or a place, or...well, any of the literally hundreds of things which Access ELF

figures out about a field. The only thing it will not do is to save all the data entries from

that particular field in its own dictionary (the fast, private dictionary we usually refer to

as the "lexicon").
                                                                                          29


During the Analysis process, Access ELF examines the terms used in defining fields and

tables, and uses its built- in dictionary to try to predict what kinds of synonyms that might

be used in queries. It also stores its type and which table it comes from (builds the

Semantic dictionary).




                                          Figure 9

The above figure shows that the word Supplier is a common noun. The synonym most

commonly used for this is Supplier ID.


3.3 Steps to Conve rsion

Now the database system is ready for answering any English query. ELF does this in

three steps. In the query box type the query ―List all customers‖.
                                                                                 30




                                   Figure 10



The first step is always to parse the English question and find the words that are

stored in the lexicon. ELF finds the word CustomerId in the Lexicon.




                                   Figure 11



Then it finds the mapping and associates the table name with the attributes .The SQL

is generated to get the result.
                                                                                         31




                                        Figure 12




3.4 Why does ELF perform better?

The reason ELF is superior to other natural language systems is very simple. All other

Natural Language systems, including EQ, are modeled on languages which are called

"context-free‖ [7]. All programming languages are defined using context-free. For

example:

                       <program>::=<program- heading><block>

            <program- heading>::=PROGRAM<program- identifier><file- list>

                          <program- identifier>::=<identifier>



These definitions usually go on for a number of pages. Using these rules, any legal

program written in the language can be parsed into a tree, where each symbol in the

program is a leaf at the bottom of the tree, and at the root of the tree is the <program>

node itself [8].
                                                                                          32
Each node of the tree is defined by one of the rules in the language definition listing. The

node itself is marked with the label on the left of the rule, and the branches from that

node are the one, or two, or three, etc. labels to the right of the symbol::= (sometimes

written as an arrow).



This is what defines context- free languages. There's always one object to the left of the

arrow, and one or more to the right. Because of this, the structure of the parsed language

string, in this case a computer program corresponds directly to the concatenation of a

series of rules of the language definition.



The reason the ELF system is so powerful is that its parser does not rely on context- free

grammars. Suppose for a moment that instead of writing the first rule as shown above, it

is written as follows:



                                    <program- heading>

                                              <block>

                                      <program> (1 2)



If the 1 and 2 represent objects found in the corresponding positions of the list, then the

rule clearly means the same thing. It just seems to be a little redundant. However, it's not

redundant once the ability to switch the order of the objects is added. For instance, in this

new notation there is a capability of writing:
                                                                                        33
                                         <block>

                                   <program- heading>

                                     <program> (2 1)



If this rule is added to a language, it could be interpreted as saying that the program

heading could now be typed in AFTER the block, instead of before it. The language

parser would produce the same program as before, because it would switch the position

of the two child nodes.



In context- free languages this cannot happen, because the first object to the left o f the

arrow will always be the leftmost child of that node. There is no way to express "switch

the position of the objects".



There's also no way to express "drop one of the nodes", "insert a node that looks like this

between here and there", and most especially, no way to say, take these right-hand-side

objects and create from them MORE THAN ONE node.



Using the ELF system to model language, this can be done and much more. For instance,

a rule could look like this:

                                           <a>

                                           <b>

                                           <c>
                                                                                      34
                                  <d> (<e> (2) 3)       <d> (1)



This means that, upon reading (or building up from the input) <a>, <b> and <c> objects,

the parser could then construct a PAIR of <d> objects, one of which had <b> and <c> for

children (though not even at the same level) and the other one having <a> as its child.



This flexibility is very useful in modeling natural languages like English. For instance,

words get dropped by English speakers, and this kind of parser can stick them right back

in again where they belong.

                                      <I have something>

                                             <that>

                                      <I want you to see>

                                      <sentence> ( 1 2 3 )



If this is a definition there can also be rule:

                                      <I have something>

                                      <I want you to see>

                                             1 that 2



This rule supplies the missing "that". Now here's the real key. One could argue that why

not keep using a context- free system, and instead of adding the rule shown above the

following rule ca be added:
                                                                                           35
                                     <I have something>

                                     <I want you to see>

                                      <sentence> ( 1 2 )



Or, in context free format

             <sentence> ::= <I have something> <that> <I want you to see>

                 <sentence> ::= <I have something> <I want you to see>



The answer is that, now, not only there are two rules, there are two different structures

(parse trees) that get generated by the parser. In the corresponding ELF example, there

are two rules, but what pops out at the end is the same exact result. No matter which input

the user types, the parser itself standardizes the result.



This is important if the parse tree generated is supposed to do something useful, like get

turned into executable code or translated into an SQL statement. Because ELF uses this

powerful system for modeling language, it could use it to do some pretty good tricks. For

instance, programming language compilers will parse the input and then pass the parse

tree to another program that converts it into an executable program. ELF does not follow

this step. Instead, the parser, as it builds the parse tree from the input, swaps out the

words that the user actually typed in, and substitutes the SQL keywords wanted in the

final result. There's nothing that "analyzes" the parse tree. The leaves of the parse tree, by

the time the parse is finished, is the SQL query to be generated.
                                                                                    36


ELF has editing tools that allows to a user to watch the progress of parsing, print out

parse trees as they are being constructed, turn rules on and off during a parse for

debugging purposes, and much more. This is all available from the Debug Dashboard in

ELF.



All programming languages follow context free grammar. The compiler parses the

program based on this grammar. ELF does not follow context free grammar. The

grammar used by ELF is for a natural language rather than a programming language. This

gives more flexibility and so ELF can understand most of the questions asked.
                                        Chapter 4

                                Comparing ELF and EQ


4.1 Introduction

The process of building a Natural Language Application involves determining the

questions that users are most likely to ask .Doing this prior to creating a model helps in

adding relationships and grammar to the model. As a result of creating these

relationships, the application will be able to answer more questions. A NLI automatically

creates a basic model based on the entities and relationships chosen in the wizard. This

model can then be refined to address any questions that cannot be answered using a basic

model.


The same procedure was followed in the evaluation of the ELF and EQ Natural Language

applications. The experiments were performed using the Northwind database sample that

is shipped with MS SQL Server and MS ACCESS. The standard eight tables were

selected. The Figure (13) shows an overview of the tables, fields, and joins in the

Northwind database.


The first step was determining the questions that are to be asked to these Interfaces. A list

of questions was created. These questions involved simple joins, complex join, func tions

like sum, avg and total, and comparisons like less than or greater than. A complete list of

the query is given in the Appendix A.
The basic model for both the Interfaces was then built. These basic models contained the

automatically generated semantics and relationships.




                                         Figure 13


The aim of this project is to evaluate the performance of English Query and ELF and

reach to a conclusion as to which one performs better.


4.2 First Test

The first experiment was to test the questions in both the applications using only the basic

model. This tested the capabilities of ELF and EQ to automatically extract relationships

from the underlying database. Figure 14 shows the results of this test. ELF gives correct

results for most of the questions and English Que ry does not. This is because English
                                                                                         39
Query does not extract all of the relationships and requires refining of the model by

adding relationships


Interface              No of question asked   No of correct result   No of       incorrect
                                                                     results
ELF                    31                     25                     6
EQ                     31                     3                      28

                                          Figure 14



4.3 Second Test

To test the performance of the Interface it is important that the model is refined to answer

all the questions that user might ask. The relationships in the EQ were added for only

those queries which failed the first test. Following are some of examples of how this was

done. A complete list of these relationships is in Appendix B.




4.3.1 Example 1

The query used is ―List sales managers‖

A sales manager is a value of the attribute contact_title in the Suppliers table. Therefore,

the following relationship is added to the model.


       supplier_contact_titles are adjectives describing suppliers


After adding this relationship the query was tested again. The EQ rephrases the question

as


       Which suppliers are sales manager?
                                                                                   40
and the SQL generated is


        select dbo.Suppliers.SupplierID

        from dbo.Suppliers

        where dbo.Suppliers.ContactTitle='Sales Manager'


Now the EQ knows that it has to fetch the ContactTitle from Suppliers table. The SQL

generated is correct and so the result is also correct.


4.3.2 Example 2

The query used is ―List all customers who ordered in July 1996‖

In the first experiment EQ was unable to generate answer for this query. The following

relationship was added


        Customer order products


The EQ now rephrases the question as


        Which customers ordered products in July, 1996?


When the query was tested in EQ the following SQL was generated


        select distinct dbo.Orders.CustomerID

        from dbo.Orders         where dbo.Orders.OrderDate>='19960701'

        and dbo.Orders.OrderDate<'19960801'


The result generated after adding this relationship is correct.
                                                                                         41
4.3.3 Results of second test

Some synonyms were also added to the model like ―units‖ for units_in_stock and

Location for employee_city. By adding these synonyms questions such as ―Find the

products which have at least 20 units in stock?‖ and ―List employees who are located in

London or Seattle‖


The EQ was now able to answer 14 more queries. The performance of EQ increased

significantly after adding relationships and synonyms. The results from experiment 2 are

given in Appendix C and are summarized in Figure 15. Overall results are summarized in

Figure 15(b).


Interface       No of question asked     No of correct result   No      of   incorrect
                                                                results
EQ              16                       15                     1

                                       Figure 15 (a)




Interface             No of question asked    No of correct result   No      of   incorrect
                                                                     results
ELF                   31                      25                     6
EQ                    31                      18                     13

                                       Figure 15 (b)




So overall out of 31 queries EQ was able to answer 18.
                                                                                      42
It can be concluded that English Query scored approximately 58%, and ELF scored 81%.



4.4 Conclusion

The performance of Natural Language Interfaces can be significantly improved by

customizing them for a database. This can be done by adding semantics and relationships.


The results clearly illustrate the overwhelming superiority of the ELF natural language

database query system over English Query. In ELF the basic model was used and no

modifications were made. This shows that ELF is effective and automatically extracts

most of the relationships from database. Whereas EQ builds up a model with only few

basic relationships and so requires a lot of modification and refinement. This is tedious

and involves a lot of work.


As mentioned in chapter 3, the parser in ELF does not rely on context-free grammar

whereas for EQ it does. This is what makes ELF superior. In ELF, the parser, as it builds

the parse tree substitutes the words with the SQL keywords. By the time parsing is

finished, the final SQL query is ready.
                                        Chapter 5

                                       Future work

Experiments should be repeated using a totally different database. This would address the

concern that the results were just because the structure of Northwind favored ELF.



Currently Natural Language Interfaces are used for small domains. The EQ and ELF can

be compared using large domains. It will be interesting to compare the performance of

these two Interfaces for large domains. The time taken to answer a particular query can

also be compared for larger domains.



In this research work only English Query and English Language Front End were

compared. There are other Interfaces available for commercial use like English wizard.

Evaluating and comparing this interface can be a work of interest in future.
           44




Appendix
                                                                                     45
                                      Appendix A

Que ries and results for ELF and EQ whe n run on base model


No    Que ry       Englis h Language Front End                Englis h Query

                   SQL : SELECT DISTINCT                      SQL : select
                   Customers.CustomerID ,                     dbo.Customers.Custo
      List all
                   Customers.CompanyName FROM                 merID
1     the
                   Customers ;                                from dbo.Customers
      customers
                   Result : correct
                                                              Result : correct
                   SELECT DISTINCT                            select
                   Customers.CustomerID ,                     dbo.Customers.Custo
      Show the     Customers.CompanyName FROM                 merID,
      customers    Customers ;                                dbo.Customers.Addres
2
      and their                                               s
      addresses.   Result : correct                           from dbo.Customers

                                                              Result : correct
                   SELECT DISTINCT                            No SQL generated
                   Customers.ContactName ,
                   Customers.CompanyName FROM                 Result : The following
                   Customers WHERE (                          appears:
                   Customers.ContactTitle = "Sales            Help Command
                   Manager" ) ;                               Type: Entity
                                                              Object:
                   Result : correct                           ENTITY:supplier_cont
                                                              act_title
                                                              Help Text: Supplier
      List sales                                              contact titles named
3
      managers                                                Sales Manager are
                                                              supplier contact titles.

                                                              Summary Text:
                                                              supplier contact title is
                                                              an attribute of supplier.
                                                              supplier contact titles
                                                              participate in the
                                                              following
                                                              relationships:
                                                              suppliers have supplier
                                                                                     46
                                                             contact titles




                SELECT DISTINCTROW Products.*                No SQL generated
                FROM Products WHERE
                Products.ProductName = "Northwoods           Result: The following
    Who sells   Cranberry Sauce" ;                           is shown on the screen
    Northwoo                                                 Categories aren't sold
4   ds          Analysis: Displays all the columns of the    by suppliers. Product
    Cranberry   Product table for this product name rather   names are sold by
    Sauce?      than just the supplier name.                 suppliers.

                Result : correct

                SELECT DISTINCT                              No SQL generated
                Customers.CustomerID ,
                Orders.OrderDate ,                           Result: The following
                Customers.ContactName ,                      is shown on the screen
    List all
                Orders.ShipName ,
    customers
                Customers.CompanyName FROM Orders            Based on the
    who
                , Customers , Orders RIGHT JOIN              information I've been
     Ordered
5               Customers ON Orders.CustomerID =             given about this
    in July
                Customers.CustomerID WHERE ( ( (             database, I can't
    1996
                Orders.OrderDate >= #07/01/1996# and         answer:
                Orders.OrderDate < DateAdd ( "m" , 1 ,       "Customers listed in
                #07/01/1996# ) ) ) ) ;                       dates?‖
                                                             I haven't been given
                Result : correct                             any information on
                                                             dates.
                                                                                    47




                SELECT DISTINCT Products.UnitPrice ,        No SQL generated
                Products.ProductName FROM Products
                WHERE ( ( Products.ProductName LIKE         Result: The following
    Give unit   "Tofu*" or Products.ProductName LIKE        is shown on the screen
6   price for   "*[!A-Z0-9]Tofu*" ) ) ;
    Tofu
                Result : correct                            Sorry, I didn't
                                                            understand that.

                SELECT DISTINCT Suppliers.SupplierID        No SQL generated
                , Products.ProductName ,
                Suppliers.CompanyName FROM Products         Result: The following
                , Suppliers , Categories , Products INNER   is shown on the screen
                JOIN Suppliers ON Products.SupplierID =
    List all
                Suppliers.SupplierID , Products INNER       Based on the
    suppliers
                JOIN Categories ON Products.CategoryID      information I've been
7   who
                = Categories.CategoryID WHERE               given about this
    supply
                Categories.CategoryName = "Beverages" ;     database, I can't
    Beverages
                                                            answer:
                Result : correct                            "Which suppliers
                                                            supply Beverages?‖


              SELECT                                        No SQL generated
              Orders.customerId,Orders.OrderId
              ,Employees.FirstName from                     Result: The following
              Orders,Employees where                        is shown on the screen
    List all  Orders.EmployeeId =
    customers Employees.EmployeeId and
8   and their Employees.FirstName = 'Laura' ;               I don't know how to
    orders by                                               connect customers to
    Laura      Result : correct                             unspecified things, so I
                                                            can't answer this
                                                            question.
                                                                                      48
                  SELECT DISTINCT Orders.OrderID ,            No SQL generated
                  Orders.ShipName FROM Shippers ,
                  Orders , Shippers INNER JOIN Orders         Result: The following
                  ON Shippers.ShipperID = Orders.ShipVia      is shown on the screen
                  WHERE Shippers.CompanyName =
                  "Federal Shipping" ; SELECT DISTINCT
                  [elfQ1].OrderID FROM [elfQ1] ; SELECT       Sorry, I didn't
                  [elfQ1].OrderID FROM [elfQ1] ; SELECT       understand that.
                  ( SELECT count ( elfQ2.OrderID ) FROM
     Give total
                  elfQ2 ) AS [Count_Of OrderID
     number of
                  (Distinct/All)] FROM elfRow in
9    orders for
                  'C:\DOCUMENTS AND
     Federal
                  SETTINGS\HOME\APPLICATION
     Shipping
                  DATA\MICROSOFT\ADDINS\elf32.mda'
                  UNION SELECT ( SELECT count (
                  elfQ3.OrderID ) FROM elfQ3 ) FROM
                  elfRow in 'C:\DOCUMENTS AND
                  SETTINGS\HOME\APPLICATION
                  DATA\MICROSOFT\ADDINS\elf32.mda'
                  ;

                   Result : correct
                  SELECT DISTINCT Suppliers.SupplierID        No SQL generated
                  , Products.ProductName ,
                  Suppliers.CompanyName FROM Products         Result: The following
                  , Suppliers , Categories , Products INNER   is shown on the screen
                  JOIN Suppliers ON Products.SupplierID =
                  Suppliers.SupplierID , Products INNER
                  JOIN Categories ON Products.CategoryID      Based on the
     Who
                  = Categories.CategoryID WHERE               information I've been
10   supplies
                  Categories.CategoryName = "Seafood" ;       given about this
     Sea food?
                                                              database, I can't
                  Result : correct                            answer:
                                                              "Which employees
                                                              does Sea have food?‖
                                                              I haven't been given
                                                              any information on
                                                              food.
                  SELECT DISTINCT uppliers.SupplierID,        No SQL generated
                  Suppliers.CompanyName FROM
     List
                  Suppliers WHERE                             Result: The following
11   suppliers
                  (((Suppliers.Country)="France"));           is shown on the screen
     in France
                  Result : correct                            whether France is a
                                                                                   49
                                                             Product or Category




                   SELECT DISTINCT                           No SQL generated
                   Suppliers.CompanyName ,
     Which are
                   Suppliers.SupplierID FROM Suppliers       Result: The following
     the
                   WHERE ( Suppliers.Country = "Germany"     is shown on the screen
12   suppliers
                   );
     in
                   Result : correct                          whether France is a
     Germany
                                                             Product or Category

                   SELECT DISTINCTROW                        select
                   Products.ProductName ,                    dbo.Products.ProductN
                   Products.UnitsInStock FROM Products ;     ame,
      Find the     SELECT DISTINCT                           dbo.Products.UnitsInSt
     products      [elfQ1].ProductName ,                     ock
     which         [elfQ1].UnitsInStock FROM [elfQ1] ,       from dbo.Product
     have at       Products , Products INNER JOIN [elfQ1]           where
13   least 20      ON Products.ProductName =                 dbo.Products.UnitsInSt
     units in      [elfQ1].ProductName ; SELECT              ock>=20
     stock         DISTINCT elfQ2.UnitsInStock ,
                   elfQ2.ProductName FROM elfQ2 ;
                   SELECT DISTINCT elfQ3.* FROM              Result : correct
                   elfQ3 WHERE elfQ3.[UnitsInStock] > 19
                   ;
                   Result : correct
                     SELECT DISTINCTROW                      No SQL generated
                    Products.ProductName ,
                    Products.UnitsInStock FROM Products ;    Result: The following
     Find the       SELECT DISTINCT                          is shown on the screen
     products       [elfQ1].ProductName ,
     which          [elfQ1].UnitsInStock FROM [elfQ1] ,      Sorry, I didn't
     have at        Products , Products INNER JOIN [elfQ1]   understand that
14   least 20       ON Products.ProductName =
     units in       [elfQ1].ProductName ; SELECT
     stock and      DISTINCT elfQ2.UnitsInStock ,
     price is 18    elfQ2.ProductName FROM elfQ2 ;
     dollars        SELECT DISTINCT elfQ3.* FROM
                    elfQ3 WHERE elfQ3.[UnitsInStock] > 19
                    ;
                   Result : incorrect
                                                                                      50
                   SELECT DISTINCT                            No SQL generated
                  Employees.EmployeeID , Employees.City
                  , Employees.LastName FROM Employees         Result: The following
     List         WHERE ( ( Employees.City = "London"         is shown on the screen
     employee     or Employees.City = "Seattle" ) ) ;         Based on the
     s who are                                                information I've been
15
     located in   Result : correct                            given about this
     London or                                                database, I can't
     Seattle                                                  answer:
                                                              "What are the
                                                              unspecified things
                                                              employees are in?"
                  SELECT DISTINCT                             No SQL generated
                  Customers.CustomerID , Orders.OrderID ,
                  Employees.HomePhone ,                       Result: The following
                  Customers.ContactName ,                     is shown on the screen
                  Orders.ShipName ,
                  Customers.CompanyName ,                     Sorry, I didn't
                  Employees.LastName FROM Orders ,            understand that.
     Customer     Employees , Customers , Orders INNER        Please check your
     who has      JOIN Employees ON Orders.EmployeeID         spelling or phrasing.
16
     placed       = Employees.EmployeeID , Orders
     maximum      INNER JOIN Customers ON
     orders       Orders.CustomerID =
                  Customers.CustomerID WHERE (
                  Orders.OrderID >= ( SELECT max ( (
                  OrderID ) ) FROM Orders ) ) ;


                  Result : incorrect
                  SELECT Products.UnitPrice FROM              No SQL generated
                  Products ; SELECT avg (
      What is
                  [elfQ1].UnitPrice ) AS [avg of UnitPrice]   Result: The following
     the
                  FROM [elfQ1] ;                              is shown on the screen
17   average
     price of     Result : correct                            I haven't been given
     products
                                                              any information on
                                                              prices.
                  SELECT DISTINCT                             No SQL generated
     Which is     Products.ProductName ,
     the most     Products.UnitPrice FROM Products ;          Result: The following
18
     expensive    SELECT DISTINCT max (                       is shown on the screen
     product      [elfQ1].UnitPrice ) AS Lim FROM
                  [elfQ1] ; SELECT DISTINCT [elfQ1].*         Sorry, I didn't
                                                                                      51
                  FROM [elfQ1] INNER JOIN elfQ2 ON            understand that
                  [elfQ1].UnitPrice >= elfQ2.Lim ;

                 Result : correct


                  SELECT DISTINCT Orders.OrderID ,            No SQL generated
                  Orders.ShipName ,
                  Shippers.CompanyName FROM Shippers          Result: The following
                  , Orders , Shippers INNER JOIN Orders       is shown on the screen
                  ON Shippers.ShipperID = Orders.ShipVia
     Orders
                  WHERE ( ( Shippers.CompanyName
     that were
                  LIKE "*Speedy*" ) or (                      Based on the
     shipped by
                  Shippers.CompanyName LIKE                   information I've been
19   Speedy
                  "*Express*" ) ) ;                           given about this
     Express in
                                                              database, I can't
     month of
                Result : correct                              answer:
     October
                                                              "Which orders were
                                                              shipped by Speedy
                                                              Express for months
                                                              long of October,
                                                              2003?"
                  SELECT DISTINCTROW [Order                   No SQL generated
                  Details].OrderID , [Order
                  Details].ProductID ,                        Result: The following
                  Products.UnitsInStock FROM [Order           is shown on the screen
                  Details] , Products , Categories , [Order
                  Details] INNER JOIN Products ON
     List the
                  [Order Details].ProductID =                 I haven't been given
     total
                  Products.ProductID , Products INNER         any information on
     number of
20                JOIN Categories ON Products.CategoryID      stocks.
     items in
                  = Categories.CategoryID WHERE
     stock for
                  Categories.CategoryName = "Beverages"
     Beverages
                  ; SELECT count ( elfQ1.OrderID ) AS
                  [count of OrderID] , elfQ1.UnitsInStock ,
                  elfQ1.ProductID FROM elfQ1 group by
                  elfQ1.ProductID , elfQ1.UnitsInStock ;

                 Result : incorrect
                                                                                           52
                    SELECT DISTINCT                               No SQL generated
                    Customers.CompanyName ,
                    Customers.ContactName ,                       Result: The following
     Companie       Customers.ContactTitle FROM Customers         is shown on the screen
     s where        WHERE ( ( ( Customers.ContactTitle
21   owner is       LIKE "Owner*" or
     the contact    Customers.ContactTitle LIKE "*[!A-Z0-
     person         9]Owner*" ) ) ) ;                             I don't understand the
                                                                  word "contact" in the
                    Result : correct                              phrase "contact
                                                                  person".
                    SELECT DISTINCT Orders.OrderID ,              No SQL generated
                    Suppliers.CompanyName ,
                    Orders.OrderDate , Orders.ShipName            Result: The following
                    FROM [Order Details] , Orders , Products      is shown on the screen
                    , Suppliers , [Order Details] INNER JOIN
                    Orders ON [Order Details].OrderID =
                    Orders.OrderID , [Order Details] INNER        Based on the
     Orders
                    JOIN Products ON [Order                       information I've been
     supplied
                    Details].ProductID = Products.ProductID ,     given about this
     by
22                  Products INNER JOIN Suppliers ON              database, I can't
     Speciality
                    Products.SupplierID =                         answer:
     Biscuits in
                    Suppliers.SupplierID WHERE ( (                "Which orders were
     1996
                    Suppliers.CompanyName LIKE                    supplied by Speciality
                    "*Biscuit*" ) and ( ( ( Orders.OrderDate      Biscuits in 1996?‖
                    >= #01/01/1996# and Orders.OrderDate <
                    DateAdd ( "yyyy" , 1 , #01/01/1996# ) ) ) )
                    );

                    Result : correct
                    SELECT DISTINCT                               No SQL generated
                    Employees.EmployeeID ,
                    Employees.HireDate ,                          Result: The following
                    Employees.LastName FROM Employees             is shown on the screen
     List all the   WHERE Employees.HireDate <
     employees      #01/01/1993# ;                                Based on the
23   hired                                                        information I've been
     before         Result : correct                              given about this
     1993                                                         database, I can't
                                                                  answer:
                                                                  "Which employees
                                                                  were hired before
                                                                  1993?‖
                                                                                      53
                  SELECT DISTINCT Suppliers.SupplierID        No SQL generated
                  , Products.ProductName ,
                  Suppliers.CompanyName FROM Products         Result: The following
                  , Suppliers , Categories , Products INNER   is shown on the screen
     List
                  JOIN Suppliers ON Products.SupplierID =
     condiment
                  Suppliers.SupplierID , Products INNER
     s supplied
24                JOIN Categories ON Products.CategoryID      condiments does not
     by
                  = Categories.CategoryID WHERE (             exist in dictionary
     Pavlova
                  Categories.CategoryName = "Condiments"
     Ltd
                  and Suppliers.CompanyName = "Pavlova,
                  Ltd." ) ;

                  Result : correct
                  SELECT DISTINCT                             No SQL generated
                  Products.ProductName FROM Suppliers ,
                  Products , Suppliers INNER JOIN             Result: The following
                  Products ON Suppliers.SupplierID =          is shown on the screen
     List all
                  Products.SupplierID WHERE
     products
                  Suppliers.Country = "Germany" ;             Based on the
25   supplied
                                                              information I've been
     to
                                                              given about this
     Germany
                  Result: incorrect                           database, I can't
                                                              answer:
                                                              "Which products are
                                                              supplied?‖
                  SELECT DISTINCT Suppliers.SupplierID        No SQL generated
                  , Suppliers.ContactName ,
                  Suppliers.CompanyName FROM              Result: The following
                  Suppliers WHERE not ( Suppliers.Country is shown on the screen
                  = "USA" ) ;
     Suppliers                                            Based on the
     who are      Result : correct                        information I've been
26   not                                                  given about this
     located in                                           database, I can't
     USA                                                  answer:
                                                          "Which things
                                                          supply?"
                                                          I haven't been given
                                                          any information on
                                                          things.
                                                                                      54
                   SELECT DISTINCT                            Select
                   Categories.CategoryID ,                    dbo.Categories.Catego
     Give the
                   Categories.CategoryName FROM               ryName,
     names and
                   Categories ;                               dbo.Categories.Catego
27   category
                                                              ryID
     Id for each
                   Result : correct                           from dbo.Categories
     category
                                                              Result : correct
                   SELECT DISTINCTROW                         No SQL generated
                   Products.ProductName , Products.*
                   FROM Products WHERE                        Result: The following
                   Products.ProductName > "Chai" ;            is shown on the screen

                   Result: incorrect
     Which
                                                              Based on the
     products
                                                              information I've been
28   are more
                                                              given about this
     expensive
                                                              database, I can't
     than chai
                                                              answer:
                                                              "How expensive are
                                                              products?".
                                                              I haven't been given
                                                              any information on
                                                              expensiveness.
                   SELECT DISTINCTROW Products.*              No SQL generated
                   FROM Products WHERE
                   Products.ProductName = "Chai" ;            Result: The following
                                                              is shown on the screen
     How           Result :Displays all the columns for
     much          product Chai .But result is correct        Based on the
29
     does Chai                                                information I've been
     cost?                                                    given about this
                                                              database, I can't
                                                              answer:
                                                              "How much does Chai
                                                              cost?‖
     Which         SELECT DISTINCT                            No SQL generated
     customers     Customers.CustomerID FROM Orders ,
     have          Customers , [Order Details] , Products ,   Result: The following
     ordered       Orders INNER JOIN Customers ON             is shown on the screen
30
     both          Orders.CustomerID =
     Konbu         Customers.CustomerID , Orders INNER        Sorry, I didn't
     and Filo      JOIN [Order Details] ON Orders.OrderID     understand that.
     Mix           = [Order Details].OrderID , [Order
                                           55
Details] INNER JOIN Products ON
[Order Details].ProductID =
Products.ProductID WHERE
Products.ProductName = "Konbu" ;
SELECT DISTINCT
Customers.CustomerID FROM Orders ,
Customers , [Order Details] , Products ,
Orders INNER JOIN Customers ON
Orders.CustomerID =
Customers.CustomerID , Orders INNER
JOIN [Order Details] ON Orders.OrderID
= [Order Details].OrderID , [Order
Details] INNER JOIN Products ON
[Order Details].ProductID =
Products.ProductID WHERE
Products.ProductName = "Filo Mix" ;
SELECT DISTINCT elfQZ1.* FROM
[elfQZ2] , [elfQZ1] , [elfQZ2] INNER
JOIN [elfQZ1] ON [elfQZ2].CustomerID
= [elfQZ1].CustomerID ; SELECT
DISTINCT Customers.CustomerID ,
Products.ProductName , [Order
Details].OrderID , Orders.ShipName ,
Customers.CompanyName FROM elfQZ3
, Orders , Customers , [Order Details] ,
Products , Orders INNER JOIN
Customers ON Orders.CustomerID =
Customers.CustomerID , Orders INNER
JOIN [Order Details] ON Orders.OrderID
= [Order Details].OrderID , [Order
Details] INNER JOIN Products ON
[Order Details].ProductID =
Products.ProductID , Orders INNER JOIN
elfQZ3 ON Orders.CustomerID =
elfQZ3.CustomerID WHERE (
Products.ProductName = "Konbu" or
Products.ProductName = "Filo Mix" )
Order by Customers.CustomerID ;


Result : correct
                                                                                  56
                  SELECT DISTINCT [Order                    No Sql generated
                  Details].Quantity , [Order
                  Details].UnitPrice , [Order               The way EQ interprets
                  Details].Discount , [Order                this question is: Show
                  Details].OrderID FROM [Order Details] ;   the products and the
     Give the                                               difference between
     difference   Result : incorrect                        their product unit
     between                                                prices and their order
31
     unit price                                             detail discounts.
     and
     discount                                               Result: The following
                                                            is shown on the screen
                                                            Products don't have
                                                            order detail discounts.
                                                            Order details have
                                                            order detail discounts.
                                                                                         57
                                       Appendix B

Relationships added to EQ based on query that failed

The relationships are added to the basic model of EQ .The following relationships were
added based on the queries that failed. The process of adding the relationship step by step
is mentioned below.

1) Customers order products

   a)   Drag PRODUCT into canvas pane.
   b)   Right click and select ―Add relationship‖
   c)   Add entity ―customer‖ and ―order_date‖
   d)   In the ―When‖ of New Relationship box add ―order_date‖
   e)   Add verb phrasing
   f)   Select ―subject verb object‖
   g)   In the subject box add ―customer‖
   h)   In the verb box add ―order‖
   i)   In direct object list add ―products‖
   j)   ―Customers order products ― appear in the phrasing

   If Unit price needs to be displayed in addition to product Id then set the product entity
   to display it.

   Query supported: List all customers who ordered in July 1996.

2) Suppliers supply categories

   a)   Drag SUPPLIER into canvas pane.
   b)   Right click and select ―Add relationship‖
   c)   Add entity ―categories‖
   d)   Add verb phrasing
   e)   Select ―subject verb object‖
   f)   In the subject box add ―SUPPLIER‖
   g)   In the verb box add ―supply‖
   h)   In direct object list add ―categories‖
   i)   ―Suppliers supply categories ― appear in the phrasing

   Query supported: List all suppliers who supply Beverages.

3) Customers order from employees
    a) Modify the relationship, customers_order_products, so that it becomes
       customers_order_products_from_employees at a specified time.
    b) Double-click customers_order_products.
                                                                                     58
   c)   In the Relationship dialog box, click Add for Entities.
   d)   In the Select Entities dialog box, double-click employee.
   e)   Double-click customers order products in the Phrasings list.
   f)   In the Verb Phrasing dialog box, do the following:
   g)   Click Add prepositional phrase.
   h)   In Prepositions, type from.
   i)   In Object of preposition, select employees.

   Query supported: List all customers for ―Laura‖.

4) shippers ship products
    a) From the left pane of the Semantics tab, drag shipper onto the Canvas pane.
    b) From the left pane of the Semantics tab, drag products onto shipper in the Canvas
        pane.
    c) In the New Relationship dialog box, click Add for Entities.
    d) In the Select Entities dialog box, double-click order_date.
    e) In the When list, select order_date.
    f) Select Add for Phrasings.
    g) In the Select Phrasing dialog box, double-click Verb Phrasing.
    h) In the Verb Phrasing dialog box, do the following:
    i) In Sentence Type, select Subject Verb Object.
    j) In Subject, select shippers.
    k) In Verb, type ship and press ENTER.
    l) In Direct object, select products.
    m) Click OK.

   Query supported: Give total number of orders for Federal Shipping.

5) customers_company_names_are_the_names_of_cutomers

   a) The model already has a relationship, customers have customer_company_names.
      Instead of creating a new relationship, add new phrasing to the existing
      relationship.
   b) From the left pane of the Semantics tab, drag customer_company_name to the
      Canvas pane.
   c) Drag customer from the left pane into the Canvas pane but not onto
      customer_company_name.
   d) Double-click customers_have_customer_company_names.
   e) In the Relationship dialog box, do the following:
   f) Click Add for Phrasings.
   g) In the Select Phrasing dialog box, double-click Name/ID Phrasing.
   h) In the Name/ID Phrasing dialog box, confirm that Entity that is name/ID is
      customer_company_name and that Entity being named is customers.
   i) Click OK.
                                                                                      59
6) categories_categorize_products

This involves subset phrasing

   a) Drag category from the left pane of the Semantics tab onto the Canvas pane.
   b) Drag product from the left pane of the Semantics tab onto in the Canvas pane but
      not onto category.
   c) The graphic in the Canvas pane shows an existing relationship, products have
      categories, exists in the model.
   d) In the Canvas pane, double-click the products_have_categories relationship.
   e) In the Relationship dialog box, select Add for Phrasings.
   f) In the Select Phrasing dialog box, double-click Subset phrasing.
   g) In the Subset Phrasing dialog box, do the following:
           In the Subject box, select products.
           Select Entity that contains category values.
           Select categories from the list.
           Click OK.

   Query supported: Who supplies ―Seafood‖?

7) some_products_are_in_stock

   a) From the left pane of the Semantics tab, drag the product entity onto the Canvas
      pane.
   b) In the Canvas pane, right-click product and choose Add Relationship.
   c) Click Add for Phrasings.
   d) In the Select Phrasing dialog box, double-click Adjective Phrasing.
   e) In the Adjective Phrasing dialog box, do the following:
   f) In the Subject list, select products.
   g) In the Adjective Type box, select Single adjective.
   h) In the Adjective that describes subject box, type in stock.
   i) Click OK.

   Query supported: Find the products which have at least units in stock and the price is
   18 dollars.

8) supplier_contact_titles are adjectives describing suppliers

   a) In the left pane of the Semantics tab of the Model Editor window, expand
      supplier.
   b) Double-click supplier_contact_title.
   c) In the Entity dialog box, select Add values of entity to model.
   d) Click OK.
                                                                                     60
   e) Next create a new relationship, supplier_contact_titles are adjectives describing
      suppliers.
   f) To create the relationship, supplier_contact_titles are adjectives describing
      suppliers
   g) Drag supplier_contact_title onto the Canvas pane.
   h) Drag supplier onto supplier_contact_title in the Canvas pane.
   i) The New Relationship dialog box appears and displays supplier and
      supplier_contact_title in the Entities list.
   j) Note If the New Relationship dialog box does not appear, try dragging supplier
      onto supplier_contact_title again.
   k) To the right of the Phrasings list, click Add.
   l) Double-click Adjective Phrasing.
   m) In the Adjective Phrasing dialog box, select or enter the following:
   n) In the Subject box, select suppliers.
   o) In Adjective Type, select Entity contains adjectives.
   p) In the Entity that contains adjectives box, select supplier_contact_titles.
   q) Click OK.
   r) If supplier_contact_titles are adjectives describing suppliers appears in the
      Phrasings list, click OK.

Query supported: List sales managers.

9) Suppliers sell products

   a) In the left pane of the Semantics tab of the Model Editor window, expand
      Relationships and double-click products have suppliers.
   b) To the right of the Phrasings box, click Add.
   c) Double-click Verb Phrasing.
   d) In the Verb Phrasing dialog box, do the following:
   e) In the Sentence type list, select Subject Verb Object.
   f) In the Subject list, select suppliers.
   g) In the Verb box, type sells.
   h) Note When creating relationships with verb phrases, phrase them in active voice,
      such as, customers buy products, instead of the passive voice, such as, products
      are bought by customers. When specifying relationships in the active voice, you
      get the passive voice automatically, which allows users to ask questions in either
      active or passive voice.
   i) In the Direct object list, select products.
   j) Click OK.

   Query supported: Which supplier sells ―Northwoods Cranberry Sauce ―?

10) Shipper Id‘s are the names of the shippers
                                                                              61
   This is name/id phrasing. The steps to adding this relationship is same as for
   customers_company_names_are_the_names_of_cutomers mentioned above.

   Query supported: Orders that were shipped by ―Speedy Express‖.

11) shipper_company_names_ship_orders
                                                                                62




Query supported: Orders that were shipped by ―Speedy Express‖.



Synonyms

Some of the synonyms were also added to the model. This helped in answering some of
the queries where EQ was not able to understand certain words.

1) Add Location as a synonym for employee_city
                                                                                      63




Query supported: Which employees are located in London or Seattle?

2) Added a synonym ―units‖ for product_unit_in_stock

Query supported: Find the products which have at least 20 units in stock and the price is
18 dollars
                                                                                     64
                                      Appendix C

S                  English
No Query           query
                                  Analysis
                   Interpretation
                   of the query
                                  select dbo.Suppliers.SupplierID
                   Which          from dbo.Suppliers
    List the
                   suppliers are where dbo.Suppliers.ContactTitle='Sales Manager'
1   sales
                   sales
    managers
                   manager?       Result: correct

                                    select distinct dbo.Products.SupplierID
                   Which
    Which                           from dbo.Products
                   supplier sells
    supplier sells                  where dbo.Products.ProductName='Northwoods
                   Northwoods
2   "Northwoods                     Cranberry Sauce'
                   Cranberry
    Cranberry
                   Sauce?
    Sauce"                          Result: correct

                                    select distinct dbo.Orders.CustomerID
                   Which
                                    from dbo.Orders        where
    List all       customers
                                    dbo.Orders.OrderDate>='19960701'
    customers      ordered
3                                           and dbo.Orders.OrderDate<'19960801'
    who ordered    products in
    in July 1996   July, 1996?
                                    Result: correct

                                    select distinct dbo.Products.SupplierID
                                    from dbo.Categories, dbo.Products where
    List all        Which           dbo.Categories.CategoryName='Beverages'
    suppliers      suppliers                and
4
    who supply     supply           dbo.Categories.CategoryID=dbo.Products.CategoryID
    Beverages      Beverages?
                                    Result: correct

                   Show the         select dbo.Employees.LastName,
                   employees        dbo.Employees.FirstName, dbo.Orders.CustomerID
                   named            into #t003
    List all       "Laura" and              from dbo.Employees left outer join dbo.Orders
5   customers      the              on
    for "Laura"    customers of     dbo.Employees.EmployeeID=dbo.Orders.EmployeeID
                   orders for               where dbo.Employees.FirstName='Laura'
                   which they               or dbo.Employees.LastName='Laura'
                   are the          select distinct #t003.FirstName, #t003.LastName,
                                                                                           65
                      employee.        #t003.CustomerID, dbo.Customers.CompanyName
                                              from #t003 left outer join dbo.Customers on
                                       #t003.CustomerID=dbo.Customers.CustomerID

                                       Result: correct
                                       select count(distinct dbo.Orders.OrderID) as "count"
                      What is the              from dbo.Orders, dbo.Shippers
     Give total       total number             where
     number of        of orders that   dbo.Orders.ShipVia=dbo.Shippers.ShipperID
6    Orders for       are shipped              and dbo.Shippers.CompanyName='Federal
     Federal          by Federal       Shipping'
     Shipping         Shipping?
                                       Result : correct

                                       select distinct dbo.Products.SupplierID
                                       from dbo.Categories, dbo.Products
     Who              Who              where dbo.Categories.CategoryName='Sea food'
7    supplies         supplies "Sea            and
     "Sea food"       food"            dbo.Categories.CategoryID=dbo.Products.CategoryID

                                       Result: correct
                                       select dbo.Suppliers.SupplierID from dbo.Suppliers

     List             Show every       Result: correct. This is because in the Supplier table,
8    suppliers in     supplier in      the attribute city is defined as a proper noun. If you
     "France"         france           quote the city name then EQ understands it. Also if a
                                       question is asked based on the value the it needs to be
                                       quoted.
                                       Result: Same as in Experiment 1.
     Find the
                                       The following two relationships already exists:
     products
                                           Products_have_products_unit_in_stock
     which have
                                           Products_have_Product_unit_prices
9    at least 20
     units in stock
                                       These two relationships should have answered the
     and the price
                                       question
     is 18 dollars

                                       select distinct dbo.Products.UnitPrice
     How much         How much         from dbo.Products
10   does a Chai      does Chai        where dbo.Products.ProductName='Chai'
     cost?            cost?
                                       Result: correct
11   Which            Which            select dbo.Employees.FirstName,
                                                                                         66
     employees      employees      dbo.Employees.LastName, dbo.Employees.City
     are located    are in         from dbo.Employees where
     in London or   London or      dbo.Employees.City='London'
     Seattle        are in                or dbo.Employees.City='Seattle'
                    Seattle?
                                    Result: correct
                                   select dbo.Products.ProductName,
                                   isnull(sum(dbo."Order Details".Quantity), 0) as
                                   "Quantity total"
     Give all the   Show the
                                           from dbo.Products left outer join dbo."Order
     products and   products and
                                   Details" on dbo.Products.ProductID=dbo."Order
12   quantity       their total
                                   Details".ProductID
     ordered in     order detail
                                           group by dbo.Products.ProductID,
     July 1996      quantities
                                   dbo.Products.ProductName

                                  Result: correct
                                  select top 1 with ties dbo.Orders.CustomerID,
                    Show the      dbo.Customers.CompanyName, count(*) as "count"
     Customer       customer that from dbo.Orders, dbo.Customers        where
     who has        has placed    dbo.Orders.CustomerID=dbo.Customers.CustomerID
13   placed         the most               group by dbo.Orders.CustomerID,
     maximum        orders and    dbo.Customers.CompanyName order by 3 desc
     orders         their name.
                                  Result: correct

                    What is the    select avg(dbo.Products.UnitPrice) as "UnitPrice
     What is the    average        average" from dbo.Products
     average        product unit
14
     price of       price of       Result: correct
     products?      products?

                                   EQ response:
                                   Based on the information I've been given about this
                                   database, I can't answer:
                                     "How expensive are products?".
                                   I haven't been given any information on
     Which is the
                                   expensiveness.
     most
15
     expensive
                                   Result: incorrect. However if the question is rephrased
     product?
                                   as ―Which product has the highest price‖ then the
                                   result is correct .the EQ interpretation of the question
                                   is ―Show the product whose product unit price is the
                                   highest‖
                                   SQL generated for this is :
                                                                                     67
                                  select top 1 with ties dbo.Products.ProductName,
                                  dbo.Products.UnitPrice
                                           from dbo.Products order by 2 desc

                                  Result: correct

     Orders that    Which orders Result : correct
     were shipped   were shipped
     by "Speedy     by Speedy
16
     Express" in    Express in
     the month of   the month of
     October        October?
                                                                                           68
                                      Appendix D
                           Copy of the e-mail from elfsoft.com

The reason ELF is superior to other natural language systems is very simple. All other
NL systems, including EQ, are based on methods of modelling languages which are
called "context- free". If you have studied any programming languages, you should be
somewhat familiar with this term. It is the way that all programming languages are
defined (usually somewhere in the back of the language guide). They look something like
this:

<program> ::= <program- heading> <block>
<program- heading> ::= PROGRAM <program- identifier> <file- list>
<program- identifier> := <identifier>

etc. etc.
These definitions usually go on for a number of pages.

Using these rules, you can parse any legal program written in the language into a tree,
where each symbol in the whole program is a leaf at the bottom of the tree, and at the top
(what's called the root of the tree) is the <program> node itself.

Each node of the tree is defined by one of the rules in the language definition listing. The
node itself is marked with the label on the left of the rule, and the branches from that
node are the one, or two, or three, etc. labels to the right of the ::= (sometimes written as
an arrow).

This is what defines context-free languages. There's always one object to the left of the
arrow, and one or more to the right. Because of this, the structure of the parsed language
string -- in this case a computer program -- corresponds directly to the concatanation of a
series of rules of the language definition.

I hope you're already familiar with this, otherwise what follows probably won't make
much sense to you.

The reason the ELF system is so powerful is that its parser does not rely on context-free
descriptions.

Suppose for a moment that instead of writing the first rule as shown above, we write it
like this:

<program- heading>
<block>

<program> ( 1 2 )
                                                                                             69

If the 1 and 2 represent the objects found in the corresponding positions of the list, then
the rule clearly means the same thing. It just seems to be a little redundant. However, it's
not redundant once you add the ability to switch the order of the objects. For instance, in
this new notation we would be capable of writing:

<block>
<program- heading>

<program> ( 2 1 )

If we added this rule to our language, it could be interpreted as saying that the progr am
heading could now be typed in AFTER the block, instead of before it. The language
parser would produce the same program as before, because it would switch the position
of the two child nodes.

In context- free languages this can't happen, because the first object to the left of the arrow
will always be the leftmost child of that node. There is no way to express "switch the
position of the objects".

There's also no way to express "drop one of the nodes", "insert a node that looks like this
between here and there", and most especially, no way to say, take these right-hand-side
objects and create from them MORE THAN ONE node.


Using the ELF system to model language, you can do all this and much more. For
instance, a rule could look like this:

<a>
<b>
<c>

<d> ( <e> ( 2 ) 3 )   <d> ( 1 )

This means that, upon reading (or building up from the input) <a>, <b> and <c> objects,
the parser could then construct a PAIR of <d> objects, one of which had <b> and <c> for
children (though not even at the same level) -- and the other one having <a> as its child.


This flexibility is very useful in modelling natural languages like English. For instance,
things get dropped out by English speakers, and this kind of parser can stick them right
back in again where they belong.

<I have something>
                                                                                              70
<that>
<I want you to see>

<sentence> ( 1 2 3 )

If this is a definition (of course it's way too specific for a real rule), you could also have a
rule:

<I have something>
<I want you to see>

1 that 2


This rule supplies the missing "that".

Now here's the real key. You could ask -- well, why couldn't I simply keep using a
context- free system, and instead of adding the rule you just showed me, I'd add this rule:

<I have something>
<I want you to see>

<sentence> ( 1 2 )

Or, in context free format

<sentence> ::= <I have something> <that> <I want you to see>
<sentence> ::= <I have something> <I want you to see>


The answer is that, now, not only do you have two rules, you have two different
structures (parse trees) that get generated by the parser. In the corresponding ELF
example, you have two rules, but what pops out at the end is the same exact result. No
matter which input the user types, the parser itself standardizes the result.

This is important if the parse tree generated is supposed to do something useful, like get
turned into executable code or translated into an SQL statement! What's more, this same
process of standardization and simplification applies at every node, at every step. For
example, certain tests may need to be applied to see if a rule should be allowed to fire. It
would get very complicated if we had to write one rule for the case when the <that>
appeared and another rule for when it didn't. But we know that the <that> will ALWAYS
be there (since the parser creates it, if it isn't there already). So we can just write our test
assuming it's there.
                                                                                               71

Because we had this powerful system for modeling language, we could use it to do some
pretty good tricks. For instance, programming language compilers will parse the input
and then pass the parse tree to another program that walks the tree and converts it into an
executable program. We don't have anything like that step. Instead, the parser, as it builds
the parse tree from the input, swaps out the words that the user actually typed in, and
substitutes the SQL keywords we want in the final result. There's nothing that "analyzes"
the parse tree. The leaves of the parse tree, by the time the parse is finished, is the SQL
query we're looking for.

Because the system has grown to be somewhat complex, it's not easy to trace a parse, or
to understand an entire parse tree. But it can be done, and the Access ELF product has
editing tools that let you watch a parse it progress, print out parse trees as they are being
constructed, turn rules on and off during a parse for debugging purposes, and much more
besides. This is all available from the Debug Dashboard, and some of it is even
documented!

I should also add one other thing. A question naturally arises, if this system is so good,
why don't other products use it. The answer to this is also simple. Every textbook you
will consult on this topic will explain that parsers for non-context free systems cannot be
built, and if they could they would be useless, because the number of operations required
would be impossibly large. This is explained using the term "combinatorial explosion".
The textbooks, in fact, "prove" that such parsers could not exist, as follows. They show
that insoluable problems (what are called NP-complete problems) can be reduced to
parsing problems. ("Insoluable" in a human time- frame, that is.) In other words, if you
have a certain hard problem -- like whether a path through a complex graph is the shortest
possible -- it can be changed mechanically into a question of whether a certain string is a
legal sentence of a given (non-context- free) language.

Therefore it follows that if an efficient parser could decide this question rapidly, the
original question could be answered rapidly.

Now, this assertion is absolutely true. These hard problems can be turned into questions
about [non-context- free language/sentence] pairs; and in fact they can be represented by
ELF-styles grammars; and (just as the books say) these parses (if the questions represent
anything at all non-trivial) will take longer than any of us has time to wait.

However, the professors make an illogical leap from this. They observe that there is a
certain class of problem that, when expressed as a [non-context- free grammar + sentence]
equivalent, remain insoluable. They reason therefore that all problems phrased in [non-
context- free grammar + sentence] form are thus insoluable.

It's just a wrong idea that has got entrenched. Yes, true, there is no way of restricting or
isolating a graph problem, expressed as a parsing problem, which will eliminate the
                                                                                           72
combinatorial complexity. But this is not true of problems that arise originally as real
language structures, not from mathematics. We're not formal mathematicians here, so we
don't have a formal explanation of this, but I think it's not too hard to understand why this
is. When you take, say, a hundred-city travelling- saleman problem, and try to calculate
all the possible routes through the graph, there's your unmanagable number. One of those
paths is the answer, and it's a grain of sand on a large beach. But all those paths actually
exist, regardless of you or the guy who asked you to solve the problem. Language isn't
like that. It's designed simply to carry ideas from one person to another. And the fact is,
we don't have that many ideas! And if the idea being conveyed isn't pretty much
something like we've already chewed over a bit, we won't understand the idea
anyway... So even though this kind of parser is no good for solving graph problems, it's
very good for resolving database queries into SQL. And eventually it will probably be
very good for translating other types of human language into forms that can be handled
by computers and robots.


I hope this helps you with your thesis. If you're really interested in this, you can probably
learn a lot more from playing with the debugging features described at http://www.elf-
software.com/help/accelf/DebugOptions.htm

- Jon Greenblatt @ ELF Software Co.
                                                                                       73
                                     References

[1] www.cs.washington.edu/research/projects/WebWare1/www/precise/precise.html

[2] Knowles, S., A Natural Language Database Interface for SQL-Tutor, Nov 1993.

[3] ELF Software CO. Natural Language Database Interfaces from ELF Software Co.

   available at www.elfsoft.com

[4] Popescu, A.M., Etzioni, O., Kautz, H., Towards a Theory of Natural Language

Interfaces to Databases, Jan 2003

[5] Androutsopoulos, I., Ritchie, G., Thanisch, P., MASQUE/SQL – An Efficient and

Portable Natural Language Query Interface for Relational Databases, Edinburgh, 1993.


[6] Microsoft English Query Tutorials available with standard installation in SQL
SERVER 7.0 or higher

[7] Private communication with elfsoft.com


[8] Johnsonbaugh, Richard, Discrete Mathematics, sixth edition

				
DOCUMENT INFO