Artificial Intelligence and expert system by erpradeepkr

VIEWS: 720 PAGES: 93

									                .Knowledge Acquisition
Knowledge acquisition includes the elicitation, collection, analysis, modelling and validation
of knowledge for knowledge engineering and knowledge management projects.

Issues in Knowledge Acquisition
Some of the most important issues in knowledge acquisition are as follows:

       Most knowledge is in the heads of experts

       Experts have vast amounts of knowledge

       Experts have a lot of tacit knowledge

            o   They don't know all that they know and use

            o   Tacit knowledge is hard (impossible) to describe

       Experts are very busy and valuable people

       Each expert doesn't know everything

       Knowledge has a "shelf life"

Requirements for KA Techniques
Because of these issues, techniques are required which:

       Take experts off the job for short time periods

       Allow non-experts to understand the knowledge

       Focus on the essential knowledge

       Can capture tacit knowledge

       Allow knowledge to be collated from different experts

       Allow knowledge to be validated and maintained

KA Techniques
Many techniques have been developed to help elicit knowledge from an expert. These are
referred to as knowledge elicitation or knowledge acquisition (KA) techniques. The term "KA
techniques" is commonly used.

The following list gives a brief introduction to the types of techniques used for acquiring,
analysing and modelling knowledge:

                               include various types of interviews
        Protocol-generation techniques
        (unstructured, semi-structured and structured), reporting
        techniques (such as self-report and shadowing) and
        observational techniques
         The aim of these techniques is to produce a protocol, i.e. a record of
         behaviour, whether in audio, video or electronic media. Audio recording is
         the usual method, which is then transcribed to produce a transcript.
Various types of interviews can be used to produce a transcript.
Unstructured interviews have a rough agenda but no pre-defined
structure, so that the expert and knowledge engineer are free to explore
the domain. This is an inefficient way of gathering detailed knowledge,
but can prove useful as an initial interview when little is known of the
domain. It also acts as an ice-breaker to establish a rapport between the
expert and knowledge engineer. A semi-structured interview
combines a highly structured agenda with the flexibility to ask
subsequent questions. The questions for a semi-structured interview are
ideally constructed some time before the interview and are sent to the
expert so he/she can start to prepare responses. For an interview lasting
1 hour, around 10-15 questions might be asked. This allows time in
between the set questions for the knowledge engineer to ask
supplementary questions to clarify points and ask for more detail where
necessary. This is often the preferred style of interview as it helps to
focus the expert on the key questions and helps avoid them giving
unnecessary information. Another form of interview is the structured
interview. This allows no flexibility on the part of the knowledge
engineer whose questions are all pre-established. As such, structured
interviews often involve filling-in a matrix or other diagrammatic

Another family of techniques that produce protocols are think aloud
problem-solving or commentary. These techniques generate protocols
by having the expert provide a running commentary on a typical task
used in the domain. The basic technique here is the self-report, in which
the expert provides a running commentary of their thought processes as
they solve a problem. Experimental evidence has shown that self-reports
can access cognitive processes that cannot be fully recalled without bias
and distortion if explained after the task has been completed. A problem
with the self-report technique is that of cognitive overload, i.e. the
mental effort required by the expert to provide the commentary
interrupts and affects their performance of the task. This is especially
true in dynamic domains where time is critical. One way around this is to
use an off-line reporting technique. Here the expert is shown a protocol
of their task behaviour, typically a video, and asked to provide a running
commentary on what they were thinking and doing. An advantage of this
is that the video can be paused or run at slow speed to allow time for full
explanation. Variants of these reporting techniques involve a second
expert commenting on another expert’s performance.

Teach Back
In the teach back technique, the knowledge engineer describes part of
the knowledge that has been acquired during previous sessions or from
other sources. The expert comments on what the knowledge engineer is
describing to reveal misunderstandings.
     Observational techniques are another way of generating protocols.
     Simply observing and making notes as the expert performs their daily
     activities can be useful, although a time-consuming process. Videotaping
     their task performance can be useful especially if combined with
     retrospective reporting techniques. On the whole, though, simple
     observation techniques are rarely used, as they are an inefficient means
     of capturing the required knowledge.

   Protocol analysis techniques are used with transcripts of interviews or other text-
    based information to identify various types of knowledge, such as goals, decisions,
    relationships and attributes. This acts as a bridge between the use of protocol-
    based techniques and knowledge modelling techniques.

    P r o t o c o l A n a l y s i s Te c h n i q u e s
       Protocol Analysis involves the identification of basic knowledge
       objects within a protocol, usually a transcript. For most projects, this
       makes use of categories of fundamental knowledge such as
       concepts, attributes, values, tasks and relationships. So, for
       example, an interview transcript would be analysed by highlighting
       all the concepts that are relevant to the project. This would be
       repeated for all the relevant attributes, values, tasks and

       In some cases, more detailed categories will be used for the
       identification depending on the requirements of the project. For
       instance, if the transcript concerns the task of diagnosis, then such
       categories as symptoms, hypotheses and diagnostic techniques
       would be used for the analysis. Such categories may be taken from
       generic ontologies and problem-solving models.

       The Protocol Tool in PCPACK can be used to analyse a transcript or
       other piece of text

   Hierarchy-generation techniques, such as laddering, are used to build taxonomies
    or other hierarchical structures such as goal trees and decision networks.
               L a d d e r i n g Te c h n i q u e s
       Laddering techniques involve the creation, reviewing and
       modification of hierarchical knowledge, often in the form of ladders
       (i.e. tree diagrams).

       Here the expert and knowledge engineer both refer to a ladder
       presented on paper or a computer screen, and add, delete, rename
       or re-classify nodes as appropriate.

       Laddering can also involve a set of predefined probe questions, such
       as "Could you tell me some sub-types of X?", "Could you tell me how
       you can tell that something is X?" and "Why would you prefer X to
       Y?". A leading proponent of this is Dr Gordon Rugg.

       Use of Ladders
       Various forms of ladder can be used.

              A concept ladder is particularly important since the way an
               expert categorises concepts into classes is an important key
               to understanding the way the domain knowledge is

              Laddering using an attribute ladder is another very useful
               technique. By reviewing and appending such a ladder, the
               knowledge engineer can validate and help elicit knowledge of
               the properties of concepts.

              Hierarchies with other relationships can also be used, such
               as composition ladders and process ladders described
               earlier. Validation of the knowledge represented in a ladder
               with another expert is often very quick and efficient.

   Matrix-based techniques involve the construction of grids indicating such things as
    problems encountered against possible solutions. Important types include the use of
    frames for representing the properties of concepts and the repertory grid technique
    used to elicit, rate, analyse and categorise the properties of concepts.

       These techniques involve the construction and filling-in of a 2-
       dimensional matrix (grid, table). Useful examples are:

              Concepts v Properties (attributes and values)

              Problems v Solutions

              Hypotheses v Diagnostic techniques

              Tasks v Resources
       The elements within the matrix can contain:
                 Symbols (ticks, crosses, question marks

                 Colours

                 Numbers

                 Text
        The use of frames (see knowledge models) can also be adopted,
        although this would typically be used for validating previously
        acquired knowledge rather than for eliciting knowledge from scratch.

        Timelines (see knowledge models) can also be used to acquire
        time-based knowledge.

        The Matrix Tool in PCPACK allows the creation of most types of

   Sorting techniques are used for capturing the way people compare and order
    concepts, and can lead to the revelation of knowledge about classes, properties and

        Sorting techniques are a well-known method for capturing the way
        experts compare and order concepts, and can lead to the revelation
        of knowledge about classes, properties and priorities.

        The simplest form is card sorting. Here the expert is given a
        number of cards each displaying the name of a concept. The expert
        has the task of repeatedly sorting the cards into piles such that the
        cards in each pile have something in common. For example, an
        expert in astronomy might sort cards showing the names of planets
        into those that are very large, those that of medium size and those
        that are relatively small. By naming each pile, the expert gives
        information on the attributes and values they use to denote the
        properties of concepts. Variants of this involve sorting objects or
        photographs rather than cards in domains where simple textual
        descriptors are not easy to use.

        A technique often used in conjunction with sorting techniques is
        triadic elicitation (aka 'Three Card Trick'). This technique prompts
        the expert to generate new attributes. This involves asking the
        expert what is similar and different about three randomly chosen
        concepts, i.e. in what way are two of them similar and different from
        the other. This is a way of eliciting attributes that are not
        immediately and easily articulated by the expert.

   Limited-information and constrained-processing tasks are techniques that either
    limit the time and/or information available to the expert when performing tasks. For
    instance, the twenty-questions technique provides an efficient way of accessing the
    key information in a domain in a prioritised order.
       Limited-information and constrained-processing tasks are techniques
       which either limit the time and/or information available to the expert
       when performing tasks that would normally require a lot of time and
       information to perform. This provides a quick and efficient way of
       establishing the key tasks and information used.

       An interesting variant of this is the twenty-questions technique.
       Here the aim is for the expert is to guess something that the
       knowledge engineer is thinking about (as in the parlour game of
       ‘animal, vegetable and mineral’). The expert is allowed to ask
       questions of the knowledge engineer who is only allowed to respond
       yes or no. As the expert asks each question, the knowledge engineer
       notes this down. The questions asked and the order in which they
       are asked give important knowledge such as key properties or
       categories in a prioritised order.

   Diagram-based techniques include the generation and use of concept maps, state
    transition networks, event diagrams and process maps. The use of these is
    particularly important in capturing the "what, how, when, who and why" of tasks
    and events.

       These techniques include the generation and use of network
       diagrams, such as concept maps, state transition networks and
       process maps (see types of knowledge models). As with laddering,
       the knowledge engineer elicits knowledge from the expert by mutual
       reference to a diagram on paper or computer screen.

       The use of concept maps has been strongly advocated as a
       comprehensive technique for eliciting many types of knowledge. Use
       of network diagrams has become a mainstream technique when
       acquiring knowledge to develop object-oriented software. For
       example, the industry standard UML (Unified Modelling Language)
       makes use of concept maps (combined with frames) for object
       knowledge, state transition networks for dynamic modelling, and
       process maps for functional modelling. As with laddering, the
       presentation of knowledge in a network format makes validation
       very efficient.

       The ease with which people understand and relate to networks has
       been demonstrated with experimental evidence showing that people
       understand and apply knowledge more easily and readily if a concept
       map notation is used rather than predicate logic.
Comparison of KA Techniques
The figure below presents the various techniques described above and shows the types of
knowledge they are mainly aimed at eliciting. The vertical axis on the figure represents the
dimension from object knowledge to process knowledge, and the horizontal axis represents
the dimension from explicit knowledge to tacit knowledge.

Typical Use of KA Techniques
How and when are the many techniques described above used in a knowledge acquisition
project? To illustrate the general process, a simple method will be described. This method
starts with the use of natural techniques, then moves to using more contrived techniques.
It is summarised as follows.

      Conduct an initial interview with the expert in order to (a) scope what knowledge is
       to be acquired, (b) determine what purpose the knowledge is to be put, (c) gain
       some understanding of key terminology, and (d) build a rapport with the expert.
       This interview (as with all session with experts) is recorded on either audiotape or

      Transcribe the initial interview and analyse the resulting protocol. Create a concept
       ladder of the resulting knowledge to provide a broad representation of the
       knowledge in the domain. Use the ladder to produce a set of questions which cover
       the essential issues across the domain and which serve the goals of the knowledge
       acquisition project.
      Conduct a semi-structured interview with the expert using the pre-prepared
       questions to provide structure and focus.
      Transcribe the semi-structured interview and analyse the resulting protocol for the
       knowledge types present. Typically these would be concepts, attributes, values,
       relationships, tasks and rules.

      Represent these knowledge elements using the most appropriate knowledge
       models, e.g. ladders, grids, network diagrams, hypertext, etc. In addition,
       document anecdotes, illustrations and explanations in a structured manner using
       hypertext and template headings.

      Use the resulting knowledge models and structured text with contrived techniques
       such as laddering, think aloud problem-solving, twenty questions and repertory grid
       to allow the expert to modify and expand on the knowledge already captured.

      Repeat the analysis, model building and acquisition sessions until the expert and
       knowledge engineer are happy that the goals of the project have been realised.

      Validate the knowledge acquired with other experts, and make modifications where
This is a very brief coverage of what happens. It does not assume any previous knowledge
has been gathered, nor that any generic knowledge can be applied. In reality, the aim
would be to re-use as much previously acquired knowledge as possible. Techniques have
been developed to assist this, such as the use of ontologies and problem-solving models.
These provide generic knowledge to suggest ideas to the expert such as general classes of
objects in the domain and general ways in which tasks are performed. This re-use of
knowledge is the essence of making the knowledge acquisition process as efficient and
effective as possible. This is an evolving process. Hence, as more knowledge is gathered
and abstracted to produce generic knowledge, the whole process becomes more efficient. In
practice, knowledge engineers often mix this theory-driven (top-down) approach with a
data-driven (bottom-up) approach (discussed later).

precedent for the aim of this thesis to apply practices from knowledge engineering to the
realm of personal knowledge.

              Knowledge Management
          Knowledge Management is a strategy, framework or system designed
          to help organisations create, capture, analyse, apply, and reuse
          knowledge to achieve competitive advantage.

          A key aspect of Knowledge Management is that knowledge within an
          organisation is treated as a key asset.

          A simple phrase that encapsulates a core aspect of Knowledge
          Management is "getting the right knowledge to the right people at the
          right time in the right format".
              Knowledge Management Methods
              Knowledge Management methods can be categorised into two main

                     Those that move knowledge around the organisation

                     Those that help create new knowledge

              Methods that help move knowledge include:

                     Face-to-face communication methods, e.g. peer assist, lessons
                      learnt reviews, knowledge fairs

                     Computer-based communication methods, e.g. email, Lotus
                      Notes, communities of practice

                     Storage-and-retrieval using computer systems, e.g. intranets,
                      knowledge books

                     Knowledge-Based Systems, e.g. expert systems

                      Knowledge Engineering
Knowledge engineering is a field within artificial intelligence that develops knowledge-based
systems. Such systems are computer programs that contain large amounts of knowledge, rules
and reasoning mechanisms to provide solutions to real-world problems.

A major form of knowledge-based system is an expert system, one designed to emulate the
reasoning processes of an expert practitioner (i.e. one having performed in a professional role for
very many years). Typical examples of expert systems include diagnosis of bacterial infections,
advice on mineral exploration and assessment of electronic circuit designs.

Importance of Knowledge Acquisition
The early years of knowledge engineering were dogged by problems. Knowledge engineers found
that acquiring enough high-quality knowledge to build a robust and useful system was a very long
and expensive activity. As such, knowledge acquisition was identified as the bottleneck in building
an expert system. This led to knowledge acquisition becoming a major research field within
knowledge engineering.

The aim of knowledge acquisition is to develop methods and tools that make the arduous task of
capturing and validating an expert’s knowledge as efficient and effective as possible. Experts tend
to be important and busy people; hence it is vital that the methods used minimise the time each
expert spends off the job taking part in knowledge acquisition sessions.
Knowledge Engineering Principles
Since the mid-1980s, knowledge engineers have developed a number of principles, methods and
tools that have considerably improved the process of knowledge acquisition. Some of the key
principles are summarised as follows:

      Knowledge engineers acknowledge that there are different types of knowledge, and that
       the right approach and technique should be used for the knowledge required.

      Knowledge engineers acknowledge that there are different types of experts and expertise,
       such that methods should be chosen appropriately.

      Knowledge engineers recognise that there are different ways of representing knowledge,
       which can aid the acquisition, validation and re-use of knowledge.

      Knowledge engineers recognise that there are different ways of using knowledge, so that
       the acquisition process can be guided by the project aims.

      Knowledge engineers use structured methods to increase the efficiency of the acquisition

Knowledge Engineering Methodologies
Epistemics is involved in three methodologies to support the development of knowledge systems:

           CommonKADS is the methodology that is most commonly followed at
           Epistemics when developing knowledge engineering systems.

           CommonKADS is a complete methodological framework for the
           development of a knowledge based system (KBS). It supports most
           aspects of a KBS development project, such as:

                     Project management

                     Organisational analysis (including problem/opportunity

                     Knowledge acquisition (including initial project scoping)

                     Knowledge analysis and modelling

                     Capture of user requirements

                     Analysis of system integration issues

                     Knowledge system design

           CommonKADS describes KBS development from two perspectives:

                     Result perspective: A set of models, of different aspects of the
                      KBS and its environment, that are continuously improved during a
                     project life-cycle.

                    Project management perspective: A risk-driven generic spiral
                     life-cycle model that can be configured into a process adapted to
                     the particular project.

The SPEDE methodology is a combination of principles, techniques and tools taken from Knowledge
Engineering and adapted for use in Knowledge Management. It provides an effective means to
capture, validate and communicate vital knowledge to provide business benefit.

The SPEDE methodology was developed under the guidance of Rolls-Royce plc and involved staff
from Epistemics acting as consultants. Early versions of PCPACK v4 were tested and developed on
a number of SPEDE projects.

With assistance from Epistemics, Rolls-Royce has run over 100 SPEDE projects, involving the
training of over 150 employees.

Structure and Deliverables
SPEDE has been specifically developed to act as a training course for novice knowledge engineers
or those seconded to a knowledge management activity. SPEDE projects typically involve 1-week
of intensive training followed by 2-3 months of scoping, knowledge acquisition and delivery phases.

The main deliverable of most SPEDE projects is an intranet website. However, previous projects
have delivered quality procedures, process improvement information and expert systems.

Projects using the SPEDE methodology follow a set of procedures coordinated by experienced staff.
All projects have a coach who manages the activities of one or more knowledge engineers on a
daily basis.

All SPEDE projects must pass through a series of gates. These are meetings held at various stages
throughout the project to act as a "go/no go" into the next phase of the project. Each gate
comprises various criteria to ensure the project is on track to meet the objectives and identify any
problems, hazards and actions. There are 5 gates: project launch review, scoping review, technical
review, delivery review and post-delivery review.

                       Methodology and tools Oriented to
                    Knowledge-Based Engineering Applications
MOKA is a methodology for developing Knowledge-Based Engineering applications, i.e. systems
that support design engineers. It is particularly aimed at capturing and applying knowledge within
aeronautical and automotive industries of the design of complex mechanical products.

Whilst huge benefits can be gained by the use of Knowledge-Based Engineering (KBE) technology,
the lack of a recognised methodology has resulted in a significant risk when developing and
maintaining KBE applications. MOKA aims to provide such a methodology, that:

      Reduces the lead times and associated costs of developing KBE applications by 20 - 25%.

      Provides a consistent way of developing and maintaining KBE applications.

      Will form the basis of an international standard.

      Makes use of a software tool to support the use of the methodology.

Need for MOKA
Companies have to manage and reuse engineering knowledge to improve business processes, to
reduce time to find new solutions, to make correct first time and to retain best practices. The aim
of MOKA is to provide a methodology to capture and formalise engineering knowledge to reuse it,
for example within KBE applications. Development and maintenance of knowledge intensive
software applications is a complex and potentially expensive activity. The number of Knowledge-
Based Engineering (KBE) systems used in the aeronautical and automotive industries has increased
in recent years. Experience has shown that long term risk can be reduced by employing a
systematic methodology that covers the development and maintenance of such systems. The
ESPRIT-IV funded project called MOKA (No. 25418) is intended to satisfy this need by providing
both a methodology and a supporting software tool, both of which are independent of any KBE

MOKA Analysis and Modelling
MOKA identifies two models to be used in the KBE application development lifecycle :

      Informal Model: A structured, natural language representation of engineering knowledge
       using pre-defined forms.

      Formal Model: A graphical, object-oriented representation of engineering knowledge at
       one level of abstraction above application code.
Within each of these models, various knowledge representations are used to help capture, analyse
and structure the knowledge required for KBE applications.

Within the informal model, the main knowledge objects are:

      Entities

           o   Structural Entities (the components of the product being designed)

           o   Functional Entities (the functions of the product and its sub-components)

      Constraints (the design requirements of the product and its sub-components)

      Activities (the tasks performed during the design process)
      Rules (decision points in the design process that affect what tasks to perform)
      Illustrations (examples that illustrate aspects of the product and design)

PCPACK can be used to satisfy the requirements for a supporting software tool for the MOKA
methodology. It supports the capture, analysis, modelling and publishing of design knowledge
using a MOKA framework.

MOKA (Methodology and tools Oriented to Knowledge-Based Engineering Applications) was an
ESPRIT funded project that started in January 1998 and consisted of the following partners:
Aerospatiale Matra (prime), British Aerospace, Daimler-Chrysler, PSA Peugeot Citroen, Knowledge
Technologies International, Decan and Coventry University. A MOKA Interest Group continues to
meet and develop the methodology.

                     Knowledge Modelling
             An important aspect of knowledge acquisition is the use of knowledge
             modelling as a way of structuring projects, acquiring and validating
             knowledge and storing knowledge for future use.

             Knowledge models are structured representations of knowledge using
             symbols to represent pieces of knowledge and relationships between
             them. Knowledge models include:

                    Symbolic character-based languages, such as logic

                    Diagrammatic representations, such as networks and ladders

                    Tabular representations, such as matrices

                    Structured text, such as hypertext.

             Uses of Knowledge Models
             The generation and modification of a knowledge model is an essential
             aspect of knowledge acquisition, as the model helps to clarify the
             language being used and quickly convey information for validation and
             modification where necessary. Thus, the use of knowledge models is of
             great benefit during:

                    knowledge elicitation (from an expert)

                    validation (with the same expert)

                    cross-validation (with another expert)
                    knowledge publication

                    maintenance and updating of the knowledge system or
             Most forms of knowledge models are composed of primitive elements
             called knowledge objects.

                         Knowledge Models
The field of Artificial Intelligence may not have produced fully intelligent machines but one of its
major achievements is the development of a range of ways of representing knowledge. A
thorough understanding of different knowledge representations is a vital part (arguably the vital
part) of Artificial Intelligence, since the ease of solving a problem is almost completely
determined by the way the problem is conceptualised and represented. The same is true for the
task of communicating knowledge. A well-chosen analogy or diagram can make all the difference
when trying to communicate a difficult idea to someone, especially a non-expert in the field.

Knowledge engineers make use of a number of ways of representing knowledge when acquiring
knowledge from experts. These are usually referred to as knowledge models.

Three important types of knowledge models are:

       Ladders: Ladders are hierarchical (tree-like) diagrams. Some important types of ladders
        are concept ladder, composition ladder, decision ladder and attribute ladder. Ladders can
        be created and edited using the Ladder Tool in PCPACK.

       Network Diagrams: Network diagrams show nodes connected by arrows. Depending on
        the type of network diagram, the nodes might represent any type of concept, attribute,
        value or task, and the arrows between the nodes any type of relationship. Examples of
        network diagrams include concept maps, process maps and state transition networks.
        Network diagrams can be created and edited using the Diagram Tool in PCPACK. Tool

       Tables and Grids: Tabular representations make use of tables or grids. Three important
        types are forms, frames, timelines and matrices/grids. Matrices can be created and edited
        using the Matrix Tool in PCPACK.
Descriptions and examples of the important types of knowledge models are shown below.

Concept Ladder
A concept ladder shows classes of concepts and their sub-types. All relationships in the ladder are
the is a relationship, e.g. car is a vehicle. A concept ladder is more commonly known as a
taxonomy and is vital to representing knowledge in almost all domains. An example of a concept
ladder is shown below.
Composition Ladder
A composition ladder shows the way a knowledge object is composed of its constituent parts. All
relationships in the ladder are the has part or part-of relationship, e.g. wheel is part of car. A
composition ladder is a useful way of understanding complex entities such as machines,
organisations and documents. An example of a composition ladder is shown below.
Decision Ladder
A decision ladder shows the alternative courses of action for a particular decision. It also shows
the pros and cons for each course of action, and possibly the assumptions for each pro and con. A
decision ladder is a useful way of representing detailed process knowledge. An example of a
decision ladder is shown below.
Attribute Ladder
An attribute ladder shows attributes and values. All the adjectival values relevant to an attribute
are shown as sub-nodes, but numerical values are not usually shown. For example, the attribute
colour would have as sub-nodes those colours appropriate in the domain as values, e.g. red, blue,
green. An attribute ladder is a useful way of representing knowledge of all the properties that can
be associated with concepts in a domain. An example of an attribute ladder is shown below.

Process Ladder
This ladder shows processes (tasks, activities) and the sub-processes (sub-tasks, sub-activities)
of which they are composed. All relationships are the part of relationship, e.g. boil the kettle is
part of make the tea. A process ladder is a useful way of representing process knowledge. An
example of a process ladder is shown below.
Concept Map
A concept map a type of diagram that shows knowledge objects as nodes and the relationships
between them as links (usually labelled arrows). Any types of concepts and relationships can be
used. The concept map is very similar to a semantic network used in cognitive psychology. An
example of a concept map is shown below.
Process Map
A third important type of network diagram is a process map. This type of diagram shows the
inputs, outputs, resources, roles and decisions associated with each process or task in a domain.
The process map is an excellent way of representing information of how and when processes,
tasks and activities are performed. An example of a process map is shown below.
State Transition Network
Another important type of network diagram is the state transition network. This type of diagram
comprises two elements: (1) nodes that represent the states that a concept can be in, and (2)
arrows between the nodes showing all the events and processes/tasks that can cause transitions
from one state to another. An example of a state transition network is shown below.

Frames are a way of representing knowledge in which each concept in a domain is described by a
group of attributes and values using a matrix representation. The left-hand column represents
the attributes associated with the concept and the right-hand column represents the appropriate
values. When the concept is a class, typical (default) values are entered in the right-hand
column. An example of a frame is shown the table below for the concept Novel.
A timeline is a type of tabular representation that shows time along the horizontal axis and such
things as processes, tasks or project phases along the vertical axis. It is very useful for
representing time-based process or role knowledge.

A matrix (aka grid) is a type of tabular representation that comprises a 2-dimensional grid with
filled-in grid cells. One example is a problem-solution matrix that shows the problems that can
arise in a particular part of a domain as the rows in the matrix and possible solutions as the
columns. Ticks, crosses or comments in the matrix cells indicate which solution is applicable to
which problem. Another important type of matrix used by knowledge engineers is a focus grid,
described later in this chapter. Examples of different forms of matrix are shown on the page
describing the PCPACK Matrix Tool.

A more recent form of knowledge model is the use of hypertext and web pages. Here
relationships between concepts, or other types of knowledge, are represented by hyperlinks. This
affords the use of structured text by making use of templates, i.e. generic headings. Different
templates can be created for different knowledge types. For example, the template for a task
would include such headings as description, goal, inputs, outputs, resources and typical

Hypertext pages can be created and edited using the Annotation Tool in PCPACK.
           Knowledge Objects
Philosophers have been thinking about knowledge for thousands of
years. Part of their endeavours has been the identification of various
types of knowledge and classification systems. These typologies have
been adopted by knowledge engineers when analysing texts and
constructing knowledge models.

Declarative and Procedural Knowledge
One well-known distinction is between declarative knowledge
(knowledge of facts) and procedural knowledge (knowledge of how
to do things), or what has been called "knowing that" and "knowing
how". Within knowledge engineering, these two types are often
referred to as object knowledge and process or task knowledge.

Tacit and Explicit Knowledge
Another well-known classification of knowledge is that of tacit
knowledge (cannot be articulated easily) and explicit knowledge
(can be articulated easily). This is particularly important for knowledge
engineers, as special techniques have to be used with an expert to try
to elicit tacit knowledge, which is the hardest and often the most
valuable knowledge to acquire.

Generic and Specific Knowledge
A further way of classifying knowledge is to what extent it is generic
(applies across many situations) or specific (applies to one or a few
situations). Developing ways in which specific knowledge can be made
more generic, and generic knowledge can be made more specific, has
been a major effort in knowledge engineering.

Knowledge Objects
The field of logic has also inspired important knowledge types, notably
concepts, attributes, values, rules and relationships.

When analysing a piece of text, such as a transcript, so that
knowledge models can be created, knowledge engineers try to identify
low-level knowledge objects. Brief definitions of some of the most
important of these are as follows.

Concepts are the things that constitute a domain, e.g. physical
objects, ideas, people and organisations. Each concept is described by
its relationships to other concepts in the domain (e.g. in a hierarchy)
and by its attributes and values. From a grammatical perspective,
concepts are usually equivalent to nouns.

An instance is an instantiated class. For example, "my car" is an
instance of the concept "car".

Instances only have the attributes of their class (including inherited
attributes). They may override any or all of the default values. For
example, the "my car" attribute "maximum speed" may be 90mph,
overriding the default of 100mph for all cars.

Processes (Tasks, Activities)
Processes (aka tasks, activities) are sets of actions performed to
satisfy a goal or set of objectives. Some examples are:

       build the house

       design the engine

       plan the project
Processes are described using other knowledge objects, such as
inputs, outputs, resources, roles and decision points.

Attributes and Values
Attributes and values describe the properties of other knowledge

Attributes are the generic properties, qualities or features belonging
to a class of concepts, e.g. weight, cost, age and ability.

Values are the specific qualities of a concept such as its actual weight
or age. Values are associated with a particular attribute and can be
numerical (e.g. 120Kg, 6 years old) or categorical (e.g. heavy, young).
From a grammatical perspective, values are equivalent to adjectives.

Rules are statements of the form "IF... THEN...". Some examples are:

       IF the temperature in the room is hot THEN open the window
        or switch the fan on

       IF the rate of compression of the engine is low THEN increase
        the oil flow

Relationships (Relations)
Relationships represent the way knowledge objects (such as concepts
and tasks) are related to one another. Important examples include is a
to show classification, part of to show composition, and those used in
various knowledge models such as a process map or state transition
       network. Relationships are often represented as arrows on diagrams.
       From a grammatical perspective, relationships are usually equivalent
       to passive verbs.

K n o w l e d g e a n d S y s t e m Va l i d a t i o n
     Part of the ongoing process is testing.
     You have to assure that two things are
      right, the knowledge and the system.
     The knowledge you get from an expert
      may not be right.
     You may translate it incorrectly or they
      may give it to you incorrectly.
     You can compare with the expert (ask
      them, interview them), and you can
      compare with other epxerts.
     Of course you need to validate the system
      too. You need to check with experts and

Derived Knowledge
     In Rule based systems, derived knowledge
      is knowledge that the user neither puts
      directly into the system, or recieves from
      the system, but the system uses.
     Pragmatically, a WM item that is on the
      then side of one rule and the if side of
      another is derived.
     Why is derived knowledge important? It
      shows that the system is doing more
      reasoning. It shows that it is a deeper
     What is an example of derived knowledge?
     In noughts and crosses, knowledge that
      the opponent has two in a row is derived
      from the board.
     In the second lab, if A then B, if B then C,
      B is derived knowledge.
     If you're doing a RBS for your coursework,
      you'll need derived knowledge to get a first
      or upper second, and it probably would
      help for lower second.
     What would be an example of derived
      knowledge in your domains?
     In class work: write a rule that determines
      that X's have two in a row. Write a rule
      that uses that fact.

R u l e B a s e d Sy s t e m A r c h i t e c t u r e
   This is the runtime architecture
   The user starts the system and interacts
    with it via the user interface.
   This is the command prompt in Clips and
    the session window in Kappa PC.
   The engine is the part of the program that
    actually does stuff.
   It says, run the system, that is, look at
    working memory, see what rules fire, and
    apply them.
   You'll have the same inference engine for
    each rule base.
   The knowledge base is the rules and the
    working memory.
   The rules will remain the same for different
   WM changes for each run and during the
Inference Engine
    The Inference Engine compares the rules
     to working memory.
    It picks a supported rule and fires it.
    There can be more than one supported
     rule, and this is resolved by a conflict
     resolution strategy.
    For example if the rules look like
        o if (X is green) and (X is a fruit) then (X

          is a Watermelon)
        o if (X is red) and (X is a fruit) then (X is

          an Apple)
    and working memory says X1 is green, X2
     is red, X1 is a fruit.
    What rule is applied?
    What happens?
    A rule can be supported more than once.
    A fact can be used to support more than
     one rule.
    Undo the rule and add the working
     memory item X2 is a fruit.
    What rule is applied?
    Now what happens?

Knowledge Base
    The Knowledge Base consists of rules and
     working memory (WM)
    Rules are if then statements
    On the if side you have conditional
     expressions, (e.g. X is green)
    You can have variables in here, in this case
     X is a variable.
    On the then side you usually have
    That is you set or modify working memory
    In Clips you set values by (assert (fact))
    You compare by stating the fact in the if
     part or by using (test (func params)). E.g
     (test (> ?a 3))
    The if part is conjoined by and by default
     and you can use the and or, and not
    Variables cross expression e.g.:
    If (favorite-colour ?Val) then (assert (pick
     Colour ?Val)

Rules vs. Structured Programming
    Structured Programming is like C, C++ and
     Java (pretty much any language you've
     heard of).
    Structured Programming has loops, and
    Are Rule Based Systems the same as a
     bunch of if then else statements?
    No, Execution vs. Branching
    It is a line of statements in C
    With a Rule Based System, each rule can
     apply at each cycle.
    A Rule Based System is essentially a big
     mutually exclusive if then else statement
     inside a loop
    How do you decide which one to apply?

    Rules are also a form of KR.
    (For that matter programs are dynamic
     forms of KR.)
    Rules enable you to derive more facts.
    Rules are more dynamic than WM.
    They are of course of the form if X then Y
    This is similar to a formal logic.

    There are lots of forms of logic.
    Logics are systems that are developed to
     try to quantify what knowledge is.
    That's why Aristotle came up with a system
     2500 years ago.
    The logics we are going to talk about are
     relatively simple and involve the concept of
    For example London is a City is True, and
     Chris is British is False
    (For the pedantic, this really is a
     simplification of the world.)

Semantic Nets
    Semantic Nets were invented by Quillian in
    Quillian was a Psychologist (at UMich) and
     was trying to define the structure of
     human knowledge.
    Semantic Nets are about relations between
    Semantic Nets use a graph structure so
     that concepts are nodes in the graph.
    The concepts are connected by arcs which
     tells the relationship between the concepts.

Semantic Nets
The major idea is that:

      The meaning of a concept comes from its relationship to other
       concepts, and that,
       The information is stored by interconnecting nodes with labelled

Why use this data structure?

       It enables attribute values to be retrieved quickly
            o assertions are indexed by the entities
            o binary predicates are indexed by first argument. E.g.
               team(Mike-Hall , Cardiff).
       Properties of relations are easy to describe .
       It allows ease of consideration as it embraces aspects of object
        oriented programming.

So called because:

       A slot is an attribute value pair in its simplest form.
       A filler is a value that a slot can take -- could be a numeric,
        string (or any data type) value or a pointer to another slot.
       A weak slot and filler structure does not consider the content of
        the representation.

Representation in a Semantic Net

The physical attributes of a person can be represented as in Fig. 9.

Fig. 9 A Semantic Network

These values can also be represented in logic as: isa(person, mammal), instance(Mike-Hall,
person) team(Mike-Hall, Cardiff)

We have already seen how conventional predicates such as lecturer(dave) can be written as
instance (dave, lecturer) Recall that isa and instance represent inheritance and are popular in
many knowledge representation schemes. But we have a problem: How we can have more
than 2 place predicates in semantic nets? E.g. score(Cardiff, Llanelli, 23-6) Solution:
      Create new nodes to represent new objects either contained or
       alluded to in the knowledge, game and fixture in the current
      Relate information to nodes and fill up slots (Fig: 10).

Fig. 10 A Semantic Network for n-Place Predicate

As a more complex example consider the sentence: John gave Mary the book. Here we have
several aspects of an event.

Fig. 11 A Semantic Network for a Sentence

Inference in a Semantic Net
Basic inference mechanism: follow links between nodes.

Two methods to do this:

Intersection search
     -- the notion that spreading activation out of two nodes and
     finding their intersection finds relationships among objects. This
     is achieved by assigning a special tag to each visited node.
       Many advantages including entity-based organisation and fast parallel implementation.
       However very structured questions need highly structured networks.

         -- the isa and instance representation provide a mechanism to
         implement this.
Inheritance also provides a means of dealing with default reasoning. E.g. we could represent:

        Emus are birds.
        Typically birds fly and have wings.
        Emus run.

in the following Semantic net:

Fig. 12 A Semantic Network for a Default Reasoning

In making certain inferences we will also need to distinguish between the link that defines a
new entity and holds its value and the other kind of link that relates two existing entities.
Consider the example shown where the height of two people is depicted and we also wish to
compare them.

We       need   extra    nodes     for    the    concept     as     well    as    its   value.

Fig. 12 Two heights

Special procedures are needed to process these nodes, but without this distinction the analysis
would be very limited.
Fig. 12 Comparison of two heights

Extending Semantic Nets
Here we will consider some extensions to Semantic nets that overcome a few problems (see
Exercises) or extend their expression of knowledge.

Partitioned Networks Partitioned Semantic Networks allow for:

       propositions to be made without commitment to truth.
       expressions to be quantified.

Basic idea: Break network into spaces which consist of groups of
nodes and arcs and regard each space as a node.
Consider the following: Andrew believes that the earth is flat. We can encode the proposition
the earth is flat in a space and within it have nodes and arcs the represent the fact (Fig. 15).
We can the have nodes and arcs to link this space the the rest of the network to represent

Andrew's belief.

Fig. 12 Partitioned network

Now consider the quantified expression: Every parent loves their child To represent this we:

       Create a general statement, GS, special class.
       Make node g an instance of GS.
       Every element will have at least 2 attributes:
           o a form that states which relation is being asserted.
           o one or more forall ( ) or exists ( ) connections -- these
             represent universally quantifiable variables in such
                   statements e.g. x, y in         parent(x)           : child(y)
Here we have to construct two spaces one for each x,y. NOTE: We can express variables as
existentially qualified variables and express the event of love having an agent p and receiver b
for every parent p which could simplify the network (See Exercises).

Also If we change the sentence to Every parent loves child then the node of the object being
acted on (the child) lies outside the form of the general statement. Thus it is not viewed as an
existentially qualified variable whose value may depend on the agent. (See Exercises and Rich
and Knight book for examples of this) So we could construct a partitioned network as in
Fig. 16

Fig. 12   Partitioned network

Production Rules
Production rules are one of the most popular and widely used knowledge representation
languages. Early expert systems used production rules as their main knowledge representation
language. For example, MYCIN, which is also considered one of the first research works in
medical informatics, has production rules as its knowledge representation language.

Production rule system consists of three components, i.e., working memory, rule base and
interpreter. The working memory contains the information that the system has gained about
the problem thus far. The rule base contains information that applies to all the problems that
the system may be asked to solve. The interpreter solves the control problem, i.e., decide
which rule to execute on each selection-execute cycle.

Production rules as a knowledge representation language has the following advantages:

         Naturalness of expression
         Modularity
         Restricted syntax
Disadvantages of production rules as a knowledge representation
language includes:

        Inefficient
        Less expressive

Frames can also be regarded as an extension to Semantic nets. Indeed it is not clear where
the distinction between a semantic net and a frame ends. Semantic nets initially we used to
represent labelled connections between objects. As tasks became more complex the
representation needs to be more structured. The more structured the system it becomes more
beneficial to use frames. A frame is a collection of attributes or slots and associated values
that describe some real world entity. Frames on their own are not particularly helpful but
frame systems are a powerful way of encoding information to support reasoning. Set theory
provides a good basis for understanding frame systems. Each frame represents:

        a class (set), or
        an instance (an element of a class).

Consider the example first discussed in Semantics Nets (Section 6.2.1):


                    isa:                Mammal



                   isa:                 Person


            isa:                Adult-Male








            isa:                Rugby-Player




            instance:                   Back

            Height:             6-0

            Position:                   Centre

            Team:               Cardiff-RFC
                      Team-Colours:                   Black/Blue


                      isa:                 Team


                      Team-size:                      15


Figure: A simple frame system

Here the frames Person, Adult-Male, Rugby-Player and Rugby-Team are all classes and the
frames Robert-Howley and Cardiff-RFC are instances.


       The isa relation is in fact the subset relation.
       The instance relation is in fact element of.
       The isa attribute possesses a transitivity property. This implies:
        Robert-Howley is a Back and a Back is a Rugby-Player who in
        turn is an Adult-Male and also a Person.
       Both isa and instance have inverses which are called subclasses
        or all instances.
       There are attributes that are associated with the class or set
        such as cardinality and on the other hand there are attributes
        that are possessed by each member of the class or set.


It is important that this distinction is clearly understood.

Cardiff-RFC can be thought of as a set of players or as an instance of a Rugby-Team.
If Cardiff-RFC were a class then

      its instances would be players
      it could not be a subclass of Rugby-Team otherwise its elements
       would be members of Rugby-Team which we do not want.

Instead we make it a subclass of Rugby-Player and this allows the players to inherit the
correct properties enabling us to let the Cardiff-RFC to inherit information about teams.

This means that Cardiff-RFC is an instance of Rugby-Team.

BUT There is a problem here:

      A class is a set and its elements have properties.
      We wish to use inheritance to bestow values on its members.
      But there are properties that the set or class itself has such as
       the manager of a team.

This is why we need to view Cardiff-RFC as a subset of one class players and an instance of
teams. We seem to have a CATCH 22. Solution: MetaClasses

A metaclass is a special class whose elements are themselves classes.

Now consider our rugby teams as:
Figure: A Metaclass frame system

The basic metaclass is Class, and this allows us to

       define classes which are instances of other classes, and (thus)
       inherit properties from this class.

Inheritance of default values occurs when one element or class is an instance of a class.

Slots as Objects

How can we to represent the following properties in frames?

       Attributes such as weight, age be attached and make sense.
       Constraints on values such as age being less than a hundred
       Default values
       Rules for inheritance of values such as children inheriting
        parent's names
       Rules for computing values
       Many values for a slot.

A slot is a relation that maps from its domain of classes to its range of values.

A relation is a set of ordered pairs so one relation is a subset of another.

Since slot is a set the set of all slots can be represent by a metaclass called Slot, say.

Consider the following:


                      isa:                 Class

                      instance:                      Class









                      instance:                      SLOT

                      domain:                        Rugby-Team
               range:              Person

               range-constraint:                          (experience x.manager)


               single-valued:               TRUE


               instance:                    SLOT

               domain:                      Physical-Object

               range:              Colour-Set

               single-valued:               FALSE


               instance:                    SLOT

               isa:                Colour

               domain:                      team-player

               range:              Colour-Set

               range-constraint:                    not Pink
                      single-valued:              FALSE


                      instance:                   SLOT

                      domain:                     Rugby-Player

                      range:            { Back, Forward, Reserve }

                      to-compute:                     x.position

                      single-valued:              TRUE

NOTE the following:

      Instances of SLOT are slots
      Associated with SLOT are attributes that each instance will
      Each slot has a domain and range.
      Range is split into two parts one the class of the elements and
       the other is a constraint which is a logical expression if absent it
       is taken to be true.
      If there is a value for default then it must be passed on unless
       an instance has its own value.
      The to-compute attribute involves a procedure to compute its
       value. E.g. in Position where we use the dot notation to assign
       values to the slot of a frame.
      Transfers through lists other slots from which values can be
       derived from inheritance.

Interpreting frames
A frame system interpreter must be capable of the following in order to exploit the frame slot
       Consistency checking -- when a slot value is added to the frame
        relying on the domain attribute and that the value is legal using
        range and range constraints.
       Propagation of definition values along isa and instance links.
       Inheritance of default. values along isa and instance links.
       Computation of value of slot as needed.
       Checking that only correct number of values computed.

See Exercises for further instances of drawing inferences etc. from frames.

And-Or Graphs
Useful for certain problems where

       The solution involves decomposing the problem into smaller
       We then solve these smaller problems.

Here the alternatives often involve branches where some or all must be satisfied before we
can progress.

For example if I want to learn to play a Frank Zappa guitar solo I could (Fig. 2.2.1)

       Transcribe it from the CD. OR
       Buy the ``Frank Zappa Guitar Book'' AND Read it from there.
Note the use of arcs to indicate that one or more nodes must all be satisfied before the parent
node is achieved. To find solutions using an And-Or GRAPH the best first algorithm is used as
a basis with a modification to handle the set of nodes linked by the AND factor.

Inadequate: CANNOT deal with AND bit well.

AO* Algorithm

   1. Initialise the graph to start node
   2. Traverse the graph following the current path accumulating
      nodes that have not yet been expanded or solved
   3. Pick any of these nodes and expand it and if it has no successors
      call this value FUTILITY otherwise calculate only f' for each of the
   4. If f' is 0 then mark the node as SOLVED
   5. Change the value of f' for the newly created node to reflect its
      successors by back propagation.
   6. Wherever possible use the most promising routes and if a node
      is marked as SOLVED then mark the parent node as SOLVED.
   7. If starting node is SOLVED or value greater than FUTILITY, stop,
      else repeat from 2.

[2] What is fuzzy logic?

Fuzzy logic is a superset of conventional (Boolean) logic that has been
extended to handle the concept of partial truth -- truth values between
"completely true" and "completely false". It was introduced by Dr. Lotfi
Zadeh of UC/Berkeley in the 1960's as a means to model the uncertainty
of natural language. (Note: Lotfi, not Lofti, is the correct spelling
of his name.)

[3] Where is fuzzy logic used?
Date: 15-APR-93

Fuzzy logic is used directly in very few applications. The Sony PalmTop
apparently uses a fuzzy logic decision tree algorithm to perform
handwritten (well, computer lightpen) Kanji character recognition.

Most applications of fuzzy logic use it as the underlying logic system
for fuzzy expert systems

What do ya mean fuzzy ??!!
Before illustrating the mechanisms which make fuzzy logic machines work, it is important to
realize what fuzzy logic actually is. Fuzzy logic is a superset of conventional(Boolean) logic
that has been extended to handle the concept of partial truth- truth values between
"completely true" and "completely false". As its name suggests, it is the logic underlying
modes of reasoning which are approximate rather than exact. The importance of fuzzy logic
derives from the fact that most modes of human reasoning and especially common sense
reasoning                are                  approximate                in            nature.
The essential characteristics of fuzzy logic as founded by Zader Lotfi are as follows.

      In fuzzy logic, exact reasoning is viewed as a limiting case of
       approximate reasoning.
      In fuzzy logic everything is a matter of degree.
      Any logical system can be fuzzified
      In fuzzy logic, knowledge is interpreted as a collection of elastic
       or, equivalently , fuzzy constraint on a collection of variables
      Inference is viewed as a process of propagation of elastic

The third statement hence, define Boolean logic as a subset of Fuzzy
Fuzzy Sets
Fuzzy Set Theory was formalised by Professor Lofti Zadeh at the University of California in
1965. What Zadeh proposed is very much a paradigm shift that first gained acceptance in the
Far East and its successful application has ensured its adoption around the world.

A paradigm is a set of rules and regulations which defines boundaries and tells us what to do
to be successful in solving problems within these boundaries. For example the use of
transistors instead of vacuum tubes is a paradigm shift - likewise the development of Fuzzy
Set Theory from conventional bivalent set theory is a paradigm shift.

Bivalent Set Theory can be somewhat limiting if we wish to describe a 'humanistic' problem
mathematically. For example, Fig 1 below illustrates bivalent sets to characterise the
temperature of a room.

The most obvious limiting feature of bivalent sets that can be seen
clearly from the diagram is that they are mutually exclusive - it is not
possible to have membership of more than one set ( opinion would
widely vary as to whether 50 degrees Fahrenheit is 'cold' or 'cool'
hence the expert knowledge we need to define our system is
mathematically at odds with the humanistic world). Clearly, it is not
accurate to define a transiton from a quantity such as 'warm' to 'hot'
by the application of one degree Fahrenheit of heat. In the real world a
smooth (unnoticeable) drift from warm to hot would occur.
This natural phenomenon can be described more accurately by Fuzzy Set Theory. Fig.2 below
shows how fuzzy sets quantifying the same information can describe this natural drift.

The whole concept can be illustrated with this example. Let's talk
about people and "youthness". In this case the set S (the universe of
discourse) is the set of people. A fuzzy subset YOUNG is also defined,
which answers the question "to what degree is person x young?" To
each person in the universe of discourse, we have to assign a degree
of membership in the fuzzy subset YOUNG. The easiest way to do this
is with a membership function based on the person's age.
young(x)           =         {          1,        if           age(x)            <=    20,

(30-age(x))/10,         if         20         <               age(x)             <=    30,

0,                if             age(x)                   >                 30           }

A             graph               of               this                 looks         like:
Given this definition, here are some example values:

Person Age degree of youth
Johan     10       1.00
Edwin     21       0.90
Parthiban 25        0.50
Arosha 26           0.40
Chin Wei 28         0.20
Rajkumar 83          0.00

So given this definition, we'd say that the degree of truth of the
statement "Parthiban is YOUNG" is 0.50.
Note: Membership functions almost never have as simple a shape as age(x). They will at least
tend to be triangles pointing up, and they can be much more complex than that. Furthermore,
membership functions so far is discussed as if they always are based on a single criterion, but
this isn't always the case, although it is the most common case. One could, for example, want
to have the membership function for YOUNG depend on both a person's age and their height
(Arosha's short for his age). This is perfectly legitimate, and occasionally used in practice. It's
referred to as a two-dimensional membership function. It's also possible to have even more
criteria, or to have the membership function depend on elements from two completely
different universes of discourse.

Fuzzy Set Operations.

        The membership function of the Union of two fuzzy sets A and B
        with membership functions     and   respectively is defined as
        the maximum of the two individual membership functions. This
        is called the maximum criterion.
   The Union operation in Fuzzy set theory is the equivalent of the OR operation in
   Boolean algebra.

   The membership function of the Intersection of two fuzzy sets A
   and B with membership functions      and      respectively is
   defined as the minimum of the two individual membership
   functions. This is called the minimum criterion.

   The Intersection operation in Fuzzy set theory is the equivalent of the AND operation
   in Boolean algebra.
       The membership function of the Complement of a Fuzzy set A
       with membership function    is defined as the negation of the
       specified membership function. This is caleed the negation

The Complement operation in Fuzzy set theory is the equivalent of the NOT operation in
Boolean algebra.

The following rules which are common in classical set theory also apply to Fuzzy set theory.

  De Morgans law




 Universe of Discourse
     The Universe of Discourse is the range of all possible values for
     an input to a fuzzy system.
 Fuzzy Set
     A Fuzzy Set is any set that allows its members to have different
     grades of membership (membership function) in the interval
     The Support of a fuzzy set F is the crisp set of all points in the
     Universe of Discourse U such that the membership function of F
     is non-zero.
 Crossover point
     The Crossover point of a fuzzy set is the element in U at which
     its membership function is 0.5.
 Fuzzy Singleton
     A Fuzzy singleton is a fuzzy set whose support is a single point in
     U with a membership function of one.
Fuzzy Rules
Human beings make descisions based on rules. Although, we may not be aware of it, all the
descisions we make are all based on computer like if-then statements. If the weather is fine,
then we may decide to go out. If the forecast says the weather will be bad today, but fine
tommorow, then we make a descision not to go today, and postpone it till tommorow. Rules
associate        ideas        and         relate       one       event         to        another.
Fuzzy machines, which always tend to mimic the behaviour of man, work the same way.
However, the descision and the means of choosing that descision are replaced by fuzzy sets
and the rules are replaced by fuzzy rules. Fuzzy rules also operate using a series of if-then
statements. For instance, if X then A, if y then b, where A and B are all sets of X and Y. Fuzzy
rules    define    fuzzy    patches,    which     is   the   key    idea     in    fuzzy    logic.
A machine is made smarter using a concept designed by Bart Kosko called the Fuzzy
Approximation Theorem(FAT). The FAT theorem generally states a finite number of patches
can cover a curve as seen in the figure below. If the patches are large, then the rules are
sloppy. If the patches are small then the rules are fine.

                                       Fuzzy Patches
In a fuzzy system this simply means that all our rules can be seen as
patches and the input and output of the machine can be associated
together using these patches. Graphically, if the rule patches shrink,
our fuzzy subset triangles gets narrower. Simple enough? Yes, bcause
even novices can build control systems that beat the best math models
of control theory. Naturally, it is math-free system.
Fuzzy                                                                                    Control
Fuzzy control, which directly uses fuzzy rules is the most important application in fuzzy theory.
Using a procedure originated by Ebrahim Mamdani in the late 70s, three steps are taken to
create a fuzzy controlled machine:

1)Fuzzification(Using    membership     functions    to  graphically describe  a    situation)
2)Rule               evaluation(Application             of           fuzzy              rules)
3)Defuzzification(Obtaining          the         crisp        or       actual         results)
As a simple example on how fuzzy controls are constructed, consider the following classic
situation: the inverted pendulum. Here, the problem is to balance a pole on a mobile platform
that can move in only two directions, to the left or to the right. The angle between the
platform and the pendulum and the angular velocity of this angle are chosen as the inputs of
the system. The speed of the platform hence, is chosen as the corresponding output.

Step                                                                                               1
First of all, the different levels of output (high speed, low speed etc.) of the platform is defined
by specifying the membership functions for the fuzzy_sets. The graph of the function is shown

Similary,   the    different   angles   between   the    platform    and    the   pendulum    and...

the      angular        velocities      of    specific      angles         are     also      defined

Note: For simplicity, it is assumed that all membership functions are spreaded equally. Hence,
this explains why no actual scale is included in the graphs.
Step                                                                                              2
The next step is to define the fuzzy rules. The fuzzy rules are mearly a series of if-then
statements as mentioned above. These statements are usually derived by an expert to achieve
optimum          results.     Some          examples       of        these        rules        are:
i) If angle is zero and angular velocity is zero then speed is also zero. ii) If angle is zero and
angular        velocity    is     low       then      the      speed       shall      be       low.
The full set of rules is summarised in the table below. The dashes are for conditions, which
have no rules ascociated with them. This is don eto simplify the situation.


                 negative       negative                        positive      positive
------------                                    zero
                 high           low                             low           high
v negative -----------             negative
                       -----------                              ---------     ---------
high       -                       high
e negative ---------            ---------                       zero          --------
                 negative       negative                        positive      positive
l zero                                          zero
                 high           low                             low           high
o positive                                                      ----------
                 ---------      zero            low                        ---------
low                                                             -
C                ---------      ---------       high            ---------- ---------        ----

An application of these rules is shown using specific values for angle and angular velocities.
The values used for this example are 0.75 and 0.25 for zero and positive-low angles, and 0.4
and 0.6 for zero and negative-low angular velocities. These points sre on the graphs below.
Consider the rule "if angle is zero and angular velocity is zero, the speed is zero". The actual
value belongs to the fuzzy set zero to a degree of 0.75 for "angle" and 0.4 for "angular
velocity". Since this is an AND operation, the minimum criterion is used , and the fuzzy set
zero of the variable "speed" is cut at 0.4 and the patches are shaded up to that area. This is
illustrated                 in                the                  figure                 below.

Similarly, the minimum criterion is used for the other three rule. The following figures show
the result patches yielded by the rule "if angle is zero and angular velocity is negative low, the
speed is negative low", "if angle is positive low and angular velocity is zero, then speed is
positive low" and "if angle is positive low and angular velocity is negative low, the speed is
The    four    results   overlaps    and     is   reduced     to   the    following    figure

Step 3: The result of the fuzzy controller as of know is a fuzzy set (of speed). In order to
choose an appropriate representative value as the final output(crisp values), defuzzification
must be done. There are numerous defuzzification methods, but the most common one used is
the     center      of      gravity      of      the     set      as      shown       below.
Fuzzy logic has rapidly become
one of the most successful of
today's technologies for
developing sophisticated control
systems. The reason for which is
very simple. Fuzzy logic addresses
such applications perfectly as it
resembles human decision making
with an ability to generate precise
solutions from certain or
approximate information. It fills an
important gap in engineering
design methods left vacant by
purely mathematical approaches
(e.g. linear control design), and
purely logic-based approaches
(e.g. expert systems) in system

While other approaches require accurate equations to model real-world
behaviors, fuzzy design can accommodate the ambiguities of real-
world human language and logic. It provides both an intuitive method
for describing systems in human terms and automates the conversion
of those system specifications into effective models.
What                            does                           it                        offer?
The first applications of fuzzy theory were primaly industrial, such as process control for
cement kilns. However, as the technology was further embraced, fuzzy logic was used in more
useful applications. In 1987, the first fuzzy logic-controlled subway was opened in Sendai in
northern Japan. Here, fuzzy-logic controllers make subway journeys more comfortable with
smooth braking and acceleration. Best of all, all the driver has to do is push the start button!
Fuzzy logic was also put to work in elevators to reduce waiting time. Since then, the
applications of Fuzzy Logic technology have virtually exploded, affecting things we use
Take for example, the fuzzy washing machine . A load of clothes in it and press start, and the
machine begins to churn, automatically choosing the best cycle. The fuzzy microwave, Place
chili, potatoes, or etc in a fuzzy microwave and push single button, and it cooks for the right
time at the proper temperature. The fuzzy car, manuvers itself by following simple verbal
instructions from its driver. It can even stop itself when there is an obstacle immedeately
ahead using sensors. But, practically the most exciting thing about it, is the simplicity involved
in operating it.

                                       CHAPTER 3

                             PROPOSITIONAL LOGIC
This chapter is intended for the reader who is more deeply interested in automated logic. Most
individuals may skip this chapter and proceed directly on to Chapter 4.
3.1 Introduction
A logic is a mathematical tool for constructing and manipulating symbolic expressions. A logic
is like a computer language. In this chapter propositional and predicate logic are mainly

A logic consists of

    1. A formal system of representation.
    2. Syntax of language that describes how to make sentences.
    3. Semantics of language that describe the meaning and
       relationship between sentences.
    4. Proof Theory: rules for deducing the entailment of a sentence.

3.2 Propositional Logic
Propositional logic is a representational language that makes the assumption that the world
can be represented solely in terms of propositions that are true or false. Propositional logic
considers each sentence as a proposition. The syntax and semantics of such a logic are given

3.2.1 Syntax for Propositional Logic
The syntax for propositional logic is quite simple. Symbols for propositional logic are the
propositional constants True and False, propositional symbols such as P, Q, R, S and logic
connectives such as         ,   ,      ,¬ and   . The following rules are used while constructing a

    1. The logical constants True or False are themselves a sentence
    2. Propositional symbols such as P, Q, R are themselves a
    3. A sentence can be formed by using the following symbols.

                       This is called conjunction and is used for
                       constructing a sentence like P Q
                       This is called disjunction and is used for
                       constructing a sentence like P Q
                       This is called not and is used in constructing
         ¬ (negation)
                       sentence like ¬P
                       This is used in constructing sentences like P  Q
         (implication) and is equivalent to ¬P Q
                       These are used in constructing a sentence like P
         (equivalent) Q which implies P is equivalent to Q
Example 3.1

An example syntax for propositional logic is given by the following two sentences:

                "The road is closed. If the road is closed, then the traffic is blocked"

These sentences maybe represented in propositional logic.

                        "The road is closed" is represented by a proposition, P.
                      "The traffic is blocked" is represented by a proposition, Q.

Then, the second sentence "If the road is blocked, then the traffic is closed" is represented by
P   Q.

3.2.2 Semantics in Propositional Logic
The semantic of a sentence gives the meaning of the sentence. The terms used for semantics
of a language are given below.

         A sentence is valid if it is true for all interpretations, e.g. P Q
           Q is a valid sentence as can be seen from the truth table
                                                P Q   Q or
                               P         Q
                                               (¬(P Q ) Q)

                               F         F                T

                               F         T                T

                               T         F                T

                               T         T                T
      An interpretation of a formula or sentence under which the
      formula is true is called a model of that formula.
      A formula is said to be unsatisfiable if it is false for every
      A formula is said to be satisfiable if it is true for some
      interpretation, i.e., there exists a model.
      If there exists an interpretation when the knowledge base (KB)
      is true and a proposition, f, is also true, then we say KB entails f
         (KB       f).

3.3 Rules of Inference
There are mainly two inference procedures:

    1. The model theory method.
    2. The proof theory method.

3.3.1 Model Theory Method
The Model Theory method uses the entailment procedure. To find out whether a new sentence

or proposition is true for some sentence given in the knowledge base, we check whether KB

Example 3.2

For example, if P       Q and P are the facts in the knowledge base, then to check whether KB

  f, the truth table for every interpretation of the proposition can be written as

                                            PQP        Q
                                            F F    T
                                            F T    T
                                            T F    F
                                            T T    T
Then, check whether Q is true when every sentence in the database is true. If every sentence
is true, then KB entails Q. In the example above, Q is entailed by KB because it is true when P
and P     Q are true (as shown in the last row of the table above).

The model theory method is exponential in the number of proposition because we need to
write 2n interpretations for n propositions. The proof theory is used because it is much faster
and reasonable than the model theory method.

3.3.2 Proof Theory Method
The inference rules of the proof theory method are as follows:

         Rules                Sentence in KB               Sentence that is inferred

   Modus Ponens                      P, P    Q                                Q

  And Elimination             A1 A2 ........ AN                    A1, A2 ,........AN

 And Introduction              A1, A2 ,........AN                A1 A2 ........ AN

  Or Introduction             A1 A2 ........ AN                  A1 A2 ........ AN

 Double Negation                      ¬¬A                                     A

  Unit Resolution                   A1 A2, ¬A2                                A1

      Resolution               A1    A2, ¬A2      A3                     A1        A3

Example 3.3
Let's take an example of the proof theory method. If P, P   Q are the facts in the database
(same as Example 3.2), then using modus ponens Q is also true
Example 3.4
If A   B and ¬B    C is given, then we can prove A   C using the resolution principle (given in
the table above)

3.4 Automated Theorem Proving in Propositional Logic
In automated theorem proving all the sentences in the knowledge base are presented in the
form of P1 P2 ...... Pn   Q where Pi and Q are the propositional variables. The facts in the
knowledge base are represented without antecedents. They are written as propositional

3.5 Goal Reduction Method
Goal reduction is a method for proving conjunction of propositional variables from the given
rules. The conjuncts of the conjunctions are the goal. Suppose, Q is the goal and rule is P1
P2 ...... Pn     Q, then proving P1 P2 ...... Pn proves Q.

3.6 Algorithms for Theorem Proving
Let n be the number of propositions in the antecedents of the rules in the knowledge base that
are to be proved, and P1, P2, ....,Pk, ....Pn, Q1, Q2,......,Qm are the propositions.

   1. If n=0, then stop as the theorem is proved.
   2. If n>0, then choose some k such that 1 k n.
   3. If Pk is a fact in the knowledge base, then (recursively) try to
      prove P1 P2 ...... Pk-1 Pk+1.... Pn.
   4. If Pk is not a fact in the knowledge base, then try to find some
      rule in the knowledge base of the form Q1 Q2 ...... Qm         Pk
      such that Q1 Q2 ........ Qm P1 P2 ........ Pk+1 Pn can be
      (recursively) proved.
   5. If Pk is not a fact and an appropriate rule cannot be found in step
      4, then stop and failure.
Example 3.5
The above algorithm can be explained in a simple manner. Assume the following two rules and
three facts are given in the knowledge base

       Rule 1: P Q R
       Rule 2: S T Q
       Facts: P, S, T

If R is to be proved, then the automated theorem proving procedure works as follows:

            o   As R is not a fact in the KB, it is necessary to prove the
                first rule (P Q R), i.e., step 4 above. To prove Rule 1, P
                and Q need to be proved. Since P is already a fact in the
                knowledge base, hence only Q needs to be proved, i.e.,
                step 3 above.
            o   Because Q is not a fact in the database, the algorithm
                needs to prove that S T Q according to Rule 2 using step
                4 again. As S and T are facts in the knowledge base, Q is
                proved via step 3.
            o   Since both P and Q are now proved, the algorithm stops as
                the theorem is proved in this example pursuant to step 1.

3.7 References

    1. T. Dean, J. Allen, Y. Aloimonos, Artificial Intelligence: Theory and
       Practice, Benjamin/Cummings Publishing Company, Inc, 1995.
    2. S. J. Russell, P. Norvig, Artificial Intelligence: A Modern
       Approach, Englewood Cliffs, N.J.: Prentice Hall, 1995.

This page was last updated on

                                       CHAPTER 4

                         FUZZIFICATION TECHNIQUE

4.1 Introduction
Fuzzification is the process of changing a real scalar value into a fuzzy value. This is achieved
with the different types of fuzzifiers. There are generally three types of fuzzifiers, which are
used for the fuzzification process; they are

    1. singleton fuzzifier,
    2. Gaussian fuzzifier, and
    3. trapezoidal or triangular fuzzifier.

4.2 Trapezoidal / Triangular Fuzzifiers

For the simplicity of discussion only the triangular and trapezoidal fuzzifiers are presented

Fuzzification of a real-valued variable is done with intuition, experience and analysis of the set
of rules and conditions associated with the input data variables. There is no fixed set of
procedures for the fuzzification.

Example 4.1
Consider a class with 10 students of different heights in the range of 5 feet to 6 feet 2 inches.
Intuition is used to fuzzify this scalar quantity into the fuzzy or linguistic variables tall, short
and medium height. The membership function associated with each scalar quantity as defined
by intuition is




where h is the height, and subscript s denotes short, m denotes medium and t denotes tall. A
graphical representation of the membership function of height is shown in Figure 4.1.
                     Figure 4.1 Membership functions for student height

Table 4.1 gives the height of the 10 students with the membership function associated with
each fuzzy variable, i.e., tall, short and medium for each student. Let's consider a specific
student: Edward. From Equations 4.1, 4.2 and 4.3, or Table 4.1 the membership value of each
fuzzy set for Edward is determined as

                       µs(5.4') =      µm(5.4') =       µt(5.4') =
                       0                  0.5                    0
It can be inferred from above result that Edward is medium by 50 %, short by 0 %, and tall by
0 %.

                       Table 4.1 Membership functions of the height

                          Student       Height
            Student                                  µshort   µmedium      µtall
                           Name         (feet)

                 1           John          5.4         0         0.5        0

                 2          Cathy          5.8         0          0         1

                 3           Lisa          6.0         0          0         1

                 4           Ajay          5.0         1          0         0

                 5           Ram           5.7         0          0        0.5

                 6         Edward          5.4         0         0.5        0

                 7           Peter         5.2         1          0         0

                 8          Victor         5.0         1          0         0

                 9           Chris         6.2         0          0         1
                 10           Sam            5.9          0          0          1
In general, the triangular membership function can be specified from the formula below:


where L and R are the left and right bounds, respectively, and C is the center of the symmetric
triangle as shown in Figure 4.2a. Likewise, the trapezoidal membership may be expressed as


where L and U are the lower and upper bounds, respectively, C is the center, and W is the
width of the top side of the symmetric trapezoid as shown in Figure 4.2b.

               a. Triangular
                                                               b. Trapezoidal
                          Figure 4.2 Common membership functions

Example 4.2
To demonstrate the implementation of these two functions, an Excel spreadsheet
(Fuzzification.xls) has been created for the reader to download and experiment with. The Excel
file imbeds the formulae for computing membership values (µ) for both the triangular and
trapezoidal fuzzifiers. The user may define the boundaries of the functions and enter
hypothetical x values from which Excel will calculate the corresponding membership value.

4.3 Remarks
The fuzzification of the input variable should be realistic. Experience and different procedures
should be followed while designing a large fuzzy system for the realistic and accurate output.
The wrong fuzzification of the input variable(s) might cause instability and error in the system.
4.4 Reference

    1. L. H. Tsoukalas, R. E. Uhrig, Fuzzy and Neural Approaches in
       Engineering, John Wiley & Sons, Inc. 1993.

This page was last updated on

                                        CHAPTER 5

                     FUZZY RULES AND IMPLICATION

5.1 Introduction
Fuzzy systems are built to replace the human expert with a machine using the logic a human
would use to perform the tasks. Suppose we ask someone how hot it is today. He may tell us
that it is hot, moderately hot or cold. He cannot tell us the exact temperature. Unlike classical
logic which can only interpret the crisp set such as hot or cold, fuzzy logic has the capability to
interpret the natural language. Thus, fuzzy logic can make human-like interpretations and is a
very useful tool in artificial intelligence, machine learning and automation. Fuzzy logic
operates on the basis of rules which are expressed in the form of If-Then constructs, also
know as horn clauses.

The concept of linguistic variable was introduced to process the natural language. The
linguistic variable discussed later in Example 5.1 is temperature. The linguistic variable can
take the verbal values such as hot, moderately hot or cold. The terms temperature is hot and
temperature is cold and temperature is moderate are known as fuzzy propositions.

5.2 Fuzzy Proposition
A fuzzy proposition can be an atomic or compound sentence. For example

"Temperature is hot" is an atomic fuzzy proposition.

"Temperature is hot and humidity is low" is a compound fuzzy proposition.

Compound fuzzy relations are expressed with fuzzy connectives such as and, or and

5.3 Syntax for IF and THEN rules
The fuzzy rules are written as

If <fuzzy proposition> then <fuzzy proposition>

The fuzzy proposition can be atomic or compound.
5.4 Method of Implication
The If-Then rules can be interpreted in classical logic by the implication operators. This was
also discussed in Chapter 3. Suppose there is a statement such as "If a Then b", then the
classical set represents this by a   b. The truth table for this rule can be given as

                                                      a        b
                                     a         b          or

                                     F         F          T

                                     F         T          T

                                     T         F          F

                                     T         T          T
The implication operator can also be written as

The above equivalence can easily be shown with the above truth table.

As discussed earlier, the If-Then rules for fuzzy logic can be written as If <fuzzy proposition>
Then <fuzzy proposition>. The propositional variables a and b are replaced by fuzzy
propositions, and the implication can be replaced by fuzzy union, fuzzy intersection and fuzzy
complement. There are many fuzzy implications. Only the two most important fuzzy
implications are discussed here.

5.5 Mamdani Min Implication
Mamdani proposed a fuzzy implication rule for fuzzy control in 1977. It is a simplified version
of Zadeh implication operator (Zadeh, 1973). The Mamdani fuzzy logic operator is given as

The above rule is clarified in the example below.

Example 5.1
Let temperature be the fuzzy variable, and let one of the rules be

                  "If the temperature is hot or temperature is moderately hot,
                             then the ice cream parlor is crowded."

Here the propositions are "temperature is hot or temperature is moderately hot" and "ice
cream parlor is crowded". The linguistic variables are "temperature" and "ice cream parlor".
The linguistic values for temperature are hot, moderately hot and cold. The membership
function for temperature in the universe of discourse, U, is given below



The ice cream parlor variable can take the linguistic values crowded and unfilled. The
membership function in the universe of discourse, V, is given as



where c represent crowded and nc represent unfilled.

The plots for the membership functions of temperature and number of customers in the ice
cream palor are shown in Figures 5.1 and 5.2, respectively.

                      Figure 5.1 Membership function of temperature
                  Figure 5.2 Membership function of number of customers

To apply the Mamdani implication rule for the above example, the following rule is applied

   1. The or connective is replaced with the max (union) operator.
   2. The Maximum of the two membership functions is evaluated for
      the antecedent part of the fuzzy rules.
   3. The Mamdani Implication operator (i.e., min operator) is applied
      between the resulting antecedent membership function and the
      consequent membership function.

Suppose the temperature is 75 °F. The or connective is replaced with the union operator.
Using the fuzzification union operator of Equation 2.7 yields

                     µTemp(75 F) = µHot    µModeratelyHot = µHot   µModeratelyHot

                                 = max[µHot(75 F), µModeratelyHot(75 F)]

                                          = max

                                          = max[0.167, 0.833]

                                          = 0.833

where µHot(75 F) and µModeratelyHot(75 F) are computed using Equations 5.3 and 5.4,
respectively. This is shown graphically in Figure 5.3.
                  Figure 5.3 Application of union operator for the given rule

The Mamdani implication operator of Equation 5.2

The Mamdani implication operator of Equation 5.2 is now applied to the rule antecedent and
the rule consequent, which in this example is a crowded ice cream parlor.

                                     [µTemp(75 F), µc(Number of Customers)]

                                   = µTemp(75 F)     µc(Number of Customers)

                                   = min[µTemp(75 F), µc(Number of Customers)]

                                  = min [0.833, µc(Number of Customers)]

where µc(Number of Customers) is shown in Figure 5.2. The dotted line in Figure 5.4 is the
output after the Mamdani implication rule is applied, that is, at each point along the ordinate
axis the minimum of the crowded membership function and the value of µTemp(75 F) is taken.

        Figure 5.4 Membership function of customer after Mamdani implication rule.

5.6 Larsen Product Implication
The Larsen product implication is given by

It uses the arithmetic product between the two membership functions in the universe of
discourses U and V.

Example 5.2
Here the Larsen implication rule is applied to the data of Example 5.1. The overall antecedent
of the fuzzy rule is the maximum of the two fuzzy propositions of the antecedent as calculated
in Example 5.1. The Larsen product implication rule can be expressed as

                                      [µTemp(75 F), µncust(y)]

                                    = µTemp(75 F) · µc(Number of Customers)

                                   = 0.833 · µc(Number of Customers)

where µc(Number of Customers) is shown in Figure 5.2. The dotted line in Figure 5.5 is the
output after the Larsen implication rule is applied, that is, at each point along the ordinate the
product of the temperature antecedent and crowded ice cream palor membership function is

                  Figure 5.5 Larsen Implication rule applied to Example 5.2.

The Larsen implication is computationally difficult than the Mamdani implication rule. In
Example 5.1 and Example 5.2 only one rule has been considered for the case of simplicity.
There may be many rules for one fuzzy system. In these cases, either Mamdani implication or
Larsen implication rule is applied for each rule. The resultant output from the implication rule
is then aggregated and defuzzified to obtain the result. The aggregation and defuzzification is
discussed in Chapter 6.

5.7 Remarks
The choice of fuzzy implication rule is very important while designing a fuzzy control system.
Only the two most commonly used implication operators have been considered here for
simplicity. There are however many more implication operators that can be applied while
designing a fuzzy control system.
                                       CHAPTER 6

                       DEFUZZIFICATION TECHNIQUE

6.1 Introduction
Fuzzy logic is a rule-based system written in the form of horn clauses (i.e., if-then rules).
These rules are stored in the knowledge base of the system. The input to the fuzzy system is a
scalar value that is fuzzified. The set of rules is applied to the fuzzified input. The output of
each rule is fuzzy. These fuzzy outputs need to be converted into a scalar output quantity so
that the nature of the action to be performed can be determined by the system. The process
of converting the fuzzy output is called defuzzification. Before an output is defuzzified all the
fuzzy outputs of the system are aggregated with an union operator. The union is the max of
the set of given membership functions and can be expressed as

There are many defuzzification techniques but primarily only three of them are in common
use. These defuzzification techniques are discussed below in detail.

6.2 Maximum Defuzzification Technique
This method gives the output with the highest membership function. This defuzzification
technique is very fast but is only accurate for peaked output. This technique is given by
algebraic expression as

                                        for all x     X                             (6.2)
where x* is the defuzzified value. This is shown graphically in Figure 6.1.

                      Figure 6.1 Max-membership defuzzification method

6.3 Centroid Defuzzification Technique
This method is also known as center of gravity or center of area defuzzification. This technique
was developed by Sugeno in 1985. This is the most commonly used technique and is very
accurate. The centroid defuzzification technique can be expressed as

where x* is the defuzzified output, µi(x) is the aggregated membership function and x is the
output variable. The only disadvantage of this method is that it is computationally difficult for
complex membership functions. This method is illustrated in Example 8.3.

6.4 Weighted Average Defuzzification Technique
In this method the output is obtained by the weighted average of the each output of the set of
rules stored in the knowledge base of the system. The weighted average defuzzification
technique can be expressed as


where x* is the defuzzified output, mi is the membership of the output of each rule, and wi is
the weight associated with each rule. This method is computationally faster and easier and
gives fairly accurate result. This defuzzification technique is applied in fuzzy application of
signal validation in Example 7.3 and fuzzy application on power.

6.5 Reference

    1. T. J. Ross, Fuzzy Logic with Engineering Applications, McGraw-
       Hill, Inc, 1995.

This page was last updated on

What Is A Neural Network?
: "...a computing system made up of a number of simple, highly interconnected processing
elements, which process information by their dynamic state response to external inputs.

ANNs are processing devices (algorithms or actual hardware) that are loosely modeled after
the neuronal structure of the mamalian cerebral cortex but on much smaller scales. A large
ANN might have hundreds or thousands of processor units, whereas a mamalian brain has
billions of neurons with a corresponding increase in magnitude of their overall interaction and
emergent behavior. Although ANN researchers are generally not concerned with whether their
networks accurately resemble biological systems, some have. For example, researchers have
accurately simulated the function of the retina and modeled the eye rather well.

The Basics of Neural Networks
Neural neworks are typically organized in layers. Layers are made up of a number of
interconnected 'nodes' which contain an 'activation function'. Patterns are presented to the
network via the 'input layer', which communicates to one or more 'hidden layers' where the
actual processing is done via a system of weighted 'connections'. The hidden layers then link
to an 'output layer' where the answer is output as shown in the graphic below.

Most ANNs contain some form of 'learning rule' which modifies the weights of the connections
according to the input patterns that it is presented with. In a sense, ANNs learn by example as
do their biological counterparts; a child learns to recognize dogs from examples of dogs.

Although there are many different kinds of learning rules used by neural networks, this
demonstration is concerned only with one; the delta rule. The delta rule is often utilized by the
most common class of ANNs called 'backpropagational neural networks' (BPNNs).
Backpropagation is an abbreviation for the backwards propagation of error.

With the delta rule, as with other types of backpropagation, 'learning' is a supervised process
that occurs with each cycle or 'epoch' (i.e. each time the network is presented with a new
input pattern) through a forward activation flow of outputs, and the backwards error
propagation of weight adjustments. More simply, when a neural network is initially presented
with a pattern it makes a random 'guess' as to what it might be. It then sees how far its
answer was from the actual one and makes an appropriate adjustment to its connection
weights. More graphically, the process looks something like this:
Note also, that within each hidden layer node is a sigmoidal activation function which polarizes
network           activity         and           helps          it          to          stablize.

Backpropagation performs a gradient descent within the solution's vector space towards a
'global minimum' along the steepest vector of the error surface. The global minimum is that
theoretical solution with the lowest possible error. The error surface itself is a hyperparaboloid
but is seldom 'smooth' as is depicted in the graphic below. Indeed, in most problems, the
solution space is quite irregular with numerous 'pits' and 'hills' which may cause the network
to settle down in a 'local minum' which is not the best overall solution.
Since the nature of the error space can not be known a prioi, neural network analysis often
requires a large number of individual runs to determine the best solution. Most learning rules
have built-in mathematical terms to assist in this process which control the 'speed' (Beta-
coefficient) and the 'momentum' of the learning. The speed of learning is actually the rate of
convergence between the current solution and the global minimum. Momentum helps the
network to overcome obstacles (local minima) in the error surface and settle down at or near
the                                     global                                     miniumum.

Once a neural network is 'trained' to a satisfactory level it may be used as an analytical tool on
other data. To do this, the user no longer specifies any training runs and instead allows the
network to work in forward propagation mode only. New inputs are presented to the input
pattern where they filter into and are processed by the middle layers as though training were
taking place, however, at this point the output is retained and no backpropagation occurs. The
output of a forward propagation run is the predicted model for the data which can then be
used            for           further            analysis           and           interpretation.

It is also possible to over-train a neural network, which means that the network has been
trained exactly to respond to only one type of input; which is much like rote memorization. If
this should happen then learning can no longer occur and the network is refered to as having
been "grandmothered" in neural network jargon. In real-world applications this situation is not
very useful since one would need a separate grandmothered network for each new kind of

How Do Neural Networks Differ From Conventional
To better understand artificial neural computing it is important to know first how a
conventional 'serial' computer and it's software process information. A serial computer has a
central processor that can address an array of memory locations where data and instructions
are stored. Computations are made by the processor reading an instruction as well as any
data the instruction requires from memory addresses, the instruction is then executed and the
results are saved in a specified memory location as required. In a serial system (and a
standard parallel one as well) the computational steps are deterministic, sequential and
logical, and the state of a given variable can be tracked from one operation to another.

In comparison, ANNs are not sequential or necessarily deterministic. There are no complex
central processors, rather there are many simple ones which generally do nothing more than
take the weighted sum of their inputs from other processors. ANNs do not execute programed
instructions; they respond in parallel (either simulated or actual) to the pattern of inputs
presented to it. There are also no separate memory addresses for storing data. Instead,
information is contained in the overall activation 'state' of the network. 'Knowledge' is thus
represented by the network itself, which is quite literally more than the sum of its individual

What Applications Should Neural Networks Be Used For?
Neural networks are universal approximators, and they work best if the system you are using
them to model has a high tolerance to error. One would therefore not be advised to use a
neural network to balance one's cheque book! However they work very well for:

capturing associations or discovering regularities within a set of patterns; where the volume,
number of variables or diversity of the data is very great; the relationships between variables
are vaguely understood; or, the relationships are difficult to describe adequately with
conventional approaches.

What Are Their Limitations?
There are many advantages and limitations to neural network analysis and to discuss this
subject properly we would have to look at each individual type of network, which isn't
necessary for this general discussion. In reference to backpropagational networks however,
there   are    some    specific    issues   potential   users   should    be   aware   of.

Backpropagational neural networks (and many other types of networks) are in a sense the
ultimate 'black boxes'. Apart from defining the general archetecture of a network and perhaps
initially seeding it with a random numbers, the user has no other role than to feed it input and
watch it train and await the output. In fact, it has been said that with backpropagation, "you
almost don't know what you're doing". Some software freely available software packages
(NevProp, bp, Mactivation) do allow the user to sample the networks 'progress' at regular time
intervals, but the learning itself progresses on its own. The final product of this activity is a
trained network that provides no equations or coefficients defining a relationship (as in
regression) beyond it's own internal mathematics. The network 'IS' the final equation of the

Backpropagational networks also tend to be slower to train than other types of networks and
sometimes require thousands of epochs. If run on a truly parallel computer system this issue
is not really a problem, but if the BPNN is being simulated on a standard serial machine (i.e. a
single SPARC, Mac or PC) training can take some time. This is because the machines CPU must
compute the function of each node and connection separately, which can be problematic in
very large networks with a large amount of data. However, the speed of most current
machines is such that this is typically not much of an issue.

What Are Their Advantages Over Conventional Techniques?
Depending on the nature of the application and the strength of the internal data patterns you
can generally expect a network to train quite well. This applies to problems where the
relationships may be quite dynamic or non-linear. ANNs provide an analytical alternative to
conventional techniques which are often limited by strict assumptions of normality, linearity,
variable independence etc. Because an ANN can capture many kinds of relationships it allows
the user to quickly and relatively easily model phenomena which otherwise may have been
very difficult or imposible to explain otherwise.

What and why?
Neural Networks: a bottom-up attempt to model the functionality of the brain.

Two main areas of activity:

      Biological
          o Try to model biological neural systems
      Computational
          o Artificial neural networks are biologically inspired
            butnot necessarily biologically plausible
          o So may use other terms: Connectionism, Parallel
            Distributed Processing, Adaptive Systems Theory.


Neural Networks are inherently parallel and naturally amenable to expression in a parallel
notation and implementation on parallel hardware.

Capacity for Adaptation
In general, neural systems are capable of learning.

Some networks have the capacity to self-organise, ensuring their stability as dynamic

A self-organising network can take account of a change in the problem that it is solving, or
may learn to resolve the problem in a new manner.
Distributed Memory
In neural networks 'memory' corresponds to an activation map of the neurons. Memory is thus
distributed over many units giving resistance to noise.

In distributed memories, such as neural networks, it is possible to start with noisy data and to
recall the correct data.

Fault Tolerance
Distributed memory is also responsible for fault tolerance.

In most neural networks, if some PEs are destroyed, or their connections altered slightly, then
the behaviour of the network as a whole is only slightly degraded.

The characteristic of graceful degradation makes neural computing systems extremely well
suited for applications where failure of control equipment means disaster.

Capacity for Generalisation
Designers of Expert Systems have difficulty in formulation rules which encapsulate an experts
knowledge in relation to some problem.

A neural system may learn the rules simply from a set of examples.

The generalisation capacity of a neural network is its capacity to give a satisfactory response
for an input which is not part of the set of examples on which it was trained.

The capacity for generalisation is an essential feature of a classification system

Certain aspects of generalisation behaviour are interesting because they are intuitively quite
close to human generalisation.

Ease of Construction
Computer simulations of small applications can be implemented relatively quickly.


       Neural systems are inherently parallel but are normally
        simulated on a sequential machines.
           o Processing time can rise quickly as the size of the
              problem grows - The Scaling Problem
           o However, a direct hardware approach would lose the
              flexibility offered by a software implementation.
           o In consequence, neural networks have been used to
              address only small problems.
       The performance of a network can be sensitive to the
        quality and type of preprocessing of the input data.
       Neural networks cannot explain the results they obtain;
        their rules of operation are completely unknown.
       Performance is measured by statistical methods giving
        rise to distrust on the part of potential users.
      Many of the design decisions required in developing an
       application are not well understood.

                                      CHAPTER 2


2.1 Introduction
Fuzzy logic is the comprehensive form of classical logic. In this chapter classical logic and
fuzzy logic are discussed and the distinction between them analyzed.

Fuzzy logic is the superset of classical logic with the introduction of "degree of membership."
The introduction of degree of membership allows the input to interpolate between the crisp
set. The operators in both logic are similar except that their interpretation differs.

2.2 Classical Logic
Let X be the universe of discourse. Let the elements contained in X be defined by x. Let us
consider A and B to be the sets which contain the element in the universe of discourse, X. The
basic operators in classical theory are

2.3 Properties of Classical Sets
The important set operators and relations include:
2.4 Mapping of Classical Set to Fuzzy Set
Classical logic interpolates the input into a crisp set. Every element in the universe of
discourse, X, either belongs to a set or does not belong to the set. For example, the element
in the universe of discourse, X, belongs to the set A or does not belong to A can be
represented by the function


The above function is also called the characteristic function. The output is 1 if the element, x,
belongs to set A, and 0 if the element, x, does not belong to set A.

2.5 Fuzzy Sets
Unlike classical set theory that classifies the elements of the set into crisp set, fuzzy set has an
ability to classify elements into a continuous set using the concept of degree of membership.
The characteristic function or membership function not only gives 0 or 1 but can also give
values between 0 and 1.

Example 2.1
Consider the outside ambient temperature. Classical set theory can only classify the
temperature as hot or cold (i.e., either 1 or 0). It cannot interpret the temperature between
20 °F and 100 °F. In other words, the characteristic function for the classical logic for the
above example is given by


The boundary 50 °F is taken because classical logic cannot interpret intermediate values.

On the other hand, fuzzy logic solves the above problem with a membership function as given


The above membership function is shown in Table 2.1. A graph of the membership function for
the fuzzy temperature variable is shown in Figure 2.1. The degree of coldness is taken as the
complement of the degree of hotness.
      Table 2.1 Membership function of temperature
                  Temperature           Degree of          Degree of
                     (°F)                Hotness           Coldness

                         20                    0                 1

                         30                 0.13               0.87

                         40                 0.25               0.75

                         50                 0.375             0.625

                         60                  0.5                0.5

                         70                 0.625             0.375

                         80                 0.75               0.25

                         90                 0.875             0.125

                         100                   1                 0

      Figure 2.1 Membership function for the degree of hotness and degree of coldness

The degree of hotness for 30 °F is 0.13 and degree of coldness for 30 °F is 0.87. This means
that 30 °F is hot by 13 percent and cold by 87 percent.

2.6 Fuzzy Set Representation
The common method of representing fuzzy set is

where x is an element in X and µA(x) is the membership function of set A which defines the
membership of fuzzy set A in the universe of discourse, X.
The term {x, µA(x)} is the singleton pair. For the temperature example described above the
fuzzy set can be represented as

hot = {(20, 0), (30, 0.125), (40, 0.25), (50, 0.375), (60, 0.5), (70, 0.625), (80, 0.75), (90,
0.875), (100, 1)}.

In the above fuzzy set the third element of the set hot denotes that the temperature 40 °F
belongs to set hot by 0.25.

An alternative method to represent the singleton function is


The above representation is for the discrete universe of discourse. The fuzzy set
representation for the continuous membership function is given by


2.7 Fuzzy Operators
Some of the most important fuzzy logic operators are given below.

     The union is the maximum degree of membership of sets A and


     The intersection is the minimum degree of membership of sets A
     and B.


    The complement of the membership of set A is
Product of two fuzzy sets
     The product of two fuzzy sets in the same universe of discourse
     is the new fuzzy set A·B with a membership function that equals
     product of the membership function of A and the membership
     function of B.
                       µA·B(x) = µA(x)·µB(x)                      (2.10)
Multiplying a fuzzy set by a crisp number
     When a fuzzy set is multiplied by a crisp number, then its
     membership function is given by
                                 µa·A(x) = a µA(x)                              (2.11)
Power of fuzzy set
       The membership function of               where      is a positive number is
       defined by
Concentration of the fuzzy set
      The concentration of the fuzzy set over the universe of discourse
      X is given by
                         µCON(A)(x) = [µA(x)]2                       (2.13)
      Concentrating the fuzzy set decreases the fuzziness. That means
      the membership function can interpolate less input between the
      crisp set.
Dilation of the fuzzy set
      The dilation of the fuzzy set over the universe of discourse X is
      given by
     Dilating the fuzzy set increases the fuzziness of the set. That
     means the membership function can interpolate more input
     between the crisp set.
Empty fuzzy set
     If the fuzzy set is empty (Ø), then the membership function is
Normal fuzzy set
     The fuzzy set is called normal if there is at least one element x0
     in the universe of discourse X where the membership function
     equals 1.
                              µA(x0) = 1                             (2.16)
Equality of fuzzy sets
     The fuzzy sets A and B are equal if the fuzzy set of A is equal to
     the fuzzy set B.
                            µA(x) = µB(x)                            (2.17)

Example 2.2
Let us consider two fuzzy sets A and B with membership functions

The plot of the membership functions µA(x) and µB(x) are shown in Figure 2.2. The union,
intersection, complements, concentration and dilation of these two membership functions are
shown in Figures 2.3 through 2.7, respectively.

             Figure 2.2 Membership functions µA(x) and µB(x) of Example 2.2.

           Figure 2.3 Union of membership functions,            = max (µA, µB).
Figure 2.4 Intersection of membership functions,

  Figure 2.5 Complement of membership function,    .
           Figure 2.6 Concentration of membership function, µCON(A)(x) = [µA(x)]2.

            Figure 2.7 Dilation of membership function,                             .

2.8 Remark
Fuzzy logic is the comprehensive version of classical logic. The understanding of classical logic
is very important for the understanding of fuzzy logic. The basic difference between classical
logic and fuzzy logic is that classical logic gives an output as either 0 or 1 but fuzzy logic can
give a continuous output. The fuzzy logic system and application are discussed in later
                        A Real-time Expert System Environment
                        for on-line Decision Support Applications

RTXPS is a real-time expert system environment designed        Production Rules that manage the dynamic
for on-line dynamic decision support, mission critical         problem knowledge base and trigger
command, control and communication tasks such as
                                                               ACTIONS are communicated to the operator in hypertext
      emergency management for                                format, and can automatically trigger a wide range of
       technological and environmental                         functions including data entry and display, an embedded
       hazards, including early warning for                    backward-chaining expert        system,    and  complex
                                                               simulation and optimisation modeling GIS applications.
       events such as floods, toxic or oil spills,
       tsunamis, land slides, etc.                             Support of extensive documentation, logging, reporting and
      complex control and assessment tasks,                   external communication functions such as automatic
                                                               compilation and sending of e-mail or fax messages as well
       including coordination of first response,               as the automatic generation and update of web pages for
       recovery, restoration and clean-up                      public information access are important features of the
       operations,                                             system.
      related teaching and training                           RTXPS can also link to on-line monitoring and data
       applications.                                           acquisition systems that can provide real-time intelligence
                                                               and feed-back from the field; this can be used not only to
RTXPS can be configured to implement any checklist,            update the problem context dynamically, but also for the re-
questionnaire or operating manual based procedure or           calibration of dynamic forecasting models.
protocol. It offers context sensitive support functions
                                                               RTXPS uses a simple near-natural language syntax for its
based on Artificial Intelligence technology, that can handle
                                                               Rules, ACTIONS and Descriptors, the variables that the
the most demanding dynamic situations in distributed client-
                                                               Rules operate on. An intuitive SCRIPT language supports the
server environments, with several parallel action threads.
                                                               efficient development of the Knowledge Base for a new
RTXPS provides extensive support and assistance to the
operator, and keeps complete real-time logs for quality

Setting and querying of timers or the pending of ACTIONS
provide additional features for real-time control.

RTXPS is based on a time-aware forward chaining inference
engine, that processes context sensitive

Application Examples
RTXPS is the core DSS component of RiskWare, a decision support system for technological risk management, designed for
risk assessment, risk management, and risk training.

RTXPS in an extended implementation is the basis of our CourseWare training system, developed in the A-TEAM advanced
technical training system. The framework can also be used to guide users through other complex tasks such as the EIAxpert
system for screening level EIA.

RTXPS has been implemented as the overall framework for SIGRIC, the Sistema di Gestione Rischio Chimico for the Provincial
Authorities of Pisa in the Regione Toscana, Italy.
RTXPS is also the core DSS component of HITERM, High-Performance Computing for Technological Risk Management, an
Esprit HPCN project with case study applications in Italy, Portugal, and Switzerland.

 Interface with HTML hypertext window and               Various editors support the man-machine
 embedded GIS, action logs and real-time                dialog and data acquisition for the inference
 clock                                                  engine

 RTXPS controls a range of dynamic                      Communication functions range from polling
 simulation models that feed their results into         remote sensors and access to remote data
 the expert system                                      bases to automatic fax messages

                 © Copyright 1995-2005 by:    ESS   Environmental Software and Services GmbH
A Real-time Expert System Environment
for on-line Decision Support Applications

Technical Specifications
The facts (data) of RTXPS are stored in DESCRIPTORs.

A value is assigned to a DESCRIPTOR either by direct editing or by starting the rule-based inference. The system then uses a
set of alternative methods enumerated in the DESCRIPTOR definition to obtain or update the DESCRIPTOR value in the
current context. The inference engine compiles all necessary information for the appropriate Backward Chaining Rules' input
conditions recursively, evaluates the Backward Chaining Rules, and eventually updates the target DESCRIPTOR.

The complete syntax of a DESCRIPTOR is:

     A <alias_for_descriptor_name>
     T <descriptor_type>
     U <unit>
     V <range> / <range> / <range> / ...
     R <rule#> / <rule#> / ...
     TB <table#> / <table#> / ...
     F <function>
     IF <interface function>
     G <gis_function> <gis_overlay>
     Q <question>
     T <model_type>
     I <input_descriptor> / <input_descriptor> /
     O <output_descriptor> / <output_descriptor> /
     <alternative defs>
     X <window x-coordinate>
     Y <window y-coordinate>
     WIDTH <window width>
     HEIGHT <window height>
     BGCOLOR <window bgcolor>
     BORDER_WIDTH <window borderwidth>
     BORDER_COLOR <window bordercolor>
     FORMAT <value selector format_string>
     DELTA <value selector increment>
     HYPER_INFO <hyperinfo path>
     HYPER_X <hyperinfo x-coordinate>
     HYPER_Y <hyperinfo x-coordinate>
     HYPER_WIDTH <hyperinfo width>
     HYPER_HEIGHT <hyperinfo height>
     HYPER_TWIDTH <hyperinfo backgroundwin width>
     HYPER_THEIGHT <hyperinfo backgroundwin height>
     HYPER_FGCOLOR <hyperinfo foreground color>
     HYPER_BGCOLOR <hyperinfo background color>
     HYPER_KEYCOLOR <hyperinfo keyword color>
     HYPER_HIKEYCOLOR <hyperinfo highlight color>
     HYPER_SWBORDERC <hyperinfo BORDER="1" color>
A simple example for a DESCRIPTOR of the reservoir expert system is retention_time:

U days
V very_small[ 0, 360] /
V small    [ 360, 1080] /
V medium [1080, 1800] /
V large    [1800, 3600] /
V very_large[3600, 7200] /
R 7777007 /
Q What is the average retention time, in days,
Q for the reservoir ? rtention time is the theoretical
Q period the average volume of water spends in the reservoir,
Q estimated as the ratio of volume to throughflow.
A typical use of this inference process is to assist the user in specifying scenario parameters: here the system collects
circumstantial evidence to derive an informed guess where no hard data are available.

Another use of the backward chaining capabilities of the expert system is to provide a synthesis of large model generated
data volumes. The chain of models used to simulate an accident scenario may easily generate data volumes in the order of
Gigabytes. These should, however, be summarized in a few simple variables such as the number of people exposed, the
level of exposure, the area contaminated, estimated material damage and a rough classification of the accident: these
classifications are needed to trigger the appropriate responses.

The flexibility to use, alternatively or conjunctively, both qualitative symbolic and quantitative numerical methods in one and
the same application allows the system to be responsive to the information at hand, and the users requirements and
constraints. This combination of methods of analysis, and the integration of data bases, geographical information systems,
and hypertext, allows to efficiently exploit whatever information, data and expertise is available in a given problem situation.

An example for a DESCRIPTOR of the reservoir expert system with an external model (in this particular case the
inflow_model) is mean_annual_inflow:

U Mill.m3
V very_small[0,30] / small[30,150] / medium[150,3000] /
V large[3000,30000] / very_large[30000,300000] /
T local_wait
I hemisphere / east_west / longitude / latitude /
O mean_annual_inflow /
Q what is the long term average mean annual inflow,
Q in Million meter cubed, to the reservoir
A model of human problem solving recursively refines and redefines a problem as more information becomes available or
certain alternatives are excluded. This responsiveness to the problem situation and the information available, and the ability
to adjust as more information becomes available, that is in a sense, learn, is a characteristic of intelligent systems.

                   © Copyright 1995-2003 by: ESS Environmental Software and
                                     Services GmbH AUSTRIA

To top