VIEWS: 720 PAGES: 93 CATEGORY: Other POSTED ON: 8/11/2010 Public Domain
.Knowledge Acquisition Knowledge acquisition includes the elicitation, collection, analysis, modelling and validation of knowledge for knowledge engineering and knowledge management projects. Issues in Knowledge Acquisition Some of the most important issues in knowledge acquisition are as follows: Most knowledge is in the heads of experts Experts have vast amounts of knowledge Experts have a lot of tacit knowledge o They don't know all that they know and use o Tacit knowledge is hard (impossible) to describe Experts are very busy and valuable people Each expert doesn't know everything Knowledge has a "shelf life" Requirements for KA Techniques Because of these issues, techniques are required which: Take experts off the job for short time periods Allow non-experts to understand the knowledge Focus on the essential knowledge Can capture tacit knowledge Allow knowledge to be collated from different experts Allow knowledge to be validated and maintained KA Techniques Many techniques have been developed to help elicit knowledge from an expert. These are referred to as knowledge elicitation or knowledge acquisition (KA) techniques. The term "KA techniques" is commonly used. The following list gives a brief introduction to the types of techniques used for acquiring, analysing and modelling knowledge: include various types of interviews Protocol-generation techniques (unstructured, semi-structured and structured), reporting techniques (such as self-report and shadowing) and observational techniques The aim of these techniques is to produce a protocol, i.e. a record of behaviour, whether in audio, video or electronic media. Audio recording is the usual method, which is then transcribed to produce a transcript. Interviews Various types of interviews can be used to produce a transcript. Unstructured interviews have a rough agenda but no pre-defined structure, so that the expert and knowledge engineer are free to explore the domain. This is an inefficient way of gathering detailed knowledge, but can prove useful as an initial interview when little is known of the domain. It also acts as an ice-breaker to establish a rapport between the expert and knowledge engineer. A semi-structured interview combines a highly structured agenda with the flexibility to ask subsequent questions. The questions for a semi-structured interview are ideally constructed some time before the interview and are sent to the expert so he/she can start to prepare responses. For an interview lasting 1 hour, around 10-15 questions might be asked. This allows time in between the set questions for the knowledge engineer to ask supplementary questions to clarify points and ask for more detail where necessary. This is often the preferred style of interview as it helps to focus the expert on the key questions and helps avoid them giving unnecessary information. Another form of interview is the structured interview. This allows no flexibility on the part of the knowledge engineer whose questions are all pre-established. As such, structured interviews often involve filling-in a matrix or other diagrammatic notation. Commentary Another family of techniques that produce protocols are think aloud problem-solving or commentary. These techniques generate protocols by having the expert provide a running commentary on a typical task used in the domain. The basic technique here is the self-report, in which the expert provides a running commentary of their thought processes as they solve a problem. Experimental evidence has shown that self-reports can access cognitive processes that cannot be fully recalled without bias and distortion if explained after the task has been completed. A problem with the self-report technique is that of cognitive overload, i.e. the mental effort required by the expert to provide the commentary interrupts and affects their performance of the task. This is especially true in dynamic domains where time is critical. One way around this is to use an off-line reporting technique. Here the expert is shown a protocol of their task behaviour, typically a video, and asked to provide a running commentary on what they were thinking and doing. An advantage of this is that the video can be paused or run at slow speed to allow time for full explanation. Variants of these reporting techniques involve a second expert commenting on another expert’s performance. Teach Back In the teach back technique, the knowledge engineer describes part of the knowledge that has been acquired during previous sessions or from other sources. The expert comments on what the knowledge engineer is describing to reveal misunderstandings. Observation Observational techniques are another way of generating protocols. Simply observing and making notes as the expert performs their daily activities can be useful, although a time-consuming process. Videotaping their task performance can be useful especially if combined with retrospective reporting techniques. On the whole, though, simple observation techniques are rarely used, as they are an inefficient means of capturing the required knowledge. Protocol analysis techniques are used with transcripts of interviews or other text- based information to identify various types of knowledge, such as goals, decisions, relationships and attributes. This acts as a bridge between the use of protocol- based techniques and knowledge modelling techniques. P r o t o c o l A n a l y s i s Te c h n i q u e s Protocol Analysis involves the identification of basic knowledge objects within a protocol, usually a transcript. For most projects, this makes use of categories of fundamental knowledge such as concepts, attributes, values, tasks and relationships. So, for example, an interview transcript would be analysed by highlighting all the concepts that are relevant to the project. This would be repeated for all the relevant attributes, values, tasks and relationships. In some cases, more detailed categories will be used for the identification depending on the requirements of the project. For instance, if the transcript concerns the task of diagnosis, then such categories as symptoms, hypotheses and diagnostic techniques would be used for the analysis. Such categories may be taken from generic ontologies and problem-solving models. The Protocol Tool in PCPACK can be used to analyse a transcript or other piece of text Hierarchy-generation techniques, such as laddering, are used to build taxonomies or other hierarchical structures such as goal trees and decision networks. L a d d e r i n g Te c h n i q u e s Laddering techniques involve the creation, reviewing and modification of hierarchical knowledge, often in the form of ladders (i.e. tree diagrams). Here the expert and knowledge engineer both refer to a ladder presented on paper or a computer screen, and add, delete, rename or re-classify nodes as appropriate. Laddering can also involve a set of predefined probe questions, such as "Could you tell me some sub-types of X?", "Could you tell me how you can tell that something is X?" and "Why would you prefer X to Y?". A leading proponent of this is Dr Gordon Rugg. Use of Ladders Various forms of ladder can be used. A concept ladder is particularly important since the way an expert categorises concepts into classes is an important key to understanding the way the domain knowledge is conceptualised. Laddering using an attribute ladder is another very useful technique. By reviewing and appending such a ladder, the knowledge engineer can validate and help elicit knowledge of the properties of concepts. Hierarchies with other relationships can also be used, such as composition ladders and process ladders described earlier. Validation of the knowledge represented in a ladder with another expert is often very quick and efficient. Matrix-based techniques involve the construction of grids indicating such things as problems encountered against possible solutions. Important types include the use of frames for representing the properties of concepts and the repertory grid technique used to elicit, rate, analyse and categorise the properties of concepts. These techniques involve the construction and filling-in of a 2- dimensional matrix (grid, table). Useful examples are: Concepts v Properties (attributes and values) Problems v Solutions Hypotheses v Diagnostic techniques Tasks v Resources The elements within the matrix can contain: Symbols (ticks, crosses, question marks Colours Numbers Text The use of frames (see knowledge models) can also be adopted, although this would typically be used for validating previously acquired knowledge rather than for eliciting knowledge from scratch. Timelines (see knowledge models) can also be used to acquire time-based knowledge. The Matrix Tool in PCPACK allows the creation of most types of matrix. Sorting techniques are used for capturing the way people compare and order concepts, and can lead to the revelation of knowledge about classes, properties and priorities. Sorting techniques are a well-known method for capturing the way experts compare and order concepts, and can lead to the revelation of knowledge about classes, properties and priorities. The simplest form is card sorting. Here the expert is given a number of cards each displaying the name of a concept. The expert has the task of repeatedly sorting the cards into piles such that the cards in each pile have something in common. For example, an expert in astronomy might sort cards showing the names of planets into those that are very large, those that of medium size and those that are relatively small. By naming each pile, the expert gives information on the attributes and values they use to denote the properties of concepts. Variants of this involve sorting objects or photographs rather than cards in domains where simple textual descriptors are not easy to use. A technique often used in conjunction with sorting techniques is triadic elicitation (aka 'Three Card Trick'). This technique prompts the expert to generate new attributes. This involves asking the expert what is similar and different about three randomly chosen concepts, i.e. in what way are two of them similar and different from the other. This is a way of eliciting attributes that are not immediately and easily articulated by the expert. Limited-information and constrained-processing tasks are techniques that either limit the time and/or information available to the expert when performing tasks. For instance, the twenty-questions technique provides an efficient way of accessing the key information in a domain in a prioritised order. Limited-information and constrained-processing tasks are techniques which either limit the time and/or information available to the expert when performing tasks that would normally require a lot of time and information to perform. This provides a quick and efficient way of establishing the key tasks and information used. An interesting variant of this is the twenty-questions technique. Here the aim is for the expert is to guess something that the knowledge engineer is thinking about (as in the parlour game of ‘animal, vegetable and mineral’). The expert is allowed to ask questions of the knowledge engineer who is only allowed to respond yes or no. As the expert asks each question, the knowledge engineer notes this down. The questions asked and the order in which they are asked give important knowledge such as key properties or categories in a prioritised order. Diagram-based techniques include the generation and use of concept maps, state transition networks, event diagrams and process maps. The use of these is particularly important in capturing the "what, how, when, who and why" of tasks and events. These techniques include the generation and use of network diagrams, such as concept maps, state transition networks and process maps (see types of knowledge models). As with laddering, the knowledge engineer elicits knowledge from the expert by mutual reference to a diagram on paper or computer screen. The use of concept maps has been strongly advocated as a comprehensive technique for eliciting many types of knowledge. Use of network diagrams has become a mainstream technique when acquiring knowledge to develop object-oriented software. For example, the industry standard UML (Unified Modelling Language) makes use of concept maps (combined with frames) for object knowledge, state transition networks for dynamic modelling, and process maps for functional modelling. As with laddering, the presentation of knowledge in a network format makes validation very efficient. The ease with which people understand and relate to networks has been demonstrated with experimental evidence showing that people understand and apply knowledge more easily and readily if a concept map notation is used rather than predicate logic. Comparison of KA Techniques The figure below presents the various techniques described above and shows the types of knowledge they are mainly aimed at eliciting. The vertical axis on the figure represents the dimension from object knowledge to process knowledge, and the horizontal axis represents the dimension from explicit knowledge to tacit knowledge. Typical Use of KA Techniques How and when are the many techniques described above used in a knowledge acquisition project? To illustrate the general process, a simple method will be described. This method starts with the use of natural techniques, then moves to using more contrived techniques. It is summarised as follows. Conduct an initial interview with the expert in order to (a) scope what knowledge is to be acquired, (b) determine what purpose the knowledge is to be put, (c) gain some understanding of key terminology, and (d) build a rapport with the expert. This interview (as with all session with experts) is recorded on either audiotape or videotape. Transcribe the initial interview and analyse the resulting protocol. Create a concept ladder of the resulting knowledge to provide a broad representation of the knowledge in the domain. Use the ladder to produce a set of questions which cover the essential issues across the domain and which serve the goals of the knowledge acquisition project. Conduct a semi-structured interview with the expert using the pre-prepared questions to provide structure and focus. Transcribe the semi-structured interview and analyse the resulting protocol for the knowledge types present. Typically these would be concepts, attributes, values, relationships, tasks and rules. Represent these knowledge elements using the most appropriate knowledge models, e.g. ladders, grids, network diagrams, hypertext, etc. In addition, document anecdotes, illustrations and explanations in a structured manner using hypertext and template headings. Use the resulting knowledge models and structured text with contrived techniques such as laddering, think aloud problem-solving, twenty questions and repertory grid to allow the expert to modify and expand on the knowledge already captured. Repeat the analysis, model building and acquisition sessions until the expert and knowledge engineer are happy that the goals of the project have been realised. Validate the knowledge acquired with other experts, and make modifications where necessary. This is a very brief coverage of what happens. It does not assume any previous knowledge has been gathered, nor that any generic knowledge can be applied. In reality, the aim would be to re-use as much previously acquired knowledge as possible. Techniques have been developed to assist this, such as the use of ontologies and problem-solving models. These provide generic knowledge to suggest ideas to the expert such as general classes of objects in the domain and general ways in which tasks are performed. This re-use of knowledge is the essence of making the knowledge acquisition process as efficient and effective as possible. This is an evolving process. Hence, as more knowledge is gathered and abstracted to produce generic knowledge, the whole process becomes more efficient. In practice, knowledge engineers often mix this theory-driven (top-down) approach with a data-driven (bottom-up) approach (discussed later). precedent for the aim of this thesis to apply practices from knowledge engineering to the realm of personal knowledge. Knowledge Management Knowledge Management is a strategy, framework or system designed to help organisations create, capture, analyse, apply, and reuse knowledge to achieve competitive advantage. A key aspect of Knowledge Management is that knowledge within an organisation is treated as a key asset. A simple phrase that encapsulates a core aspect of Knowledge Management is "getting the right knowledge to the right people at the right time in the right format". Knowledge Management Methods Knowledge Management methods can be categorised into two main groups: Those that move knowledge around the organisation Those that help create new knowledge Methods that help move knowledge include: Face-to-face communication methods, e.g. peer assist, lessons learnt reviews, knowledge fairs Computer-based communication methods, e.g. email, Lotus Notes, communities of practice Storage-and-retrieval using computer systems, e.g. intranets, knowledge books Knowledge-Based Systems, e.g. expert systems Knowledge Engineering Knowledge engineering is a field within artificial intelligence that develops knowledge-based systems. Such systems are computer programs that contain large amounts of knowledge, rules and reasoning mechanisms to provide solutions to real-world problems. A major form of knowledge-based system is an expert system, one designed to emulate the reasoning processes of an expert practitioner (i.e. one having performed in a professional role for very many years). Typical examples of expert systems include diagnosis of bacterial infections, advice on mineral exploration and assessment of electronic circuit designs. Importance of Knowledge Acquisition The early years of knowledge engineering were dogged by problems. Knowledge engineers found that acquiring enough high-quality knowledge to build a robust and useful system was a very long and expensive activity. As such, knowledge acquisition was identified as the bottleneck in building an expert system. This led to knowledge acquisition becoming a major research field within knowledge engineering. The aim of knowledge acquisition is to develop methods and tools that make the arduous task of capturing and validating an expert’s knowledge as efficient and effective as possible. Experts tend to be important and busy people; hence it is vital that the methods used minimise the time each expert spends off the job taking part in knowledge acquisition sessions. Knowledge Engineering Principles Since the mid-1980s, knowledge engineers have developed a number of principles, methods and tools that have considerably improved the process of knowledge acquisition. Some of the key principles are summarised as follows: Knowledge engineers acknowledge that there are different types of knowledge, and that the right approach and technique should be used for the knowledge required. Knowledge engineers acknowledge that there are different types of experts and expertise, such that methods should be chosen appropriately. Knowledge engineers recognise that there are different ways of representing knowledge, which can aid the acquisition, validation and re-use of knowledge. Knowledge engineers recognise that there are different ways of using knowledge, so that the acquisition process can be guided by the project aims. Knowledge engineers use structured methods to increase the efficiency of the acquisition process. Knowledge Engineering Methodologies Epistemics is involved in three methodologies to support the development of knowledge systems: CommonKADS CommonKADS is the methodology that is most commonly followed at Epistemics when developing knowledge engineering systems. CommonKADS is a complete methodological framework for the development of a knowledge based system (KBS). It supports most aspects of a KBS development project, such as: Project management Organisational analysis (including problem/opportunity identification) Knowledge acquisition (including initial project scoping) Knowledge analysis and modelling Capture of user requirements Analysis of system integration issues Knowledge system design Perspectives CommonKADS describes KBS development from two perspectives: Result perspective: A set of models, of different aspects of the KBS and its environment, that are continuously improved during a project life-cycle. Project management perspective: A risk-driven generic spiral life-cycle model that can be configured into a process adapted to the particular project. SPEDE The SPEDE methodology is a combination of principles, techniques and tools taken from Knowledge Engineering and adapted for use in Knowledge Management. It provides an effective means to capture, validate and communicate vital knowledge to provide business benefit. The SPEDE methodology was developed under the guidance of Rolls-Royce plc and involved staff from Epistemics acting as consultants. Early versions of PCPACK v4 were tested and developed on a number of SPEDE projects. With assistance from Epistemics, Rolls-Royce has run over 100 SPEDE projects, involving the training of over 150 employees. Structure and Deliverables SPEDE has been specifically developed to act as a training course for novice knowledge engineers or those seconded to a knowledge management activity. SPEDE projects typically involve 1-week of intensive training followed by 2-3 months of scoping, knowledge acquisition and delivery phases. The main deliverable of most SPEDE projects is an intranet website. However, previous projects have delivered quality procedures, process improvement information and expert systems. Projects using the SPEDE methodology follow a set of procedures coordinated by experienced staff. All projects have a coach who manages the activities of one or more knowledge engineers on a daily basis. Gates All SPEDE projects must pass through a series of gates. These are meetings held at various stages throughout the project to act as a "go/no go" into the next phase of the project. Each gate comprises various criteria to ensure the project is on track to meet the objectives and identify any problems, hazards and actions. There are 5 gates: project launch review, scoping review, technical review, delivery review and post-delivery review. MOKA Methodology and tools Oriented to Knowledge-Based Engineering Applications MOKA is a methodology for developing Knowledge-Based Engineering applications, i.e. systems that support design engineers. It is particularly aimed at capturing and applying knowledge within aeronautical and automotive industries of the design of complex mechanical products. Whilst huge benefits can be gained by the use of Knowledge-Based Engineering (KBE) technology, the lack of a recognised methodology has resulted in a significant risk when developing and maintaining KBE applications. MOKA aims to provide such a methodology, that: Reduces the lead times and associated costs of developing KBE applications by 20 - 25%. Provides a consistent way of developing and maintaining KBE applications. Will form the basis of an international standard. Makes use of a software tool to support the use of the methodology. Need for MOKA Companies have to manage and reuse engineering knowledge to improve business processes, to reduce time to find new solutions, to make correct first time and to retain best practices. The aim of MOKA is to provide a methodology to capture and formalise engineering knowledge to reuse it, for example within KBE applications. Development and maintenance of knowledge intensive software applications is a complex and potentially expensive activity. The number of Knowledge- Based Engineering (KBE) systems used in the aeronautical and automotive industries has increased in recent years. Experience has shown that long term risk can be reduced by employing a systematic methodology that covers the development and maintenance of such systems. The ESPRIT-IV funded project called MOKA (No. 25418) is intended to satisfy this need by providing both a methodology and a supporting software tool, both of which are independent of any KBE platform. MOKA Analysis and Modelling MOKA identifies two models to be used in the KBE application development lifecycle : Informal Model: A structured, natural language representation of engineering knowledge using pre-defined forms. Formal Model: A graphical, object-oriented representation of engineering knowledge at one level of abstraction above application code. Within each of these models, various knowledge representations are used to help capture, analyse and structure the knowledge required for KBE applications. Within the informal model, the main knowledge objects are: Entities o Structural Entities (the components of the product being designed) o Functional Entities (the functions of the product and its sub-components) Constraints (the design requirements of the product and its sub-components) Activities (the tasks performed during the design process) Rules (decision points in the design process that affect what tasks to perform) Illustrations (examples that illustrate aspects of the product and design) MOKA Tool PCPACK can be used to satisfy the requirements for a supporting software tool for the MOKA methodology. It supports the capture, analysis, modelling and publishing of design knowledge using a MOKA framework. Background MOKA (Methodology and tools Oriented to Knowledge-Based Engineering Applications) was an ESPRIT funded project that started in January 1998 and consisted of the following partners: Aerospatiale Matra (prime), British Aerospace, Daimler-Chrysler, PSA Peugeot Citroen, Knowledge Technologies International, Decan and Coventry University. A MOKA Interest Group continues to meet and develop the methodology. Knowledge Modelling An important aspect of knowledge acquisition is the use of knowledge modelling as a way of structuring projects, acquiring and validating knowledge and storing knowledge for future use. Knowledge models are structured representations of knowledge using symbols to represent pieces of knowledge and relationships between them. Knowledge models include: Symbolic character-based languages, such as logic Diagrammatic representations, such as networks and ladders Tabular representations, such as matrices Structured text, such as hypertext. Uses of Knowledge Models The generation and modification of a knowledge model is an essential aspect of knowledge acquisition, as the model helps to clarify the language being used and quickly convey information for validation and modification where necessary. Thus, the use of knowledge models is of great benefit during: knowledge elicitation (from an expert) validation (with the same expert) cross-validation (with another expert) knowledge publication maintenance and updating of the knowledge system or publication Most forms of knowledge models are composed of primitive elements called knowledge objects. Knowledge Models The field of Artificial Intelligence may not have produced fully intelligent machines but one of its major achievements is the development of a range of ways of representing knowledge. A thorough understanding of different knowledge representations is a vital part (arguably the vital part) of Artificial Intelligence, since the ease of solving a problem is almost completely determined by the way the problem is conceptualised and represented. The same is true for the task of communicating knowledge. A well-chosen analogy or diagram can make all the difference when trying to communicate a difficult idea to someone, especially a non-expert in the field. Knowledge engineers make use of a number of ways of representing knowledge when acquiring knowledge from experts. These are usually referred to as knowledge models. Three important types of knowledge models are: Ladders: Ladders are hierarchical (tree-like) diagrams. Some important types of ladders are concept ladder, composition ladder, decision ladder and attribute ladder. Ladders can be created and edited using the Ladder Tool in PCPACK. Network Diagrams: Network diagrams show nodes connected by arrows. Depending on the type of network diagram, the nodes might represent any type of concept, attribute, value or task, and the arrows between the nodes any type of relationship. Examples of network diagrams include concept maps, process maps and state transition networks. Network diagrams can be created and edited using the Diagram Tool in PCPACK. Tool Tables and Grids: Tabular representations make use of tables or grids. Three important types are forms, frames, timelines and matrices/grids. Matrices can be created and edited using the Matrix Tool in PCPACK. Descriptions and examples of the important types of knowledge models are shown below. Concept Ladder A concept ladder shows classes of concepts and their sub-types. All relationships in the ladder are the is a relationship, e.g. car is a vehicle. A concept ladder is more commonly known as a taxonomy and is vital to representing knowledge in almost all domains. An example of a concept ladder is shown below. Composition Ladder A composition ladder shows the way a knowledge object is composed of its constituent parts. All relationships in the ladder are the has part or part-of relationship, e.g. wheel is part of car. A composition ladder is a useful way of understanding complex entities such as machines, organisations and documents. An example of a composition ladder is shown below. Decision Ladder A decision ladder shows the alternative courses of action for a particular decision. It also shows the pros and cons for each course of action, and possibly the assumptions for each pro and con. A decision ladder is a useful way of representing detailed process knowledge. An example of a decision ladder is shown below. Attribute Ladder An attribute ladder shows attributes and values. All the adjectival values relevant to an attribute are shown as sub-nodes, but numerical values are not usually shown. For example, the attribute colour would have as sub-nodes those colours appropriate in the domain as values, e.g. red, blue, green. An attribute ladder is a useful way of representing knowledge of all the properties that can be associated with concepts in a domain. An example of an attribute ladder is shown below. Process Ladder This ladder shows processes (tasks, activities) and the sub-processes (sub-tasks, sub-activities) of which they are composed. All relationships are the part of relationship, e.g. boil the kettle is part of make the tea. A process ladder is a useful way of representing process knowledge. An example of a process ladder is shown below. Concept Map A concept map a type of diagram that shows knowledge objects as nodes and the relationships between them as links (usually labelled arrows). Any types of concepts and relationships can be used. The concept map is very similar to a semantic network used in cognitive psychology. An example of a concept map is shown below. Process Map A third important type of network diagram is a process map. This type of diagram shows the inputs, outputs, resources, roles and decisions associated with each process or task in a domain. The process map is an excellent way of representing information of how and when processes, tasks and activities are performed. An example of a process map is shown below. State Transition Network Another important type of network diagram is the state transition network. This type of diagram comprises two elements: (1) nodes that represent the states that a concept can be in, and (2) arrows between the nodes showing all the events and processes/tasks that can cause transitions from one state to another. An example of a state transition network is shown below. Frames Frames are a way of representing knowledge in which each concept in a domain is described by a group of attributes and values using a matrix representation. The left-hand column represents the attributes associated with the concept and the right-hand column represents the appropriate values. When the concept is a class, typical (default) values are entered in the right-hand column. An example of a frame is shown the table below for the concept Novel. Timeline A timeline is a type of tabular representation that shows time along the horizontal axis and such things as processes, tasks or project phases along the vertical axis. It is very useful for representing time-based process or role knowledge. Matrix A matrix (aka grid) is a type of tabular representation that comprises a 2-dimensional grid with filled-in grid cells. One example is a problem-solution matrix that shows the problems that can arise in a particular part of a domain as the rows in the matrix and possible solutions as the columns. Ticks, crosses or comments in the matrix cells indicate which solution is applicable to which problem. Another important type of matrix used by knowledge engineers is a focus grid, described later in this chapter. Examples of different forms of matrix are shown on the page describing the PCPACK Matrix Tool. Forms A more recent form of knowledge model is the use of hypertext and web pages. Here relationships between concepts, or other types of knowledge, are represented by hyperlinks. This affords the use of structured text by making use of templates, i.e. generic headings. Different templates can be created for different knowledge types. For example, the template for a task would include such headings as description, goal, inputs, outputs, resources and typical problems. Hypertext pages can be created and edited using the Annotation Tool in PCPACK. Knowledge Objects Philosophers have been thinking about knowledge for thousands of years. Part of their endeavours has been the identification of various types of knowledge and classification systems. These typologies have been adopted by knowledge engineers when analysing texts and constructing knowledge models. Declarative and Procedural Knowledge One well-known distinction is between declarative knowledge (knowledge of facts) and procedural knowledge (knowledge of how to do things), or what has been called "knowing that" and "knowing how". Within knowledge engineering, these two types are often referred to as object knowledge and process or task knowledge. Tacit and Explicit Knowledge Another well-known classification of knowledge is that of tacit knowledge (cannot be articulated easily) and explicit knowledge (can be articulated easily). This is particularly important for knowledge engineers, as special techniques have to be used with an expert to try to elicit tacit knowledge, which is the hardest and often the most valuable knowledge to acquire. Generic and Specific Knowledge A further way of classifying knowledge is to what extent it is generic (applies across many situations) or specific (applies to one or a few situations). Developing ways in which specific knowledge can be made more generic, and generic knowledge can be made more specific, has been a major effort in knowledge engineering. Knowledge Objects The field of logic has also inspired important knowledge types, notably concepts, attributes, values, rules and relationships. When analysing a piece of text, such as a transcript, so that knowledge models can be created, knowledge engineers try to identify low-level knowledge objects. Brief definitions of some of the most important of these are as follows. Concepts Concepts are the things that constitute a domain, e.g. physical objects, ideas, people and organisations. Each concept is described by its relationships to other concepts in the domain (e.g. in a hierarchy) and by its attributes and values. From a grammatical perspective, concepts are usually equivalent to nouns. Instances An instance is an instantiated class. For example, "my car" is an instance of the concept "car". Instances only have the attributes of their class (including inherited attributes). They may override any or all of the default values. For example, the "my car" attribute "maximum speed" may be 90mph, overriding the default of 100mph for all cars. Processes (Tasks, Activities) Processes (aka tasks, activities) are sets of actions performed to satisfy a goal or set of objectives. Some examples are: build the house design the engine plan the project Processes are described using other knowledge objects, such as inputs, outputs, resources, roles and decision points. Attributes and Values Attributes and values describe the properties of other knowledge objects. Attributes are the generic properties, qualities or features belonging to a class of concepts, e.g. weight, cost, age and ability. Values are the specific qualities of a concept such as its actual weight or age. Values are associated with a particular attribute and can be numerical (e.g. 120Kg, 6 years old) or categorical (e.g. heavy, young). From a grammatical perspective, values are equivalent to adjectives. Rules Rules are statements of the form "IF... THEN...". Some examples are: IF the temperature in the room is hot THEN open the window or switch the fan on IF the rate of compression of the engine is low THEN increase the oil flow Relationships (Relations) Relationships represent the way knowledge objects (such as concepts and tasks) are related to one another. Important examples include is a to show classification, part of to show composition, and those used in various knowledge models such as a process map or state transition network. Relationships are often represented as arrows on diagrams. From a grammatical perspective, relationships are usually equivalent to passive verbs. K n o w l e d g e a n d S y s t e m Va l i d a t i o n Part of the ongoing process is testing. You have to assure that two things are right, the knowledge and the system. The knowledge you get from an expert may not be right. You may translate it incorrectly or they may give it to you incorrectly. You can compare with the expert (ask them, interview them), and you can compare with other epxerts. Of course you need to validate the system too. You need to check with experts and users Derived Knowledge In Rule based systems, derived knowledge is knowledge that the user neither puts directly into the system, or recieves from the system, but the system uses. Pragmatically, a WM item that is on the then side of one rule and the if side of another is derived. Why is derived knowledge important? It shows that the system is doing more reasoning. It shows that it is a deeper system. What is an example of derived knowledge? In noughts and crosses, knowledge that the opponent has two in a row is derived from the board. In the second lab, if A then B, if B then C, B is derived knowledge. If you're doing a RBS for your coursework, you'll need derived knowledge to get a first or upper second, and it probably would help for lower second. What would be an example of derived knowledge in your domains? In class work: write a rule that determines that X's have two in a row. Write a rule that uses that fact. R u l e B a s e d Sy s t e m A r c h i t e c t u r e This is the runtime architecture The user starts the system and interacts with it via the user interface. This is the command prompt in Clips and the session window in Kappa PC. The engine is the part of the program that actually does stuff. It says, run the system, that is, look at working memory, see what rules fire, and apply them. You'll have the same inference engine for each rule base. The knowledge base is the rules and the working memory. The rules will remain the same for different runs. WM changes for each run and during the run. Inference Engine The Inference Engine compares the rules to working memory. It picks a supported rule and fires it. There can be more than one supported rule, and this is resolved by a conflict resolution strategy. For example if the rules look like o if (X is green) and (X is a fruit) then (X is a Watermelon) o if (X is red) and (X is a fruit) then (X is an Apple) and working memory says X1 is green, X2 is red, X1 is a fruit. What rule is applied? What happens? A rule can be supported more than once. A fact can be used to support more than one rule. Undo the rule and add the working memory item X2 is a fruit. What rule is applied? Now what happens? Knowledge Base The Knowledge Base consists of rules and working memory (WM) Rules are if then statements On the if side you have conditional expressions, (e.g. X is green) You can have variables in here, in this case X is a variable. On the then side you usually have assignments. That is you set or modify working memory items. In Clips you set values by (assert (fact)) You compare by stating the fact in the if part or by using (test (func params)). E.g (test (> ?a 3)) The if part is conjoined by and by default and you can use the and or, and not functions Variables cross expression e.g.: If (favorite-colour ?Val) then (assert (pick Colour ?Val) Rules vs. Structured Programming Structured Programming is like C, C++ and Java (pretty much any language you've heard of). Structured Programming has loops, and functions. Are Rule Based Systems the same as a bunch of if then else statements? No, Execution vs. Branching It is a line of statements in C With a Rule Based System, each rule can apply at each cycle. A Rule Based System is essentially a big mutually exclusive if then else statement inside a loop How do you decide which one to apply? Rules Rules are also a form of KR. (For that matter programs are dynamic forms of KR.) Rules enable you to derive more facts. Rules are more dynamic than WM. They are of course of the form if X then Y This is similar to a formal logic. Logic There are lots of forms of logic. Logics are systems that are developed to try to quantify what knowledge is. That's why Aristotle came up with a system 2500 years ago. The logics we are going to talk about are relatively simple and involve the concept of Truth. For example London is a City is True, and Chris is British is False (For the pedantic, this really is a simplification of the world.) Semantic Nets Semantic Nets were invented by Quillian in 1969. Quillian was a Psychologist (at UMich) and was trying to define the structure of human knowledge. Semantic Nets are about relations between concepts. Semantic Nets use a graph structure so that concepts are nodes in the graph. The concepts are connected by arcs which tells the relationship between the concepts. Semantic Nets The major idea is that: The meaning of a concept comes from its relationship to other concepts, and that, The information is stored by interconnecting nodes with labelled arcs. Why use this data structure? It enables attribute values to be retrieved quickly o assertions are indexed by the entities o binary predicates are indexed by first argument. E.g. team(Mike-Hall , Cardiff). Properties of relations are easy to describe . It allows ease of consideration as it embraces aspects of object oriented programming. So called because: A slot is an attribute value pair in its simplest form. A filler is a value that a slot can take -- could be a numeric, string (or any data type) value or a pointer to another slot. A weak slot and filler structure does not consider the content of the representation. Representation in a Semantic Net The physical attributes of a person can be represented as in Fig. 9. Fig. 9 A Semantic Network These values can also be represented in logic as: isa(person, mammal), instance(Mike-Hall, person) team(Mike-Hall, Cardiff) We have already seen how conventional predicates such as lecturer(dave) can be written as instance (dave, lecturer) Recall that isa and instance represent inheritance and are popular in many knowledge representation schemes. But we have a problem: How we can have more than 2 place predicates in semantic nets? E.g. score(Cardiff, Llanelli, 23-6) Solution: Create new nodes to represent new objects either contained or alluded to in the knowledge, game and fixture in the current example. Relate information to nodes and fill up slots (Fig: 10). Fig. 10 A Semantic Network for n-Place Predicate As a more complex example consider the sentence: John gave Mary the book. Here we have several aspects of an event. Fig. 11 A Semantic Network for a Sentence Inference in a Semantic Net Basic inference mechanism: follow links between nodes. Two methods to do this: Intersection search -- the notion that spreading activation out of two nodes and finding their intersection finds relationships among objects. This is achieved by assigning a special tag to each visited node. Many advantages including entity-based organisation and fast parallel implementation. However very structured questions need highly structured networks. Inheritance -- the isa and instance representation provide a mechanism to implement this. Inheritance also provides a means of dealing with default reasoning. E.g. we could represent: Emus are birds. Typically birds fly and have wings. Emus run. in the following Semantic net: Fig. 12 A Semantic Network for a Default Reasoning In making certain inferences we will also need to distinguish between the link that defines a new entity and holds its value and the other kind of link that relates two existing entities. Consider the example shown where the height of two people is depicted and we also wish to compare them. We need extra nodes for the concept as well as its value. Fig. 12 Two heights Special procedures are needed to process these nodes, but without this distinction the analysis would be very limited. Fig. 12 Comparison of two heights Extending Semantic Nets Here we will consider some extensions to Semantic nets that overcome a few problems (see Exercises) or extend their expression of knowledge. Partitioned Networks Partitioned Semantic Networks allow for: propositions to be made without commitment to truth. expressions to be quantified. Basic idea: Break network into spaces which consist of groups of nodes and arcs and regard each space as a node. Consider the following: Andrew believes that the earth is flat. We can encode the proposition the earth is flat in a space and within it have nodes and arcs the represent the fact (Fig. 15). We can the have nodes and arcs to link this space the the rest of the network to represent Andrew's belief. Fig. 12 Partitioned network Now consider the quantified expression: Every parent loves their child To represent this we: Create a general statement, GS, special class. Make node g an instance of GS. Every element will have at least 2 attributes: o a form that states which relation is being asserted. o one or more forall ( ) or exists ( ) connections -- these represent universally quantifiable variables in such statements e.g. x, y in parent(x) : child(y) loves(x,y) Here we have to construct two spaces one for each x,y. NOTE: We can express variables as existentially qualified variables and express the event of love having an agent p and receiver b for every parent p which could simplify the network (See Exercises). Also If we change the sentence to Every parent loves child then the node of the object being acted on (the child) lies outside the form of the general statement. Thus it is not viewed as an existentially qualified variable whose value may depend on the agent. (See Exercises and Rich and Knight book for examples of this) So we could construct a partitioned network as in Fig. 16 Fig. 12 Partitioned network Production Rules Production rules are one of the most popular and widely used knowledge representation languages. Early expert systems used production rules as their main knowledge representation language. For example, MYCIN, which is also considered one of the first research works in medical informatics, has production rules as its knowledge representation language. Production rule system consists of three components, i.e., working memory, rule base and interpreter. The working memory contains the information that the system has gained about the problem thus far. The rule base contains information that applies to all the problems that the system may be asked to solve. The interpreter solves the control problem, i.e., decide which rule to execute on each selection-execute cycle. Production rules as a knowledge representation language has the following advantages: Naturalness of expression Modularity Restricted syntax Disadvantages of production rules as a knowledge representation language includes: Inefficient Less expressive Frames Frames can also be regarded as an extension to Semantic nets. Indeed it is not clear where the distinction between a semantic net and a frame ends. Semantic nets initially we used to represent labelled connections between objects. As tasks became more complex the representation needs to be more structured. The more structured the system it becomes more beneficial to use frames. A frame is a collection of attributes or slots and associated values that describe some real world entity. Frames on their own are not particularly helpful but frame systems are a powerful way of encoding information to support reasoning. Set theory provides a good basis for understanding frame systems. Each frame represents: a class (set), or an instance (an element of a class). Consider the example first discussed in Semantics Nets (Section 6.2.1): Person isa: Mammal Cardinality: Adult-Male isa: Person Cardinality: Rugby-Player isa: Adult-Male Cardinality: Height: Weight: Position: Team: Team-Colours: Back isa: Rugby-Player Cardinality: Tries: Mike-Hall instance: Back Height: 6-0 Position: Centre Team: Cardiff-RFC Team-Colours: Black/Blue Rugby-Team isa: Team Cardinality: Team-size: 15 Coach: Figure: A simple frame system Here the frames Person, Adult-Male, Rugby-Player and Rugby-Team are all classes and the frames Robert-Howley and Cardiff-RFC are instances. Note The isa relation is in fact the subset relation. The instance relation is in fact element of. The isa attribute possesses a transitivity property. This implies: Robert-Howley is a Back and a Back is a Rugby-Player who in turn is an Adult-Male and also a Person. Both isa and instance have inverses which are called subclasses or all instances. There are attributes that are associated with the class or set such as cardinality and on the other hand there are attributes that are possessed by each member of the class or set. DISTINCTION BETWEN SETS AND INSTANCES It is important that this distinction is clearly understood. Cardiff-RFC can be thought of as a set of players or as an instance of a Rugby-Team. If Cardiff-RFC were a class then its instances would be players it could not be a subclass of Rugby-Team otherwise its elements would be members of Rugby-Team which we do not want. Instead we make it a subclass of Rugby-Player and this allows the players to inherit the correct properties enabling us to let the Cardiff-RFC to inherit information about teams. This means that Cardiff-RFC is an instance of Rugby-Team. BUT There is a problem here: A class is a set and its elements have properties. We wish to use inheritance to bestow values on its members. But there are properties that the set or class itself has such as the manager of a team. This is why we need to view Cardiff-RFC as a subset of one class players and an instance of teams. We seem to have a CATCH 22. Solution: MetaClasses A metaclass is a special class whose elements are themselves classes. Now consider our rugby teams as: Figure: A Metaclass frame system The basic metaclass is Class, and this allows us to define classes which are instances of other classes, and (thus) inherit properties from this class. Inheritance of default values occurs when one element or class is an instance of a class. Slots as Objects How can we to represent the following properties in frames? Attributes such as weight, age be attached and make sense. Constraints on values such as age being less than a hundred Default values Rules for inheritance of values such as children inheriting parent's names Rules for computing values Many values for a slot. A slot is a relation that maps from its domain of classes to its range of values. A relation is a set of ordered pairs so one relation is a subset of another. Since slot is a set the set of all slots can be represent by a metaclass called Slot, say. Consider the following: SLOT isa: Class instance: Class domain: range: range-constraint: definition: default: to-compute: single-valued: Coach instance: SLOT domain: Rugby-Team range: Person range-constraint: (experience x.manager) default: single-valued: TRUE Colour instance: SLOT domain: Physical-Object range: Colour-Set single-valued: FALSE Team-Colours instance: SLOT isa: Colour domain: team-player range: Colour-Set range-constraint: not Pink single-valued: FALSE Position instance: SLOT domain: Rugby-Player range: { Back, Forward, Reserve } to-compute: x.position single-valued: TRUE NOTE the following: Instances of SLOT are slots Associated with SLOT are attributes that each instance will inherit. Each slot has a domain and range. Range is split into two parts one the class of the elements and the other is a constraint which is a logical expression if absent it is taken to be true. If there is a value for default then it must be passed on unless an instance has its own value. The to-compute attribute involves a procedure to compute its value. E.g. in Position where we use the dot notation to assign values to the slot of a frame. Transfers through lists other slots from which values can be derived from inheritance. Interpreting frames A frame system interpreter must be capable of the following in order to exploit the frame slot representation: Consistency checking -- when a slot value is added to the frame relying on the domain attribute and that the value is legal using range and range constraints. Propagation of definition values along isa and instance links. Inheritance of default. values along isa and instance links. Computation of value of slot as needed. Checking that only correct number of values computed. See Exercises for further instances of drawing inferences etc. from frames. And-Or Graphs Useful for certain problems where The solution involves decomposing the problem into smaller problems. We then solve these smaller problems. Here the alternatives often involve branches where some or all must be satisfied before we can progress. For example if I want to learn to play a Frank Zappa guitar solo I could (Fig. 2.2.1) Transcribe it from the CD. OR Buy the ``Frank Zappa Guitar Book'' AND Read it from there. Note the use of arcs to indicate that one or more nodes must all be satisfied before the parent node is achieved. To find solutions using an And-Or GRAPH the best first algorithm is used as a basis with a modification to handle the set of nodes linked by the AND factor. Inadequate: CANNOT deal with AND bit well. AO* Algorithm 1. Initialise the graph to start node 2. Traverse the graph following the current path accumulating nodes that have not yet been expanded or solved 3. Pick any of these nodes and expand it and if it has no successors call this value FUTILITY otherwise calculate only f' for each of the successors. 4. If f' is 0 then mark the node as SOLVED 5. Change the value of f' for the newly created node to reflect its successors by back propagation. 6. Wherever possible use the most promising routes and if a node is marked as SOLVED then mark the parent node as SOLVED. 7. If starting node is SOLVED or value greater than FUTILITY, stop, else repeat from 2. [2] What is fuzzy logic? Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the concept of partial truth -- truth values between "completely true" and "completely false". It was introduced by Dr. Lotfi Zadeh of UC/Berkeley in the 1960's as a means to model the uncertainty of natural language. (Note: Lotfi, not Lofti, is the correct spelling of his name.) [3] Where is fuzzy logic used? Date: 15-APR-93 Fuzzy logic is used directly in very few applications. The Sony PalmTop apparently uses a fuzzy logic decision tree algorithm to perform handwritten (well, computer lightpen) Kanji character recognition. Most applications of fuzzy logic use it as the underlying logic system for fuzzy expert systems What do ya mean fuzzy ??!! Before illustrating the mechanisms which make fuzzy logic machines work, it is important to realize what fuzzy logic actually is. Fuzzy logic is a superset of conventional(Boolean) logic that has been extended to handle the concept of partial truth- truth values between "completely true" and "completely false". As its name suggests, it is the logic underlying modes of reasoning which are approximate rather than exact. The importance of fuzzy logic derives from the fact that most modes of human reasoning and especially common sense reasoning are approximate in nature. The essential characteristics of fuzzy logic as founded by Zader Lotfi are as follows. In fuzzy logic, exact reasoning is viewed as a limiting case of approximate reasoning. In fuzzy logic everything is a matter of degree. Any logical system can be fuzzified In fuzzy logic, knowledge is interpreted as a collection of elastic or, equivalently , fuzzy constraint on a collection of variables Inference is viewed as a process of propagation of elastic constraints. The third statement hence, define Boolean logic as a subset of Fuzzy logic. Fuzzy Sets Fuzzy Set Theory was formalised by Professor Lofti Zadeh at the University of California in 1965. What Zadeh proposed is very much a paradigm shift that first gained acceptance in the Far East and its successful application has ensured its adoption around the world. A paradigm is a set of rules and regulations which defines boundaries and tells us what to do to be successful in solving problems within these boundaries. For example the use of transistors instead of vacuum tubes is a paradigm shift - likewise the development of Fuzzy Set Theory from conventional bivalent set theory is a paradigm shift. Bivalent Set Theory can be somewhat limiting if we wish to describe a 'humanistic' problem mathematically. For example, Fig 1 below illustrates bivalent sets to characterise the temperature of a room. The most obvious limiting feature of bivalent sets that can be seen clearly from the diagram is that they are mutually exclusive - it is not possible to have membership of more than one set ( opinion would widely vary as to whether 50 degrees Fahrenheit is 'cold' or 'cool' hence the expert knowledge we need to define our system is mathematically at odds with the humanistic world). Clearly, it is not accurate to define a transiton from a quantity such as 'warm' to 'hot' by the application of one degree Fahrenheit of heat. In the real world a smooth (unnoticeable) drift from warm to hot would occur. This natural phenomenon can be described more accurately by Fuzzy Set Theory. Fig.2 below shows how fuzzy sets quantifying the same information can describe this natural drift. The whole concept can be illustrated with this example. Let's talk about people and "youthness". In this case the set S (the universe of discourse) is the set of people. A fuzzy subset YOUNG is also defined, which answers the question "to what degree is person x young?" To each person in the universe of discourse, we have to assign a degree of membership in the fuzzy subset YOUNG. The easiest way to do this is with a membership function based on the person's age. young(x) = { 1, if age(x) <= 20, (30-age(x))/10, if 20 < age(x) <= 30, 0, if age(x) > 30 } A graph of this looks like: Given this definition, here are some example values: Person Age degree of youth -------------------------------------- Johan 10 1.00 Edwin 21 0.90 Parthiban 25 0.50 Arosha 26 0.40 Chin Wei 28 0.20 Rajkumar 83 0.00 So given this definition, we'd say that the degree of truth of the statement "Parthiban is YOUNG" is 0.50. Note: Membership functions almost never have as simple a shape as age(x). They will at least tend to be triangles pointing up, and they can be much more complex than that. Furthermore, membership functions so far is discussed as if they always are based on a single criterion, but this isn't always the case, although it is the most common case. One could, for example, want to have the membership function for YOUNG depend on both a person's age and their height (Arosha's short for his age). This is perfectly legitimate, and occasionally used in practice. It's referred to as a two-dimensional membership function. It's also possible to have even more criteria, or to have the membership function depend on elements from two completely different universes of discourse. Fuzzy Set Operations. Union The membership function of the Union of two fuzzy sets A and B with membership functions and respectively is defined as the maximum of the two individual membership functions. This is called the maximum criterion. The Union operation in Fuzzy set theory is the equivalent of the OR operation in Boolean algebra. Intersection The membership function of the Intersection of two fuzzy sets A and B with membership functions and respectively is defined as the minimum of the two individual membership functions. This is called the minimum criterion. The Intersection operation in Fuzzy set theory is the equivalent of the AND operation in Boolean algebra. Complement The membership function of the Complement of a Fuzzy set A with membership function is defined as the negation of the specified membership function. This is caleed the negation criterion. The Complement operation in Fuzzy set theory is the equivalent of the NOT operation in Boolean algebra. The following rules which are common in classical set theory also apply to Fuzzy set theory. De Morgans law , Associativity Commutativity Distributivity Glossary Universe of Discourse The Universe of Discourse is the range of all possible values for an input to a fuzzy system. Fuzzy Set A Fuzzy Set is any set that allows its members to have different grades of membership (membership function) in the interval [0,1]. Support The Support of a fuzzy set F is the crisp set of all points in the Universe of Discourse U such that the membership function of F is non-zero. Crossover point The Crossover point of a fuzzy set is the element in U at which its membership function is 0.5. Fuzzy Singleton A Fuzzy singleton is a fuzzy set whose support is a single point in U with a membership function of one. Fuzzy Rules Human beings make descisions based on rules. Although, we may not be aware of it, all the descisions we make are all based on computer like if-then statements. If the weather is fine, then we may decide to go out. If the forecast says the weather will be bad today, but fine tommorow, then we make a descision not to go today, and postpone it till tommorow. Rules associate ideas and relate one event to another. Fuzzy machines, which always tend to mimic the behaviour of man, work the same way. However, the descision and the means of choosing that descision are replaced by fuzzy sets and the rules are replaced by fuzzy rules. Fuzzy rules also operate using a series of if-then statements. For instance, if X then A, if y then b, where A and B are all sets of X and Y. Fuzzy rules define fuzzy patches, which is the key idea in fuzzy logic. A machine is made smarter using a concept designed by Bart Kosko called the Fuzzy Approximation Theorem(FAT). The FAT theorem generally states a finite number of patches can cover a curve as seen in the figure below. If the patches are large, then the rules are sloppy. If the patches are small then the rules are fine. Fuzzy Patches In a fuzzy system this simply means that all our rules can be seen as patches and the input and output of the machine can be associated together using these patches. Graphically, if the rule patches shrink, our fuzzy subset triangles gets narrower. Simple enough? Yes, bcause even novices can build control systems that beat the best math models of control theory. Naturally, it is math-free system. Fuzzy Control Fuzzy control, which directly uses fuzzy rules is the most important application in fuzzy theory. Using a procedure originated by Ebrahim Mamdani in the late 70s, three steps are taken to create a fuzzy controlled machine: 1)Fuzzification(Using membership functions to graphically describe a situation) 2)Rule evaluation(Application of fuzzy rules) 3)Defuzzification(Obtaining the crisp or actual results) As a simple example on how fuzzy controls are constructed, consider the following classic situation: the inverted pendulum. Here, the problem is to balance a pole on a mobile platform that can move in only two directions, to the left or to the right. The angle between the platform and the pendulum and the angular velocity of this angle are chosen as the inputs of the system. The speed of the platform hence, is chosen as the corresponding output. Step 1 First of all, the different levels of output (high speed, low speed etc.) of the platform is defined by specifying the membership functions for the fuzzy_sets. The graph of the function is shown below Similary, the different angles between the platform and the pendulum and... the angular velocities of specific angles are also defined Note: For simplicity, it is assumed that all membership functions are spreaded equally. Hence, this explains why no actual scale is included in the graphs. Step 2 The next step is to define the fuzzy rules. The fuzzy rules are mearly a series of if-then statements as mentioned above. These statements are usually derived by an expert to achieve optimum results. Some examples of these rules are: i) If angle is zero and angular velocity is zero then speed is also zero. ii) If angle is zero and angular velocity is low then the speed shall be low. The full set of rules is summarised in the table below. The dashes are for conditions, which have no rules ascociated with them. This is don eto simplify the situation. Angle Speed negative negative positive positive ------------ zero high low low high v negative ----------- negative ----------- --------- --------- high - high ----- negative e negative --------- --------- zero -------- low low negative negative positive positive l zero zero high low low high o positive ---------- --------- zero low --------- low - ---- C --------- --------- high ---------- --------- ---- -- An application of these rules is shown using specific values for angle and angular velocities. The values used for this example are 0.75 and 0.25 for zero and positive-low angles, and 0.4 and 0.6 for zero and negative-low angular velocities. These points sre on the graphs below. Consider the rule "if angle is zero and angular velocity is zero, the speed is zero". The actual value belongs to the fuzzy set zero to a degree of 0.75 for "angle" and 0.4 for "angular velocity". Since this is an AND operation, the minimum criterion is used , and the fuzzy set zero of the variable "speed" is cut at 0.4 and the patches are shaded up to that area. This is illustrated in the figure below. Similarly, the minimum criterion is used for the other three rule. The following figures show the result patches yielded by the rule "if angle is zero and angular velocity is negative low, the speed is negative low", "if angle is positive low and angular velocity is zero, then speed is positive low" and "if angle is positive low and angular velocity is negative low, the speed is zero". The four results overlaps and is reduced to the following figure Step 3: The result of the fuzzy controller as of know is a fuzzy set (of speed). In order to choose an appropriate representative value as the final output(crisp values), defuzzification must be done. There are numerous defuzzification methods, but the most common one used is the center of gravity of the set as shown below. Fuzzy logic has rapidly become one of the most successful of today's technologies for developing sophisticated control systems. The reason for which is very simple. Fuzzy logic addresses such applications perfectly as it resembles human decision making with an ability to generate precise solutions from certain or approximate information. It fills an important gap in engineering design methods left vacant by purely mathematical approaches (e.g. linear control design), and purely logic-based approaches (e.g. expert systems) in system design. While other approaches require accurate equations to model real-world behaviors, fuzzy design can accommodate the ambiguities of real- world human language and logic. It provides both an intuitive method for describing systems in human terms and automates the conversion of those system specifications into effective models. What does it offer? The first applications of fuzzy theory were primaly industrial, such as process control for cement kilns. However, as the technology was further embraced, fuzzy logic was used in more useful applications. In 1987, the first fuzzy logic-controlled subway was opened in Sendai in northern Japan. Here, fuzzy-logic controllers make subway journeys more comfortable with smooth braking and acceleration. Best of all, all the driver has to do is push the start button! Fuzzy logic was also put to work in elevators to reduce waiting time. Since then, the applications of Fuzzy Logic technology have virtually exploded, affecting things we use everyday. Take for example, the fuzzy washing machine . A load of clothes in it and press start, and the machine begins to churn, automatically choosing the best cycle. The fuzzy microwave, Place chili, potatoes, or etc in a fuzzy microwave and push single button, and it cooks for the right time at the proper temperature. The fuzzy car, manuvers itself by following simple verbal instructions from its driver. It can even stop itself when there is an obstacle immedeately ahead using sensors. But, practically the most exciting thing about it, is the simplicity involved in operating it. CHAPTER 3 PROPOSITIONAL LOGIC This chapter is intended for the reader who is more deeply interested in automated logic. Most individuals may skip this chapter and proceed directly on to Chapter 4. 3.1 Introduction A logic is a mathematical tool for constructing and manipulating symbolic expressions. A logic is like a computer language. In this chapter propositional and predicate logic are mainly discussed. A logic consists of 1. A formal system of representation. 2. Syntax of language that describes how to make sentences. 3. Semantics of language that describe the meaning and relationship between sentences. 4. Proof Theory: rules for deducing the entailment of a sentence. 3.2 Propositional Logic Propositional logic is a representational language that makes the assumption that the world can be represented solely in terms of propositions that are true or false. Propositional logic considers each sentence as a proposition. The syntax and semantics of such a logic are given below. 3.2.1 Syntax for Propositional Logic The syntax for propositional logic is quite simple. Symbols for propositional logic are the propositional constants True and False, propositional symbols such as P, Q, R, S and logic connectives such as , , ,¬ and . The following rules are used while constructing a sentence. 1. The logical constants True or False are themselves a sentence 2. Propositional symbols such as P, Q, R are themselves a sentence. 3. A sentence can be formed by using the following symbols. This is called conjunction and is used for (and) constructing a sentence like P Q This is called disjunction and is used for (or) constructing a sentence like P Q This is called not and is used in constructing ¬ (negation) sentence like ¬P This is used in constructing sentences like P Q (implication) and is equivalent to ¬P Q These are used in constructing a sentence like P (equivalent) Q which implies P is equivalent to Q Example 3.1 An example syntax for propositional logic is given by the following two sentences: "The road is closed. If the road is closed, then the traffic is blocked" These sentences maybe represented in propositional logic. "The road is closed" is represented by a proposition, P. "The traffic is blocked" is represented by a proposition, Q. Then, the second sentence "If the road is blocked, then the traffic is closed" is represented by P Q. 3.2.2 Semantics in Propositional Logic The semantic of a sentence gives the meaning of the sentence. The terms used for semantics of a language are given below. Valid A sentence is valid if it is true for all interpretations, e.g. P Q Q is a valid sentence as can be seen from the truth table below P Q Q or P Q (¬(P Q ) Q) F F T F T T T F T T T T Model An interpretation of a formula or sentence under which the formula is true is called a model of that formula. Unsatisfiable A formula is said to be unsatisfiable if it is false for every interpretation. Satisfiable A formula is said to be satisfiable if it is true for some interpretation, i.e., there exists a model. Entailment If there exists an interpretation when the knowledge base (KB) is true and a proposition, f, is also true, then we say KB entails f (KB f). 3.3 Rules of Inference There are mainly two inference procedures: 1. The model theory method. 2. The proof theory method. 3.3.1 Model Theory Method The Model Theory method uses the entailment procedure. To find out whether a new sentence or proposition is true for some sentence given in the knowledge base, we check whether KB f. Example 3.2 For example, if P Q and P are the facts in the knowledge base, then to check whether KB f, the truth table for every interpretation of the proposition can be written as PQP Q F F T F T T T F F T T T Then, check whether Q is true when every sentence in the database is true. If every sentence is true, then KB entails Q. In the example above, Q is entailed by KB because it is true when P and P Q are true (as shown in the last row of the table above). The model theory method is exponential in the number of proposition because we need to write 2n interpretations for n propositions. The proof theory is used because it is much faster and reasonable than the model theory method. 3.3.2 Proof Theory Method The inference rules of the proof theory method are as follows: Rules Sentence in KB Sentence that is inferred Modus Ponens P, P Q Q And Elimination A1 A2 ........ AN A1, A2 ,........AN And Introduction A1, A2 ,........AN A1 A2 ........ AN Or Introduction A1 A2 ........ AN A1 A2 ........ AN Double Negation ¬¬A A Unit Resolution A1 A2, ¬A2 A1 Resolution A1 A2, ¬A2 A3 A1 A3 Example 3.3 Let's take an example of the proof theory method. If P, P Q are the facts in the database (same as Example 3.2), then using modus ponens Q is also true Example 3.4 If A B and ¬B C is given, then we can prove A C using the resolution principle (given in the table above) 3.4 Automated Theorem Proving in Propositional Logic In automated theorem proving all the sentences in the knowledge base are presented in the form of P1 P2 ...... Pn Q where Pi and Q are the propositional variables. The facts in the knowledge base are represented without antecedents. They are written as propositional symbols. 3.5 Goal Reduction Method Goal reduction is a method for proving conjunction of propositional variables from the given rules. The conjuncts of the conjunctions are the goal. Suppose, Q is the goal and rule is P1 P2 ...... Pn Q, then proving P1 P2 ...... Pn proves Q. 3.6 Algorithms for Theorem Proving Let n be the number of propositions in the antecedents of the rules in the knowledge base that are to be proved, and P1, P2, ....,Pk, ....Pn, Q1, Q2,......,Qm are the propositions. 1. If n=0, then stop as the theorem is proved. 2. If n>0, then choose some k such that 1 k n. 3. If Pk is a fact in the knowledge base, then (recursively) try to prove P1 P2 ...... Pk-1 Pk+1.... Pn. 4. If Pk is not a fact in the knowledge base, then try to find some rule in the knowledge base of the form Q1 Q2 ...... Qm Pk such that Q1 Q2 ........ Qm P1 P2 ........ Pk+1 Pn can be (recursively) proved. 5. If Pk is not a fact and an appropriate rule cannot be found in step 4, then stop and failure. Example 3.5 The above algorithm can be explained in a simple manner. Assume the following two rules and three facts are given in the knowledge base Rule 1: P Q R Rule 2: S T Q Facts: P, S, T If R is to be proved, then the automated theorem proving procedure works as follows: o As R is not a fact in the KB, it is necessary to prove the first rule (P Q R), i.e., step 4 above. To prove Rule 1, P and Q need to be proved. Since P is already a fact in the knowledge base, hence only Q needs to be proved, i.e., step 3 above. o Because Q is not a fact in the database, the algorithm needs to prove that S T Q according to Rule 2 using step 4 again. As S and T are facts in the knowledge base, Q is proved via step 3. o Since both P and Q are now proved, the algorithm stops as the theorem is proved in this example pursuant to step 1. 3.7 References 1. T. Dean, J. Allen, Y. Aloimonos, Artificial Intelligence: Theory and Practice, Benjamin/Cummings Publishing Company, Inc, 1995. 2. S. J. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, Englewood Cliffs, N.J.: Prentice Hall, 1995. This page was last updated on CHAPTER 4 FUZZIFICATION TECHNIQUE 4.1 Introduction Fuzzification is the process of changing a real scalar value into a fuzzy value. This is achieved with the different types of fuzzifiers. There are generally three types of fuzzifiers, which are used for the fuzzification process; they are 1. singleton fuzzifier, 2. Gaussian fuzzifier, and 3. trapezoidal or triangular fuzzifier. 4.2 Trapezoidal / Triangular Fuzzifiers For the simplicity of discussion only the triangular and trapezoidal fuzzifiers are presented here. Fuzzification of a real-valued variable is done with intuition, experience and analysis of the set of rules and conditions associated with the input data variables. There is no fixed set of procedures for the fuzzification. Example 4.1 Consider a class with 10 students of different heights in the range of 5 feet to 6 feet 2 inches. Intuition is used to fuzzify this scalar quantity into the fuzzy or linguistic variables tall, short and medium height. The membership function associated with each scalar quantity as defined by intuition is (4.1) (4.2) (4.3) where h is the height, and subscript s denotes short, m denotes medium and t denotes tall. A graphical representation of the membership function of height is shown in Figure 4.1. Figure 4.1 Membership functions for student height Table 4.1 gives the height of the 10 students with the membership function associated with each fuzzy variable, i.e., tall, short and medium for each student. Let's consider a specific student: Edward. From Equations 4.1, 4.2 and 4.3, or Table 4.1 the membership value of each fuzzy set for Edward is determined as µs(5.4') = µm(5.4') = µt(5.4') = 0 0.5 0 It can be inferred from above result that Edward is medium by 50 %, short by 0 %, and tall by 0 %. Table 4.1 Membership functions of the height Student Height Student µshort µmedium µtall Name (feet) 1 John 5.4 0 0.5 0 2 Cathy 5.8 0 0 1 3 Lisa 6.0 0 0 1 4 Ajay 5.0 1 0 0 5 Ram 5.7 0 0 0.5 6 Edward 5.4 0 0.5 0 7 Peter 5.2 1 0 0 8 Victor 5.0 1 0 0 9 Chris 6.2 0 0 1 10 Sam 5.9 0 0 1 In general, the triangular membership function can be specified from the formula below: (4.4) where L and R are the left and right bounds, respectively, and C is the center of the symmetric triangle as shown in Figure 4.2a. Likewise, the trapezoidal membership may be expressed as (4.5) where L and U are the lower and upper bounds, respectively, C is the center, and W is the width of the top side of the symmetric trapezoid as shown in Figure 4.2b. a. Triangular b. Trapezoidal Figure 4.2 Common membership functions Example 4.2 To demonstrate the implementation of these two functions, an Excel spreadsheet (Fuzzification.xls) has been created for the reader to download and experiment with. The Excel file imbeds the formulae for computing membership values (µ) for both the triangular and trapezoidal fuzzifiers. The user may define the boundaries of the functions and enter hypothetical x values from which Excel will calculate the corresponding membership value. 4.3 Remarks The fuzzification of the input variable should be realistic. Experience and different procedures should be followed while designing a large fuzzy system for the realistic and accurate output. The wrong fuzzification of the input variable(s) might cause instability and error in the system. 4.4 Reference 1. L. H. Tsoukalas, R. E. Uhrig, Fuzzy and Neural Approaches in Engineering, John Wiley & Sons, Inc. 1993. This page was last updated on CHAPTER 5 FUZZY RULES AND IMPLICATION 5.1 Introduction Fuzzy systems are built to replace the human expert with a machine using the logic a human would use to perform the tasks. Suppose we ask someone how hot it is today. He may tell us that it is hot, moderately hot or cold. He cannot tell us the exact temperature. Unlike classical logic which can only interpret the crisp set such as hot or cold, fuzzy logic has the capability to interpret the natural language. Thus, fuzzy logic can make human-like interpretations and is a very useful tool in artificial intelligence, machine learning and automation. Fuzzy logic operates on the basis of rules which are expressed in the form of If-Then constructs, also know as horn clauses. The concept of linguistic variable was introduced to process the natural language. The linguistic variable discussed later in Example 5.1 is temperature. The linguistic variable can take the verbal values such as hot, moderately hot or cold. The terms temperature is hot and temperature is cold and temperature is moderate are known as fuzzy propositions. 5.2 Fuzzy Proposition A fuzzy proposition can be an atomic or compound sentence. For example "Temperature is hot" is an atomic fuzzy proposition. "Temperature is hot and humidity is low" is a compound fuzzy proposition. Compound fuzzy relations are expressed with fuzzy connectives such as and, or and complement. 5.3 Syntax for IF and THEN rules The fuzzy rules are written as If <fuzzy proposition> then <fuzzy proposition> The fuzzy proposition can be atomic or compound. 5.4 Method of Implication The If-Then rules can be interpreted in classical logic by the implication operators. This was also discussed in Chapter 3. Suppose there is a statement such as "If a Then b", then the classical set represents this by a b. The truth table for this rule can be given as a b a b or F F T F T T T F F T T T The implication operator can also be written as (5.1) The above equivalence can easily be shown with the above truth table. As discussed earlier, the If-Then rules for fuzzy logic can be written as If <fuzzy proposition> Then <fuzzy proposition>. The propositional variables a and b are replaced by fuzzy propositions, and the implication can be replaced by fuzzy union, fuzzy intersection and fuzzy complement. There are many fuzzy implications. Only the two most important fuzzy implications are discussed here. 5.5 Mamdani Min Implication Mamdani proposed a fuzzy implication rule for fuzzy control in 1977. It is a simplified version of Zadeh implication operator (Zadeh, 1973). The Mamdani fuzzy logic operator is given as (5.2) The above rule is clarified in the example below. Example 5.1 Let temperature be the fuzzy variable, and let one of the rules be "If the temperature is hot or temperature is moderately hot, then the ice cream parlor is crowded." Here the propositions are "temperature is hot or temperature is moderately hot" and "ice cream parlor is crowded". The linguistic variables are "temperature" and "ice cream parlor". The linguistic values for temperature are hot, moderately hot and cold. The membership function for temperature in the universe of discourse, U, is given below (5.3) (5.4) (5.5) The ice cream parlor variable can take the linguistic values crowded and unfilled. The membership function in the universe of discourse, V, is given as (5.6) (5.7) where c represent crowded and nc represent unfilled. The plots for the membership functions of temperature and number of customers in the ice cream palor are shown in Figures 5.1 and 5.2, respectively. Figure 5.1 Membership function of temperature Figure 5.2 Membership function of number of customers To apply the Mamdani implication rule for the above example, the following rule is applied 1. The or connective is replaced with the max (union) operator. 2. The Maximum of the two membership functions is evaluated for the antecedent part of the fuzzy rules. 3. The Mamdani Implication operator (i.e., min operator) is applied between the resulting antecedent membership function and the consequent membership function. Suppose the temperature is 75 °F. The or connective is replaced with the union operator. Using the fuzzification union operator of Equation 2.7 yields µTemp(75 F) = µHot µModeratelyHot = µHot µModeratelyHot = max[µHot(75 F), µModeratelyHot(75 F)] = max = max[0.167, 0.833] = 0.833 where µHot(75 F) and µModeratelyHot(75 F) are computed using Equations 5.3 and 5.4, respectively. This is shown graphically in Figure 5.3. Figure 5.3 Application of union operator for the given rule The Mamdani implication operator of Equation 5.2 The Mamdani implication operator of Equation 5.2 is now applied to the rule antecedent and the rule consequent, which in this example is a crowded ice cream parlor. [µTemp(75 F), µc(Number of Customers)] = µTemp(75 F) µc(Number of Customers) = min[µTemp(75 F), µc(Number of Customers)] = min [0.833, µc(Number of Customers)] where µc(Number of Customers) is shown in Figure 5.2. The dotted line in Figure 5.4 is the output after the Mamdani implication rule is applied, that is, at each point along the ordinate axis the minimum of the crowded membership function and the value of µTemp(75 F) is taken. Figure 5.4 Membership function of customer after Mamdani implication rule. 5.6 Larsen Product Implication The Larsen product implication is given by (5.8) It uses the arithmetic product between the two membership functions in the universe of discourses U and V. Example 5.2 Here the Larsen implication rule is applied to the data of Example 5.1. The overall antecedent of the fuzzy rule is the maximum of the two fuzzy propositions of the antecedent as calculated in Example 5.1. The Larsen product implication rule can be expressed as [µTemp(75 F), µncust(y)] = µTemp(75 F) · µc(Number of Customers) = 0.833 · µc(Number of Customers) where µc(Number of Customers) is shown in Figure 5.2. The dotted line in Figure 5.5 is the output after the Larsen implication rule is applied, that is, at each point along the ordinate the product of the temperature antecedent and crowded ice cream palor membership function is computed. Figure 5.5 Larsen Implication rule applied to Example 5.2. The Larsen implication is computationally difficult than the Mamdani implication rule. In Example 5.1 and Example 5.2 only one rule has been considered for the case of simplicity. There may be many rules for one fuzzy system. In these cases, either Mamdani implication or Larsen implication rule is applied for each rule. The resultant output from the implication rule is then aggregated and defuzzified to obtain the result. The aggregation and defuzzification is discussed in Chapter 6. 5.7 Remarks The choice of fuzzy implication rule is very important while designing a fuzzy control system. Only the two most commonly used implication operators have been considered here for simplicity. There are however many more implication operators that can be applied while designing a fuzzy control system. CHAPTER 6 DEFUZZIFICATION TECHNIQUE 6.1 Introduction Fuzzy logic is a rule-based system written in the form of horn clauses (i.e., if-then rules). These rules are stored in the knowledge base of the system. The input to the fuzzy system is a scalar value that is fuzzified. The set of rules is applied to the fuzzified input. The output of each rule is fuzzy. These fuzzy outputs need to be converted into a scalar output quantity so that the nature of the action to be performed can be determined by the system. The process of converting the fuzzy output is called defuzzification. Before an output is defuzzified all the fuzzy outputs of the system are aggregated with an union operator. The union is the max of the set of given membership functions and can be expressed as (6.1) There are many defuzzification techniques but primarily only three of them are in common use. These defuzzification techniques are discussed below in detail. 6.2 Maximum Defuzzification Technique This method gives the output with the highest membership function. This defuzzification technique is very fast but is only accurate for peaked output. This technique is given by algebraic expression as for all x X (6.2) where x* is the defuzzified value. This is shown graphically in Figure 6.1. Figure 6.1 Max-membership defuzzification method 6.3 Centroid Defuzzification Technique This method is also known as center of gravity or center of area defuzzification. This technique was developed by Sugeno in 1985. This is the most commonly used technique and is very accurate. The centroid defuzzification technique can be expressed as (6.3) where x* is the defuzzified output, µi(x) is the aggregated membership function and x is the output variable. The only disadvantage of this method is that it is computationally difficult for complex membership functions. This method is illustrated in Example 8.3. 6.4 Weighted Average Defuzzification Technique In this method the output is obtained by the weighted average of the each output of the set of rules stored in the knowledge base of the system. The weighted average defuzzification technique can be expressed as (6.4) where x* is the defuzzified output, mi is the membership of the output of each rule, and wi is the weight associated with each rule. This method is computationally faster and easier and gives fairly accurate result. This defuzzification technique is applied in fuzzy application of signal validation in Example 7.3 and fuzzy application on power. 6.5 Reference 1. T. J. Ross, Fuzzy Logic with Engineering Applications, McGraw- Hill, Inc, 1995. This page was last updated on What Is A Neural Network? : "...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs. ANNs are processing devices (algorithms or actual hardware) that are loosely modeled after the neuronal structure of the mamalian cerebral cortex but on much smaller scales. A large ANN might have hundreds or thousands of processor units, whereas a mamalian brain has billions of neurons with a corresponding increase in magnitude of their overall interaction and emergent behavior. Although ANN researchers are generally not concerned with whether their networks accurately resemble biological systems, some have. For example, researchers have accurately simulated the function of the retina and modeled the eye rather well. . The Basics of Neural Networks Neural neworks are typically organized in layers. Layers are made up of a number of interconnected 'nodes' which contain an 'activation function'. Patterns are presented to the network via the 'input layer', which communicates to one or more 'hidden layers' where the actual processing is done via a system of weighted 'connections'. The hidden layers then link to an 'output layer' where the answer is output as shown in the graphic below. Most ANNs contain some form of 'learning rule' which modifies the weights of the connections according to the input patterns that it is presented with. In a sense, ANNs learn by example as do their biological counterparts; a child learns to recognize dogs from examples of dogs. Although there are many different kinds of learning rules used by neural networks, this demonstration is concerned only with one; the delta rule. The delta rule is often utilized by the most common class of ANNs called 'backpropagational neural networks' (BPNNs). Backpropagation is an abbreviation for the backwards propagation of error. With the delta rule, as with other types of backpropagation, 'learning' is a supervised process that occurs with each cycle or 'epoch' (i.e. each time the network is presented with a new input pattern) through a forward activation flow of outputs, and the backwards error propagation of weight adjustments. More simply, when a neural network is initially presented with a pattern it makes a random 'guess' as to what it might be. It then sees how far its answer was from the actual one and makes an appropriate adjustment to its connection weights. More graphically, the process looks something like this: Note also, that within each hidden layer node is a sigmoidal activation function which polarizes network activity and helps it to stablize. Backpropagation performs a gradient descent within the solution's vector space towards a 'global minimum' along the steepest vector of the error surface. The global minimum is that theoretical solution with the lowest possible error. The error surface itself is a hyperparaboloid but is seldom 'smooth' as is depicted in the graphic below. Indeed, in most problems, the solution space is quite irregular with numerous 'pits' and 'hills' which may cause the network to settle down in a 'local minum' which is not the best overall solution. Since the nature of the error space can not be known a prioi, neural network analysis often requires a large number of individual runs to determine the best solution. Most learning rules have built-in mathematical terms to assist in this process which control the 'speed' (Beta- coefficient) and the 'momentum' of the learning. The speed of learning is actually the rate of convergence between the current solution and the global minimum. Momentum helps the network to overcome obstacles (local minima) in the error surface and settle down at or near the global miniumum. Once a neural network is 'trained' to a satisfactory level it may be used as an analytical tool on other data. To do this, the user no longer specifies any training runs and instead allows the network to work in forward propagation mode only. New inputs are presented to the input pattern where they filter into and are processed by the middle layers as though training were taking place, however, at this point the output is retained and no backpropagation occurs. The output of a forward propagation run is the predicted model for the data which can then be used for further analysis and interpretation. It is also possible to over-train a neural network, which means that the network has been trained exactly to respond to only one type of input; which is much like rote memorization. If this should happen then learning can no longer occur and the network is refered to as having been "grandmothered" in neural network jargon. In real-world applications this situation is not very useful since one would need a separate grandmothered network for each new kind of input. How Do Neural Networks Differ From Conventional Computing? To better understand artificial neural computing it is important to know first how a conventional 'serial' computer and it's software process information. A serial computer has a central processor that can address an array of memory locations where data and instructions are stored. Computations are made by the processor reading an instruction as well as any data the instruction requires from memory addresses, the instruction is then executed and the results are saved in a specified memory location as required. In a serial system (and a standard parallel one as well) the computational steps are deterministic, sequential and logical, and the state of a given variable can be tracked from one operation to another. In comparison, ANNs are not sequential or necessarily deterministic. There are no complex central processors, rather there are many simple ones which generally do nothing more than take the weighted sum of their inputs from other processors. ANNs do not execute programed instructions; they respond in parallel (either simulated or actual) to the pattern of inputs presented to it. There are also no separate memory addresses for storing data. Instead, information is contained in the overall activation 'state' of the network. 'Knowledge' is thus represented by the network itself, which is quite literally more than the sum of its individual components. What Applications Should Neural Networks Be Used For? Neural networks are universal approximators, and they work best if the system you are using them to model has a high tolerance to error. One would therefore not be advised to use a neural network to balance one's cheque book! However they work very well for: capturing associations or discovering regularities within a set of patterns; where the volume, number of variables or diversity of the data is very great; the relationships between variables are vaguely understood; or, the relationships are difficult to describe adequately with conventional approaches. What Are Their Limitations? There are many advantages and limitations to neural network analysis and to discuss this subject properly we would have to look at each individual type of network, which isn't necessary for this general discussion. In reference to backpropagational networks however, there are some specific issues potential users should be aware of. Backpropagational neural networks (and many other types of networks) are in a sense the ultimate 'black boxes'. Apart from defining the general archetecture of a network and perhaps initially seeding it with a random numbers, the user has no other role than to feed it input and watch it train and await the output. In fact, it has been said that with backpropagation, "you almost don't know what you're doing". Some software freely available software packages (NevProp, bp, Mactivation) do allow the user to sample the networks 'progress' at regular time intervals, but the learning itself progresses on its own. The final product of this activity is a trained network that provides no equations or coefficients defining a relationship (as in regression) beyond it's own internal mathematics. The network 'IS' the final equation of the relationship. Backpropagational networks also tend to be slower to train than other types of networks and sometimes require thousands of epochs. If run on a truly parallel computer system this issue is not really a problem, but if the BPNN is being simulated on a standard serial machine (i.e. a single SPARC, Mac or PC) training can take some time. This is because the machines CPU must compute the function of each node and connection separately, which can be problematic in very large networks with a large amount of data. However, the speed of most current machines is such that this is typically not much of an issue. What Are Their Advantages Over Conventional Techniques? Depending on the nature of the application and the strength of the internal data patterns you can generally expect a network to train quite well. This applies to problems where the relationships may be quite dynamic or non-linear. ANNs provide an analytical alternative to conventional techniques which are often limited by strict assumptions of normality, linearity, variable independence etc. Because an ANN can capture many kinds of relationships it allows the user to quickly and relatively easily model phenomena which otherwise may have been very difficult or imposible to explain otherwise. What and why? Neural Networks: a bottom-up attempt to model the functionality of the brain. Two main areas of activity: Biological o Try to model biological neural systems Computational o Artificial neural networks are biologically inspired butnot necessarily biologically plausible o So may use other terms: Connectionism, Parallel Distributed Processing, Adaptive Systems Theory. ATTRACTIVE PROPERTIES OF NEURAL NETWORKS Parallelism Neural Networks are inherently parallel and naturally amenable to expression in a parallel notation and implementation on parallel hardware. Capacity for Adaptation In general, neural systems are capable of learning. Some networks have the capacity to self-organise, ensuring their stability as dynamic systems. A self-organising network can take account of a change in the problem that it is solving, or may learn to resolve the problem in a new manner. Distributed Memory In neural networks 'memory' corresponds to an activation map of the neurons. Memory is thus distributed over many units giving resistance to noise. In distributed memories, such as neural networks, it is possible to start with noisy data and to recall the correct data. Fault Tolerance Distributed memory is also responsible for fault tolerance. In most neural networks, if some PEs are destroyed, or their connections altered slightly, then the behaviour of the network as a whole is only slightly degraded. The characteristic of graceful degradation makes neural computing systems extremely well suited for applications where failure of control equipment means disaster. Capacity for Generalisation Designers of Expert Systems have difficulty in formulation rules which encapsulate an experts knowledge in relation to some problem. A neural system may learn the rules simply from a set of examples. The generalisation capacity of a neural network is its capacity to give a satisfactory response for an input which is not part of the set of examples on which it was trained. The capacity for generalisation is an essential feature of a classification system Certain aspects of generalisation behaviour are interesting because they are intuitively quite close to human generalisation. Ease of Construction Computer simulations of small applications can be implemented relatively quickly. LIMITATIONS IN THE USE OF NEURAL NETWORKS Neural systems are inherently parallel but are normally simulated on a sequential machines. o Processing time can rise quickly as the size of the problem grows - The Scaling Problem o However, a direct hardware approach would lose the flexibility offered by a software implementation. o In consequence, neural networks have been used to address only small problems. The performance of a network can be sensitive to the quality and type of preprocessing of the input data. Neural networks cannot explain the results they obtain; their rules of operation are completely unknown. Performance is measured by statistical methods giving rise to distrust on the part of potential users. Many of the design decisions required in developing an application are not well understood. CHAPTER 2 FUZZY LOGIC AND CLASSICAL LOGIC 2.1 Introduction Fuzzy logic is the comprehensive form of classical logic. In this chapter classical logic and fuzzy logic are discussed and the distinction between them analyzed. Fuzzy logic is the superset of classical logic with the introduction of "degree of membership." The introduction of degree of membership allows the input to interpolate between the crisp set. The operators in both logic are similar except that their interpretation differs. 2.2 Classical Logic Let X be the universe of discourse. Let the elements contained in X be defined by x. Let us consider A and B to be the sets which contain the element in the universe of discourse, X. The basic operators in classical theory are 2.3 Properties of Classical Sets The important set operators and relations include: 2.4 Mapping of Classical Set to Fuzzy Set Classical logic interpolates the input into a crisp set. Every element in the universe of discourse, X, either belongs to a set or does not belong to the set. For example, the element in the universe of discourse, X, belongs to the set A or does not belong to A can be represented by the function (2.1) The above function is also called the characteristic function. The output is 1 if the element, x, belongs to set A, and 0 if the element, x, does not belong to set A. 2.5 Fuzzy Sets Unlike classical set theory that classifies the elements of the set into crisp set, fuzzy set has an ability to classify elements into a continuous set using the concept of degree of membership. The characteristic function or membership function not only gives 0 or 1 but can also give values between 0 and 1. Example 2.1 Consider the outside ambient temperature. Classical set theory can only classify the temperature as hot or cold (i.e., either 1 or 0). It cannot interpret the temperature between 20 °F and 100 °F. In other words, the characteristic function for the classical logic for the above example is given by (2.2) The boundary 50 °F is taken because classical logic cannot interpret intermediate values. On the other hand, fuzzy logic solves the above problem with a membership function as given by (2.3) The above membership function is shown in Table 2.1. A graph of the membership function for the fuzzy temperature variable is shown in Figure 2.1. The degree of coldness is taken as the complement of the degree of hotness. Table 2.1 Membership function of temperature Temperature Degree of Degree of (°F) Hotness Coldness 20 0 1 30 0.13 0.87 40 0.25 0.75 50 0.375 0.625 60 0.5 0.5 70 0.625 0.375 80 0.75 0.25 90 0.875 0.125 100 1 0 Figure 2.1 Membership function for the degree of hotness and degree of coldness The degree of hotness for 30 °F is 0.13 and degree of coldness for 30 °F is 0.87. This means that 30 °F is hot by 13 percent and cold by 87 percent. 2.6 Fuzzy Set Representation The common method of representing fuzzy set is (2.4) where x is an element in X and µA(x) is the membership function of set A which defines the membership of fuzzy set A in the universe of discourse, X. The term {x, µA(x)} is the singleton pair. For the temperature example described above the fuzzy set can be represented as hot = {(20, 0), (30, 0.125), (40, 0.25), (50, 0.375), (60, 0.5), (70, 0.625), (80, 0.75), (90, 0.875), (100, 1)}. In the above fuzzy set the third element of the set hot denotes that the temperature 40 °F belongs to set hot by 0.25. An alternative method to represent the singleton function is (2.5) The above representation is for the discrete universe of discourse. The fuzzy set representation for the continuous membership function is given by (2.6) 2.7 Fuzzy Operators Some of the most important fuzzy logic operators are given below. Union The union is the maximum degree of membership of sets A and B. (2.7) Intersection The intersection is the minimum degree of membership of sets A and B. (2.8) Complement The complement of the membership of set A is (2.9) Product of two fuzzy sets The product of two fuzzy sets in the same universe of discourse is the new fuzzy set A·B with a membership function that equals product of the membership function of A and the membership function of B. µA·B(x) = µA(x)·µB(x) (2.10) Multiplying a fuzzy set by a crisp number When a fuzzy set is multiplied by a crisp number, then its membership function is given by µa·A(x) = a µA(x) (2.11) Power of fuzzy set The membership function of where is a positive number is defined by (2.12) Concentration of the fuzzy set The concentration of the fuzzy set over the universe of discourse X is given by µCON(A)(x) = [µA(x)]2 (2.13) Concentrating the fuzzy set decreases the fuzziness. That means the membership function can interpolate less input between the crisp set. Dilation of the fuzzy set The dilation of the fuzzy set over the universe of discourse X is given by (2.14) Dilating the fuzzy set increases the fuzziness of the set. That means the membership function can interpolate more input between the crisp set. Empty fuzzy set If the fuzzy set is empty (Ø), then the membership function is zero. (2.15) Normal fuzzy set The fuzzy set is called normal if there is at least one element x0 in the universe of discourse X where the membership function equals 1. µA(x0) = 1 (2.16) Equality of fuzzy sets The fuzzy sets A and B are equal if the fuzzy set of A is equal to the fuzzy set B. µA(x) = µB(x) (2.17) Example 2.2 Let us consider two fuzzy sets A and B with membership functions (2.18) The plot of the membership functions µA(x) and µB(x) are shown in Figure 2.2. The union, intersection, complements, concentration and dilation of these two membership functions are shown in Figures 2.3 through 2.7, respectively. Figure 2.2 Membership functions µA(x) and µB(x) of Example 2.2. Figure 2.3 Union of membership functions, = max (µA, µB). Figure 2.4 Intersection of membership functions, Figure 2.5 Complement of membership function, . Figure 2.6 Concentration of membership function, µCON(A)(x) = [µA(x)]2. Figure 2.7 Dilation of membership function, . 2.8 Remark Fuzzy logic is the comprehensive version of classical logic. The understanding of classical logic is very important for the understanding of fuzzy logic. The basic difference between classical logic and fuzzy logic is that classical logic gives an output as either 0 or 1 but fuzzy logic can give a continuous output. The fuzzy logic system and application are discussed in later chapters. A Real-time Expert System Environment for on-line Decision Support Applications RTXPS is a real-time expert system environment designed Production Rules that manage the dynamic for on-line dynamic decision support, mission critical problem knowledge base and trigger command, control and communication tasks such as ACTIONS. ACTIONS are communicated to the operator in hypertext emergency management for format, and can automatically trigger a wide range of technological and environmental functions including data entry and display, an embedded hazards, including early warning for backward-chaining expert system, and complex simulation and optimisation modeling GIS applications. events such as floods, toxic or oil spills, tsunamis, land slides, etc. Support of extensive documentation, logging, reporting and complex control and assessment tasks, external communication functions such as automatic compilation and sending of e-mail or fax messages as well including coordination of first response, as the automatic generation and update of web pages for recovery, restoration and clean-up public information access are important features of the operations, system. related teaching and training RTXPS can also link to on-line monitoring and data applications. acquisition systems that can provide real-time intelligence and feed-back from the field; this can be used not only to RTXPS can be configured to implement any checklist, update the problem context dynamically, but also for the re- questionnaire or operating manual based procedure or calibration of dynamic forecasting models. protocol. It offers context sensitive support functions RTXPS uses a simple near-natural language syntax for its based on Artificial Intelligence technology, that can handle Rules, ACTIONS and Descriptors, the variables that the the most demanding dynamic situations in distributed client- Rules operate on. An intuitive SCRIPT language supports the server environments, with several parallel action threads. efficient development of the Knowledge Base for a new RTXPS provides extensive support and assistance to the application. operator, and keeps complete real-time logs for quality control. Setting and querying of timers or the pending of ACTIONS provide additional features for real-time control. RTXPS is based on a time-aware forward chaining inference engine, that processes context sensitive Application Examples RTXPS is the core DSS component of RiskWare, a decision support system for technological risk management, designed for risk assessment, risk management, and risk training. RTXPS in an extended implementation is the basis of our CourseWare training system, developed in the A-TEAM advanced technical training system. The framework can also be used to guide users through other complex tasks such as the EIAxpert system for screening level EIA. RTXPS has been implemented as the overall framework for SIGRIC, the Sistema di Gestione Rischio Chimico for the Provincial Authorities of Pisa in the Regione Toscana, Italy. RTXPS is also the core DSS component of HITERM, High-Performance Computing for Technological Risk Management, an Esprit HPCN project with case study applications in Italy, Portugal, and Switzerland. Interface with HTML hypertext window and Various editors support the man-machine embedded GIS, action logs and real-time dialog and data acquisition for the inference clock engine RTXPS controls a range of dynamic Communication functions range from polling simulation models that feed their results into remote sensors and access to remote data the expert system bases to automatic fax messages © Copyright 1995-2005 by: ESS Environmental Software and Services GmbH AUSTRIA A Real-time Expert System Environment for on-line Decision Support Applications Technical Specifications DESCRIPTORs The facts (data) of RTXPS are stored in DESCRIPTORs. A value is assigned to a DESCRIPTOR either by direct editing or by starting the rule-based inference. The system then uses a set of alternative methods enumerated in the DESCRIPTOR definition to obtain or update the DESCRIPTOR value in the current context. The inference engine compiles all necessary information for the appropriate Backward Chaining Rules' input conditions recursively, evaluates the Backward Chaining Rules, and eventually updates the target DESCRIPTOR. The complete syntax of a DESCRIPTOR is: DESCRIPTOR <descriptor_name> A <alias_for_descriptor_name> T <descriptor_type> U <unit> V <range> / <range> / <range> / ... R <rule#> / <rule#> / ... TB <table#> / <table#> / ... F <function> IF <interface function> G <gis_function> <gis_overlay> Q <question> MODEL <model_name> T <model_type> I <input_descriptor> / <input_descriptor> / O <output_descriptor> / <output_descriptor> / ENDMODEL ALTERNATIVE <alternative> <alternative defs> ENDALTERNATIVE LAYOUT X <window x-coordinate> Y <window y-coordinate> WIDTH <window width> HEIGHT <window height> BGCOLOR <window bgcolor> BORDER_WIDTH <window borderwidth> BORDER_COLOR <window bordercolor> FORMAT <value selector format_string> DELTA <value selector increment> HYPER_INFO <hyperinfo path> HYPER_X <hyperinfo x-coordinate> HYPER_Y <hyperinfo x-coordinate> HYPER_WIDTH <hyperinfo width> HYPER_HEIGHT <hyperinfo height> HYPER_TWIDTH <hyperinfo backgroundwin width> HYPER_THEIGHT <hyperinfo backgroundwin height> HYPER_FGCOLOR <hyperinfo foreground color> HYPER_BGCOLOR <hyperinfo background color> HYPER_KEYCOLOR <hyperinfo keyword color> HYPER_HIKEYCOLOR <hyperinfo highlight color> HYPER_SWBORDERC <hyperinfo BORDER="1" color> ENDLAYOUT ENDDESCRIPTOR A simple example for a DESCRIPTOR of the reservoir expert system is retention_time: DESCRIPTOR retention_time TS U days V very_small[ 0, 360] / V small [ 360, 1080] / V medium [1080, 1800] / V large [1800, 3600] / V very_large[3600, 7200] / R 7777007 / Q What is the average retention time, in days, Q for the reservoir ? rtention time is the theoretical Q period the average volume of water spends in the reservoir, Q estimated as the ratio of volume to throughflow. ENDDESCRIPTOR A typical use of this inference process is to assist the user in specifying scenario parameters: here the system collects circumstantial evidence to derive an informed guess where no hard data are available. Another use of the backward chaining capabilities of the expert system is to provide a synthesis of large model generated data volumes. The chain of models used to simulate an accident scenario may easily generate data volumes in the order of Gigabytes. These should, however, be summarized in a few simple variables such as the number of people exposed, the level of exposure, the area contaminated, estimated material damage and a rough classification of the accident: these classifications are needed to trigger the appropriate responses. The flexibility to use, alternatively or conjunctively, both qualitative symbolic and quantitative numerical methods in one and the same application allows the system to be responsive to the information at hand, and the users requirements and constraints. This combination of methods of analysis, and the integration of data bases, geographical information systems, and hypertext, allows to efficiently exploit whatever information, data and expertise is available in a given problem situation. An example for a DESCRIPTOR of the reservoir expert system with an external model (in this particular case the inflow_model) is mean_annual_inflow: DESCRIPTOR mean_annual_inflow TS U Mill.m3 V very_small[0,30] / small[30,150] / medium[150,3000] / V large[3000,30000] / very_large[30000,300000] / MODEL inflow_model T local_wait I hemisphere / east_west / longitude / latitude / O mean_annual_inflow / ENDMODEL Q what is the long term average mean annual inflow, Q in Million meter cubed, to the reservoir ENDDESCRIPTOR A model of human problem solving recursively refines and redefines a problem as more information becomes available or certain alternatives are excluded. This responsiveness to the problem situation and the information available, and the ability to adjust as more information becomes available, that is in a sense, learn, is a characteristic of intelligent systems. © Copyright 1995-2003 by: ESS Environmental Software and Services GmbH AUSTRIA