Document Sample

MTech(IT)                                   Subject Code : MT-22


Q.1     What are the issues in knowledge representation?

The fundamental goal of Knowledge Representation is to facilitate inferencing
(conclusions) from knowledge. The issues that arise while using KR techniques are
many. Some of these are explained below.

    Important Attributes : Any attribute of objects so basic that they occur in
     almost every problem domain ?

    Relationship among attributes: Any important relationship that exists among
     object attributes ?

    Choosing Granularity : At what level of detail should the knowledge be
     represented ?

    Set of objects : How sets of objects be represented ?

    Finding Right structure : Given a large amount of knowledge stored, how can
     relevant parts be accessed ?

Q.2     What is Expert System?

    An expert system, is an interactive computer-based decision tool that uses both
     facts and heuristics to solve difficult decision making problems, based on
     knowledge acquired from an expert.
    An expert system is a model and associated procedure that exhibits, within a
     specific domain, a degree of expertise in problem solving that is comparable to
     that of a human expert.
    An expert system compared with traditional computer : Inference engine +
     Knowledge = Expert system ( Algorithm + Data structures = Program in
     traditional computer )
    First expert system, called DENDRAL, was developed in the early 70's at
     Stanford University

Expert systems are computer applications which embody some non-algorithmic
expertise for solving certain types of problems. For example, expert systems are
used in diagnostic applications. They also play chess, make financial planning
decisions, configure computers, monitor real time systems, underwrite insurance
policies, and perform many services which previously required human expertise.

Expert System Components And Human Interfaces :

Expert systems have a number of major system components and interface with
individuals who interact with the system in various roles. These are illustrated

Q.3   What do you understand by forward Vs backward reasoning?

Whether you use forward or backwards reasoning to solve a problem depends on
the properties of your rule set and initial facts. Sometimes, if you have some
particular goal (to test some hypothesis), then backward chaining will be much
more efficient, as you avoid drawing conclusions from irrelevant facts. However,
sometimes backward chaining can be very wasteful - there may be many possible
ways of trying to prove something, and you may have to try almost all of them
before you find one that works. Forward chaining may be better if you have lots of
things you want to prove (or if you just want to find out in general what new facts
are true); when you have a small set of initial facts; and when there tend to be lots
of different rules which allow you to draw the same conclusion. Backward chaining
may be better if you are trying to prove a single fact, given a large set of initial
facts, and where, if you used forward chaining, lots of rules would be eligible to fire
in any cycle.

           Forward Reasoning                      Backward Reasoning
Forward: from the start states.           Backward: from the goal states.
Forward rules: to encode knowledge        Backward rules: to encode knowledge
about how to respond to certain input.    about how to achieve particular goals.
          Combining forward and backward reasoning
                 A1, …, Ak-1, Ak, Ak+1, …, An

                 achieved by             achieved by
               forward reasoning backward reasoning

Q.4    Describe procedural Vs declarative knowledge?
Declarative knowledge is defined as the factual information stored in memory and
known to be static in nature. Other names, e.g. descriptive knowledge,
propositional knowledge, etc. are also given. It is the part of knowledge which
describes how things are. Things/events/processes, their attributes, and the
relations between these things/events/processes and their attributes define the
domain of declarative knowledge.
Procedural knowledge is the knowledge of how to perform, or how to operate.
Names such as know-how are also given. It is said that one becomes more skilled
in problem solving when he relies more on procedural knowledge than declarative
          Procedural knowledge                        Declarative knowledge
  high efficiency                                higher level of abstraction
  low modifiability                              suitable for indipendent facts
                                                  good modifiability
  low cognitive adequacy (better for             good readablity
   knowledge engineers)                           good cognitive matching (better for
                                                   domain experts and end-users)
                                                  low computational efficiency

Q.5    Write a short note on Best first research?

Best-first search, rather than plunging as deep as possible into the tree (as in
depth-first search), or traversing each level of the tree in succession (as in breadth-
first search), uses a heuristic to decide at each stage which is the best place to
continue the search.

Best-first search in its most basic form consists of the following algorithm (adapted
from Pearl, 1984):

The first step is to define the OPEN list with a single node, the starting node. The
second step is to check whether or not OPEN is empty. If it is empty, then the
algorithm returns failure and exits. The third step is to remove the node with the
best score, n, from OPEN and place it in CLOSED. The fourth step “expands” the
node n, where expansion is the identification of successor nodes of n. The fifth step
then checks each of the successor nodes to see whether of not one of them is the
goal node. If any successor is the goal node, the algorithm returns success and the
solution, which consists of a path traced backwards from the goal to the start
node. Otherwise, the algorithm proceeds to the sixth step. For every successor
node, the algorithm applies the evaluation function, f, to it, then checks to see if
the node has been in either OPEN or CLOSED. If the node has not been in either, it
gets added to OPEN. Finally, the seventh step establishes a looping structure by
sending the algorithm back to the second step. This loop will only be broken if the
algorithm returns success in step five or failure in step two.

The algorithm is represented here in pseudo-code:

1. Define a list, OPEN, consisting solely of a single node, the start node, s.
2. IF the list is empty, return failure.
3. Remove from the list the node n with the best score (the node where f is the
    minimum), and move it to a list, CLOSED.
4. Expand node n.
5. IF any successor to n is the goal node, return success and the solution (by
    tracing the path from the goal node to s).
6. FOR each successor node:
    a) apply the evaluation function, f, to the node.
    b) IF the node has not been in either list, add it to OPEN.
7. GOTO 2.


Best-first search and its more advanced variants have been used in such
applications as games and web crawlers. In a web crawler, each web page is
treated as a node, and all the hyperlinks on the page are treated as unvisited
successor nodes. A crawler that uses best-first search generally uses an evaluation
function that assigns priority to links based on how closely the contents of their
parent page resemble the search query (Menczer, Pant, Ruiz, and Srinivasan,
2001). In games, best-first search may be used as a path-finding algorithm for
game characters. For example, it could be used by an enemy agent to find the
location of the player in the game world. Some games divide up the terrain into
“tiles” which can either be blocked or unblocked. In such cases, the search
algorithm treats each tile as a node, with the neighbouring unblocked tiles being
successor nodes, and the goal node being the destination tile (Koenig, 2004).

Q.6   Explain with example traveling salesman problem?

The "Traveling Salesman Problem" (TSP) is a common problem applied to artificial
intelligence. The TSP presents the computer with a number of cities, and the
computer must compute the optimal path between the cities. This applet uses a
genetic algorithm to produce a solution to the "Traveling Salesman Problem".

The traveling salesman problem (TSP) is a problem in discrete or combinatorial
optimization. It is a prominent illustration of a class of problems in computational
complexity theory, which are classified as NP-hard. In the traveling-salesman
problem, which is closely related to the Hamiltonian cycle problem, a salesman
must visit n cities. Modeling the problem as a complete graph with n vertices, we
can say that the salesman wishes to make a tour, or hamiltonian cycle, visiting
each city exactly once and to finishing at the city he starts from. There is an integer
cost c(i, j) to travel from city i to city j , and the salesman wishes to make the tour
whose total cost is minimum, where the total cost is the sum of the individual
costs along the edges of the tour.

Representing the cities by vertices and the roads between them by edges. We get a
graph. In this graph, with every edge there is associated a real number such a
graph is called a weighted graph being the weight of edge.

In our problem, if each of the cities has a road to every other city, we have a
complete weighted graph. This graph has numerators Hamiltonian circuits, and
we are to pick the one that has the smallest sum of the distance

The total number of different Hamiltonian circuits in a complete graph of n
vertices can be shown to be (n - 1)!/2. This follows from the fact that starting
from any vertex we have n - 1 edges to choose from the first vertex , n- 2 from the
second, n- 3 from the third, and so on, these being independent, result with (n-1)
choices This number is, however, divided y2, because each Hamiftonian circuit
has been counted twice

Theoretically the problem of the traveling salesman can always be solved by
enumeration all (n-1)!/2 Hamtltonian circuits, calculation the distance traveled
in each, and then picking the shortest one. However for a large value of n, the
labor involved is too great even for a digital computer.

The problem is to prescribe a manageable algorithm for finding the shortest route.
No efficient algorithm for problems of arbitrary size has yet been found, although
many attempts have been made. Since this problem has application in operations
research, some specific large-scale examples have been worked out. There are also
available several heuristic methods of solution that give a route very close to the
shortest one, but do not guarantee the shortest.

Q.7   Describe neural model?

Computational neurobiologists have constructed very elaborate computer models of
neurons in order to run detailed simulations of particular circuits in the brain. As
Computer Scientists, we are more interested in the general properties of neural
networks, independent of how they are actually "implemented" in the brain. This
means that we can use much simpler, abstract "neurons", which (hopefully)
capture the essence of neural computation even if they leave out much of the
details of how biological neurons work.

People have implemented model neurons in hardware as electronic circuits, often
integrated on VLSI chips. Remember though that computers run much faster than
brains - we can therefore run fairly large networks of simple model neurons as
software simulations in reasonable time. This has obvious advantages over having
to use special "neural" computer hardware.

A Simple Artificial Neuron
Our basic computational element (model neuron) is often called a node or unit. It
receives input from some other units, or perhaps from an external source. Each
input has an associated weight w, which can be modified so as to model synaptic
learning. The unit computes some function f of the weighted sum of its inputs:

Its output, in turn, can serve as input to other units

      The weighted sum               is called the net input to unit i, often written
      Note that wij refers to the weight from unit j to unit i (not the other way
      The function f is the unit's activation function. In the simplest case, f is the
       identity function, and the unit's output is just its net input. This is called a
       linear unit.

Q.8    Describe Pattern Recognition problem?

Methods for solving pattern recognition tasks generally assume a sequential model
for the pattern recognition process, consisting of pattern environment, sensors to
collect data from the environment, feature extraction from the data and
association/ storage/classification/clustering using the features. The simplest
solution to a pattern recognition problem is to use template matching, where the
data of the test pattern are matched point by point with the corresponding data in
the reference pattern. Obviously, this can work only for very simple and highly
restricted pattern recognition tasks. At the next level of complexity, one can
assume a deterministic model for the pattern generation process, and derive the
parameters of the model from given data in order to represent the pattern
information in the data. Matching test and reference patterns are done at the
parametric level. This works well when the model of the gene;ation process is
known with reasonable accuracy. One could also assume a stochastic model for
the pattern generation process, and derive the parameters of the model from a
large set of training patterns. Matching between test and reference patterns can be
performed by several statistical methods like likelihood ratio, variance weighted
distance, Bayesian classification etc. Other approaches for pattern recognition
tasks depend on extracting features from parameters or data. These features may
be specific for the task. A pattern is described in terms of features, and pattern
matching is done using descriptions of the features.

Another method based on descriptions is called syntactic or structural pattern
recognition in which a pattern in expressed in terms of primitives suitable for the
classes of pattern under study (Schalkoft 1992). Pattern matching is performed by
matching the descriptions of the patterns in terms of the primitives. More recently,
methods based on the knowledge of the sources generating the patterns are being
explored for pattern recognition tasks. These knowledge-based systems express I
knowledge in the form of rules for generating and perceiving patterns.

The main difficulty in each of the pattern recognition techniques alluded to above
is that of choosing an appropriate model for the pattern generating process and
estimating the parameters of the model in the case of a model-based approach, or
extraction of features from data parameters in the case of feature-based methods,
or selecting appropriate primitives in the case of syntactic pattern recognition, or
deriving rules in the case of a knowledge-based approach. It is all the more difficult
when the test patterns are noisy and distorted versions of the patterns used in the
training process. The ultimate goal is to impart to a machine the pattern
recognition capabilities comparable to those of human beings. This goal is difficult
to achieve using most of the conventional methods, because, as mentioned earlier,
these methods assume a sequential model for the pattern recognition process. On
the other hand, the human pattern recognition process is an integrated process
involving the use of biological neural processing even from the stage of sensing the
environment. Thus the neural processing takes place directly on the data for
feature extraction and pattern matching. Moreover, the large size (in terms of
number of neurons and interconnections) of the biological neural network and the
inherently different r mechanism of processing are attributed to our abilities of
pattern recognition in spite of variability and noise in the data. Moreover, we are
able to deal effortlessly with temporal patterns and also with the so-called stability-
plasticity dilemma as well.

It is for these reasons attempts are being made to explore new models of
computing, inspired by the structure and function of the biological neural network.
Such models for computing are based on artificial neural networks,

Q.9   Define simulated annealing?

A technique to find a good solution to an optimization problem by trying random
variations of the current solution. A worse variation is accepted as the new solution
with a probability that decreases as the computation proceeds. The slower the
cooling schedule, or rate of decrease, the more likely the algorithm is to find an
optimal or near-optimal solution.

Simulated annealing is a generalization of a Monte Carlo method for examining the
equations of state and frozen states of n-body systems [Metropolis et al. 1953]. The
concept is based on the manner in which liquids freeze or metals recrystalize in the
process of annealing. In an annealing process a melt, initially at high temperature
and disordered, is slowly cooled so that the system at any time is approximately in
thermodynamic equilibrium. As cooling proceeds, the system becomes more
ordered and approaches a "frozen" ground state at T=0. Hence the process can be
thought of as an adiabatic approach to the lowest energy state. If the initial
temperature of the system is too low or cooling is done insufficiently slowly the
system may become quenched forming defects or freezing out in metastable states
(ie. trapped in a local minimum energy state).

The original Metropolis scheme was that an initial state of a thermodynamic
system was chosen at energy E and temperature T, holding T constant the initial
configuration is perturbed and the change in energy dE is computed. If the change
in energy is negative the new configuration is accepted. If the change in energy is
positive it is accepted with a probability given by the Boltzmann factor exp -(dE/T).
This processes is then repeated sufficient times to give good sampling statistics for
the current temperature, and then the temperature is decremented and the entire
process repeated until a frozen state is achieved at T=0.

By analogy the generalization of this Monte Carlo approach to combinatorial
problems is straight forward [Kirkpatrick et al. 1983, Cerny 1985]. The current
state of the thermodynamic system is analogous to the current solution to the
combinatorial problem, the energy equation for the thermodynamic system is
analogous to at the objective function, and ground state is analogous to the global
minimum. The major difficulty (art) in implementation of the algorithm is that
there is no obvious analogy for the temperature T with respect to a free parameter
in the combinatorial problem. Furthermore, avoidance of entrainment in local
minima (quenching) is dependent on the "annealing schedule", the choice of initial
temperature, how many iterations are performed at each temperature, and how
much the temperature is decremented at each step as cooling proceeds.

Simulated annealing has been used in various combinatorial optimization
problems and has been particularly successful in circuit design problems
Q.10 What is Boltzmann machine?

A Boltzmann machine is the name given to a type of stochastic recurrent neural
network by Geoffrey Hinton and Terry Sejnowski. Boltzmann machines can be seen
as the stochastic, generative counterpart of Hopfield nets. They were one of the first
examples of a neural network capable of learning internal representations, and are
able to represent and (given sufficient time) solve difficult combinatoric problems.
However, due to a number of issues discussed below, Boltzmann machines with
unconstrained connectivity have not proven useful for practical problems in
machine learning or inference. They are still theoretically intriguing, however, due
to the locality and Hebbian nature of their training algorithm, as well as their
parallelism and the resemblance of their dynamics to simple physical processes. If
the connectivity is constrained, the learning can be made efficient enough to be
useful for practical problems.

They are named after the Boltzmann distribution in statistical mechanics, which is
used in their sampling function.