License Plate Recognition
(1) What is LPR?
(2) Application areas
(3) Top-Level description of the prototype system
(4) General methods
Extraction of plate region
I. Edge extraction
II. Hough transform
(a) Intersection method
(b) Pixel method
(c) Boundary-constraint-based OCR
(d) Grid based OCR
(e) Region matching based OCR
III. Neural networks
(5) Noise Removal
(6) What to look for?
(7) Case Study:- Speed Detection
(1) What is LPR?
LPR (License Plate Recognition) is an image-processing technology used to identify
vehicles by their license plates. This technology is used in various security and traffic
applications, such as the access-control system featured in the following animation:
In the above example: while the vehicle approaches the gate, the LPR unit automatically
"reads" the license plate registration number, compares to a predefined list and opens the
gate if there is a match.
(2) Application areas of LPR
1. This system can be used as "doorman" in entrance to restricted areas.
2. Traffic law enforcement.
3. Fee highways.
5. Access control
6. Traffic surveillance
(3) Top-Level description of the prototype system:
First a digital camera is used to record images of a passing car.
A computer that received input from this tape makes the further analysis.
The developed software analyzes an input video stream and cuts from it pictures,
suspected to contain LP (License Plate).
Afterwards, the picture is passed to OCR module that extracts a number from the picture.
(4) General methods
In the commercial systems following methodologies are being used:
Extraction of Plate Region
For extracting the plate region, techniques such as edge extraction, Hough transform ,
histogram analysis and morphological operators have been applied.
An edge-based approach is normally simple and fast. However, it is too sensitive to the
unwanted edges, which may happen to appear in the front part of a car. Therefore, this
method cannot be used independently.
Hough transform for line detection gives positive effect on images with a large plate
region where it can be assumed the shape of license plate is defined by lines. However, it
needs a large memory space and considerable amount of computing time.
In histogram analysis we make use of thresholding to map each pixel value to either
foreground or background values. The process of assigning 1 of 2 values to each pixel is
sometimes called binarization. A region is a connected portion of an image in which the
considered uniform. When thresholding anything, there‘s a fundamental problem: how
should we select the threshold value?
The effect of various threshold values is depicted below
T = 124 T = 137 T = 159 T = 213
One approach to selecting a threshold is
Compute a histogram of the gray-scale image
Analyze the histogram to identify significant concentrations of intensity values
Select a threshold that separates the 2 concentrations
A histogram is a representation of the statistical distribution of observed values. For an
image, a histogram h(i) of image intensity values indicates the number of pixels having
A histogram looks something like this:-
A histogram is often used to estimate the probability distribution of image intensities. In
some cases, it is useful to think of a histogram as the sum of several Gaussian
distributions. Histograms are often used to determine thresholds for images. A histogram
does not contain information about the position of image contents. We can think of a
histogram as a vector. Histograms are not limited to image intensities; we could compute
the histogram of any set of numbers.
The histogram based approach does not work properly on the image with noises and the
image with tilted plate.
This morphological operator is a composition of three basic operators: a dilation, an
erosion of the input image by the input structuring element and a subtraction of these
two results. Morphological operators often take a binary image and a structuring element
as input and combine them using a set operator (intersection, union, inclusion,
complement). They process objects in the input image based on characteristics of its
shape, which are encoded in the structuring element. Usually, the structuring element is
sized 3×3 and has its origin at the center pixel. It is shifted over the image and at each
pixel of the image its elements are compared with the set of the underlying pixels. If the
two sets of elements match the condition defined by the set operator (e.g. if the set of
pixels in the structuring element is a subset of the underlying image pixels), the pixel
underneath the origin of the structuring element is set to a pre-defined value (0 or 1 for
binary images). A morphological operator is therefore defined by its structuring element
and the applied set operator.
For the basic morphological operators the structuring element contains only foreground
pixels (i.e. ones) and `don't care's'. These operators, which are all a combination of
erosion and dilation, are often used to select or suppress features of a certain shape, e.g.
removing noise from images or selecting objects with a particular direction.
The more sophisticated operators take zeros as well as ones and `don't care's' in the
structuring element. The most general operator is the hit and miss, in fact, all the other
morphological operators can be deduced from it. Its variations are often used to simplify
the representation of objects in a (binary) image while preserving their structure, e.g.
producing a skeleton of an object using skeletonization and tidying up the result using
One important application of the morphologic gradient in binary images is to find their
boundaries. An example is shown below
Input image Negation of the gradient
Morphology has been known to be strong to noise signals, but it is rarely used in real
time system because of its slow operation. Morphological operators are applied to
graylevel images to reduce noise or to brighten the image etc. However, for many
applications, other methods like a more general spatial filter produces better results.
The recognition process can be divided into
1) segmentation process
2) recognition process
The most of conventional segmentation methods are rule-based methods utilizing for
the specific placement of the characters, labeling, histogram, and so on. In the methods,
for coping with unevenness of the color depth depending on the lighting condition and
the dirtiness, the binarization process is more important than the other process.
Nevertheless, if a noise such as a dust and a stain exists on or near a character, the
character is being broken off and is segmented to too large or small area in segmentation
process. This missegmentation is reason for decreasing of the recognition rate.
Approaches for Recognition
A) Syntactic approach: the ridge patterns and minutiae are approximated by a string of
primitives – also called template.
B) Structural approach: features based on minutiae are extracted and then represented
using a graph data structure. Using the topology of the features does the matching.
C) Neural networks approach: a feature vector is constructed and classified by a neural
A) The correlation or template matching approach to character recognition is straight-
forward and can be reliable, provided the target is "cooperative" and the application
remains invariant. As the name implies, once each character is isolated, the recognition
engine attempts to match it against a set of predefined standards. Any condition--
lighting, viewing angle, obscuration, plate size, font-- that causes a character to vary from
the standard is likely to confuse the engine and return a questionable result.
Some approaches are :-
The intersection method of character recognition is accomplished by placing a number of
horizontal and vertical lines across the image of the character. Each line intersects the
character a number of times. The number of intersections for each line forms a pattern,
which is used to recognize the character. This method is very tolerant to the position of
the character within the grid formed by the lines. If the character is placed in different
locations within the grid, the pattern will be shifted, but will still be intact.
The pixel method of recognition uses a library that contains an ideal version of each
character that can be recognized. To recognize a character, the image of the character is
compared pixel by pixel to each image in the library. For example, the image is compared
to the library version of the letter A, B, C, and so on. The comparison that produces the
highest percentage of similar pixels will be the software’s guess. Comparing equally
spaced pixels instead of comparing every pixel of the image speeds up the process, with a
loss of accuracy.
A variation of this method involves checking a subset of pixels instead of comparing
them all. Instead of comparing all of the pixels of the letter A, only the pixels in the
general shape of an A are checked. This cuts down on the number of comparisons
needed, but will generally be less accurate than comparing all of the pixels.
The pixel method has two disadvantages. The first is that there are many comparisons
performed, which will lead to long processing times. The second disadvantage is that for
the pixel method to work, both the image of the character and the library version must be
scaled and aligned very well to provide good results.
Boundary-constraint-based OCR recognizes characters by examining the boundaries of a
character at specific points. First, two sets of three points are positioned above and below
the character. These three points form a line, with one point at the left edge of the
character, one at the right edge of the character, and one at the center. As shown in Figure
below these two sets of points are then moved vertically towards the center of the
character until they hit parts of the character.
These six points are enough information to form a unique match with many characters in
the English language. Where ambiguities exist, or to improve the accuracy of the
recognition, the process can be repeated horizontally as shown in next figure
Grid Based OCR
Grid Based OCR
Even greater accuracy can be achieved by adding more points to the boundary lines like
in next technique
Depending on the computational power available and the accuracy desired, it is not
uncommon to use five or more boundary points on each side. While it is not needed for
English capital letters and numbers, some character sets also require an "inner boundary"
to be detected. This procedure is similar to that used to determine the outer boundaries,
but these constraints expand from the center instead of contracting from the outside.
B) Structural analysis uses a decision-tree to assess the geometric features of each
character's contour. The technique can be somewhat tolerant of variations in size, tilt, and
perspective. As a simple example, the characters B, D, 6, and 9. Features that might be
used to distinguish them are the number of loops in each character-- one or two-- and the
vertical position of the loop-- top, central, or bottom. Two loops point to a B, and one
loop leads to the next branch of the tree. A loop at the top indicates a 9; if the loop is
central, it's a D, and a loop at the bottom means a 6. Characters without loops (E, M, N,
hyphen, etc.) require additionally complex, time-consuming analysis.
C) Neural networks are trained by example instead of being programmed in a
conventional sense. While learning to recognize a recurring pattern, the network
constructs statistical models that adapt to individual characters' distinctive features.
Therefore, neural networks tend to be resilient to noise, and performance usually is not
compromised under changing operational conditions. However, each modification (e.g. a
new font) that is presented to the neural network may require a significant investment in
We now describe this popular method in little more detail:-
A Neural network consists of at least three layers of units: an input layer, at least one
intermediate hidden layer, and an output layer. The connection weights in a Neural
network are one way. Typically, units are connected in a feed-forward fashion with input
units fully connected to units in the hidden layer and hidden units fully connected to units
in the output layer. When a Neural network is cycled, an input pattern is propagated
forward to the output units through the intervening input-to-hidden and hidden-to-output
With Neural networks, learning occurs during a training phase in which each input
pattern in a training set is applied to the input units and then propagated forward. The
pattern of activation arriving at the output layer is then compared with the correct
(associated) output pattern to calculate an error signal. The error signal for each such
target output pattern is then backpropagated from the outputs to the inputs in order to
appropriately adjust the weights in each layer of the network. After a Neural network has
learned the correct classification for a set of inputs, it can be tested on a second set of
inputs to see how well it classifies untrained patterns. Thus, an important consideration in
applying Neural learning is how well the network generalizes.
A typical Neural network looks something like this
Typical architecture of a backpropagation network
(5) Noise Removal
There exist many kinds of noise in a surveillance system. If the noise is introduced by
playing and digitization, it might be worthwhile to play the recording more often and
average the final result. One danger with this method is however that the tape can be
damaged if it is played more often.
Averaging over multiple frames can also reduce the noise. This can be valuable if there is
a night recording with a camera and a scene that does not move. One problem arises
when in certain frames moving people are seen (figure 2). Since these moving people are
not present in most of the frames (that are averaged), obviously they wouldn’t be visible
in the final averaged frame. The value of this method depends on the part of the image
that has to be visualized.
(6) What to Look For?
Evaluating a system's capabilities can require that one look beyond the recognition
engine's proficiency with a standard character set under nearly ideal test conditions. It is
prudent to determine all factors that can influence operations and to learn the effects of
those variables that cannot be held constant. They include:
Volume of vehicle flow
Ambient illumination (day, night, sun, shadow)
Vehicle type (passenger car, truck, tractor-trailer, etc.)
Plate mounting (rear only or front and rear)
Plate jurisdiction (and attendant fonts)
Plate tilt, rotation, skew
Presence of a trailer hitch, decorative frame, or other obscuring factors
(7) Case Study :- Speed Detection
An off-line system comprises of two parts; namely, the image capture hardware and the
image processing system. A video camera is used to capture several series of video
frames in different locations. A database is built up using these real-life captured video
frames. The system comprises of PC hardware to digitize the video information and
software to process the digitized images. The video is converted into a suitable format
(like AVI) in the laboratory, a sequence of frames can be extracted from the AVI file. With
the captured frames, image-processing techniques are used to extract the multiple objects
from individual video frame. Subsequent to that, the location of each moving object is
identified and the corresponding speed is abstracted.
Foreground and background separation is first performed followed by filtering out the
noise on the image by Morphology and other image filters. Locating each moving object
by clustering and connectivity follows by estimation of the distance traveled for each
moving object in a succession of images. Each object is distinguished by its license plate
using LPR software.
A background of the road is used as a reference for the subsequent image processing. An
analysis is made of a typical video frame with multiple moving objects. The frame is
processed with the moving objects being extracted from the background. A particular
vehicle is identified with a LPR. With consecutive video frames, a succession of time
sequence frame with multiple objects being extracted can be obtained. Knowing the
timing between consecutive frame, speed of individual objects can be evaluated
Suppose we know the distance between camera and LP at an instant and again at some
other instant, calculation of speed is a child’s play. The reference value can be some
benchmark on road, given that all LP have fixed dimensions. Such a situation is quite
rare. Hence, to take three frames as sample values bypasses all constraints written above.
No need for fixed size LP, no benchmark, no prior knowledge of any distance. To include
the case of curvi-linear motion, we suggest taking change of height of LP w.r.t time
(assuming flat ground) instead of its width. The method just described uses one camera
only to accomplish the job.
A simpler technique is to use two cameras that are a fixed distance apart and detect LP of
a vehicle as they pass a benchmark. Knowing time elapsed, we can calculate speed.The
best advantage of such image based systems is that they are immune to electronic
We thus have so far learnt that what is LPR , various techniques of implementation
and applications of such systems in practical life. Such systems make our lives more
safe , secure and convenient.