The task is to track vehicles appearing in a video stream accurately enough to provide a
trajectory of the vehicle as it moves through an interesting traffic structure, such as an
This approach for tracking vehicles uses a feature-based motion segmentation approach.
Vehicles are identified by looking for regions of the video image which have a great deal
of movement. After an area of motion has been identified and clustered, that image
region may be tracked, as shown below in Figure 1.
Figure 1 shows a track as a green curve trailing behind the white lower left vehicle. The
image region which is being tracked is within the red ellipse. The yellow ellipse defines
a boundary within which no other track may be initialized. The white car in the lower
left and the dark truck in the upper right both have yellow dots indicating the center of a
moving region. The motion of the vehicle is used to provide an initial location which
may be tracked throughout the frame.
Kanade-Lucas-Tomasi (KLT) features may be used to describe general motions within
video images. The KLT algorithm finds several thousand features in each frame of
video. It then attempts to find a correspondence between the features in one frame with
the features in the next.
Figure 2 shows all the KLT features found for the video frame in Figure 1 as red dots
with a red line connecting the feature to its location in the previous frame. Most of the
KLT features do not move, and appear simply as dots. The white vehicle in the lower left
of Figure 3 has lines trailing behind the vehicle showing where that vehicle’s features
were located in the previous frame.
The KLT algorithm  finds image features at locations where the minimum eigenvalue
of the 2x2 symmetric matrix G:
G gg T dA
is above some threshold. “ g ”is the local image intensity gradient vector or
g , , a 2x1 matrix, and W is a window about some image region around the
KLT feature. Generally, image locations which satisfy the above criterion appear to be
corners which can more easily be tracked. Specifically, the KLT algorithm uses the G
matrix to find the displacement. The above criterion for the eigenvalues of G attempts to
ensure that the KLT’s solution for the displacement is of a higher quality than points
which do not meet the minimum eigenvalue threshold criterion.
A solution for the displacement may be found by finding a displacement d which
minimizes the sum of the squares of the intensity in the following cost function:
[ I ( x d ) J ( x)]2 dx
where I and J are two image intensity maps which are adjacent in time. If one simplifies
the above equation by using a two term Taylor series expansion for I—
I ( x d ) I ( x) g d
and differentiating with respect to d and setting the result equal to zero:
[ I ( x) J ( x) g d ]gdx 0
gg T dA d ( I ( x) J ( x))gdA
the above can be expressed as the following matrix equation:
where G is the same 2x2 matrix used to find the image features. The criteria used to find
trackable features help ensure that Gd e can be solved for d.
This algorithm uses the implementation of the KLT found in the OpenCV computer
vision library .
Classifying Feature Movement
The movement of the KLT features is classified by finding an affine transform for all
KLT features over the entire image. The KLT features whose residual exceeds some
threshold are assumed to be located on a moving object. All of the features judged to be
moving are shown in Figure 3.
The moving KLT features are clustered by finding groups of features whose members are
separated by no more than some minimum distance. This clustering is accomplished by
finding a Delaunay triangulation  for the moving points, and then repeatedly extracting
clusters by recursively traversing the Delaunay triangulation without traversing an edge
whose length exceeds the minimum distance.
Figure 4 shows a Delaunay triangulation of the moving features shown in Figure 3. The
clustering algorithm uses the Delaunay triangulation by selecting a random vertex, and
traversing all adjacent vertices where an adjacent vertex is connected by an edge whose
length is less than some threshold.
Figure 5 shows Figure 4’s vertices without any edges. The vertices have been clustered
into two primary regions-- the green vertices in the lower left and the red vertices in the
upper right. There are a few outlying vertices near the upper right hand cluster.
Tracking an Object’s Movement
The location of moving objects in the video may be interpreted as the centroid of the
moving KLT feature clusters. However, finding the center of motion of a particular
image region is not a very good method of tracking an object. It merely provides
evidence that a vehicle probably exists near that location. The moving cluster centroid
tends not to stay fixed with respect to the vehicle which produced it.
A better method of updating the location of a moving object is to use the local KLT
features to find a transform which maps the current KLT features to features in a
subsequent frame. The resulting transform is then used to update the moving object’s
track location. Because the transform is not dependent on whether an object is moving,
the transform may be used to update motionless objects as well.
This particular algorithm uses a 2D affine transform for no reason other than convenience
and ease of use.
Random Sample Consensus Affine Transform Estimate
The local affine transform is found by Random Sample Consensus (RANSAC).
RANSAC repeatedly selects three KLT features within some threshold distance of a
current track location. These three points are used to find a hypothesis for the local affine
transform. RANSAC then selects the affine transform hypothesis which has the most
support among the local KLT features.
If a reasonable affine transform with a small residual is found at a current track location,
the transform is used to update the position and velocity of the track. Figure 6 shows
four vehicles being tracked.
The video frame shown in Figure 8 demonstrates some of the difficulties encountered
when tracking in the presence of deep shadows. The tracked regions of the vehicles tend
to be centered on the bright side away from the shadows.
 B. Delaunay, Sur la sphère vide, Izvestia Akademii Nauk SSSR, Otdelenie
Matematicheskikh i Estestvennykh Nauk, 7:793-800, 1934
 J. Shi and C. Tomasi. Good features to track. In Proc. IEEE International Conf.
Computer Vision and Pattern Recognition (CVPR) IEEE Press, 1994
 OpenCV Open Source Computer Vision Library