Vectorization Algorithm for Line Drawing and Gap filling of Maps

Document Sample
Vectorization Algorithm for Line Drawing and Gap filling of Maps Powered By Docstoc
					                                      (IJCSIS) International Journal of Computer Science and Information Security,
                                      Vol. 8, No. 7, October 2010

  Vectorization Algorithm for Line Drawing and Gap filling of

       Ms.Neeti Daryal                                           Dr Vinod Kumar
Lecturer,Department of Computer Science,                  Reader,Department of Mathematics
M L N College, Yamuna Nagar                               J.V.Jain College,Saharanpur
Vectorization, i.e. raster-to-vector conversion is heart of graphics recognition problems,
as it deals with converting the scanned image to a vector form suitable for further
analysis. Many vectorization methods have been designed. This paper deals with the
method of raster-to-vector conversion which proposed for capturing line drawing images.
.In the earliest works on vectorization, only one kind of method was introduced. The
proposed algorithm combines the features of thinning method and medial line extraction
method so as to produce best line fitting algorithm. There are several steps in this
process. The first step is Pre-processing, in which find the line into original raster image.
Second is developing an algorithm for gap filling between the adjacent lines to produce
vectorization for scanned map. Result and Literature about the above mentioned methods
is also included in this paper.
Key Words: Vectorization, Gap filling, Line drawing, Thinning algorithm, Medial

1. INTRODUCTION                                      into vector lines automatically. In this
                                                     paper, a new raster-to-vector conversion
Graphics recognition is concerned with               method is proposed for capturing high-
the analysis of graphics-intensive                   quality vectors in a line drawing.
documents, such as technical drawings,
maps or schemas. Vectorization, i.e.                 Bitmap Image:                 Vector Graphic:
raster-to-vector conversion, is of course
a central part of graphics recognition
problems, as it deals with converting
the scanned map to a vector form
suitable for further analysis.       Line
drawing management systems store
visual objects as graphic entities. Many
techniques have already been proposed
for the extraction and recognition of
graphic entities from scanned binary                 Figure 1[1]: Raster               Figure1[2]
                                                     Vector Graphics                 Graphics
maps. In particular, various raster-to-
vector conversion methods have been
developed which convert image lines                  There are two kinds of computer
                                                     graphics - raster (composed of pixels)

                                                                                 ISSN 1947-5500
                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                     Vol. 8, No. 7, October 2010

and       vector     (composed         of           edges of the shape) before the medial
paths)[1]. Raster images are more                   axis between the two side edges is
commonly called bitmap images. Vector
graphics are       called object-oriented
graphics as shown in Figure 1[2].

 In general, vector data structure
produces smaller file size than raster
image because a raster image needs
space for all pixels while only point
                                                    Figure 2: Defects of the thinning method
coordinates are stored in vector
representation [3]. This is even truer in
                                                    found. The midpoint of two parallel lines
the case when the graphics or images
                                                    is given by the midpoint of a
have large homogenous regions and the
                                                    perpendicular line projected from one
boundaries and shapes are the primary
                                                    side to the other, and these midpoints are
                                                    coordinates which represent vectors
                                                    [5].The medial line extraction method
                                                    often misses pairs of contour lines at
Vectorization techniques have been                  branches as shown in Figure 3[6]
developed in various domains and a                  consequently it fails to find the midpoint
number of methods have been proposed                of parallel lines [8].
and implemented. These methods are
roughly divided into two classes:
Thinning based methods and Non-
thinning based methods [4].
Thinning based methods are applied in
most of the earlier vectorization schemes
[4]. These methods usually employ an
iterative boundary erosion process to
remove       outer pixels until only one-
pixel-wide skeleton remains like                    Figure 3: Defects of the medial line extraction
“peeling an onion” [5]. A polygonal                 method
approximation procedure is then applied
to convert the skeleton to a vector, which          Other classes of non-thinning based
may be a line segment or a plotline. The            methods that also preserve line width
thinning method tends to create noisy               have been developed recently [5]. These
junctions at corners, intersections, and            include run graph based methods mesh
branches as shown in the Figure 2[6].               pattern based methods         and the
Among the non-thinning based methods.               Orthogonal Zig-Zag (OZZ) method.
Medial     line    extraction    methods,           These methods are not included in this
surveyed in were also popular in the                paper. We are working with above said
early days of vectorization [7]. Methods            two methods only.
of this class extract image contours (the           The disadvantages of thinning based
                                                    methods and medial line extraction

                                                                                ISSN 1947-5500
                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                         Vol. 8, No. 7, October 2010

methods lead to a failure in fitting a line             (1) Linking short line Segments                     into
properly. But the thinning method is able               longer integrated ones.
to maintain connectivity but loses shape
information. Interestingly, the medial                  (2) Correcting the defects at junctions.
line extraction method has the
complementary features; that is, it                      (3) Modifying vector attributes such as
maintains shape information but tends to                endpoints intermediate points and line
lose line connectivity. In combination,                 width.
they could be realized; good-quality
extracted lines could be obtained.                      Linking short line segments into longer
                                                        ones may yield the correct line width
4. PROPOSED                                             and overcome some junction problems.
VECTORIZATION PROCESS                                   Other defects at junctions, such as
                                                        corners and branches are subject to
The following is an implementation of                   special processing [9]. The precise
the line fitting concept. The purpose of                intersection points. i.e. the endpoints of
the       particular method has been                    the vectors, are calculated.
carefully designed to offer practical                   The combination has several steps in this
performance with both acceptable                        process.
processing speed and good vector                        The first step is preprocessing in which
quality. Figure 4 shows a flowchart for                 find the line into original raster image.
the whole procedure [5].                                Second is Gap filling between the
                                                        adjacent lines.

                                                        4.1 PREPROCESSING

                                                             •    A scanned line drawing is
                                                                  converted from binary raster
                                                                  image data to run length code
                                                             •    Processed into skeletons and
                                                                  Tracked for contours.
                                                             •    Each skeleton fragment is linked
                                                                  to      neighboring      contour
                                                             •    Processed into skeleton and
                                                                  contour fragment respectively.

                                                        4.2 GAP FILLING
Figure 4: Flowchart of line fitting method
based on contours and skeletons
                                                        In a contour image the contour lines are
                                                        split and the different contour levels are
Basic vectorization            requires the
                                                        written in the gap. This causes problems
following tasks:
                                                        in automatic vectorization of images.
                                                        Since the text are erased and not taken
                                                        into account while vectorizing, the final

                                                                                    ISSN 1947-5500
                                    (IJCSIS) International Journal of Computer Science and Information Security,
                                    Vol. 8, No. 7, October 2010

output has gaps in between lines. Gaps             Using these coordinates we perform
are also produced due to noise. Thus gap           least square parabola fitting to get the
filling should be given prime importance           values of the coefficient, a, b and c.
after processed into skeleton and contour          Using the values of a, b and c and the x
fragment respectively. A poor-quality              coordinates of the two lines we can get
line drawing often has gaps which                  an approximate value of y. There are
prevent correct vector extraction [10].            other cases where we can directly extend
Following algorithm shows the steps for            the line and we do not have to
gap filling                                        approximate the curve. The X and Y
                                                   coordinate are chosen based on four
                                                   cases as shown below. Let us consider
                                                   that (x1, y1) and (x2, y2) are the end
Step 1: Reading the input and getting              points of two lines whose distance is less
the x and y coordinates of the line.               than the threshold value.
                                                   Consider Figure 6, the end points are
Step2: Get x and y coordinates of the              highlighted in red. Here we can see that
endpoints.                                         x1≠ x2 and y1 ≠ y2 and x1 ≠ y1 and x2
                                                   ≠ y2. In this case since x ≠ y we cannot
Step3: Find distance between endpoints.            connect it using a straight line and so we
After finding the end points we find the           will use Least Square parabola to
distance between the end points using              interpolate the points in between the
the Euclidean distance formula which               endpoints.       Using the x and y
can be mathematically represented as,              coordinates of the two lines we get the
D = p(x1 − x2)2 + (y1 − y2)2                       value of a, b and c using the steps
Where D is the Euclidean distance and              explained in the above section. After we
(x1, y1) and (x2, y2) are endpoints.               get the values of a, b and c we increment
                                                   minimum value of x by 1 until it reaches
Step 4: IF distance < threshold then set           maximum value of x and substitute the
the threshold otherwise stop.                      vale of x in the following equation to get
                                                   the corresponding y value,
Step5: Setting the threshold.
Step6: Get the x and y coordinate of end
points and five adjacent points
corresponding to the line then we get the
x and y coordinate of the end points that
have distance that is less than the set

Step7: Check if any of the distance are
equal then we go to step 8 (slope                  Figure 5: Example of Gapfilling
function) otherwise go to step 9 (Least
Square Parabola).

Step8: Slope Function

Step9: Least Square Parabola fitting.

                                                                               ISSN 1947-5500
                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                     Vol. 8, No. 7, October 2010

                                                    interpolate and get the x coordinate to
                                                    get the corresponding y coordinate.

                                                    5. Results

                                                    The result obtained has been shown
                                                    using all the foresaid discussed methods
                                                    displayed in the form of results as
                                                    follows. Figure 8 is the scanned image
                                                    and Figure 9 the corresponding gap
                                                    filled image. Since this is an iterative
     Figure 6: Case 4: Gap Filling                  process all the gaps that are within the
       f(x) = a + bx + cx2                          threshold are filled.

Where f(x) = y. The least squares line
method uses this equation to get the
parabola graph. After getting the value
of y we approximate the number to a
natural number. The condition for
approximation being that if the decimal
value is greater than or equal to 0.5 then
it is approximated (rounded) to the next
number and if it is less than 5 then it is
                                                    Figure 8: Contour Image with Gap
approximated to the real number. For
example if the value of y is 4.75 then it
is approximated to 5 and if the value of y
= 4.30 then it is approximated to 4. An
example of gapfilling of this case is
shown in Figure 7.

                                                    Figure 9: Gap Filled Contour Image

                                                    6. CONCLUSION
   Figure 7: Example of Gapfilling
                                                    In this paper, we have discussed the line
Rounding the number or approximating                formation, which has been done through
is only done for raster images and not for          the combination of line fragment and
vector data since there is no need to               contour fragment algorithm for building
rasterized the curve. LSP is used only for          a vectorization method which leads to
case four because in the other cases we             filling the gap between the lines. More
get the exact coordinates by just                   specifically, the gap between the lines
extending the line and we do not have to            have been filled by Least Square

                                                                                ISSN 1947-5500
                                    (IJCSIS) International Journal of Computer Science and Information Security,
                                    Vol. 8, No. 7, October 2010

Parabola fitting algorithm This resultant
of this method has been applied for the            [7] Kasturi, S. T. Bow. W. El-Masri. J.
correction of scanned map, shown as                Shah, J. R. Gattiker, and U. B. Mokate;
Figure 8 & 9.                                      ”A System for Interpretation of Line
                                                   Drawings”, IEEE Trans. on PAMI, 12(
6. REFRENCES                                       IO), pp978-992, 1990.

[l] J.Jimenez and J .L.Navalon, “Some              [8] Borgefors. Distance Transforms in
experiments in image vectorization,”               Digital Images. Computer Vision,
IBM J. Res. Develop. 26, pp.724-                   Graphics and Image Processing, 34:344-
734(1982) [4] R.O.Duda, P.E.Hart, “Use             371, 1986.
of Hough transformation to detect lines            [9] J.Canny. A Computational Approach
and curves in pictures,” Commun.ACM,               to Edge Detection. IEEE Transactions on
15, 1, pp.11-15(1972) [5] J. Jimenez and           PAMI, 8(6):679-698, 1986.\
J.L. Navalon, -‘Some Experiments in
Image Vectorization’ , IBM J. Res.                 [10] R.W. Smith, “Computer Processing
Develop. 26, pp724-734, 1982.                      of Line Images: “A Survey”, Patteni
                                                   Recognition, 20( l), pp7-15, 1987.
[2]  Smith R.W. (1978). Computer
processing of line images: A survey.
Pattern Recognition x; 20(1):7-15.

[3] R.Kasturi, S.Siva, and L.O’Gorman,
“Techniques     for     Line   Drawing
Interpretation: An Overview,” Proc.
IAPR Workshop on Machine Vision
Applications, pp. 15 1-160( 1990)

[4] H.Tamura, “A Comparison of line
thinning algorithms from digital
geometry viewpoint,” Proc.4th Int. Jt
Conf. on Pattern Recognition, Kyoto,
Japan, pp715719, IEEE(1978).

[5] F.Chang, Y.-C. Lu, and T. Pavlidis.
Feature Analysis Using Line Weep
Thinning Algorithm. IEEE Transactions
on PAMI, 21(2):145-158, Feb. 1999.

[6] Tainura, “A Comparison of Line
Thinning Algorithms
from Digital Geometry Viewpoint”,
Proc. of 4th hit. Jt. Conf. on
Pattem Recognition. Kyoto. Japan,
pp715-719, 1978.

                                                                               ISSN 1947-5500

Description: Vol. 8 No. 7 October 2010 International Journal of Computer Science and Information Security