Learning Center
Plans & pricing Sign in
Sign Out

中期论文An improved maze algorithm in contour extraction of CAPTCHAs


									        An improved maze algorithm in contour extraction of CAPTCHAs

                            LIU Jia-lin1, LIANG Kai-fa1, CHEN Shuang-ping1, *

                            1 Zhuhai College, Jinan University, Guangdong, China

Abstract—An improved maze algorithm has been first           sufficiently       extraction    of       CAPTCHAs ’ contour
advanced, combining with the rotation invariant theory, to   feature         , or tru ly reflect the trend of contour, and
extract the concou r of CAPTCHAs which are slant and         easily mistake points outside boundary. Thoroughly
have random positions. First, take the CAPTCHAs as a         evaluating others’ methods’ advantage, an improved
maze, in which white pixels represents a potential route;    maze algorithm has been first advanced in this paper
Second, search towards “left-down” in the predefined         to ext ract the contour features of CAPTCHAs wh ich
direction and record the pixels which have been checked;     are slant and have random positions.
Finally, get the clockwise rotate projection of CAPTCHAs
                                                                 II. DEFINITION OF CAPTCHAs CONTOUR
within the paths’ closed region in order to obtain the
CAPTCHAs’ contour features which are immune to                  Just like all other characters, CAPTCHAs have
rotation, random position, slant and other factors.          some distinguishable contour feature, and these

Experiments have shown that th e algorithm has better        features show a trend in outer contour-the "trend", the

performance in dealing with CAPTCHAs which are slant         "trend" of this CAPTCHA contour helps to recognize

and have random positions, its sufficient extraction of      the unique character. So in order to segment different
contour features reduces errors and improves recognition     character, we need to write a program to track the
rate.                                                        outline of the "trend".
                                                                                  4 4              3    3
 Keywords Maze Algorithm; Contour; CAPTCHAs
                                                                                  4                     3
            I. INTRODUCTION                                                       2 2
                                                                                  2 2
    CAPTCHAs recognition is a reverse Turing test
application (, most current                                  1 1
CAPTCHAs have a complex background, random                                         Figure 1
                                                             As is shown in Figure 1, the “trend” has four forms:
position and other features such as slant and
distortion. Researches on CAPTCHAs also have                                    H:1-1
great mean ing in Art ificial Intelligence, Image
Recognition and Pattern Match, etc. [1]                                         U:2-2
    In the process of CAPTCHAs recognition,
features extract ion of the characters’ contour plays                           R:3-3
the key role      not   merely    in the process of
segmentation but also in matching and recognition.                              L:4-4
At present, there have numerous methods concerning                                  Figure 2
                                                             In the Figure 2, eight contour trends are defined, and
the contour extraction mentioned in different theses.        the adjacency relationship of pixels in the figure
Such as “Horizontal Projection” in [7]; [2] proposed         constitutes the following collections:
Refinement algorith m based on [7]; “Drop fell” in [3];                               u  {rh , ru , rl , rr }
Outline Curvature method in [4]; Contour Tracing             Among them,  rh is horizontal trend; ru is vertical
algorith m based on Edge Detection in [5]; Surface           trend; rl is up-right (down-left) trend; rr is
Reconstruction algorithm in [6]; Mimetic Water               down-right (up-left) trend.
Erosion, etc.. These methods’ contour reparation
achieved certain effects based on edge detection [8],
but on the whole, these methods cannot guarantee a
           III. THE INTRODUCTION OF THE MAZE                                     cur= Pop();
                            ALGORIGHTM                                              For i=0 to 7 do {

       Maze algorith m has been used to solve the                                        tem=cur+d[i];

    shortest path problem which is a co mmon algorithm                                   If (tem! =black) {

    in the ACM contest. Thoroughly evaluating the                                               Push (tem);

    efficiency of the maze algorith m, an imp roved maze                                        cur+=tem;

    algorith m were advanced in our work and applied to                                         cur=done;// Avoid repetition

    the features extraction of CAPTCHAs contour. In the                                         If (cur==obj) {

    following of the paper, we will first discuss the maze                                            Push(cur);return s;}

    algorith m, and then show how to improve it and                                              Else

    introduce it in the feature ext raction of the                                                    {i=0 ;}

    CAPTCHAs.                                                                             }
   A. The analysis of Maze algorithm
                                                                                         {i++;}//search in the next direction
       The process of finding route in a maze is actually
    a search process: in any time and any position in the
    maze, always let the co mputer search the next
    position in eight direction(east, southeast, south,
                                                                         B. Improved Maze algorithm
    southwest, west, northwest, north, the Northeast); If
    next position could be passed, and has not been                            The maze algorithm solved the problem that

    checked before(synchronous recording is needed in                     finding the shortest path fro m entrance to exit under

    this step), then goes forward one step, and repeat the                unknown environment (such as a maze). Concerning

    same method in the new position to carry on the                       the practical situation of CAPTCHAs, this maze

    search; If 8 directions(Figure 3) were tried out but the              algorith m was improved in our work, wh ich is

    access had still not been found, then go back by one                  illustrated as follows:

    step, try the next d irection in the preposition [11].                    First, treat the CAPTCHAs picture as a maze (a
                                                                          complete unique maze was constructed between
                    O                         X
                                                                          every two adjacent characters), get rid of the noise
                                                                          through preprocessing of picture;
                              1       2      3
                                                                              Secondly, seek for the “entrance” of every maze in
                              4              5                            the picture: find the entrance through computing the
                                                                          most left and most up position in a maze, seal up left
                        Y     6       7      8
                                                                          region of the entrance, and set the picture boundary
                                                                          as the exit;
                                  Figure 3
         During every step search, the program needs                          Finally, find another character’s contour using the

    to judge whether it has arrived at exit of the maze,                  contour left to it.

    if so, then stop the search and withdraws; if not,                        Based on the above ideas, the improved maze

    then continue to search; If it goes back to the                       algorith m actually reflects the sufficient contour

    beginning of the maze, we could realize that the                      feature of CAPTCHAs. The improved algorithm is

    maze doesn’t has a access route.                                      illustrated as follows:
       Maze Algorithm is actually an operation of stack,                       1) Define the preferred direct ions of search
    through continuously pushing one accessible position                  (left-down direct ion has priority);
    in the stack to update the current stack           [13]
                                                              . The            2)       Find    the     most    left   and   up   entrance
    algorith m is described as below:                                     (TopLeft_point) using edge recognition strategy

    Maze_path () {                                                        while seg menting, and set this position’s horizontal
         Push(s);                                                         projection lines with black p ixels;
         While (s! =null) {                                                    3) Use maze algorithm, along the character’s
    outlines to find accessible route-“s” till the boundary                 Finally change the in itial point of this sequence to get
    of the image;                                                           n   kinds of situations. The n-mo ment matrix is
        4) Mark the region which is in the left of “s”                      illustrated as follows:
    found in 3);                                                                                k1     k2    ...   kn 
        5) Repeat 2), 3), 4) until the end of segmentation                                     k       k3    ...   k1 
                                                                                                2                         
    of the whole CAPTCHAs picture.                                                              k3     k4    ...   k2 
                                                                                                                          
                                                                                                ...    ...   ...   ... 
         IV. FEATURE EXTRA CTION BASED ON                                                       kn
                                                                                                       k1    ...   k n 1 
                    ROTATION INVA RIANCE                                    Match among these matrixes to find the best matched
       After contour ext raction using the "improved                        sequences. The time co mplexity is n .
    maze algorith m" advanced in III, the whole contour                    B. Revolves invariable characteristic extraction
    feature of every character was stored in memo ry                            In order to reduce the unnecessary computation,
    separately, through which the following steps of                        this paper defined some “effective rotation”. In
    feature extraction based on rotation invariance was                     another word, no character would t ilt more than 90
    guaranteed.                                                             degrees in practice. Thus we may reduce the matrix
   A. The definition of Rotation Invariance                                scale reduction by half. The algorith m is divided into
       Rotation Invariance      means      “Shapes can be                   the following several steps:
    converted to time series. The distance from every                           1) Calculate the minimu m circle inscribed by
    point on the profile to the center is measured and                      characters after the segmentation results in IV;
    treated as the Y-axis of a time series of length”               .           2) Calculate the circle's center;
    Rotation Invariance converted the contour of an                             3) Make rays at an angles increase of             fro m the
    image into a time series, wh ich represent very well                    center to the contour, record the nearest point;
    every feature of the contour of CAPTCHAs and                                4) Construct n-mo ment matrix;
    performs better in keeping the whole features of                            5) Repeat steps 1-4 until the feature repository is
    CAPTCHAs.                                                               sufficient;
       Based on this theory, aiming at processing the                           6) Match: through calculate the minimu m d istance
    slant and rotation feature of some CAPTCHAs, this                       of the different n-mo ment matrix, get the matching
    paper use the “circle” method to ext ract the features                  results.
    of CA PTCHAs’ contour. Using “effective rotation” to
                                                                                             V.        EXPERIM ENT
    reduce computation. So we need to obtain the center
                                                                               Experiment environment: Intel(R) Pentiu m(R)
    of the character, and then co mpute the distance
                                                                            Dual CPU T2330 @1.60GHz, 0.99GB memo ry;
    between the contour boundary and the center. In the                     Co mpiler: Visual Studio 2010; Programming
    actual operation, firstly zoo m the unique character                    language: C#; Data: 500 CAPTCHAs of each kind
                                                                            selected form Internet.
    which has been segmented on a certain proportion,
                                                                           A. Contour Extraction Effect
    and place it in circle whose diameter is certain,
                                                                                The practical imp rovement of maze algorith m in
    making it inscribed the circle; Then make rotating                      on Internet interception of typical validation code for
    rays fro m the center of circle in which every t wo                     the outline ext raction, Figure 4 shows the location,
                                                                            the algorith m randomly, character tilt of the
    rays’ included angle is  , it ’s no doubt that each ray                verification code to deal with better results:
    would reach the contour, so we could record the
    crossing point along the ray fro m the center to the
                                                                                                        Figure 4
    outer contour (i.e. crossing point between every ray
                                                                           B. The statistical recognition rate of different
    and character outline) in which the nearest point fro m                 combinations of algorithms
    the center is needed. A total of          records will                     In order to compare the efficiency of our wo rk
                                                                           with others’, we used some comb ination of different

    save according to sequence: K        {k1 , k2 , k3 , } .          algorith ms. Contour extraction algorith m used in the
                                                                            experiment: Pro jection method and Imp roved Maze
Algorith m advanced in this paper; Feature extraction                          better method to recognize CAPTCHAs which are
algorith m:          Histogram          and       Rotation        Invariant    slant and have random positions or other tough
Algorith m used in this paper. 4 CAPTCHAs                                      factors. Experiments results showed that the
Recognition method are”1: Pro jection + Histogram; 2:                          algorithms have a better effect in extraction, and
Projection + Rotation Invariant method; 3: Improved                            could sufficiently reserve the contour features of
Maze A lgorith m +Histogram; 4: Improved Maze                                  CAPTCHAs, what’s more, through the combined
Algorith m + Rotation Invariant algorith m wh ich is                           algorithms, we achieved a great enhance in
the best combination first advanced in this paper.                             recognition rate.
        Experimental           data       used        three     types     of
                                                                                                   A CKNOWLEDGMENT
CAPTCHAs which are different from each other
                                                                                   This work was supported by National College
largely. Recognition rates from 1 to 4 algorithm                               Innovative Experimental Project of Zhuhai College
combinations are shown as below:                                               Jinan University.
               Recognition Rates of Different Algorithm Combinition                *Corresponding
         0.8                                                                   [1] ZHAO Yong-tao, Li Zhi-min, WANG Hong-jian, CHEN

         0.6                                                          4399         Zhi-yun, WANG Lin. Image Preprocessed Study of the Seal
                                                                      51com        Imprint Verification [J]. Chinese Journal of Scientific
                                                                                   Instrument, 2004, 25 (4):401-405(in Chinese)
           0                                                                   [2] YU Ming , ZHANG Yan-yun , XUE Cui-hong. Image
                    1             2            3           4                       segmentation algorithm of single handwritten Chinese
                              Algorithm Combinition
                                                                                   characters[J]. Computer Engineering and Applications,2010
                         Figure 5                                                  ,46(9):180-182. (in Chinese)
Through comparison in Figure 5, we can found that                              [3] S. Khan, Character Segmentation Heuristics of Check Amount
the algorithm advanced in this paper gain the highest                              Verification, Master Thesis, Massachusetts Institute of
recognition rate, and the more difficult of the                                    Technology, 1998.
CAPTCHAs(which can be seen from the init ial                                   [4] WANG Gui-xin, LIU Jian-sheng, JU Yan, WANG Tong-qing.
Recognition Rate through algorith m 1, the lower the                               Handwritten Character Recognition & Features Extraction from
rate, the more difficu lt the CAPTCHAs ), the more                                 the Curvatures of Contour Points Based on Wavelet Transform
obvious advantages the Improved Maze A lgorith m.                                  [J]. Journal of Huazhong University of Science and Technology
    For the most difficult CAPTCHAs used in the                                    (Nature Science), 2006, 27 (1): 83-85 (in Chinese)
experiment, some more concrete data using different                            [5] MA Jin, CHEN Li-chao, ZHANG Yong-mei. A Research on
algorith ms are listed in Table 1:                                                 Contour Tracking and Edge Detection Based On Image
         Table 1 Experiment Data of Algorith m 4                                   Automatic Recognition [J]. Journal of North University of
         A Ave_e(ms) Ave_r(ms)                rate                                 China, 2006, 27 (5): 431-434(in Chinese)
      lg                                                                       [6] ZHANG T ai-fa, ZHANG Hong-yan, ZHANG Ya-jiang. An
         1         0.80          0.27         0.43                                 Algorithm of outside Boundary Contour Tracking for Surface
         2         1.60          89.73        0.50                                 Reconstruction    [J].Computer  Simulation, 2008, 25
         3         1.62          0.48         0.47                                 (12):239-241 (in Chinese)
         4         1.99          169.42       0.82                             [7] ZHU Yan-li, FU Jun-hui, SUN Yin-jie. A Declining License
In Table 1, A lg is shorted of different algorithm                                 Character Extraction Method [J]. Control & Automation, 2008,
combination; Ave_s is shorted of average time of                                   24 (1):229-230. (in Chinese)
extraction (or segmentation); Ave_r is shorted of                              [8] ZHANG Lin-rui, JIA Yu-lin, CHENG Ke. Method of obtaining
average time of recognition; rate is the average                                   close exterior outlines of simple images [J]. Infrared and Laser
recognition rate.                                                                  Engineering, 2006, 35 (3) (in Chinese)
    We find that the 4th algorithm, though, does not                           [9] Li Fuying. Visual C + + graphics and video processing and
have much improvement in average extract ion time                                  database of articles [M]. Beijing: China Electric Power Press,
with other algorith ms, and even its average                                       2003:15-44(in Chinese)
recognition time seems to be a little longer. But, as is
                                                                               [10] Xu Ming. CAPTCHAs identification and anti-identification.
known to us all, the mo re sufficient the feature the                              Nanjing University undergraduate thesis. July 2007:25-33 (in
more co mputation we need to do, what’s more, this                                 Chinese)
Improved Maze algorith m brings us a great enhance
in recognition rate, nearly double the rate of others ’,                       [11] JING T ian. Application and Implementation of Stack in the
which achieves our orig inal and most important goal.                              “maze” Algorithms [J]. Journal of Huainan Teachers College.
                                                                                   2007, 9 (3):89-91(in Chinese)

                        VI.       CONCLUSION                                   [12] Dragomir Yankov, Eamonn Keogh, Li Wei, Xiaopeng Xi. Fast
                                                                                   Best-Match Shape Searching in Rotation Invarian Metric
                                                                                   Spaces. Proceedings of the 7th SIAM International Conference
        This is the first time to apply an improved maze                           on Data Mining (2007), Minneapolis, USA, 2007. Society for
algorithm into the CAPTCHAs’ contour feature                                       Industrial Mathematics, 2007:611-616

extraction, and through skillfully combining it with                           [13] YAN Wei-min. Data Structure [M]. Beijing: T singhua
                                                                                   University Press, 2008:44-65(in Chinese)
the Rotation Invariance theory, we put forward a

To top