Docstoc

CAPTCHA Processing

Document Sample
CAPTCHA Processing Powered By Docstoc
					CAPTCHA Processing

CPRE 583 Fall 2010 Project
        CAPTCHA Processing
          Responsibilities
• Brian Washburn – Loading Image into RAM
  and Preprocessing and related portion of
  writeup/presentation

• Nicholas Rundle – Text Detection, related
  portion of writeup/presentation, and
  writeup/presentation Assembly

• Daniel Uhrman – Text Recognition and
  related portion of writeup/presentation
          CAPTCHA Processing
              Motivation
The ever increasing spam e-mail has led to the
development of CAPTCHAs to try and distinguish between
humans and computers. The ability to distinguish between
humans and computers is becoming more difficult as
computer systems improve. New CAPTCHA systems that
are harder to break with a computer are necessary in order
to maintain security. This project aims to break current
CAPTCHA systems as a means of showing the
weaknesses inherent in the system and to motivate ways to
improve upon the current designs.
CAPTCHA Processing
     Design


   Image          Text   overlooks
           FPGA
              Interface Design
There are two main interfaces into the system:
    1.) Ethernet to/from the PPC
    2.) Loads and Stores to/from PPC and APU


                                   Load
                         PowerPC            Auxiliary
   Terminal   Ethernet             Store   Processor
                           440
                                              Unit
         File Transfer Protocol
Client                                Server
                  TCP Connect

                     “220”



                    “AUTH”


                     “234”


               “USER Captcha Group”



                     “230”
                     Passive FTP
Client                       Server Control Port          Server Data
                  PASV                                    Port

                     227
         IP, IP, IP, IP, Port, Port


             Connect to Addr


                         ACK


                                                   DATA
                     Terminate

             “226 Success”
Features of the Xilinx llwip4 library
                (Lightweight IP)

• Standard Berkeley model for sockets
  – Lwip_listen()
  – Lwip_write()
  – Lwip_socket()
  – Lwip_bind()
  – Lwip_socket() (SOCK_STREAM for TCP)
  – Lwip_accept()
  – Read()
  – Close()
              lxilKernel library
• Features an easy threading model
• Pthread like mutex’ing
      FTP               Process            Process
     Server             Control             Data
     Thread               Port               Port




              Control
                                  Listen
                Port
                                  Data
               listen
                                   Port
              Thread
                Captcha Controller
• Our Controller coordinates dataflow
  between all of our different subsystems


    Auxiliary
   Processor           Segmenter         Classifier
      Unit




                BRAM           BRAM   BRAM
         Future PPC Work
• The PowerPC can be used for pre-
  processing
  – Noise Reduction
  – Edge detection
  – Color correction
• Also, it could be used to parse the headers
  of image files and pass this data along
  coherently
            Segmenter Unit
• Searches columns of the input image for
  the edges of letters and copies these
  columns into BRAM.

• For uniformity, output letters are fixed size
  of 32x32. Right filled with white pixels.
                Segmenter Unit


Input bram
address 0




  Output bram
  address 0         Address 32
            Segmentation
• Histogram thresholding
• Edge detection
• Region-based
               Classifier Unit
• Receives indication of successful segmentation
  of up to 8 characters from Segmenter
• Reads Segmented Characters from BRAM.
• Compares each input character to 36 template
  characters (A-Z and 0-9).
• Outputs an array of up to 8 ASCII values.
           Horizontal Projection
• The segmented characters and template characters
  are analyzed using HP (horizontal projection).
• The HP is determined by calculating the sum of each
  horizontal row of pixel values for an image.
• For our 32x32 pixel images, the HP values will be
  arrays of size 32 containing sums of up to 32 in each
  position.
       Classifier Template BRAM
• The expected HP values are pre-calculated for each
  template character.
• These values are stored in a ROM made in a BRAM IP
  core that is preconfigured with a .COE file.
• The input images from the segmenter are read from
  BRAM and compared to each of the template characters
  to find the best match.
           Correlation Algorithm
• The HP values are compared utilizing the correlation
  function from statistics shown below:




 • Where: X and Y are the HP values for an input image
   and a given template and N is the length of the HP array.
      Correlation Algorithm Cont’d
• Due to the following constraints we went with the
  following modification of the correlation equation:
   – No IP Core for floating point conversion in version 10.1 of tools.
   – No IP Core for an integer-based square root function.
   – Potential overflows as a result of large summations and
     multiplication.
• Implemented as 16 dedicated multipliers, 1 larger width
  multiplier as well as 1 dedicated divider.
      Potential Future Work
• Implement “learning” functionality in
  classifier so that the template ROM is
  actually a RAM and can be updated based
  upon CAPTCHA techniques it observes.
• Utilize CAPTCHA Detection Unit for name
  recognition from security badges, or
  license plate identification on speed
  cameras.
                Integration

• In its current form, the project works fully
  in Modelsim with various test inputs.

• In HW, the project works all the way up to
  the classifier. The classifier unit has many
  multipliers and uses a pipelined divider
  which is a potential point of timing
  irregularities. We are adding pipeline
  stages to account for these timing issues.
      Potential Future Work
• Implement “learning” functionality in
  classifier so that the template ROM is
  actually a RAM and can be updated based
  upon CAPTCHA techniques it observes.
• Utilize CAPTCHA Detection Unit for name
  recognition from security badges, or
  license plate identification on speed
  cameras.
          CAPTCHA Processing
               Papers
• Algorithm to Break Visual CAPTCHA (ICETET 2009)
• Bio-inspired unified model of visual segmentation system
  for CAPTCHA character recognition (SiPS 2008)
• CAPTCHA Security: A Case Study (Security & Privacy
  July 2009)
• Recognizing object in adversarial clutter: breaking a
  visual CAPTCHA (Computer Vision and Pattern
  Recognition 2003)
• Reverse Engineering CAPTCHAs (WCRE 2008)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:10/24/2011
language:English
pages:23