   Ian Boggs
        What is CAPTCHA?
• Completely Automated Public Turing test
  to tell Computers and Humans Apart
• Really a reverse Turing test
• Something that is “easy” for humans to do
  but hard for computers
• All because of SPAM!
        What is CAPTCHA
• Mostly focusing on visual CAPTCHA
      Text vs. Images (bitmaps)
• Images (bitmaps) are made of individual pixels:

                                Magnification of cat’s
     Text vs. Images (bitmaps)
• Each pixel specifies the
  color at that point in the
• A grid of pixels forms an
     Text vs. Images (bitmaps)
• Text (like in this presentation) is represnted
  in the computer as a series of codes, e.g.
• “A” really means “code 65” to the
• Each character is a discreet unit of data.
     Text vs. Images (bitmaps)
• Why can’t a computer read text in images?
  – To a computer, an image is just a grid of pixels
  – Doesn’t contain anything it understands as
  – Computer can tell the difference between two
    different pixels in an image.
     Text vs. Images (bitmaps)
• Optical Character Recognition
  – Look at an image of pixels
  – Find groups of pixels that contrast with the
  – Compare group with known pattern of pixels

  Optical Character Recognition
• Easy to do on plain images:

    Image of Dr. Wu-chang Feng’s email address from his hompage

    Online OCR program had no problem converting into text
 Optical Character Recognition
• CAPTCHA breaks OCR algorithms
  – OCR can’t pick out patterns that look like text,
    or ends up misreading image noise as text.
    Problems with CAPTCHA
• From the user’s perspective:
  – Harder to get your internettin’ done
  – Possible to get locked out after misreading
    CAPTCHA enough times
  – Blind users at a serious disadvantage
     • But they have audio CAPTCHA!
  – Images are getting more complex because…
    Problems with CAPTCHA
• Visual CAPTCHA can and has been broken
  – Must be easy enough for humans to read but
    too hard for computers to OCR.
  – Yahoo’s EZ-Gimpy:
    Problems with CAPTCHA
• Send the CAPTCHA image to a real human
  – People can decode 100’s of CAPTCHAs an
  – SPAM Bot sends CAPTCHA image to human
  – Human decodes and sends back
  – Strip tease CAPTCHA game
    Problems with CAPTCHA
• Reuse CAPTCHA image session id
  – Has to do with HTTP protocol and Cookies
    • Server sends CAPTCHA image with attached
      cookie containing Session ID to user.
    • Real person completes CAPTCHA and returns it but
      saves session ID cookie
    • Valid session ID = Completed CAPTCHA
    • Session ID is reused by spam bot and server thinks
      it’s a real person.
      Alternative CAPTCHAs
• Have user do a simple math problem (What
  is two plus four?)
• Ask the user to select a “duck” from a series
  of images
And finally
