25 – OCR and PDF's by bestt571


OCR (Optical Character Recognition) technology refers to electronic equipment (such as a scanner or digital camera) to check paper to print the characters, by detecting the dark, bright patterns to determine the shape, then use the character recognition method to shape the text translated into computer process; that is, scanning the text data, then analyzes and processes image files, text and layout information for the process.

More Info
									25 – OCR and PDF’s                                   5.   Click Scanner and
                                                          choose CanonScanLiDE                               PDF’S
                                                          70. Click OK.
                                                     6.   Place your document(s)
Contents                                                  onto the scanner glass                             Printing a PDF
Optical Character Recognition                             and click on the Scan
                                                          icon to start a new scan.                          The
• Background
                                                     7.   Click Continue to add                              best
• How To
                                                          new pages, and Done                                way
• Converting to Text                                                                                         to print a PDF is to use the print icon rather than
PDF’s                                                     when you are finished.
                                                                                                             going File > Print
• Printing a PDF
• Creating a PDF                                     Converting to Text:                                     Creating a PDF
• Converting a PDF to Text
   o Save as Text in Adobe                           1.   Click on the Send Text to Word icon.               Should you need to convert a
   o Using OCR by Converting the PDF to a TIFF       2.   Choose All Pages if you have scanned more than     document into a PDF format for
• Caution                                                 one page.                                          some reason (for this example we
•  Tips                                              3.   Click OK. Microsoft Word will open.                are going to assume a Word
                                                     4.   Your document will now appear as text in           document):
                                                          Microsoft Word.                                    1. Open the document you wish to
                                                     5.   Click View > Print Layout to make it look              convert to a PDF.
                                                          normal. Check for errors and save as a Word        2. Click on File > Print. Click on the down arrow
Background                                                document.                                              beside the box with the printer name in it, and
                                                                                                                 scroll up until you see Cutepdf Writer. Click
Optical Character Recognition (OCR) allows you to    Tips:                                                       OK.
scan a document and then convert the scanned text                                                            3. A Save As screen
into editable text the you can change or use in      If you are already have a scanned document file (in         will appear.
Microsoft Word.                                      TIFF) format you can open it and use the Send Text          Specify where you
                                                     to Word function as above.                                  would like the PDF
How To                                                                                                           to save and a file
                                                                                                                 name, then click
Converting a scanned image to OCR is easy. Simply:                                                               Save.
                                                                                                             The document will be
1.   Click on Start > Course Specific Software >
                                                     After you have scanned the document it is possible to   saved as a PDF in the
     Desktop Publishing > MS Offfice Document
                                                     save as a multi paged TIFF (by chosing File > Save      location you specified.
     Imaging. You should now see this box:
                                                     As), instead of converting it to text.
                                                     After sending to Word, you may need to close some       How to Convert a PDF to Text
                              Send Text to           error messages.
                                                     Please observe copyright laws, and avoid                Occassionally you want to convert a PDF to text so
                                                     plagiarism.                                             that you can take quotes out of it. Sometimes
                                                                                                             though, the PDF is of a scanned image and you are
2.   Click on the Scan
                                                                                                             unable to select the text, to copy and paste it into
     icon                                                                                                    Word. There are two ways around this problem:
3.   Choose Color even if
     your text is black and
4.   Choose Prompt for
     Additional pages if
     you want to scan more
     than one page.
Save as Text in Adobe                                        More Information
                                                             For more information about burning to a CD:
1.   Open the PDF by clicking on it. This will open the
                                                             • ask a CRR (yellow-jacketed) tutor-supervisor for
     file in Adobe Reader.
                                                               assistance or
2.   Click on File > Save as Text. Name the file, and
                                                             • go to www.google.co.nz and do a search with the
     select the location you would like it to save.
                                                               keywords “burning CD Windows XP.”
     Select Save.
                                                             SCS Tipsheets are available @ the web address:
     Caution:                                                http://www.otago.ac.nz/sch
     This will strip out all the formatting and save it as
     a text file. This also tends to put in extra
     characters that are the hidden characters from
     the Word formatting.
Use OCR by Converting the PDF to a TIFF File                                                                      Character
(Only do this if you are unable to select the text to                                                             Recognition
copy into Word).

To do this involves several steps:
                                                                                                                  (OCR) and
1. Right click on the PDF > Open with > Adobe
    Photoshop Elements 3.0.
2. This will open the PDF in Photoshop and ask you
    to Select a Page to Open. (Unfortunately you
    can only do one page at a time, so it may be
    worth deciding what pages you need beforehand.)
    Select a page and
    click OK. Change to
    Resolution box to
    200 dpi. Click OK.
3. This will save the
    page into
                                                                                                                  Tip sheet #25
4. Go File > Save As.
    Change the file name if you want to. Change the
    Format to a TIFF > Save.

Caution                                                                                                           Student
No OCR programme is perfect, and some text may be
incorrectly recognised.                                                                                           Computing
                                                                                                                  April 2007

To top