Scanning and OCR with Acrobat 8

Turning a stack of paperinto a searchable PDF is easy with Adobe Acrobat 8.0 and optical character recognition-OCR. Using Acrobat 8, we can create an image+text PDF which maintains the original image of a paper document and adds an invisible layerof searchable text.

Acrobat 8 includes a number of OCR and scanning improvements that offer substantial benefits over previous versions.

The Acrobat 8 OCR engine is both faster and more accurate. Owners of Acrobat 5 through 7 can upgrade to the latest version at the Adobe Web Site or from any computer software reseller. If you only have Acrobat Standard, this is your opportunity to upgrade to Acrobat 8.0 Professional which offers many legal-specific features such as redaction,Bates numbering and more.

Get the Update

If you have already installed Acrobat 8, be sure you download the latest update to Acrobat (currently version 8.1).

Version 8.1 can OCR PD files that were previously Bates stamped, which was a problem in the past. You can check to see if you have the latest update by going to the Help menu in Acrobat and choosing Check for Updates:

Scanner Drivers

Acrobat can control any scanner which uses a standard TWAIN scanner driver. TWAIN is a standard software protocol that allows scanning devices and software programs to communicate with each other. The scanner manufacturer generally installs the TWAIN driver automatically along with the rest of the software for the device.

For more information about TWAIN, visit the TWAIN Working Group site.

Most scanners include a TWAIN driver, but some do not, most notably the Fujitsu ScanSnap. Although the ScanSnap includes a full version of Acrobat 8.0 Standard, it uses its own software to scan to PDF. I've scanned thousands of pages with my ScanSnap, but I prefer to scan directly into Acrobat. For that reason, I feel it is worth spending a bit more for a scanner which supports TWAIN. Other scanners in the Fujitsu product line such as the f-5110C do include a TWAIN driver.

Scanning with Acrobat 8

To scan paper into Acrobat from a TWAIN device, go to the Acrobat toolbar and click the Create button once; then choose From Scanner . . .

Scan Settings

The Acrobat Scan window will open.

I've numbered some sections of this window to explain the various options:

  1. Input Section
    1. A list of devices that Acrobat can control. If you have more than one TWAIN device, you will see it listed here. Don't be surprised if you see your digital camera listed.
    2. Choose whether you wish to scan the front sides of documents or both sides.
    3. For most legal documents, choose Black and White and 300 dpi.
  2. Scanner Options
    This button allows you to control how you will interact with the scanner:

    There are two options here: Show Scanner's Native Interface or Hide Scanner's Native Interface.

    It's a good idea to show the native interface for your first scan. You may be able to greatly speed scanning by modifying the factory default settings in the scanner interface. For example, to get the most from a Canon DR-2580C scanner, I made several changes in the native driver first. See the Canon DR-2580C article on my blog.

    Once you have the options for your device set appropriately, choose Hide Scanner's Native Interface which allows you to scan into PDF with a lot less clicks.
  3. Output
    Do you want to Scan to a new document or append to an existing one? If you already have a PDF open, Acrobat will assume you want to append to the document. Some attorneys like to create one large case file and continually add to it by scanning. If that sounds like your preferred workflow, choose the Append option.
  4. Document Optimization
    By adjusting the slider for Optimization, you can control the file size of the resulting PDF. The default settings are just fine for most legal documents, so no changes are necessary.
  5. Text Recognition
    Acrobat will scan your document, run OCR and make it accessible for the visually impaired. A side benefit of accessibility is that document structure is added that enhances your ability to copy text accurately and to save to editable formats like Microsoft Word.

    Clicking on the Options button presents some additional choices:
    1. Language
      The default is set to US English. If you have non-English documents - like French or Spanish - make sure you choose the appropriate language to improve accuracy. It's worth noting here that Acrobat always uses English as a second language. If you choose to scan in Spanish, any words that are not recognized by the Spanish language OCR engine will be passes through the English OCR engine. This is helpful for attorneys that practice in areas where both Spanish and English are used.
    2. The PDF output style for the legal market should be set to "searchable image." This will maintain an exact copy of the document, but allow for searchable text in Acrobat.
    3. Downsampling reduces the pixel count of images such as signatures, stamps, or photos in the document which helps decrease the size of your PDF.

Acrobat will remember the choices you make in the Scan Settings window, which is a great convenience. However, If if you do a "one-off", for example a document in another language, make sure to change the settings back or you may experience poor recognition results.

Scan!

Once scan settings are to your liking, click the Scan button. Acrobat will ask you to name the file if you are not appending to an existing PDF document. Make sure you save the file into the appropriate folder-e.g. a client or matter folder-on your hard drive.

Depending on your scanner, you may see some slightly different messages at this time. For example, my Canon scanner always alerts me when the document feeder is empty.

When scanning is complete, Acrobat will ask if you would like to scan more pages. This feature allows you to scan documents with lots of pages, perhaps more than the input bin on your scanner can support at one time.

When you click OK, Acrobat will begin the OCR process on the document. You'll see a progress bar:

Once the document is scanned in, Acrobat will take from four to ten seconds per page to perform OCR.

Save!

What can I do with the PDF?

You can perform many useful operations on an image+text PDF:

  1. Save back to an editable format such as Microsoft Word by clicking the Export button on the Acrobat toolbar.
  2. Search the PDF to find key facts, places, and names and so on. Go to Edit->Find:
  3. Apply your thinking about the document in the form of sticky notes and highlights. You can search the text of your own annotations, too!
    1. Right-click anywhere and choose Add Sticky Note
    2. Select some text and right-click to highlight text and add a note to it:
  4. Stamp your document with date and time of your review.
    Go to View->Toolbars-> Comment and Markup and use the Stamp Tool:

Check out the Acrobat for Legal Professionals Blog

My Acrobat for Legal Professionals Blog at http://blogs.adobe.com/acrolaw is a good resource if you want to know more about the legal applications for Acrobat.

Here are some recommended articles:

Full Text Search of PDF using Adobe Acrobat

Troubleshooting Acrobat OCR

Is that PDF Searchable?

Using Acrobat and the Canon DR-2580c Scanner

Batch OCR using Acrobat Professional

Understanding "Flavors" of PDF

cover of Adobe Acrobat 8.0 Professional [OLD VERSION]Adobe Acrobat 8.0 Professional [OLD VERSION]
Binding: CD-ROM
List price: $449.00 USD
cover of Adobe Acrobat Professional 8 [OLD VERSION]Adobe Acrobat Professional 8 [OLD VERSION]
Binding: DVD-ROM
List price: $449.00 USD
Amazon price: $233.00 USD

cover of Adobe Acrobat Professional 8.0 Upsell from Standard V5+ [OLD VERSION]Adobe Acrobat Professional 8.0 Upsell from Standard V5+ [OLD VERSION]
Binding: CD-ROM
List price: $159.00 USD