Rip and read: 6 OCR tools put to the test

Journalists have been handed wonky PDF files or had to scan mountains of paper documents for years, but until relatively recently there hasn’t been an easy way to translate those docs into digital text. Several tools for converting PDF files into text using optical character recognition or OCR for short have popped up recently, but which one works best?

To see which OCR tools did the job and which ones fell flat, a one-page online document was printed and scanned on an HP DeskJet F4280 printer at 200 DPI. The results are below and you can view the original document here.



Downloadable software available for PC

Accuracy: The software gets the majority of the text right, but portions of the document are translated into indecipherable characters, especially the italic text.

View results of OCR with SimpleOCR



Private document storehouse and analysis tool for newsrooms and journalists

Accuracy: Pretty close with a few errors here and there.

View results of OCR with DocumentCloud


SayWhat Translator

OCR app available from iTunes for $9.99

Accuracy: Total fail. Couldn’t recognize a single word. The results aren’t much better with larger or less text.



Google Docs

Free document creation, sharing, and storage system with OCR feature

Accuracy: Close to perfect with a few odd characters throughout the text.

View results of OCR with Google Docs


OCR Online

Online OCR and conversion tool; several format and language options; free with restrictions

Accuracy: Near perfect with a few missing punctuation marks. Great results for a free tool.

View results of OCR with OCR Online


Adobe Acrobat X Pro

PDF management software with OCR capabilities; $499

Accuracy: Results are near perfect and comparable to OCR Online. Which means unless you already have the program or are willing to pay 500 bucks, OCR Online is a more attractive choice.

View results of OCR with Acrobat X Pro