Journalists have been handed wonky PDF files or had to scan mountains of paper documents for years, but until relatively recently there hasn’t been an easy way to translate those docs into digital text. Several tools for converting PDF files into text using optical character recognition or OCR for short have popped up recently, but which one works best?
To see which OCR tools did the job and which ones fell flat, a one-page online document was printed and scanned on an HP DeskJet F4280 printer at 200 DPI. The results are below and you can view the original document here.
Downloadable software available for PC
Accuracy: The software gets the majority of the text right, but portions of the document are translated into indecipherable characters, especially the italic text.
Private document storehouse and analysis tool for newsrooms and journalists
Accuracy: Pretty close with a few errors here and there.
OCR app available from iTunes for $9.99
Free document creation, sharing, and storage system with OCR feature
Accuracy: Close to perfect with a few odd characters throughout the text.
Online OCR and conversion tool; several format and language options; free with restrictions
Accuracy: Near perfect with a few missing punctuation marks. Great results for a free tool.
PDF management software with OCR capabilities; $499
Accuracy: Results are near perfect and comparable to OCR Online. Which means unless you already have the program or are willing to pay 500 bucks, OCR Online is a more attractive choice.