Orphan Works & Optical Character Recognition Software

By Jason Boog 

adxgetmedia.pngWhile researching an essay about New York City poets and the Great Depression last year, this GalleyCat editor read through hundreds of pages from 1930s novels, periodicals, and self-published materials that couldn’t leave the New York Public Library.

Optical Character Recognition (OCR) software can help authors and researchers digging through a stack of orphan works. These specialized tools convert scanned, photographed, or written text into digital text. We test drove ABBYY FineReader Express for this article–the software voted “Best Text Recognition Tool” by Lifehacker readers.

The OCR company has been around for 20 years, and the program now recognizes 171 different languages. Embedded below, you can see screen shots of the text capture process–watching a 75-year-old self published poetry journal page enter the digital age.


In a telephone interview, senior product marketing manager Wendy Wang shared tips for writers hoping to utilize this technology while doing library research. She explained: “We have developed digital camera OCR–you can even use your cellphone camera. You can capture a certain book page image and go back to your office to develop the image.”

She had these tips for taking better photos of text: “Make sure have 5-megapixel camera. Lighting is another issue, libraries are kind of dark. For a better OCR result good lighting. Focus is also important. When you shake your camera, it will decrease the image resolution.”

Here is a photocopied page from the self-published Raven Poetry Circle Anthology, published in January 1934.

ravenpoetryanthologyoriginal.jpg

Here is a screen shot from inside the ABBYY FineReader Express program for the Mac–the text recognition software has recognized the portions outlined in green.

ravenabbyy23.jpg

Here is the Microsoft Word document version of the scanned text–a search-able and sharable digital copy of a 75-year-old orphaned text.

ravenfinal.jpg