Google’s Library of Babel, Redux


Last week we posted a piece asking whether the errors that pop up in Google’s book scans–“it seems doubtful that Zarathustra spake thus: ‘Full^is_£arth of superfluous ones….'”–make Google’s books a library of Babel. We asked whether readers thought this is a huge problem, or something we can ignore. The post sparked a heated argument between two readers, someone posting as “wmartin46” and blogger Mike Cane.

After explaining his understanding of how Google’s OCR software works (and he points out that Google has the best text-recognition software in the buinsess) “wMartin46” admits “The error rate for the .epub/text versions of free books on the Google WEB-site is generally less than 5%. Unfortunately, this is way too high to result in reading a book without having to stop and figure out what a garbled word means. Students have underlined and disfigured many of the books coming from libraries. In those cases, the OCR code is not well designed, and garbage results.”

Then he says something that really pisses Mike Cane off: “I can’t thank Google enough for the material I have been able to obtain from their Google/Books project, and have nothing but praise for them at the moment.”

Here’s how Cane responds: “You are thanking the wrong frikkin people, pal. Thank everyone who funded the *libraries* that preserved the material Google is sucking up. If Google paid anything for that privilege, it’s only a *fraction* of the investment that was originally put in — and still continues.”

It gets a little ugly after that, but these comments nicely present two popular fronts in the war of Google Opinion: essentially “Google is Good” vs. “Google is Evil.” Maybe Judge Denny Chin should take a look at this.