Almost perfectly fit for Arthur Clarke’s famous saying that “any sufficiently advanced technology is indistinguishable from magic,” Word Lens is a translation app. If you open it up on your iPhone and hold it up to any sign in Spanish, it automatically translates it, replacing the foreign words in the viewfinder with English.
But besides its magical demo, the impressive thing about the company is that Good and his partner built the optical character recognition and translation technology from scratch. Word Lens doesn’t rely on Google Translate, because Good doesn’t want the app to be totally dependent on a steady Internet connection. For all he knows, a tourist could be in an isolated, rural part of Bolivia or Colombia. If you don’t understand a sign, it’s not like you can wait for a connection to show up. So everything has to be done locally on the phone.
Because it was designed specifically for this task, Good says WordLens’ OCR probably works better than any off-the-shelf technology for on-the-fly augmented reality translation.
Good initially got the idea for WordLens while he was traveling in Germany and wanted to understand what he was reading. He told a friend about the idea while they were attending the computer graphics convention SIGGRAPH and spent the next few weeks building a “lame, little prototype.”
Not knowing if his project would really even work, he quit his job at Sega shortly afterwards. (It helped that he had built up and sold a roughly 50-person company called Secret Level to Sega four years ago.) So Good had funds to bootstrap his project for awhile. About a year ago, he showed it to John DeWeese, another programmer working on a similar idea at Hacker Dojo down in Mountain View. He joined on full-time shortly afterward.
A breakthrough came when they were trying to test taking snapshots. Testing WordLens was painfully slow: they had to take a snapshot of a phrase in a foreign language, plug in a USB drive and copy the file over. In an experiment, they replaced that by hooking up a video cable to film phrases in Spanish, and Voila!
It became apparent that WordLens should work in real-time with video. Humans were just much more capable at auto-correcting and tilting their phones to get a better view of text than software was at figuring out how to correct distortion or compensate for poor lighting.
He and DeWeese also had to come up with translation technology. For starters, they created a dictionary and did basic word-for-word lookups. They also did some statistical translation work, using European Parliament transcriptions in multiple languages to see which translations were statistically most often paired with each other. The problem with that method though, is that certain terms don’t match up to their colloquial meanings. For example, the word “house” in that context isn’t referring to someone’s home. It’s talking about a part of Parliament.
Because of its more basic approach to translation, Word Lens isn’t always perfect. It’s just meant to give the user a basic sense of a phrase’s meaning.
“We do have an unhappy category of users I call the ‘linguistics professor.’ They usually say the grammar is outrageous or that we got the masculine-feminine version of words wrong,” Good said. “Those people were expecting it to be something it was never intended to be. It’s meant to be a tool for tourists.”
The easiest part of Word Lens’ technology is actually replacing the words with translated text. Because the OCR has already identified the text and understands how much space it takes up, Word Lens can sample the surrounding color and simply replace the words with their translation.
It always uses a basic all caps, Arial font. Adding font-matching, Good says, would end up being too distracting. If WordLens was constantly adjusting the translation and toggling between serif and sans-serif text at the same time, it would probably be too much.
After Good felt he nailed the basics enough for friends to travel abroad and find the app useful, he launched it to an explosively receptive audience. The initial YouTube video garnered more than 3 million plays. While Good wouldn’t reveal numbers, he’s more than paid for the costs of him and DeWeese building the app over the last 2 1/2 years. He recently bumped the price up to $9.99 from $4.99 too.
Good seems fairly intent on keeping control of the company, but that isn’t stopping investors from reaching out. Because he’s sold a company before, he’s had enough funds to bootstrap QuestVisual (the name of the parent company) for two years and afford a Financial District office for it.
He concedes though, that if anyone could threaten his business, it would probably be Google. Already, the search giant released a version of Google Translate with an experimental “Conversation Mode” that lets you record yourself and then plays a translated recording of your words. Plus there’s already Google Goggles, which can initiate searches from photographs.
However, the upside of not relying on the cloud — as Google would — is that it may take several years before mobile broadband in foreign countries is readily accessible and fast enough for a web-based competitor to emerge. That leaves a healthy market of tourists for QuestVisual.
But the downside of handling everything locally is that Good doesn’t get statistics back on usage or on whether people seem to be happy with the translations. For now, he’s relying a lot on forums and anecdotal feedback.
Next up are versions of the app in other languages, probably starting with European languages first. Then perhaps other platforms. The software is written in such a way that it isn’t iPhone specific, so it’s easier to port to other platforms like Android. (It even required a bit of assembly language.)
“I think I can put together as good a programming team as anybody and I intend to make a really high quality product,” Good said.
Photo taken by Robert Scoble