Will Google ever be able to read Flash Icon symbol text?

A big stumbling block in whether or not future updates of crawlers will be able to do more with Flash has to do with Optical Character Recognition (OCR). Many sites use stylised text ‘symbols’ instead of the ‘dynamic’ graphical text, this cuts down on the size of files as it doesn’t matter how many times a symbol is used it is only loaded once.

With the most recent update for crawlers encountering swf Icon files Google has only now added the ability to read dynamic text, despite it being introduced in August 2000 with Flash 5. If it has taken them a full 8 years to develop a way to do this relatively more simple task, it looks like it will be a long way off before they are able to do fully automated OCR.

In April 2007 Google sponsored an open source project into character recognition called Ocropus. It’s goal was not to develop an AI that could recognise text in symbols, but rather one that could assist in cataloging libraries and helping vision impaired web users.

Building on their acquisition of HP’s 20 year old software Tesseract (another OCR designed to help catalogue physical book) as well as their recent announcement of working with Adobe Icon, Google would appear to be working actively to catalogue even more of the world’s knowledge, much of which is in picture format.

Whilst piecing this together is a big leap from these facts alone, other information would seem to suggest that they are interested in being able to read and recognise images on the web. In June 2007 Google filed a patent for recognising text in images, although a spokesperson for Google later stated that they

…file patent applications on a variety of ideas that our employees come up with. Some of those ideas later mature into real products or services; some don’t. Prospective product announcements should not necessarily be inferred from our patent applications.

With the advent of search engines like kooaba allowing users to send images via multi-media messages from their phones and get seemingly accurate results, how long will it be before everything Google are investing in allow it to do the same?

Personally, I don’t see this technology happening for the next 3 to 5 years in any usable format and until that point I’d recommend still following all the usual practises for optimising Flash sites.

Have I made too bold a leap here? Let me know in the comments.

Tags: , , ,