By Tony Henning
Microsoft is developing technology that would make it possible to take a picture of an object with a camera-phone, and then use the image to do a search of a web-based database for more information. Calling the concept “Phone2Search,” the technology is being investigated by the Web Search and Mining group within Microsoft Research Asia, according to a company blog. The technology would be an alternative to having to use the phone’s keypad to type out search queries. A user takes a photo of a real-world object and sends the photo, via e-mail or MMS, to a web-based server, which searches an image database for matches. The server then delivers database information to the user, such as detailed information about a product or tourist site, price comparisons, a menu from the restaurant, hotel room rates and availability, etc.
“This technology,” says one of the principal researchers Xing Xie, “aims to solve the problem of mapping a physical-world object to a digital-world object. You see an object in the physical world, and you want to know the corresponding information in the digital world — for example, its price on the web, user comments, or web sites. There are many different solutions. You can use a bar code or radio frequency identification. But using a picture of the object is very convenient and very easy to deploy. As the old saying goes,” Xie says, “a picture is worth a thousand words.”
Xie and his colleagues investigated Content Based Image Retrieval (CBIR) and existing computer-vision techniques, but found both approaches wanting. In the second half of 2005, the research team rebuilt the system, with image matching based on some well-known computer-vision algorithms that extract features from images. That choice proved productive, resulting in an efficient, high-dimensional index that can search through a large image database and return results quickly — combing through a collection of 6,000 images and delivering matches in a mere three seconds using a common laptop. The searchable database still needs to be a predefined collection of images, but they can be harvested from the web. Manual annotation and organization are then employed to enhance performance.
Given that methodology, the stated goal strikes us as a bit ambitious. A collection of 6,000 images is a drop in the bucket compared to just the volume of visual material already on the internet, which must be something like six orders of magnitude greater — how long to manually annotate and then filter through 6 billion images? — not to mention the universe of potential real-world subjects. And there are already a number of companies commercializing exactly this functionality, albeit on a much more modest scale. We covered Neven Vision and ActiveSymbols in last week’s MIR, and we’ve written about Mobot on numerous occasions as well. Still, the goal is tantalizing and having the financial and intellectual resources of Microsoft behind the project adds credibility and momentum to the concept.
In a paper entitled “Photo-to-Search: Using Camera Phones to Inquire of the Surrounding World,” to be delivered in Japan in May during the upcoming seventh International Conference on Mobile Data Management, Xie and co-authors Mingjing Li and Wei-Ying Ma, both of Microsoft Research Asia, and Menglei Jia and Xin Fan of the University of Science and Technology of China, underscore how important camera-phones could become in searching via mobile devices.
“The value of camera-phones on daily information acquisition has not been sufficiently recognized by the wireless industry and researchers,” the authors state. “With necessary technologies, they [could] become a powerful tool to acquire … information [about] the surrounding world on the go.”
We would, of course, echo that sentiment and then some — the value of camera-phones has not been recognized sufficiently or, some might say, at all, particularly here in North America. The wireless operators, who spend more on advertising than any other industry (the top seven wireless carriers spent nearly $5 billion in 2004, according to Advertising Age), continue to sell mobile phones and services as utilities, not as the incredibly sophisticated and personal lifestyle devices they are recognized as in most other markets. It’s all about buckets of minutes and network coverage and not the benefits and value of that camera you have with you all the time. The failure to deliver the message of Connected Imaging, to tell that story to consumers, is one of the prime motivations for 6Sight. Join us in Monterey, California on October 25 and help us tell that story. Check it out at http://www.6sight.com