Tuesday, October 13, 2009

When Computers can't, Humans can help

Luis von Ahn, an Assistant Professor of Computer Science at Carnegie Mellon and a Duke University graduate in Mathematics was here to deliver the first of the Distinguished Computer
Science Alumni Lectures.

Luis von Ahn is the inventor of CAPTCHAs, the squiggly random letters/numbers that one finds at the end of an online form to verify that you're a real person. He went on to develop RECAPTCHA, and the concept of GWAP (Games With A Purpose). His work mostly revolves around human computation and harnessing the collective power and time of individuals to solve problems that the computer cannot tackle yet.

"Humans can read CAPTCHA, but computers are unable to do so. That is what we were interested in finding -- a test that humans can pass, but computers cannot."

CAPTCHAs ensure that it is a person on the other end who is typing the information and not a machine. A majority of the big websites like Gmail, Facebook, Twitter use them to protect automatic programs from entering information in the forms. They help ensure that spammers won't write programs to create millions of email accounts for sending junk emails.

"Statistically speaking, around 200 million CAPTCHAs are typed everyday. On an average it takes around 10 seconds for a human to type a CAPTCHA," Luis told us. "That meant a huge chunk of human hours were wasted typing my CAPTCHAs. Then, I started feeling bad."

Consequently, Luis came up with the idea of RECAPTCHA. He found a task that computers are not good at doing, but would be with the help of humans. In the case of RECAPTCHA, the problem in hand was digitizing books. The solution- split the whole gigantic task of digitizing books into 10 second intervals and use that human time in typing CAPTCHAs to figure out another word that the book scanner (OCR) could not infer.

"The idea is simple. You start with an old book, scan it, and the computer would decipher the words using Optical Character Recognition. However, the computer is not perfect at doing this. For example, the OCR cannot recognize approximately 30% of the words in books published before 1900. So what we do is take that word which the computer cannot read, and display it with another CAPTCHA for which the computer already knows the answer." Humans who type in the known word and the unknown word to solve a CAPTCHA while commenting on a blog or opening a new email account are thus helping the machines decipher the unrecognized words.

With the help of RECAPTCHA, the number of words being digitized every day is around 50 million, which is equivalent to 4 million books a year.

Luis' second big project was reusing wasted time cycles. For example, in 2003, 9 billion hours were spent playing solitaire. Luis hoped to utilize this time and convert it into something useful.

"I thought that computers are still bad at labeling images with captions or words. For example, if you search for something on Google Images, it doesn't always give the relevant images. This is because the search engine doesn't have an accurate description or label it is searching for."

So he invented the
ESP game, and a side-effect of the game was that people are actually labeling the images on the web. You can see the paper Luis published on harnessing the power of the ESP game here.

"5000 people playing the ESP game simultaneously can label all images in Google in 2 months!"

Luis' next big project, which is still in development, is related to yet another task which computers are not capable of performing perfectly -- language translation.

He has devised a technique in which people knowing only one language can translate another strange language almost as well as an expert.

Luis is a recipient of the MacArthur Fellowship, Microsoft New Faculty Fellowship, and a Sloan Research Fellowship. Among other honors, he was named one of the 50 Best Minds in Science by Discover Magazine, one of the Brilliant 10 scientists by Popular Science and one of the top innovators in Arts and Science by the Smithsonian Magazine.

No comments: