Duke Research Blog: human computation

Wednesday, November 4, 2009

Social Tagging to Verify Identity

Because the anonymity of the web might allow a major chunk of social network users to be dishonest, an algorithm being developed by a Duke graduate student is trying to estimate a given user's credibility by asking their friends if they're legit.

Michael Sirivianos, a PhD student in Computer Science presented FaceTrust in a talk on Tuesday.

It's a unique system he and his fellow researchers developed that uses a person's Online Social Network (OSN) and reputation among friends to verify how trustworthy they may be. He calls it "relaxed and attribute-based credentials."

In the talk entitled "On the Internet, 'Am I really not a dog?' " Michael explained the complete concept behind their system and how it could be an essential step in assessing the personality of users on social networks.

"An online world without identity credentials makes determining who to believe very difficult," Michael said.

Sites like Amazon, eBay, dating sites, Craigslist etc., might "simply ask you a question, like, if you are over 18 or not, and if you say yes, there is no other checking mechanism to determine if you are speaking the truth or not," Michael said.

The approach that Michael and his team took in solving this problem is inspired by the wisdom of crowds-- the power of user feedback. FaceTrust employs "crowd vetting," i.e., using feedback from friends to determine whether an online user's statement or assertion may be credible.

"Online Social Network users tag/vote for their friends' verifiable identity assertions. These OSN providers issue credentials on the user's assertions," Michael explained.

"We do tagging using a concept called social tagging. Currently, we use Games With a Purpose to assess the credentials. Basically, a combination of fun and serious (useful) assertions are presented to the user's friend, and they vote for a yes or no," Michael added.

Since FaceTrust utilizes OSN and friend feedback as reliable sources, it is successful in providing probabilistic assurances of a user's credibility.

"Of course a number of concerns come up- there can be dishonest users, credentials can be forged, and maybe the users don't even tag at all. Also, we need to preserve the privacy of taggers and preserve anonymity of users that present credentials."

To ensure that the feedback from only honest users is weighted in the algorithm, only friends of that user on that network can tag a user. Friending can be seen as a form of trust with high probability.

The algorithm also assumes that most honest users have friends who will not tag their honest assertions about themselves as false.

"To further ensure the fact that the feedback we collect over a large range is reliable, we employ trust transitivity via a method called Trust Inference. We use history to determine similarity. It is observed that honest friends have a history of tagging attributes about a person similarly," Michael explained.

The complete paper, entitled "FaceTrust: Assessing the Credibility of Online Personas via Social Networks" can be found here.

Michael Sirivianos completed his B.S. in Electrical and Computer Engineering at the National Technical University of Athens, and M.S. in Computer Science and Engineering at UCSD. His other projects include Free-riding in BitTorrent networks, Loud and Clear(L&C), and Dandelion.

Tuesday, October 13, 2009

When Computers can't, Humans can help

Luis von Ahn, an Assistant Professor of Computer Science at Carnegie Mellon and a Duke University graduate in Mathematics was here to deliver the first of the Distinguished Computer

Science Alumni Lectures.

Luis von Ahn is the inventor of CAPTCHAs, the squiggly random letters/numbers that one finds at the end of an online form to verify that you're a real person. He went on to develop RECAPTCHA, and the concept of GWAP (Games With A Purpose). His work mostly revolves around human computation and harnessing the collective power and time of individuals to solve problems that the computer cannot tackle yet.

"Humans can read CAPTCHA, but computers are unable to do so. That is what we were interested in finding -- a test that humans can pass, but computers cannot."

CAPTCHAs ensure that it is a person on the other end who is typing the information and not a machine. A majority of the big websites like Gmail, Facebook, Twitter use them to protect automatic programs from entering information in the forms. They help ensure that spammers won't write programs to create millions of email accounts for sending junk emails.

"Statistically speaking, around 200 million CAPTCHAs are typed everyday. On an average it takes around 10 seconds for a human to type a CAPTCHA," Luis told us. "That meant a huge chunk of human hours were wasted typing my CAPTCHAs. Then, I started feeling bad."

Consequently, Luis came up with the idea of RECAPTCHA. He found a task that computers are not good at doing, but would be with the help of humans. In the case of RECAPTCHA, the problem in hand was digitizing books. The solution- split the whole gigantic task of digitizing books into 10 second intervals and use that human time in typing CAPTCHAs to figure out another word that the book scanner (OCR) could not infer.

"The idea is simple. You start with an old book, scan it, and the computer would decipher the words using Optical Character Recognition. However, the computer is not perfect at doing this. For example, the OCR cannot recognize approximately 30% of the words in books published before 1900. So what we do is take that word which the computer cannot read, and display it with another CAPTCHA for which the computer already knows the answer." Humans who type in the known word and the unknown word to solve a CAPTCHA while commenting on a blog or opening a new email account are thus helping the machines decipher the unrecognized words.

With the help of RECAPTCHA, the number of words being digitized every day is around 50 million, which is equivalent to 4 million books a year.

Luis' second big project was reusing wasted time cycles. For example, in 2003, 9 billion hours were spent playing solitaire. Luis hoped to utilize this time and convert it into something useful.

"I thought that computers are still bad at labeling images with captions or words. For example, if you search for something on Google Images, it doesn't always give the relevant images. This is because the search engine doesn't have an accurate description or label it is searching for."

So he invented the ESP game, and a side-effect of the game was that people are actually labeling the images on the web. You can see the paper Luis published on harnessing the power of the ESP game here.

"5000 people playing the ESP game simultaneously can label all images in Google in 2 months!"

Luis' next big project, which is still in development, is related to yet another task which computers are not capable of performing perfectly -- language translation.

He has devised a technique in which people knowing only one language can translate another strange language almost as well as an expert.

Luis is a recipient of the MacArthur Fellowship, Microsoft New Faculty Fellowship, and a Sloan Research Fellowship. Among other honors, he was named one of the 50 Best Minds in Science by Discover Magazine, one of the Brilliant 10 scientists by Popular Science and one of the top innovators in Arts and Science by the Smithsonian Magazine.

Wednesday, November 4, 2009

Social Tagging to Verify Identity

Tuesday, October 13, 2009

When Computers can't, Humans can help

Duke Research Blog

About This Blog

My Blog List

Related Links

Labels

Blog Archive