12/10/2011

Notes – Massive-scale online collaboration

There is a popular presentation on ted Titled as massive scale online collaboration given by Luis von Ahn.


Luis is a well known computer scientist who focuses on so called human computation technologies. He is famous for his previous projects CAPTCHA and reCAPTCHA. In fact, the word CAPTCHA is coined by him for “Completely Automated Public Turing test to tell Computers and Humans Apart”  in the paper: CAPTCHA: Using Hard AI Problems for Security .

CAPTCHA is publicly well known since we should already encountered them many times in our daily web life. But reCAPTCHA is not so well known but in fact we should also had faced it many times and this technology is solving some hard AI problems every day.

The motivation behind reCAPTCHA is that, there is about 200M CAPTCHA inputs per day and each input spends a people 10 seconds around. This is really a huge time and intelligence waste, so Luis want to leverage such kind of resource to accomplish some useful work – solving AI problems that can be divided into 10 seconds small chunks.

Fortunately, there do be one such problem – book digitizing: scan real books and turn scanned pictures into text. There are already many OCR (optical character recognition) technologies to do this automatically. But they are not good enough, it’s said that for books older than 50 years ago, OCR can’t handle +30% of them. So we can divide those OCR task into small pieces (usually, one word per piece) and let people solve them while they are doing CAPTCHA on Internet, which is called reCAPTCHA.

How it works? Each time a CAPTCHA is requested, the system send two pictures to people. One picture contains word that the system already knows but the other are not, and the unknown one comes from books that need to be OCRed. When receiving feedback from human, the system check whether the first picture/text matches, if yes, it has some confidence that the second picture/text also matches. To handle those cases that the second pair failed to match, the system will send the same picture multiple times and use the most popular answer as the final result.

This works very well and the reCAPTCHA is acquired by Google in 2009. But Luis didn’t stop there, now he is introducing another great ideas called Duolingo. The problem Duolingo wants to solve is translate the web into different languages and the challenges for this are:
- Lack of bilinguals
- Lack of motivation

The way to solve this problem is:learning by doing for language learners.

When doing language translation exercises, the learners are given real world sentences that come from the web translation problem. This solution is pretty good because it can solve the web translation problem because there are so many language learner in this world, and it also has positive feedback look to solve the motivation challenge:
- Learn with real content, thus learners has good exercise to improve their skills
- Fair business model for language education, thus learners can learn for free since he had contributed some valuable stuff while learning

Luis called his problem as duolingo and I think this project is very promising and super attractive.

No comments: