On image labelling
Labelling data is a labourous side task that arises in most computer vision projects. Since the developers usually don't want to spend their time for such a dumb work, there exist a number of workarounds. Let me enumerate some I've heard of:
- At Academia, the task of labelling is usually being endured on [PhD] students' broad shoulders. The funny part is the students are not always enrolled in the relevant project. At Graphics & Media Lab, students who have not attended enough seminars by the time of revision, should label some data sets for the lab projects.
- One could also hire some people to label her data. Since the developers/researchers are relatively high-paid, it is economic to hire other folks (sometimes, they are students as well). UPDATE: hr0nix mentioned in the comment that there exists the Mechanical Turk service that helps requesters to find contractors.
- The more witty way is to use applied psychology. For example, Google transformed the labelling process to the game. During the gameplay, you and your randomly chosen partner tag images. Sooner you tag an image with the same tag, more points you get. The brilliant idea! Believe or not, when I first saw it, I was carried away and could not stop playing until my friends dragged me out for a pizza!
- The most revolutionary approach was introduced by Densey Tan. Here is a popular explanation of what he has done. The idea is to capture labels straight from one's brain using EEG/fMRI/whatnot. Now they can perform only 2 or 3 class labelling, but (I hope) it is only the beginning.
24 January 2010 at 01:11
How can you forget about Mechanical Turk?!
24 January 2010 at 21:11
Thanks, I've updated the post.
According to the article, most of MTurk workers are US citizens, so, in Russia, it seems to be more profitable to find someone here (who is ready to label your data for a bottle of vodka :). It would be a nice idea to expand MTurk to Bangladesh.