I returned from my eurotrip yesterday and now I am ready to start a series of posts about International Computer Vision Summer School (ICVSS 2010). Generally, I enjoyed the school. That week gave me (I hope) a lot of new knowledge and new friends.
The scientific programme of the school included lectures, tutorials, a student poster session and a reading group. Lectures occupied most of official programme time and were given by a great team of professors including Richard Szeliski, renowned Tomasso Poggio and enchanting Kristen Grauman. I am not going to describe all the talks, but feature the ones that are close to my interests. You can find the complete programme here. Unfortunately, no video was recorded, but I have an access to all the slides and posters, so if you are interested in anything, I can send it to you (I believe it does not violate any copyrights).
Wednesday was the day of Recognition. Kristen Grauman gave a talk about visual search. She covered the topics concerning specific object search using local descriptors and bags of words, object category search with pyramid matching, and also discussed state-of-the-art in the challenging problem of web-scale image retrieval. Mark Everingham continued with a talk about category recognition and localization using machine learning techniques. Localization is reached using bounding boxes, segments, or object parts search (like finding eyes and a mouth to find a face). Sure he could not avoid to mention importance of context. He also explained the PASCAL VOC evaluation protocol.
The tutorials covered some applications and did not really impress me. Poster session was a great opportunity to meet people. Some posters were really decent, for example Michael Bleyer's poster on dense stereo estimation using soft segmentation, which won a half of the school's best presentation prize. The work was done with Microsoft Research Cambridge, they formulated a really complicated energy function based on surface plane estimations and minimized it with Lempitsky's graph-cut based fusion-move algorithm (2009). More details could be found in their recent CVPR paper.
The problem with the poster session was that lecturers did not attend it, although they could really give a great feedback. My presentation was in the last day of the session after the not-really-popular reading group and in the room upstairs (its existence was not a well-known fact :), so many people preferred to spend that evening on the beach. The school audience was quite heterogeneous, there were a lot of people from medical imaging, video compression etc., so I had to explain some basics (like MRFs) to some students interested in my poster. There were also really smart guys (like those from Cambridge group). There was a bit of useful feedback: someone recommended me to use QPBO for inference, and I should probably consider it.
Since the speakers did not attend the poster session, students had to communicate with them informally. A roommate of mine, Ramin, who does a crowd analysis, caught Richard Szeliski and asked him about local descriptors in video that operate in 3D image-time space. Szeliski told that it is a promising field and even remembered Ivan Laptev's name. During our tour to Ragusa Ibla I asked Mark Everingham (who is probably the closest of all speakers to my topic) about 3D point cloud classification. He said it was not really his field, but it should be fruitful to analyse clouds not only in local levels, and stuff like multi-layer CRFs could be useful. During the last 2 years there appeared a few papers that exploit that simple intuitive idea and incorporate shape detectors with CRFs, but it usually looks awkward. Well, may be smartly designed multi-layer CRFs might be really useful. It is funny, when I told Everingham that I'm from Moscow he replied that it was great, Moscow has a great math school and remembered Vladimir Kolmogorov. So, our education seems to be not so terrible. :D