## All the summer schools

This summer I attended two conferences (3DIMPVT, EMMCVPR) and two summer schools. I know my latency is somewhat annoying, but it's better to review them now then never. :) This post is about the summer schools, and the following is going to be about the conferences.

PhD Summer School in Cambridge

Both schools were organized by Microsoft Research. The first one, PhD Summer School was in Cambridge, UK. The lectures covered some general issues for computer science PhD students (like using cloud computing for research and career perspectives) as well as some recent technical results by Microsoft Research. From the computer vision side, there were several talks:
• Antonio Criminisi described their InnerEye system for retrieval of similar body part scans, which is useful for diagnosis based on similar cases' medical history. He also featured the basics of Random Forests as an advertisement to his ICCV 2011 tutorial. The new thing was using peculiar weak classifiers (like 2nd order separation surfaces). Antonio argued they perform much better then trees in some cases.
• Andrew Fitzgibbon gave a brilliant lecture about pose estimation for Kinect (MSR Cambridge is really proud of that algorithm [Shotton, 2011], this is the topic for another post).
• Olga Barinova talked about the modern methods of image analysis and her work for the past 2 years (graphical models for non-maxima suppression for object detection and urban scene parsing).
The other great talks were about .NET Gadgeteer, the system for modelling and even deployment of electronic gadgets (yes, hardware!), and F#, Microsoft's alternative to Scala, the language that combines object-oriented paradigm with functional. Sir Tony Hoare also gave a lecture, so I had a chance to ask him how he ended up in Moscow State University in the 60s. It turns out he studied statistics, and Andrey Kolmogorov was one of the leaders of the field that time, so that internship was a great opportunity for him. He said he had liked the time in Moscow. :) There were also magnificent lectures by Simon Peyton-Jones about giving talks and writing papers. Those advices are the must for everyone who does research, you can find the slides here. Slides for some of the lectures are available from the school page.

The school talks did not take all the time. Every night was occupied by some social event (go-karting, punting etc.) as well as unofficial after-parties in Cambridge pubs. Definitely it is the most fun school/conference I've attended so far. Karting was especially great, with the quality track, pit-stops, stats and prizes, so special thanks to Microsoft for including it to the program!

Microsoft Computer Vision School in Moscow

This year, Microsoft Research summer school in Russia was devoted to computer vision and organized in cooperation with our lab. The school started before its official opening with a homework assignment we authored (I was one of four student volunteers). The task was to develop an image classification method capable to distinguish two indoor and two outdoor classes. The results were rated according to the performance on the hidden test set. Artem Konev won the challenge with 95.5% accuracy and was awarded a prize consisted of an xBox and Kinect. Two years ago we used those data for the projects on Introduction to Computer Vision course, where nobody reached even 90%. It reflects not just the lore of participants, but also the progress of computer vision: all the top methods used PHOW descriptors and linear SVM with approximate decomposed χ2 kernel [Vedaldi and Zisserman, 2010], which were unavailable that time!

In fact, Andrew Zisserman was one of the speakers. Andrew is the most cited computer vision researcher and the only person whose Zisserman number is zero. :) His course was on Visual Search and Recognition, including instance-level and category-level recognition. The ideas that were relatively new:
• when computing visual words, sometimes it is fruitful to use soft assignments to clusters, or more advanced methods like Locality-constrained linear coding [Wang et al., 2010];
• for instance-level recognition it is possible to use query expansion to overcome occlusions [Chum et al., 2007]: the idea is to use the best matched images from the base as new queries;
• object detection is traditionally done with sliding window, the problems here are: various aspect ratio, partial occlusions, multiple responses and background clutter for substantially non-convex objects;
• for object detection use bootstrapped sequential classification: on the next stage take the false negative detections from the previous stage as negative examples and retrain the classifier;
• multiple kernel learning [Gehler and Nowozin, 2009] is a hot tool that is used to find the ideal linear combination of SVM kernels: combining different features is fruitful, but learning the combination is not much better than just averaging (Lampert: “Never use MKL without comparison to simple baselines!”);
• movies are common datasets, since there are a lot of repeated objects/people/environments, and the privacy issues are easy to overcome. The movies like Groundhog Day and Run Lola Run are especially good since they contain repeated episodes. You can try to find the clocks on Video Google Demo.
Zisserman talked about PASCAL challenge a lot. During a break he mentioned that he annotated some images himself since “it is fun”. One problem with the challenge is we don't know if the progress over years really reflects the increased quality of methods, or is just because of growth of the training set (though, it is easy to check).

Andrew Fitzgibbon gave two more great lectures, one about Kinect (with slightly different motivation than in Cambridge) and another about continuous optimization. He talked a lot about reconciling theory and practice:
• the life-cycle of a research project is: 1) chase the high-hanging fruit (theoretically-sound model), 2) try to make stuff really work, 3) look for the things that confuse/annoy you and fix them;
• for Kinect pose estimation, the good top-down method based on tracking did not work, so they ended up classifying body parts discriminatively, temporal smoothing is used on the late stage;
• “don't be obsessed with theoretical guarantees: they are either weak or trivial”;
• on the simplest optimization method: “How many people have invented [coordinate] alternation at some point of their life?”. Indeed, the method is guaranteed to converge, but the problems arise when the valleys are not axis-aligned;
• gradient descent is not a panacea: in some cases it does small steps too, conjugate gradient method is better (it uses 1st order derivatives only);
• when possible, use second derivatives to determine step size, but estimating them is hard in general;
• one almost never needs to take the matrix inverse; in MATLAB, to solve the system Hd = −g, use backslash: d = −H\g;
• the Friday evening method is to try MATLAB (implementing the derivative-free Nelder-Mead method).
Dr. Fitzgibbon asked the audience what the first rule of machine learning is. I hardly helped over replying “Never talk about machine learning”, but he expected the different answer: “Always try the nearest neighbour first!”

Christoph Lampert gave lectures on kernel methods, and structured learning, and kernel methods for structured learning. Some notes on the kernel methods talk:
• (obvious) don't rely on the error on a train set, and (less obvious) don't even report about it in your papers;
• for SVM kernels, in order to be legitimate, a kernel should be an inner product; it is often hard to prove it directly, but there are workarounds: a kernel can be drawn from a conditionally positive-definite matrix; sum, product and exponent of a kernel(s) is a kernel too etc. (thus, important for multiple-kernel learning, linear combination of kernels is a kernel);
• since training (and running) non-linear SVMs is computationally hard, explicit feature maps are popular now: try to decompose the kernel back to conventional dot product of modified features; typically the features should be transformed to infinite sums, so take first few terms ;
• if the kernel can be expressed as a sum over vector components (e.g. χ2 kernel $\sum_d x_d x'_d / (x_d + x'_d)$), it is easy to decompose; radial basis function (RBF) kernel ($\exp (\|x-x'\|^2 / 2\sigma^2)$) is the exponent of a sum, so it is hardly decomposable (more strict conditions are in the paper);
• when using RBF kernel, you have another parameter σ to tune; the rule of thumb is to take σ² equal to the median distance between training vectors (thus, cross-validation becomes one-dimensional).
Christoph also told a motivating story why one should always use cross-validation (so just forget the previous point :). Sebastian Nowozin was working on his [ICCV 2007] paper on action classification. He used the method by Dollár et al. [2005] as a baseline. The paper reported 80.6% accuracy on the KTH dataset. He outperformed the method by a couple of per cents and then decided to reproduce Dollár's results. Imagine his wonder when simple cross-validation (with same features and kernels) yielded 85.2%! So, Sebastian had to improve his method to beat the baseline.

I feel I should stop writing about the talks now since the post grows enormously long. Another Lampert's lecture and Carsten Rother's course on CRFs were close to my topic, so they deserve separate posts (I already reviewed basics of structured learning and max-product optimization in this blog). Andreas Müller blogged about the recent Ivan Laptev's action recognition talk on CVML, which was pretty similar to ours. The slides are available for all MSCVS talks, and videos will be shared in September.

There were also several practical sessions, but I personally consider them not that useful, because one hardly ever can feel the essence of a method in 1.5 hours changing the code according to some verbose instruction. It is more of an art to design such tutorials, and no one can really master it. :) Even if the task is well-designed, one may not succeed performing it due to technical reasons: during Carsten Rother's tutorial, Tanya and me spent half an hour to spot the bug caused by confusing input and index variable names (MATLAB is still dynamically typed). Ondrej Chum once mentioned how his tutorial was doomed since half of the students did not know how to work with sparse matrices. So, practical sessions are hard.

There was also a poster session, but I cannot remember a lot of bright works, unfortunately. Nataliya Shapovalova who won the best poster award, presented quite interesting work on action recognition, which I liked as well (and it is not the last name bias! :) My congratulations to Natasha!

The planned social events were not so exhaustive as in Cambridge, but self-organization worked out. The most prominent example was our overnight walk around Moscow, in which a substantial part of school participants took part. It included catching last subway train, drinking whiskey and gin, a game of guessing hallucinating names of each other, and moving a car from the tram rail to let the tram go in the morning. :) I also met some of OpenCV developers from Nizhny Novgorod there.

MSCVS is a one-time event, unfortunately. There are at least three annual computer vision summer schools in Europe: ICVSS (the most mature one, I attended it last year), CVML (held in France by INRIA) and VSSS (includes sport sessions besides the lectures, held in Zürich). If you are a PhD student in vision (especially in the beginning of your program), it is worth attending one of them each year to keep up with current trends in the vision community, especially if you don't go to the major conferences. The sets of topics (and even speakers!) have usually large intersection, so pick one of them. ICVSS has arguably the most competitive participant selection, but the application deadline and acceptance notification are in March, so one can apply to the other schools if rejected.