tag:blogger.com,1999:blog-22936118408573697202024-03-18T14:10:35.744+03:00Computer Blindnessand Computer BlondnessAnonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.comBlogger46125tag:blogger.com,1999:blog-2293611840857369720.post-52870151806629999432016-12-16T15:00:00.000+03:002016-12-16T15:00:38.279+03:00NIPS 2016<div dir="ltr" style="text-align: left;" trbidi="on">
<style type="text/css">
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Arial; color: #232323; -webkit-text-stroke: #232323}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Arial; color: #232323; -webkit-text-stroke: #232323; min-height: 15.0px}
span.s1 {font-kerning: none}
span.s2 {text-decoration: underline ; font-kerning: none; color: #1255cc; -webkit-text-stroke: 0px #1255cc}
</style>
<br />
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;">I have just returned from NIPS (technically, I am still continuing travelling in Canary Islands). Surprisingly, this is my first NIPS. I have enjoyed presentations of great work, as well as communicating with machine-learning folks and friends. Here is an edited high-level write-up I prepared for my co-workers at BlippAR, hence the focus on computer vision and deep learning. The scope of the conference is of course much broader.</span></span></div>
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif; font-size: x-small;"><br /></span></span></div>
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;">In the beginning of the conference, Yann LeCun gave a keynote, which may be considered a program speech for the whole conference. Besides starting the layered cake meme, he mentioned two <b>ideas that he considers brightest</b> in the last decade: adversarial training and external memory. My perception is that the most frequent buzzwords at the conference are:</span></span></div>
<div class="p1">
</div>
<ul style="text-align: left;">
<li><span style="font-family: "georgia" , "times new roman" , serif;">generative adversarial networks (GAN),</span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;">recurrent neural networks (RNN),</span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;">reinforcement learning (RL),</span></li>
<li><span style="font-family: "georgia" , "times new roman" , serif;">robotics (done with or without RL).</span></li>
</ul>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://cdn-images-1.medium.com/max/1600/1*KDvA9Fq3lm-eQOyGlcKAKg.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="255" src="https://cdn-images-1.medium.com/max/1600/1*KDvA9Fq3lm-eQOyGlcKAKg.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: "times" , "times new roman" , serif;">Great engineers are good at defining useful abstractions. This LeCun’s layered cake metaphor was heavily re-used throughout the conference.</span></td></tr>
</tbody></table>
<div>
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="-webkit-text-stroke-width: initial;">Andrew Ng gave a keynote on the</span><b style="-webkit-text-stroke-width: initial;"> workflow of applied DL research</b><span style="-webkit-text-stroke-width: initial;">. Most ideas are trivial, although are often neglected. Here are a few of them. The most important metrics to track are: human error, training set error, and validation error. The gap between human and training set error is the <i>bias</i>. It can be solved by a bigger model or longer training. The validation/training gap is due to the <i>variance</i>. Once the bias problem is addressed, you can add more data or increase regularisation to reduce variance.</span></span></div>
<br />
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;">In practice, however, we have a different distribution of test data than of the training data. In that case, you might want to have two validation sets: one coming from training distribution, and another from test distribution. If you don’t have the latter, you will be solving a wrong problem. Domain adaptation methods may be also helpful, but are not quite practical at the moment.</span></span></div>
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;">He also heavily advocates for using the centralised data storage infrastructure that is easily available to all team members. It speeds up the progress a lot.</span></span></div>
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;">Currently, supervised learning dominates the landscape, while unsupervised and reinforcement learning are around for some time, but still cannot take off. Ng predicts that the second-biggest areas (after SL) will be not one of those, but transfer learning and multi-task learning. Those are important practical areas almost ignored by academic researchers.</span></span></div>
<div class="p2">
<span style="font-family: "georgia" , "times new roman" , serif;"><span class="s1"></span><br /></span></div>
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;">There were seemingly few papers on <b>architectural design</b> (for supervised learning)<b>.</b> Daniel Sedra gave a great talk on <a href="https://arxiv.org/abs/1603.09382"><span class="s2">Neural networks of stochastic depth</span></a>. The idea is to drop entire layers during training in ResNet-like architectures. That allows to grow them up to 1000+ layers, which gives significant improvement in accuracy on CIFAR (and not quite significant on ImageNet). Unfortunately, at test-time layers are there, so the inference time may be a problem. Another improvement is to add identity connection between each pair of layers, which also helps a bit.</span></span></div>
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></span></div>
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;">Michael Figurnov, my academic sibling, presented a work on <b>speeding up inference</b> using so-called <a href="https://arxiv.org/abs/1504.08362" target="_blank">perforated convolutions</a>. The idea is to skip some multiplications that do not impact the result of the operation a lot. In the <a href="https://arxiv.org/abs/1612.02297" target="_blank">follow-up work</a>, he and co-authors suggest to skip the whole parts of the computation graph once they are estimated to be unimportant. This can potentially lead to significant speed-up, but for most practitioners it is the question of support in libraries such as CUDNN. Hopefully, we’ll be able to use it soon.</span></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg48Ht588lcVZRmxBAjTa9rNkYe3gOkRwUiCSmje-o7udw9fcKQRsJO2sFkXsk6LK-KbLn10gIMMFFNccZnFzQqwdFcc7V7-H4KBjjXRbRtchKwpg72MuztJ7pJSosy8VHf3Y3XzbYw_z7b/s1600/Screen+Shot+2016-12-16+at+11.17.09.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="100" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg48Ht588lcVZRmxBAjTa9rNkYe3gOkRwUiCSmje-o7udw9fcKQRsJO2sFkXsk6LK-KbLn10gIMMFFNccZnFzQqwdFcc7V7-H4KBjjXRbRtchKwpg72MuztJ7pJSosy8VHf3Y3XzbYw_z7b/s320/Screen+Shot+2016-12-16+at+11.17.09.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The method suggest not to waste computation time on background, but evaluate more ResNet layers where it can pay out more.</td></tr>
</tbody></table>
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;">There was a demo I missed that demonstrated real-time </span><b style="font-family: Georgia, "Times New Roman", serif;">object detection on a mobile device</b><span style="font-family: "georgia" , "times new roman" , serif;"> (if you have a link, please share with me!)</span><span style="font-family: "georgia" , "times new roman" , serif;">. It is based on </span><a href="http://pjreddie.com/darknet/yolo/" style="font-family: Georgia, "Times New Roman", serif;" target="_blank">YOLO</a><span style="font-family: "georgia" , "times new roman" , serif;">, and the idea is to pre-compute and memoize some parts of the computational graph, then at inference time find the best approximation. There is some loss in accuracy, but real-time inference on the board looks impressive! </span></span></div>
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;"><br /></span></span></div>
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;">The most interesting improvement for <b>deep learning optimisation</b> seems to be <a href="https://arxiv.org/abs/1602.07868"><span class="s2">Weight normalisation</span></a>. The paper builds on the idea of batch normalisation. It tries to explain the positive effect of it (i.e. staying on the unit sphere) and uses this intuition to propose an alternative way to stabilise the training process.</span></span></div>
<div class="p2">
<span style="font-family: "georgia" , "times new roman" , serif;"><span class="s1"></span><br /></span></div>
<div class="p1">
<span class="s1"><span style="font-family: "georgia" , "times new roman" , serif;">As I said above, <b>GAN</b>s are a big (and controversial) topic here. It seems to be the most popular generative model for images, although variational autoencoders are still around. The training process is quite unstable, but the community is increasingly finding the ways to stabilise it. Most of them are hacks born in trial-and-error process. Soumith Chintala presented a list of 15 tricks in his talk <i>How to train GAN</i>? He promised to publish the slides later, but here is <a href="https://github.com/soumith/ganhacks"><span class="s2">the writeup</span></a>. If you will have to train GANs, make sure you read that first. There is no way you can guess this combination yourself (and papers typically do not mention all those hacks).</span></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://pbs.twimg.com/media/CyxUpy-XUAYdX0t.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="200" src="https://pbs.twimg.com/media/CyxUpy-XUAYdX0t.jpg" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Poster by <a href="https://twitter.com/soumithchintala/status/805111562503589889" target="_blank">Soumith</a>. TL;DR: use magic.</td></tr>
</tbody></table>
<div class="p1">
<span style="-webkit-text-stroke-width: initial; font-family: "georgia" , "times new roman" , serif;">In the </span><b style="-webkit-text-stroke-width: initial; font-family: Georgia, "Times New Roman", serif;">visual search </b><span style="-webkit-text-stroke-width: initial; font-family: "georgia" , "times new roman" , serif;">field, there is a </span><a href="https://arxiv.org/abs/1606.03558" style="-webkit-text-stroke-width: initial; font-family: Georgia, "Times New Roman", serif;"><span class="s2">Universal correspondence network</span></a><span style="-webkit-text-stroke-width: initial; font-family: "georgia" , "times new roman" , serif;"> paper. Instead of matching features, it suggests to run one pass through the network to find dense correspondences. They claim that in comparison to extracting deep features for patches, the number of distance computations gets down from quadratic to linear (although, spatial index can also mitigate the issue). It appears to learn some notion of geometry/topology, i.e. one can train it to do rigid matching, as well as to look more at semantic similarity. The results they showed were on very similar images, so did not impress me much. Although it may had been done only to ease the visualisation.</span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8ugmSbmWv9Y51do44RHmj25aBwZJNc1ODJvsO4tY9c_ERIjufXdLbyu23MwgVktVDc5KxnPXqvDTpDeOTpKTLzdrOMOQVQaba8x3PIu9qpSFteV1fPjbjmXHSafEtz6Zraup6CU7TOBZX/s1600/Screen+Shot+2016-12-16+at+11.31.14.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="232" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8ugmSbmWv9Y51do44RHmj25aBwZJNc1ODJvsO4tY9c_ERIjufXdLbyu23MwgVktVDc5KxnPXqvDTpDeOTpKTLzdrOMOQVQaba8x3PIu9qpSFteV1fPjbjmXHSafEtz6Zraup6CU7TOBZX/s640/Screen+Shot+2016-12-16+at+11.31.14.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Deep learning continues to disrupt traditional computer-vision pipelines.</td></tr>
</tbody></table>
<div class="p1">
<span style="-webkit-text-stroke-width: initial; font-family: "georgia" , "times new roman" , serif;">For </span><b style="-webkit-text-stroke-width: initial; font-family: Georgia, "Times New Roman", serif;">reinforcement learning</b><span style="-webkit-text-stroke-width: initial; font-family: "georgia" , "times new roman" , serif;">, there is no single paper that I can pick as brightest, but the most impact seems to be done by publishing </span><a href="https://universe.openai.com/" style="-webkit-text-stroke-width: initial; font-family: Georgia, "Times New Roman", serif;" target="_blank">OpenAI Universe</a><span style="-webkit-text-stroke-width: initial; font-family: "georgia" , "times new roman" , serif;"> and </span><a href="https://github.com/deepmind/lab" style="-webkit-text-stroke-width: initial; font-family: Georgia, "Times New Roman", serif;" target="_blank">DeepMind Lab</a><span style="-webkit-text-stroke-width: initial; font-family: "georgia" , "times new roman" , serif;">, both are training environments for reinforcement learning algorithms stemming from computer-generated imagery (i.e. video games). It seems to be a valid step towards building intelligent agents operating in the real world, although there is a scepticism that it will be difficult to make the final step: the methods are data-hungry, and it is difficult to get a lot of data from anything rather than simulated environment, let alone having a frequent reward signal. This is one of the reasons researchers study the possibility of </span><a href="https://arxiv.org/abs/1603.00448v3" style="-webkit-text-stroke-width: initial; font-family: Georgia, "Times New Roman", serif;" target="_blank">imitation learning</a><span style="-webkit-text-stroke-width: initial; font-family: "georgia" , "times new roman" , serif;">, where a robot learns the cost function from demonstrations rather than taking it for granted. With this approach, we may be able to teach robots to perform tasks like pouring water into a glass by having a person or another robot showing it how to do it.</span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://i.ytimg.com/vi/GgkmJDjeJtw/maxresdefault.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="225" src="https://i.ytimg.com/vi/GgkmJDjeJtw/maxresdefault.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">DeepMind collaborated with Blizzard to provide StarCraft API to facilitate training of RL agents. Now you have an excuse to play games at work.</td></tr>
</tbody></table>
<div class="p1">
<b style="-webkit-text-stroke-width: initial; font-family: Georgia, "Times New Roman", serif;">Bayesian methods</b><span style="-webkit-text-stroke-width: initial; font-family: "georgia" , "times new roman" , serif;"> seem to be quite popular, and apparently can be successfully combined with neural networks (variational autoencoders and their variants are used a lot). Workshops on approximate variational inference and Bayesian deep learning have seen a large number of papers. Tutorial on variational inference (</span><a href="http://www.cs.columbia.edu/~blei/talks/2016_NIPS_VI_tutorial.pdf" style="-webkit-text-stroke-width: initial; font-family: Georgia, "Times New Roman", serif;" target="_blank">slides-pdf-1</a><span style="-webkit-text-stroke-width: initial; font-family: "georgia" , "times new roman" , serif;">, </span><a href="http://shakirm.com/papers/VITutorial.pdf" style="-webkit-text-stroke-width: initial; font-family: Georgia, "Times New Roman", serif;" target="_blank">slides-pds-2</a><span style="-webkit-text-stroke-width: initial; font-family: "georgia" , "times new roman" , serif;">) reflected a significant progress in the recent years: things like reparametrisation trick and stochastic optimisation made possible to perform estimation of quite sophisticated posterior distributions. </span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtj6Ldm-AZTCi8QU170tyxGY5h1722SZJEOJ2gDG5TmVgyGkPQtAPmHf4tyEToLjpN1XTgkSWLPpV8HsXKyffZ5LJFO8GwVEKJcAn98uUrneON5apb5NkIk66p0kJY4cpPzkOzFiW-2S_Q/s1600/Screen+Shot+2016-12-16+at+11.50.39.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="205" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtj6Ldm-AZTCi8QU170tyxGY5h1722SZJEOJ2gDG5TmVgyGkPQtAPmHf4tyEToLjpN1XTgkSWLPpV8HsXKyffZ5LJFO8GwVEKJcAn98uUrneON5apb5NkIk66p0kJY4cpPzkOzFiW-2S_Q/s320/Screen+Shot+2016-12-16+at+11.50.39.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Slide by David Blei. Bayesian models have been long used successfully to estimate uncertainty, in particular to glue models to get complex pipelines while avoiding information loss. It is great we can use it with modern models as well.</td></tr>
</tbody></table>
<div class="p1">
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="-webkit-text-stroke-width: initial;">Surely, an important part of the conference is </span>socialisation<span style="-webkit-text-stroke-width: initial;">. So thanks to everybody I had a chance to meet at the conference and had a productive (or just fun) conversation with. See you at the future conferences!</span></span></div>
</div>
Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com8tag:blogger.com,1999:blog-2293611840857369720.post-537789717029977612016-08-24T11:00:00.000+03:002016-08-24T11:00:20.359+03:00Launching Jupyter notebook server on startup<div dir="ltr" style="text-align: left;" trbidi="on">
<i>In this post, I show how to automatically run the Jupyter notebook server on an OSX machine on startup, without using extra terminal windows.</i><br />
<h4 style="text-align: left;">
Background</h4>
<a href="http://jupyter.org/" target="_blank">Jupyter notebooks</a> provide a useful interactive environment (think better alternative for console) for scripting languages like Python, Julia, Lua/torch, which has become a favourite tool of many data scientists for handling datasets that fit into the laptop’s memory. If you have not used it, try it next time you do data analysis or prototype an algorithm.<br />
<h4 style="text-align: left;">
The Pain</h4>
There is a slight technical difficulty, though. The way Jupyter works is you run a server process on your machine (or somewhere in the network), and then access it through a web client. I have noticed a popular pattern of usage in the local-server case: a user runs the command to start the server process in the terminal then keeps the terminal window open while she is using the notebook. This method has several flaws:<br />
<br />
<ul style="text-align: left;">
<li>you need to run the process manually each time you need a notebook (after reboot);</li>
<li>you have to remember the command line arguments;</li>
<li>the process takes up a terminal window/tab, or, if you run it in the background, you can accidentally close the tab and your RAM state will be lost.</li>
</ul>
<h4 style="text-align: left;">
The Solution</h4>
<div>
If your perfectionist’s feelings are hurt by those nuisances above, my setup may remedy it. It uses <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">launchd</span> framework that starts, stops, and manages background processes. </div>
<div>
<br /></div>
<div>
First, create a property list file in your <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">~/Library/LaunchAgents </span>directory: </div>
<div>
<br /></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">~/Library/LaunchAgents/com.blippar.jupyter-ipython.plist</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: inherit;">and put the following text there. This is a custom format, which is not quite XML (e.g. order of children nodes apparently matters). I highlighted red the paths and other variables you will have to change:</span></div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"></span></div>
<div>
<span style="font-family: courier new, courier, monospace; font-size: x-small;"><?xml version="1.0" encoding="UTF-8"?></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"</span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> "http://www.apple.com/DTDs/PropertyList-1.0.dtd"></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <plist version="1.0"></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <dict></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <key>KeepAlive</key></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <true/></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <key>Label</key></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <string>com.blippar.jupyter-ipython</string></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <key>ProgramArguments</key></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <array></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <string>zsh</string></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <string>-c</string></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <string>source ~/.zshrc;workon <span style="color: red;">py3</span>;jupyter notebook --notebook-dir <span style="color: red;">/Users/roman/jupyter-nb</span> --no-browser</string></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> </array></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <key>RunAtLoad</key></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <true/></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <key>StandardErrorPath</key></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <string><span style="color: red;">/Users/roman/Library/LaunchAgents/jupyter-notebook.stderr</span></string></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <key>StandardOutPath</key></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> <string><span style="color: red;">/Users/roman/Library/LaunchAgents/jupyter-notebook.stdout</span></string></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> </dict></span><br />
<span style="font-family: courier new, courier, monospace; font-size: x-small;"> </plist></span></div>
</div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> </span></div>
<div>
The most interesting part is the <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">ProgramArguments </span>entry. As it is impossible to specify several commands as an argument, I had to run a separate shell instance and pass the commands separated by semicolons. If you do not need to set the virtual environment (<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">workon </span><span style="color: red; font-family: "courier new" , "courier" , monospace; font-size: x-small;">py3</span>), you might get away without calling the shell, although I recommend always using virtualenv for python on a local machine. Also, I use <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">zsh</span>; replace it with your shell of choice.</div>
<div>
<br /></div>
<div>
To check if everything works, run</div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">launchctl load com.blippar.jupyter-ipython</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">launchctl start</span><span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> com.blippar.jupyter-ipython</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
and open <a href="http://localhost:8888/tree">http://localhost:8888/tree</a> in your browser. Hopefully, you see the list of available notebooks.</div>
<div>
<br /></div>
<div>
Linux users, you should be able to replicate this with <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">systemd </span>(although, chances are you already know that =).</div>
</div>
Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com6tag:blogger.com,1999:blog-2293611840857369720.post-63741246954088471852015-10-09T12:28:00.000+03:002015-10-09T12:34:56.813+03:00Asynchronous Fitter design pattern for training in IPython notebooks<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="border: 0px; margin: 0px 0px 1em; padding: 0px; text-align: left; word-wrap: break-word;">
[Cross-posted from <a href="http://codereview.stackexchange.com/questions/106943/asynchronous-model-fitting-that-allows-termination-in-python">CodeReview@SX</a>]<br />
<br />
<h3 style="text-align: left;">
The problem</h3>
When you work with Python interactively (e.g. in an IPython shell or notebook) and run a computationally intensive operation like fitting a machine-learning model that is implemented in a native code, you cannot interrupt the operation since the native code does not return execution control to the Python interpreter until the end of the operation. The problem is not specific to machine learning, although it is typical to run a training process for which you cannot predict the training time. In case it takes longer that you expected, to stop training you need to stop the kernel and thus lose the pre-processed features and other variables stored in the memory, i.e. you cannot interrupt only a particular operation to check a simpler model, which allegedly would be fit faster.<br />
<br />
<h3 style="text-align: left;">
The solution</h3>
I propose an <i>Asynchronous Fitter </i>design pattern that runs fitting in a separate process and communicates the results back when they are available. It allows to stop training gracefully by killing the spawned process and then run training of a simpler model. It also allows to train several models simultaneously and work in the IPython notebook during model training. Note that multithreading is probably not an option, since we cannot stop a thread that runs an uncontrolled native code.<br />
<br />
Here is a draft implementation:</div>
<pre class="lang-py prettyprint prettyprinted" style="background-color: #eeeeee; border: 0px; color: #393318; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, sans-serif; font-size: 13px; margin-bottom: 1em; max-height: 600px; overflow: auto; padding: 5px; width: auto; word-wrap: normal;"><code style="border: 0px; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, sans-serif; margin: 0px; padding: 0px; white-space: inherit;"><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">from</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> multiprocessing </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">import</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="typ" style="border: 0px; color: #2b91af; margin: 0px; padding: 0px;">Process</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="typ" style="border: 0px; color: #2b91af; margin: 0px; padding: 0px;">Queue</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">import</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> time
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">class</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="typ" style="border: 0px; color: #2b91af; margin: 0px; padding: 0px;">AsyncFitter</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">object</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">):</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">def</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> __init__</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> model</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">):</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">queue </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="typ" style="border: 0px; color: #2b91af; margin: 0px; padding: 0px;">Queue</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">()</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">model </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> model
self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">proc </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">None</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">start_time </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">None</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">def</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> fit</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> x_train</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> y_train</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">):</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">terminate</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">()</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">proc </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="typ" style="border: 0px; color: #2b91af; margin: 0px; padding: 0px;">Process</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">target</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="typ" style="border: 0px; color: #2b91af; margin: 0px; padding: 0px;">AsyncFitter</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">async_fit_</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
args</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">model</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> x_train</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> y_train</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">queue</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">))</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">start_time </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> time</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">time</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">()</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">proc</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">start</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">()</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">def</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> try_get_result</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">):</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">if</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">queue</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">empty</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">():</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">return</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">None</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">return</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">queue</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">get</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">()</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">def</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> is_alive</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">):</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">return</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">proc </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">is</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">not</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">None</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">and</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">proc</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">is_alive</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">()</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">def</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> terminate</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">):</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">if</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">proc </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">is</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">not</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">None</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">and</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">proc</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">is_alive</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">():</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">proc</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">terminate</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">()</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">proc </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">None</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">def</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> time_elapsed</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">):</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">if</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">not</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">start_time</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">:</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">return</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="lit" style="border: 0px; color: maroon; margin: 0px; padding: 0px;">0</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">return</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> time</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">time</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">()</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">-</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> self</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">start_time
</span><span class="lit" style="border: 0px; color: maroon; margin: 0px; padding: 0px;">@staticmethod</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">def</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> async_fit_</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">model</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> x_train</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> y_train</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> queue</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">):</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
model</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">fit</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">x_train</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> y_train</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">)</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
queue</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">put</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">model</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">)</span></code></pre>
<h3 style="text-align: left;">
Usage</h3>
It is easy to modify a code that uses scikit-learn to adopt the pattern. Here is an example:<br />
<pre class="lang-py prettyprint prettyprinted" style="background-color: #eeeeee; border: 0px; color: #393318; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, sans-serif; font-size: 13px; margin-bottom: 1em; max-height: 600px; overflow: auto; padding: 5px; width: auto; word-wrap: normal;"><code style="border: 0px; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, sans-serif; margin: 0px; padding: 0px; white-space: inherit;"><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">import</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> numpy </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">as</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> np
</span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">from</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> sklearn</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">svm </span><span class="kwd" style="border: 0px; color: darkblue; margin: 0px; padding: 0px;">import</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> SVC
model </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> SVC</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">C </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="lit" style="border: 0px; color: maroon; margin: 0px; padding: 0px;">1e3</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> kernel</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="str" style="border: 0px; color: maroon; margin: 0px; padding: 0px;">'linear'</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">)</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
fitter </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="typ" style="border: 0px; color: #2b91af; margin: 0px; padding: 0px;">AsyncFitter</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">model</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">)</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
x_train </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> np</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">random</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">rand</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="lit" style="border: 0px; color: maroon; margin: 0px; padding: 0px;">500</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="lit" style="border: 0px; color: maroon; margin: 0px; padding: 0px;">30</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">)</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
y_train </span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> np</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">random</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">randint</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="lit" style="border: 0px; color: maroon; margin: 0px; padding: 0px;">0</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> </span><span class="lit" style="border: 0px; color: maroon; margin: 0px; padding: 0px;">2</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> size</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">=(</span><span class="lit" style="border: 0px; color: maroon; margin: 0px; padding: 0px;">500</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,))</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">
fitter</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">.</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">fit</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">(</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;">x_train</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">,</span><span class="pln" style="border: 0px; color: black; margin: 0px; padding: 0px;"> y_train</span><span class="pun" style="border: 0px; color: black; margin: 0px; padding: 0px;">)</span></code></pre>
<br />
You can check if training is still running by calling <span style="font-family: Courier New, Courier, monospace;">fitter.is_alive() </span>and check the time currently elapsed by calling <span style="font-family: Courier New, Courier, monospace;">fitter.time_elapsed()</span>. Whenever you want, you can <span style="font-family: Courier New, Courier, monospace;">terminate() </span>the process or just train another model that will terminate the previous one. Finally, you can obtain the model by <span style="font-family: Courier New, Courier, monospace;">try_get_result()</span>, which returns <span style="font-family: Courier New, Courier, monospace;">None</span> when training is in progress.<br />
<div>
<br />
<h3 style="text-align: left;">
The issues</h3>
As far as I understand, the training set is being pickled and copied, which may be a problem if it is large. Is there an easy way to avoid that? Note that training needs only read-only access to the training set.<br />
<br />
What happens if someone loses a reference to an AsyncFitter instance that wraps a running process? Is there a way to implement an asynchronous delayed resource cleanup?</div>
</div>
Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com3tag:blogger.com,1999:blog-2293611840857369720.post-24238482397986058672013-02-15T22:03:00.000+04:002013-02-15T22:03:07.022+04:00X Bridges of Königsberg<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
The recent <a href="http://spikedmath.com/">Spiked Math</a> comic strip proposes some cheats to build an <a href="http://en.wikipedia.org/wiki/Eulerian_path">Eulerian path</a> over the famous <a href="http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg">bridges of Königsberg</a>, which was originally impossible. Here is the strip:</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://spikedmath.com/541.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="http://img.spikedmath.com/comics/541-solutions-to-the-seven-bridges-of-konigsberg.png" width="482" /></a></div>
<br />
<div style="text-align: justify;">
It turns out that in practice another solution worked out (in fact, the mix of the engineer’s and mythbuster’s solutions). You can ruin the city during a world war and then transfer it to Russians, who will build the new bridges instead. I’ve been to Kaliningrad (the city has been renamed too) two years ago, but did not try to solve the Euler’s problem in field. Let’s find out if it is possible now. Look at the modern map:</div>
<div class="separator" style="clear: both; text-align: center;">
<iframe frameborder="0" height="350" marginheight="0" marginwidth="0" scrolling="no" src="https://maps.google.com/maps/ms?msa=0&msid=205148145045254464247.0004d5c49f1ee9c7ac540&ie=UTF8&t=h&ll=54.704441,20.506582&spn=0.017357,0.036478&z=14&output=embed" width="425"></iframe><br /></div>
<div style="text-align: center;">
<small>View <a href="https://maps.google.com/maps/ms?msa=0&msid=205148145045254464247.0004d5c49f1ee9c7ac540&ie=UTF8&t=h&ll=54.704441,20.506582&spn=0.017357,0.036478&z=14&source=embed" style="color: blue; text-align: left;">Königsberg bridges</a> in a larger map</small></div>
<br />
<div style="text-align: justify;">
So, the modified Spiked Math’s scheme looks as follows (the new bridges are shown in light blue):</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0Zirbdxshyphenhyphendbet-pVKiihlNUgnqWORV9-h5kuRO-CUdd-XTmscKQArynAWP3j5VlsjCYpDGL__aNZcbFouzEYwS3kYDouo4TxHLLgjuQO643hxC1rE0JM4fPWoFO1623hTkPiMlSIrbn9/s1600/sm-koenig-new.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="146" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0Zirbdxshyphenhyphendbet-pVKiihlNUgnqWORV9-h5kuRO-CUdd-XTmscKQArynAWP3j5VlsjCYpDGL__aNZcbFouzEYwS3kYDouo4TxHLLgjuQO643hxC1rE0JM4fPWoFO1623hTkPiMlSIrbn9/s320/sm-koenig-new.png" width="320" /></a></div>
<br />
<div style="text-align: justify;">
Is there an Eulerian path nowadays, i.e. can you walk through the city and cross each bridge once and only once? Recall the theorem:</div>
<blockquote class="tr_bq" style="text-align: justify;">
<i>A connected graph allows an Eulerian path if and only if it has no more than two vertices of an odd degree.</i></blockquote>
<div style="text-align: justify;">
In the picture above, two pieces of land have even number of incident bridges, while the Kneiphof island in the center has degree 3, and the left bank (in the bottom) has degree 5. This means that now Eulerian paths exist. Any such path should begin in the left bank and end in the island, or vice-versa. It may be a good idea to finish your journey near the grave of Immanuel Kant. :) However, there is no Eulerian <i>conduit</i>, i.e. closed Eulerian path.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: x-small;">PS. 1. If you zoom out the above map, you’ll see one more piece of land and two more bridges on the East. However, it only adds a vertex of degree 2, and swaps parity of the banks, so the properties of the new graph are similar, you just need to start/finish your journey in the right bank, not the left one (and walk 10 km more).</span></div>
<div style="text-align: justify;">
<span style="font-size: x-small;">2. When I was preparing this article, I found </span><a href="http://people.engr.ncsu.edu/mfms/SevenBridges/" style="font-size: small;">a similar analysis</a><span style="font-size: x-small;">. It used an old map, so the Jubilee bridge and the 2nd Estacade bridges were not marked (they were built in 2005 and 2012, respectively), and also ignored the Reichsbahnbrüke (railroad bridge, but seems pedestrian-accessible).</span></div>
<div style="text-align: justify;">
<span style="font-size: x-small;">3. It would be more interesting to analyze the bridge graphs of other cities with more islands like St. Petersburg (several hundred bridges) or Amsterdam (numerous bridges/canals, but have structure).</span><br />
<span style="font-size: x-small;">4. Here is </span><a href="http://www.math.dartmouth.edu/~euler/docs/originals/E053.pdf" style="font-size: small;">Euler’s original paper</a><span style="font-size: x-small;"> (in Latin) with his drawings of the above schemes.</span></div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com34tag:blogger.com,1999:blog-2293611840857369720.post-85209948400965607472012-12-04T16:16:00.000+04:002012-12-04T16:16:01.066+04:00Machine learning in facebook<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
Last week <a href="http://www.facebook.com/max.gubin">Max Gubin</a> gave a talk on how Facebook exploits machine learning. The talk was more technical then one might had expected, so I share some interesting facts in this posts. I hope it doesn’t disclose any secret information. :)</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Max started with a screenshot of the Facebook interface, where each element was highlighted as beneficial of machine learning. Just to name some, they use learning to predict the order of stories in the newsfeed, groups/ads/chat contacts to show you, and even occasions when your account is supposedly hacked, so you need to verify your identity. For most of the tasks learning is probably trivial, but at least two of them involve complicated algorithms (see below).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Facebook engineers face number of difficulties. A user expects to load a page almost instantly, though network infrastructure already imposes some lag. To avoid further delaying, prediction should be done in tens of microseconds. Moreover, half a billion of daily-active users send a lot of queries, so massively-parallel implementations would be too expensive. They have no choice other than sticking to linear models. For example, they train a linear fitness function to rank the stories for the newsfeed (using e.g. hinge loss or logistic loss). It should be trained to satisfy multiple criteria, often contradicting. For example, maximizing personal user experience (showing most interesting stories) might hurt experience of other users (if one has few friends, they are the only users who can read his/her posts) or degrade the system as a whole (showing certain types of news might be not really interesting to anyone, while necessary to improve connectivity of the social network). Those criteria should be balanced in the learning objective, and the coefficients are changing over time. Even the personal user experience cannot be measured easily. The obvious thing to try is to ask users to label interesting stories (or use their <i>Likes</i>). However, such tests are always biased: Facebook tried to use this subjective labelling three times, and all of them were unsuccessful. Users just don’t tell what they <i>really </i>like.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Another challenge is the quickly-changing environment. For example, interest to specific ads may be seasoned. In advertising, one of the strategies is to maximize the click-through rate (CTR). The model for personalized ads should be able to learn online to adapt to changes efficiently. They use probit regression, where online updates can be written in a closed form, unlike to logistic regression (note the linear model again!). It is based on Microsoft’s <a href="http://research.microsoft.com/apps/pubs/default.aspx?id=74419">TrueSkill™</a> method for learning ranks of players to find good matches and seems similar to what Bing uses for CTR maximization [<a href="http://www.icml2010.org/papers/901.pdf">Graepel et al., 2010</a>].</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Finally, Max mentioned the problem of estimating new features. The common practice in the industry is <i><a href="http://en.wikipedia.org/wiki/Ab_testing">A/B testing</a></i>, where a group of users is randomly selected to test some feature, and the rest of users are treated as the control group. Then they compare the indicators for those two groups (e.g. average time spent on the website, or clicks made on the newsfeed stories) and apply statistical tests. As usual, samples are typically small. For example, if they want to test a feature for search in Chinese, they take <i>a small group </i>of 10 million users, and hope that some of them will query in Chinese (recall that Facebook is unavailable in China). It is typically hard to prove a statistically significant improvement.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
It was partially a hiring event. If you are looking for an internship or a full-time job, you may contact to their HR specialist in Eastern Europe <a href="http://www.facebook.com/moreno.marina">Marina</a>. Facebook also keeps in touch with <a href="http://www.facebook.com/careers/university">universities</a>, e.g. invites professors to give talks in their office or develop joint courses. Professors may apply, but I don’t know a contact for that.</div>
<div style="text-align: justify;">
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com18tag:blogger.com,1999:blog-2293611840857369720.post-37415247892193820992012-04-16T02:42:00.002+04:002012-04-16T02:42:20.888+04:00PGM-class and MRF parameter learning<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
I’m taking Stanford CS 228 (a.k.a. <a href="https://class.coursera.org/pgm/class/index">pgm-class</a>) on Coursera. The class is great, I guess it provides close to the maximum one can do under the constraints of remoteness and bulkness. The thing I miss is theoretical problems, which were taken aside from the on-line version because they could not be graded automatically.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
There is an important thing about graphical models I fully realized only recently (partly due to the class). This thing should be articulated clearly in every introductory course, but is often just mentioned, probably because lecturers consider it obvious. The thing is <b>there is no probabilistic meaning of MRF potentials whatsoever</b>. The partition function is there not only for amenity: in contrast to Bayesian networks, there is no general way to assign potentials of an <b>undirected </b>graphical model to avoid normalization. The loops make it impossible. The implication is one should not assign potentials by estimating frequencies of assignments to factors (possibly conditioned on features) like <a href="http://shapovalov.ro/papers/Shapovalov-et-al-PCV2010.pdf">I did earlier</a>. This is quite a bad heuristic because it is susceptible to overcounting. Let me give an example.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
For the third week programming assignment we needed to implement a Markov network for handwriting OCR. The unary and pairwise potentials are somewhat obvious, but there was also a task to add ternary factors. The accuracy of the pairwise model is 26%. <a href="http://www.cs.elte.hu/~klao/">Mihaly Barasz</a> tried to add ternary factors with values proportional to <a href="http://en.wikipedia.org/wiki/Trigram">trigram</a> frequencies in English, which decreased performance to 21% (<a href="https://class.coursera.org/pgm/forum/thread?thread_id=876">link for those who have access</a>). After removing pairwise factors, the performance rose to 38%. Why has the joint model failed? The reason is overcounting evidence: different factor types enforce the same co-occurrences, thus creating bias towards more frequent assignments, and this shows it can be significant. Therefore, we should train models with cycles discriminatively. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
One more thought I’d like to share: graphical model design is similar to software engineering in the way that the crucial thing for the both is eliminating insignificant dependencies on the architecture design stage. </div>
<div style="text-align: justify;">
<br /></div>
</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com8tag:blogger.com,1999:blog-2293611840857369720.post-81028043002264527922012-01-05T20:05:00.000+04:002012-01-05T20:05:11.239+04:00When and why it is safe to cast floor/ceil result to integer<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
Happy new year, ladies and gents! I am back, and the following couple of posts are going to be programming-related.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The C/C++ standard library declares <a href="http://cplusplus.com/reference/clibrary/cmath/floor/">floor function</a> such that it returns a floating-point number:</div>
<pre style="color: green; font-size: 12px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: -webkit-auto;"> double floor ( double x );
float floor ( float x ); // C++ only</pre>
<div style="text-align: justify;">
I used to be tortured by two questions:</div>
<div style="text-align: justify;">
<br /></div>
<ol style="text-align: left;">
<li style="text-align: justify;">Why would the floor function return anything other than integer?</li>
<li style="text-align: justify;">What is the most correct way to cast the output to integer?</li>
</ol>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
At last I figured out the answer to both questions. Note that the rest of the post applies to the <a href="http://cplusplus.com/reference/clibrary/cmath/ceil/">ceil</a> function as well.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
For the second point, I knew the common idiom was just type-casting the output:</div>
<pre style="color: green; font-size: 12px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: -webkit-auto;"></pre>
<pre style="color: green; font-size: 12px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: -webkit-auto;">double a;</pre>
<pre style="color: green; font-size: 12px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: -webkit-auto;">int intRes = int(floor(a)); // use static_cast if you don't like brevity</pre>
<div style="text-align: justify;">
If you’ve heard <a href="http://www.exploringbinary.com/the-four-stages-of-floating-point-competence/">anything</a> about peculiarities of floating-point arithmetic, you might start worrying that floor might return not exact integer value but $\lfloor a \rfloor - \epsilon$, so that type-casting is incorrect (assume $a$ is positive). However, this is not the case. Consider the following statement.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
If $a$ is not integer and is representable as <a href="http://en.wikipedia.org/wiki/IEEE_754-2008">IEEE-754</a> floating-point number, than both $\lfloor a \rfloor$ and $\lceil a \rceil$ are representable within that domain. This means that for any float/double value floor can return a number that is integer and fits in float/double format.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: x-small;">The proof is easy but requires understanding of <a href="http://en.wikipedia.org/wiki/Floating_point">representation of floating-point numbers</a>. Suppose $a = c \times b^q, c \in [0.1, 1)$ has $k$ significant digits, i.e. $k$-th digit after the point in $c$ is not null, but all the further ones are. Since it is representable, $k$ is less then the maximum width of significand. Since $a$ is not integer, both $\lfloor a \rfloor$ and $\lceil a \rceil$ have significands with the number of significant digits less then $k$. Rounding however might increase the order of magnitude but only by one. So, the rounded number’s significand fits in $k$ digits, Q.E.D.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
However, this does not mean one can always type-cast the output safely, since, for example, not every integer stored in double can be represented as 4-byte int (and this is the answer to the question #1). Double precision numbers have 52 digit significands, so any integer number up to $2^{52}$ can be stored exactly. For the int type, it is only $2^{31}$. So if there is possibility of overflow, check before the cast or use the int64 type.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
On the other hand, a lot of int’s cannot be represented as floats (also true for int64 and double). Consider the following code:</div>
<pre style="color: green; font-size: 12px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: -webkit-auto;">std::cout << std::showbase << std::hex </pre>
<pre style="color: green; font-size: 12px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: -webkit-auto;"> << int(float(1 << 23)) << " " << int(float(1 << 24)) << std::endl</pre>
<pre style="color: green; font-size: 12px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: -webkit-auto;"> << int(float(1 << 23) + 1) << " " << int(float(1 << 24) + 1) << std::endl;
</pre>
The output is:<br />
<pre style="color: green; font-size: 12px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: -webkit-auto;"></pre>
<pre style="color: green; font-size: 12px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: -webkit-auto;">0x800000 0x1000000
0x800001 0x1000000
</pre>
<pre style="color: green; font-size: 12px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-align: -webkit-auto;"></pre>
<div style="text-align: justify;">
$2^{24}+1$ cannot be represented as float exactly (it is represented as $2^{24}$), so one should not use the float version of floor for numbers greater than few millions.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
There are <a href="http://www.exploringbinary.com/why-powers-of-ten-up-to-10-to-the-22-are-exact-as-doubles/">classes of numbers</a> that are guaranteed to be represented exactly in floating point format. See also <a href="http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html">the comprehensive tutorial</a> on floating-point arithmetic by David Goldberg.</div>
</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com3tag:blogger.com,1999:blog-2293611840857369720.post-71588740292627214352011-11-25T20:54:00.001+04:002011-11-25T21:51:32.699+04:00Kinect Apps Challenge<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<a href="http://graphics.cs.msu.ru/en">Graphics & Media Lab</a> and <a href="http://research.microsoft.com/en-us/labs/cambridge/default.aspx">Microsoft Research Cambridge</a> have announced a contest for applications that use Kinect sensor. The authors of the five brightest apps will be funded to attend the 2012 Microsoft Research PhD summer school. I blogged about the last year's event in <a href="http://computerblindness.blogspot.com/2011/08/all-summer-schools.html">my previous post</a>. Details of the contest are <a href="http://summerschool2011.graphicon.ru/en/contest">here</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
I guess there's no need to explain what <a href="http://en.wikipedia.org/wiki/Kinect">Kinect</a> is. It is extremely successful combined colour and depth sensor by Microsoft. It is being distributed for killer price (≈$150) as an add-on for Xbox, although it is hard to imagine the range of possible applications. Controlling a computer only by gestures is considered as a primer of <a href="http://en.wikipedia.org/wiki/Natural_user_interface">natural user interface</a>. To name some applications beyond gaming, this kind of NUI is useful to help surgeons to keep their hands clean:</div>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/f5Ep3oqicVU?feature=player_embedded' frameborder='0'></iframe></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Kinect helps blind people to navigate through buildings:</div>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/l6QY-eb6NoQ?feature=player_embedded' frameborder='0'></iframe></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
For more ideas look at the winners of <a href="http://kinecthacks.net/winners-of-openni-2011-developer-challenge/">OpenNI challenge</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
OpenNI is an open-source alternative to Kinect SDK. Its strong point is it is integrated with <a href="http://pointclouds.org/">PCL</a>, but beware that you cannot use it for the contest. Only <a href="http://kinectforwindows.org/">Kinect for Windows</a> SDK is allowed.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Kinect is probably the most successful <i>commercial </i>outcome from the Microsoft Research lab. MSR is unique because they do a lot of theoretical research there, and it is unclear if it is profitable for Microsoft to fund it. But the projects like Kinect reveal the doubts. I will post about MSR organization in comparison to other industrial labs in one of the later posts.</div>
</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com2tag:blogger.com,1999:blog-2293611840857369720.post-36301968909012898432011-08-16T03:35:00.005+04:002011-08-16T03:40:04.876+04:00All the summer schools<div style="text-align: justify;">This summer I attended two conferences (<a href="http://www.3dimpvt.org/">3DIMPVT</a>, <a href="http://emmcvpr11.csd.uwo.ca/">EMMCVPR</a>) and two summer schools. I know my latency is somewhat annoying, but it's better to review them now then never. :) This post is about the summer schools, and the following is going to be about the conferences.</div><div style="text-align: justify;">
<br /></div><div style="text-align: justify;">
<br /></div><div style="text-align: justify;"><b><span class="Apple-style-span">PhD Summer School in Cambridge</span></b></div><div style="text-align: justify;">
<br /></div><div style="text-align: justify;">Both schools were organized by <a href="http://research.microsoft.com/en-us/">Microsoft Research</a>. The first one, <a href="http://research.microsoft.com/en-us/events/2011summerschool/">PhD Summer School</a> was in Cambridge, UK. The lectures covered some general issues for computer science PhD students (like using cloud computing for research and career perspectives) as well as some recent technical results by Microsoft Research. From the computer vision side, there were several talks:</div><div style="text-align: justify;"><ul><li><a href="http://research.microsoft.com/en-us/people/antcrim/">Antonio Criminisi</a> described their <a href="http://research.microsoft.com/en-us/projects/MedicalImageAnalysis/">InnerEye</a> system for retrieval of similar body part scans, which is useful for diagnosis based on similar cases' medical history. He also featured the basics of Random Forests as an advertisement to his <a href="http://research.microsoft.com/en-us/people/antcrim/decisionforeststutorial.aspx">ICCV 2011 tutorial</a>. The new thing was using peculiar weak classifiers (like 2nd order separation surfaces). Antonio argued they perform much better then trees in some cases.</li><li><a href="http://research.microsoft.com/en-us/um/people/awf/">Andrew Fitzgibbon</a> gave a brilliant lecture about pose estimation for <a href="http://en.wikipedia.org/wiki/Kinect">Kinect</a> (MSR Cambridge is really proud of that algorithm [<a href="http://research.microsoft.com/pubs/145347/BodyPartRecognition.pdf">Shotton, 2011</a>], this is the topic for another post).</li><li><a href="http://graphics.cs.msu.ru/people/staff/obarinova">Olga Barinova</a> talked about the modern methods of image analysis and her work for the past 2 years (graphical models for <a href="http://graphics.cs.msu.ru/en/science/research/machinelearning/hough">non-maxima suppression for object detection</a> and <a href="http://graphics.cs.msu.ru/en/science/research/3dreconstruction/geometricparsing">urban scene parsing</a>). </li></ul>The other great talks were about <a href="http://www.netmf.com/gadgeteer/">.NET Gadgeteer</a>, the system for modelling and even deployment of electronic gadgets (yes, hardware!), and <a href="http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/">F#</a>, Microsoft's alternative to <a href="http://en.wikipedia.org/wiki/Scala_(programming_language)">Scala</a>, the language that combines object-oriented paradigm with functional. <a href="http://en.wikipedia.org/wiki/Tony_Hoare">Sir Tony Hoare</a> also gave a lecture, so I had a chance to ask him how he ended up in Moscow State University in the 60s. It turns out he studied statistics, and <a href="http://en.wikipedia.org/wiki/Andrey_Nikolayevich_Kolmogorov">Andrey Kolmogorov</a> was one of the leaders of the field that time, so that internship was a great opportunity for him. He said he had liked the time in Moscow. :) There were also magnificent lectures by <a href="http://research.microsoft.com/en-us/people/simonpj/">Simon Peyton-Jones</a> about giving talks and writing papers. Those advices are <i>the must</i> for everyone who does research, you can find the slides <a href="http://research.microsoft.com/en-us/um/people/simonpj/papers/giving-a-talk/giving-a-talk.htm">here</a>. Slides for some of the lectures are available from <a href="http://research.microsoft.com/en-us/events/2011summerschool/#Abstracts">the school page</a>.</div><div style="text-align: justify;">
<br /></div><div style="text-align: justify;">The school talks did not take all the time. Every night was occupied by some social event (go-karting, <a href="http://en.wikipedia.org/wiki/Punt_(boat)">punting</a> etc.) as well as unofficial after-parties in Cambridge pubs. Definitely it is the most fun school/conference I've attended so far. Karting was especially great, with the quality track, pit-stops, stats and prizes, so special thanks to Microsoft for including it to the program! </div><div style="text-align: justify;">
<br /></div><div style="text-align: justify;">
<br /></div><div style="text-align: justify;"><b><span class="Apple-style-span">Microsoft Computer Vision School in Moscow</span></b></div><div style="text-align: justify;"><b><span class="Apple-style-span">
<br /></span></b></div><div style="text-align: justify;"><span class="Apple-style-span">This year, <a href="http://summerschool2011.graphicon.ru/en">Microsoft Research summer school</a> in Russia was devoted to computer vision and organized in cooperation with our lab. The school started before its official opening with a <a href="http://courses.graphicon.ru/school/mscvs2011/assign"><b>homework assignment</b></a> we authored (I was one of four student volunteers). The task was to develop an image classification method capable to distinguish two indoor and two outdoor classes. The results were rated according to the performance on the hidden test set. Artem Konev won the challenge with 95.5% accuracy and was awarded a prize consisted of an xBox and Kinect. Two years ago we used those data for the projects on <i>Introduction to Computer Vision</i> course, where nobody reached even 90%. It reflects not just the lore of participants, but also the progress of computer vision: all the top methods used PHOW descriptors and linear SVM with approximate decomposed <i>χ<sup>2</sup></i> kernel [<a href="http://www.robots.ox.ac.uk/~vgg/publications/papers/vedaldi10.pdf">Vedaldi and Zisserman, 2010</a>], which were unavailable that time!</span></div><div style="text-align: justify;"><span class="Apple-style-span">
<br /></span></div><div style="text-align: justify;">In fact, <b><a href="http://www.robots.ox.ac.uk/~az/">Andrew Zisserman</a> </b>was one of the speakers. Andrew is the most cited computer vision researcher and the only person whose <a href="http://computerblindness.blogspot.com/2010/10/papers-citations-co-authorship-and.html">Zisserman number</a> is zero. :) His course was on <a href="http://summerschool2011.graphicon.ru/en/courses/az">Visual Search and Recognition</a>, including instance-level and category-level recognition. The ideas that were relatively new:</div><div style="text-align: justify;"><div><ul><li>when computing visual words, sometimes it is fruitful to use soft assignments to clusters, or more advanced methods like <i>Locality-constrained linear coding</i> [<a href="http://www.robots.ox.ac.uk/~vgg/rg/papers/wang_etal_CVPR10.pdf">Wang et al., 2010</a>];</li><li>for instance-level recognition it is possible to use <i>query expansion</i> to overcome occlusions [<a href="http://marcade.robots.ox.ac.uk:8080/~vgg/publications/2007/Chum07b/chum07b.pdf">Chum et al., 2007</a>]: the idea is to use the best matched images from the base as new queries;</li><li>object detection is traditionally done with sliding window, the problems here are: various aspect ratio, partial occlusions, multiple responses and background clutter for substantially non-convex objects;</li><li>for object detection use bootstrapped sequential classification: on the next stage take the false negative detections from the previous stage as negative examples and retrain the classifier;</li><li><i>multiple kernel learning</i> [<a href="http://www.vision.ee.ethz.ch/publications/papers/proceedings/eth_biwi_00649.pdf">Gehler and Nowozin, 2009</a>] is a hot tool that is used to find the ideal linear combination of SVM kernels: combining different features is fruitful, but learning the combination is not much better than just averaging (Lampert: “Never use MKL without comparison to simple baselines!”); </li><li>movies are common datasets, since there are a lot of repeated objects/people/environments, and the privacy issues are easy to overcome. The movies like <i><a href="http://www.imdb.com/title/tt0107048/">Groundhog Day</a></i> and <i><a href="http://www.imdb.com/title/tt0130827/">Run Lola Run</a></i> are especially good since they contain repeated episodes. You can try to find <a href="http://arthur.robots.ox.ac.uk:8081/cgi-bin/shot_viewer/vg_single_image.py?movie+name=lola&frame+nb=25901">the clocks on Video Google Demo</a>.</li></ul><div>Zisserman talked about <a href="http://pascallin.ecs.soton.ac.uk/challenges/VOC/">PASCAL challenge</a> a lot. During a break he mentioned that he annotated some images himself since “it is fun”. One problem with the challenge is we don't know if the progress over years really reflects the increased quality of methods, or is just because of growth of the training set (though, it is easy to check).</div></div><div>
<br /></div><div><b><a href="http://research.microsoft.com/en-us/um/people/awf/">Andrew Fitzgibbon</a></b> gave two more great lectures, one about Kinect (with slightly different motivation than in Cambridge) and another about continuous optimization. He talked a lot about reconciling theory and practice:</div><div><ul><li>the life-cycle of a research project is: 1) chase the high-hanging fruit (theoretically-sound model), 2) try to make stuff really work, 3) look for the things that confuse/annoy you and fix them;</li><li>for Kinect pose estimation, the <i>good</i> top-down method based on tracking did not work, so they ended up classifying body parts discriminatively, temporal smoothing is used on the late stage;</li><li>“don't be obsessed with theoretical guarantees: they are either weak or trivial”;</li><li>on the simplest optimization method: “How many people have invented [coordinate] alternation at some point of their life?”. Indeed, the method is guaranteed to converge, but the problems arise when the valleys are not axis-aligned;</li><li>gradient descent is not a panacea: in some cases it does small steps too, <span class="Apple-style-span" style="line-height: 19px; "><span class="Apple-style-span"><a href="http://en.wikipedia.org/wiki/Conjugate_gradient_method">conjugate gradient method</a> is better (it uses 1st order derivatives only);</span></span></li><li><span class="Apple-style-span" style="line-height: 19px; "><span class="Apple-style-span">when possible, use second derivatives to determine step size, but estimating them is hard in general;</span></span></li><li><span class="Apple-style-span"><span class="Apple-style-span" style="line-height: 19px;">one almost never needs to take the matrix inverse; in MATLAB, to solve the system <i>Hd = −g,</i> use backslash: <i>d = −H\g</i>;</span></span></li><li><span class="Apple-style-span"><span class="Apple-style-span" style="line-height: 19px; "><span class="Apple-style-span">the Friday evening method is to try MATLAB </span><span class="Apple-style-span"><a href="http://www.mathworks.com/help/techdoc/ref/fminsearch.html"><span class="Apple-style-span">fminsearch</span></a> </span><span class="Apple-style-span">(implementing the derivative-free </span><span class="Apple-style-span" style="line-height: normal; "><span class="Apple-style-span"><a href="http://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method">Nelder-Mead method</a></span></span><span class="Apple-style-span">).</span></span></span></li></ul><div><span class="Apple-style-span"><span class="Apple-style-span" style="line-height: 19px;">Dr. Fitzgibbon asked the audience what the first rule of machine learning is. I hardly helped over replying “Never talk about machine learning”, but he expected the different answer: “Always try the nearest neighbour first!” </span></span></div></div><div><span class="Apple-style-span"><span class="Apple-style-span" style="line-height: 19px;">
<br /></span></span></div><div><span class="Apple-style-span"><span class="Apple-style-span" style="line-height: 19px;"><b><a href="http://pub.ist.ac.at/~chl/">Christoph Lampert</a></b> gave lectures on kernel methods, and structured learning, and kernel methods for structured learning. Some notes on the kernel methods talk:</span></span></div><div><ul><li><span class="Apple-style-span"><span class="Apple-style-span" style="line-height: 19px;">(obvious) don't rely on the error on a train set, and (less obvious) don't even report about it in your papers;</span></span></li><li><span class="Apple-style-span"><span class="Apple-style-span" style="line-height: 19px;">for SVM kernels, in order to be legitimate, a kernel should be an inner product; it is often hard to prove it directly, but there are workarounds: a kernel can be drawn from a conditionally positive-definite matrix; sum, product and exponent of a kernel(s) is a kernel too etc. (thus, important for multiple-kernel learning, linear combination of kernels is a kernel);</span></span></li><li><span class="Apple-style-span"><span class="Apple-style-span" style="line-height: 19px;">since training (and running) non-linear SVMs is computationally hard, explicit feature maps are popular now: try to decompose the kernel back to conventional dot product of modified features; typically the features should be transformed to infinite sums, so take first few terms <span class="Apple-style-span" style="font-family: Georgia, serif; font-size: 16px; line-height: normal; ">[<a href="http://www.robots.ox.ac.uk/~vgg/publications/papers/vedaldi10.pdf">Vedaldi and Zisserman, 2010</a>]</span>;</span></span></li><li><span class="Apple-style-span"><span class="Apple-style-span" style="line-height: 19px;">if the kernel can be expressed as a sum over vector components (e.g. <span class="Apple-style-span" style="font-family: Georgia, serif; font-size: 16px; line-height: normal; "><i>χ<sup>2</sup></i></span> kernel $\sum_d x_d x'_d / (x_d + x'_d)$), it is easy to decompose; <i>radial basis function </i>(RBF) kernel ($\exp (\|x-x'\|^2 / 2\sigma^2)$) is the exponent of a sum, so it is hardly decomposable (more strict conditions are in the paper);</span></span></li><li><span class="Apple-style-span"><span class="Apple-style-span" style="line-height: 19px;">when using RBF kernel, you have another parameter <i>σ</i> to tune; the rule of thumb is to take <i>σ</i>² equal to the median distance between training vectors (thus, cross-validation becomes one-dimensional).</span></span></li></ul><div><span class="Apple-style-span" style="line-height: 19px;">Christoph also told a motivating story why one should always use cross-validation (so just forget the previous point :). Sebastian Nowozin was working on his [<a href="http://www.nowozin.net/sebastian/papers/nowozin2007actionclassification.pdf">ICCV 2007</a>] paper on action classification. He used the method by <a href="http://zimmer.csufresno.edu/~yucao/csci226/Projects/Project_6/Behavior%20Recognition%20via%20Sparse%20Spatio-Temporal%20Features.pdf">Dollár et al. [2005]</a> as a baseline. The paper reported 80.6% accuracy on the KTH dataset. He outperformed the method by a couple of per cents and then decided to reproduce Dollár's results. Imagine his wonder when simple cross-validation (with same features and kernels) yielded 85.2%! So, Sebastian had to improve his method to beat the baseline. </span></div><div><span class="Apple-style-span" style="line-height: 19px;">
<br /></span></div><div><span class="Apple-style-span" style="line-height: 19px;">I feel I should stop writing about the talks now since the post grows enormously long. Another Lampert's lecture and <a href="http://research.microsoft.com/en-us/um/people/carrot/">Carsten Rother</a>'s course on CRFs were close to my topic, so they deserve separate posts (I already reviewed basics of <a href="http://computerblindness.blogspot.com/2011/04/on-structured-learning.html">structured learning</a> and <a href="http://computerblindness.blogspot.com/2010/05/max-product-optimization-methods-and.html">max-product optimization</a> in this blog). <a href="http://www.ais.uni-bonn.de/~amueller/">Andreas Müller</a> <a href="http://peekaboo-vision.blogspot.com/2011/08/cvml-ivan-laptev.html">blogged about</a> the recent <a href="http://www.irisa.fr/vista/Equipe/People/Ivan.Laptev.html">Ivan Laptev</a>'s action recognition talk on CVML, which was pretty similar to ours. <a href="http://summerschool2011.graphicon.ru/en/schedule">The slides are available for all MSCVS talks</a>, and <b>videos will be shared in September</b>.</span></div></div><div><span class="Apple-style-span" style="line-height: 19px;">
<br /></span></div><div><span class="Apple-style-span" style="line-height: 19px;">There were also several <b>practical sessions</b>, but I personally consider them not that useful, because one hardly ever can <i>feel</i> the essence of a method in 1.5 hours changing the code according to some verbose instruction. It is more of an art to design such tutorials, and no one can really master it. :) Even if the task is well-designed, one may not succeed performing it due to technical reasons: during Carsten Rother's tutorial, <a href="https://plus.google.com/115888484810854823852/about?hl=en">Tanya</a> and me spent half an hour to spot the bug caused by confusing <span class="Apple-style-span" >input</span> and <span class="Apple-style-span">index</span> variable names (MATLAB is still dynamically typed). <a href="http://cmp.felk.cvut.cz/~chum/">Ondrej Chum</a> once mentioned how his tutorial was doomed since half of the students did not know how to work with sparse matrices.<span class="Apple-style-span"> So, practical sessions are hard.</span></span></div><div><span class="Apple-style-span" style="line-height: 19px;"><span class="Apple-style-span">
<br /></span></span></div><div><span class="Apple-style-span" style="line-height: 19px;"><span class="Apple-style-span">There was also a <b>poster session</b>, but I cannot remember a lot of bright works, unfortunately. <a href="http://www.cvc.uab.es/personal2.asp?idioma=en&id=720&shapovalovanataliya">Nataliya Shapovalova</a> who won the best poster award, presented quite interesting work on action recognition, which I liked as well (and it is not the last name bias! :) My congratulations to Natasha!</span></span></div><div><span class="Apple-style-span" style="line-height: 19px;"><span class="Apple-style-span">
<br /></span></span></div><div><span class="Apple-style-span" style="line-height: 19px;"><span class="Apple-style-span">The planned <b>social events</b> were not so exhaustive as in Cambridge, but self-organization worked out. The most prominent example was our overnight walk around Moscow, in which a substantial part of school participants took part. It included catching last subway train, drinking whiskey and gin, a game of <s>guessing</s> hallucinating names of each other, and moving a car from the tram rail to let the tram go in the morning. :) I also met some of <a href="http://opencv.willowgarage.com/">OpenCV</a> developers from Nizhny Novgorod there.</span></span></div><div>
<br /></div></div><div style="text-align: justify;">
<br /></div><div style="text-align: justify;">MSCVS is a one-time event, unfortunately. There are at least three <b>annual computer vision summer schools</b> in Europe: <a href="http://www.dmi.unict.it/icvss/">ICVSS</a> (the most mature one, I attended it <a href="http://computerblindness.blogspot.com/search/label/ICVSS">last year</a>), <a href="http://www.di.ens.fr/willow/events/cvml2011/">CVML</a> (held in France by INRIA) and <a href="http://www.vision.ee.ethz.ch/summerschool2011/">VSSS</a> (includes sport sessions besides the lectures, held in Zürich). If you are a PhD student in vision (especially in the beginning of your program), it is worth attending one of them each year to keep up with current trends in the vision community, especially if you don't go to the major conferences. The sets of topics (and even speakers!) have usually large intersection, so pick one of them. ICVSS has arguably the most competitive participant selection, but the application deadline and acceptance notification are in March, so one can apply to the other schools if rejected.</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com7tag:blogger.com,1999:blog-2293611840857369720.post-88943286199831658682011-06-25T11:27:00.004+04:002011-06-25T12:32:02.758+04:00Google search by image<div style="text-align: justify;">Last week Google introduced <i><a href="http://www.youtube.com/watch?&v=t99BfDnBZcI">Search by Image</a></i> feature. There were a handful of web-sites that suggested content-based image retrieval in the Internet, but the quality was low, <a href="http://computerblindness.blogspot.com/2010/03/web-scale-content-based-image-retrieval.html">as I blogged earlier</a>. I repeated the queries <a href="http://tineye.com">TinEye</a> failed at, and Google image search found them both! It found <a href="http://images.google.com/search?tbs=sbi:AMhZZisjgaicgygO1CzML9zQMv0h6l5RMPu8mv-Ko-pBo67FdV1HyUQGg-Csuw_1oVKZrjvHMsVU3ZEncoye9kXrjIHC9-SQGmeBnrREih40b4MTShpwIjWFz6mduatSA2MFBnBdP7j8gkfC8foLcHl8v41s3STsCywI3rQJ9ZXlV_1uxr6H1j6Cnf3N-jHsIuvb6EyZgCAZpTAGvuRAYdDwe9noW4OLZpiTSaXoOeY-ikVaTVY9XV2LBYdn8karWEm2zhNcY5axPvdBmYoujTf6o0VugqkC6A3kG-BxJ3MEuVhL_1-nIzc66_1u3RLQlIYMLWeE06jtY8mlO4g03FXYhqQAFbvP0jv7BhGfsY-nSqNsFP9kYlC4Ea0iEgGOeOID5ofUtCy8TTu7SRkxOdY8KIjV4ZQuCSb799kNrwIiuhatpdT0j11FlvYZd0dS076OoMHvd3w2fAguLk3JvTCYfJvKOulwPso2t52mrqNepTOSJdVKF2jlpkevwPq0RSK4g29BOxfJSbvB5RFmWNVBgRX1ulRiuqepseIU-i6EXCbF3E4H-zdRSh8bGnjiaIy2KQQYuoNofz16L4sFqAlRjCaaf17lSMBngxR-nuZY4Bp9NKcS5jL3ORO00Y9YG8gbxfa9QKtRcTUOJY4IDuAuJ7qOZu8VpDOkLw7Bd2te1DNMpO_1HKJ3gkbmS1oA0AugmgRYOOAOBJ7jTrC1GEiOu0D7WpoX4KcpVAeRF-XLDklY0RTPRJoI-Zi6Hzp_1zvq_168ZLo_1wWBQ1otN3mdsxLOam8N5_1rZurENOia6ckito5rETreWGrpjjdPG-Z3orRbaYIUCYY5TO7BeJpoQ1PtCnY698uqrMkWrQfCCB2QCQB4u8-5IGn9QOpHjJ_1-Vn88ER1Mzt7Sm_1pM0kvsquCrSJPvY3gsGZSyz5dxdhvUG1XYiV3Tgz0AwcJ3MNs71DC0IVB4_1H0UFhK4KDqT77LONhL_1E0ItjZP13a7DsnSbDLpyD4wHC1NTi8Lg7aza728LpKxtB_1IKGcV3deKKWot1PgJvCjbGJ6JK0x8Oc6A11XSnB_1znUnYZtGKzH2YsdGLx9wINFVcbxrx7JMO3MkYTsd6gYxs0Yh78YSgOpiQ4fwo-ENjwkHzK4yw7X6TyddBenX-W5hNpokhX1BItO7xQ0FcJDjpZwDo1uWN8Gd4MEWk9FoDGTyZZ13oDiYGrImkhIra6hgvtVpQFyQQ3JaU8syEhKECkhaIIzmT97nHlHMVYtwn0FAUoGXu-gh6BeW6yPgH3Z9wUCsHu_1CcQa7rIVtqQG47-cpJfAjMZ6eE-UHx8Fn3slx0X45AY&hl=en&biw=1166&bih=878&gbv=2">few instances of Marion Lenbach </a>(one of them from this blog, which means the coverage is large!), and I finally remembered <a href="http://images.google.com/search?tbs=sbi:AMhZZitnFJPZ0Ir9xs08V-1aMnmiMLJFDC9WK1sZ2cI2OSlipQ1dhdWY1L4QfV9_1k3J4wvpLzDDJfDx5INwhM_1C34yw16744c9CRaWwsy4tUTjPCSKdGDK9Xb_1Q7zXcvSOG14Ljvc8KbPoNW8IumbhTDCrBOaaRi0tC5a-7OkgHuEXpijtTCgGGk2XJu_1e4QwLvxgGw31uQIjuDRc0btU_1zZDJPRDfNxYh--MstB4xcAm1NVY6H6MH8r-9n_1qpI3O3oOFt-e6ePDoFz3bvhUZz-1fQxueAj_1MlxA0EuWHUEPcIo4AWUW8XD_12bZZP1i-PgvJ3AwCQYqT6qqgvpDAMFUHsQ3qSqAvgQba67zl7Dh1fuL_1d1BOplyvHtvtIuHb_1ZKeXmKt5KBGnlcRYWLUdYbGGSkQiG9j07x2NFiJyt92dKKT8uzydiQh4L84LW9j0Cto6EvH5lfVcldTn4ZpYNYVHcSqeRY_1VpGY9qkSBRWp3FA7qxj67XyHL6kBY7Zd3V0875BbfWUFMjBo3kQyuZ3goZbW7NYdjk_1miVzGE7qDOqHHVBQN3Pxzso_1gkfdQDzfLtewh-KtKpS7AhQJplUnzEch_19Wb28yhkd68nC9YdySr-8irvsfnhbmK96vdmZ4vwdC9Hn3bW4JnV5Z9GNWLdnCfYfPB7GwdC7pl-MESjA_1e1C_1aJ2lYe3-HSg_1PKbe1bxEmvIoc54YtWjveNcuvYbzHujTndSvjEvrLpm5RPcIigvdbK7Ra7VuSx1b_14sZ8-5KsKxLISyc-GxWXm5DWKV1wDnCPhamAuUhk2c4x6_1Imcrk-s4nNejNCW8cF9D3-yJP2XFWV4d5ybNDPH4pJDQCBAKKZPAhgnpL2dLF2g3FT3YGNUgaAz8rE_15QG06wLY2mxkBZj51ujh7EM-ITR4au_13GcgjRq4D6LEO6KV8dSFvF4MXrlPUbDPHCdIHtGDRkX1BCTuzDq4VxBhZdMhubNet5J6c6JhtL56zNAFdL1qzo-stTqxBgGkgTU8mikeoEzky_19N-Y6bWH_1eb00-p3M5LXN7eJxBz4exLQ53B9RlzqnYcM3vXDnLvoxOR_1vQkBBZTuR2fZ1oS3Xx_1n-DfxO2EuMPGbuzNwPjq5H2u6_1w0SKVHjvdifvQjaFq2IaG3HYM7wYYbD8Iim9KzlBQMOmhpuSeYl9jr03fnD44OpHurTsN5zg_1BVppd4gEdvs2G8hYUJEWkpyeiMA6Jjyez_16_1WEbkWw9Dr_10s3xxAOQTds_1rn30tlH4iYNhv8w771sv6J_1qAOuxRrCfMGgIU9C9Y4ao0_1LZ7DzpLWBaKyHkyT4mfK9lu0&hl=en&biw=1166&bih=878&gbv=2">the movie</a> from the HOG paper: <a href="http://www.imdb.com/title/tt0134119/">The Talanted Mr. Ripley</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, from the quick glance Google finally accomplished what the others could not do for ages. Why did they succeed? There are two possible reasons: large facilities that allow building and storing large index efficiently, and a unique technology. The former is surely the case: it seems the engine indexed a large portion of the photos in the web. I cannot say anything about the technology: there is nothing about that among the <a href="http://googleresearch.blogspot.com/2011/06/google-at-cvpr-2011.html">Google's CVPR papers</a>, so one need to do black-box testing to see which transformations and modifications are allowed.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Google seems to expand to the areas of multimedia web, even where the niche is already occupied. Recently <a href="http://googleresearch.blogspot.com/2011/06/instant-mix-for-music-beta-by-google.html">they announced</a> their alternative to <a href="http://grooveshark.com/">grooveshark</a>, the recommendation system for <a href="http://music.google.com">Music beta</a> (the service is unfortunately available on invitation basis in the US only). The system is not based on <a href="http://en.wikipedia.org/wiki/Collaborative_filtering">collaborative filtering</a> (only), they (also) analyse the content. I planned to investigate this area too, but given that it is becoming mature and thus not so alluring. I am eager to see if the service will succeed. After all, <a href="http://www.google.com/buzz">Buzz</a> did not replace <a href="http://twitter.com/">Twitter</a>, and <a href="http://www.orkut.com/">Orkut</a> did not replace <a href="http://www.facebook.com/">Facebook</a>.</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com1tag:blogger.com,1999:blog-2293611840857369720.post-53110775985982306682011-04-26T02:20:00.003+04:002011-04-26T02:29:15.516+04:00On structured learning<div style="text-align: justify;">This post is devoted to <i>structured learning</i>, a novel machine learning paradigm which may be used for solving different computer vision problems. For example, finding optimal parameters of graphical models is essentially a structured learning problem. I am going to give an introductory talk on structured learning for Dmitry Vetrov's <a href="http://machinelearning.ru/wiki/index.php?title=Smais">graphical models class</a> this Friday at 4:20 pm, so please feel free to drop by if you are in Moscow (the talk is to be in Russian).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Structured learning is basically a very general <a href="http://en.wikipedia.org/wiki/Supervised_learning">supervised learning</a> setting. Consider the <a href="http://en.wikipedia.org/wiki/Statistical_classification">classification</a> setting as a basic problem. One needs to find parameters of a function that maps a feature vector to one of the pre-defined class labels $\mathbb{R}^m \to \{c_1, c_2, \dots, c_K\}$. The fundamental property is the classes are unordered and <i>orthogonal</i>. The latest means the probabilities of objects classified as different labels are uncorrelated (some negative correlation is normal since strong classifying of an object as $c_i$ decreases the probability for rest of the labels). </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Now consider <a href="http://en.wikipedia.org/wiki/Regression_analysis">regression</a>, which is another supervised learning problem where feature vectors map to real values: $\mathbb{R}^m \to \mathbb{R}$. It might be considered a classification problem with infinite number of classes. There are two obvious flaws in this reduction. First, a training set is unlikely to have at least one example for each class. To overcome this one can quantize the codomain and train a classifier over a finite set of class labels. However, it leaves us with the second flaw: the method does not take into account correlation between the possible outcomes. The bins are ordered: the training features that correspond to neighbouring bins should be handled differently from those that correspond to distant bins. That's why some global methods (like <a href="http://en.wikipedia.org/wiki/Linear_regression">linear regression</a>) are used.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The similar situation is in the case of a <i>structured </i>outcome. In this case, one usually has a plenty of possible outcomes, but they are not independent. The techniques of structured learning are applicable when the outcomes have some underlying structure, and the elements of the structure have similar sets of features. Also, it should be possible to estimate how bad the prediction is (often in terms of incorrectly predicted elements), which is called <i>structured loss</i>. The methods allow small deviations from the ideal training outcome, which are possible e.g. because of noise, but penalize the substantial ones (just like regression!). The prediction function (parameters of which are tuned) can thus be formalized as a mapping $\mathbb{R}^{m \times l} \to \{c_1, c_2, \dots, c_K\}^l$.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The possible example is <a href="http://en.wikipedia.org/wiki/Hidden_Markov_model">hidden Markov model</a> learning, where the elements are emission potentials and transition potentials, and the loss can be defined as the number of wrong HMM outputs. According to the upper formalization, there are $l$ elements, each represented by $m$ features (in practice, $m$ can vary for different types of elements). Since the labels of transition elements are strictly defined by emission outputs, not every outcome is possible.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Another example of a structured prediction problem is <a href="http://en.wikipedia.org/wiki/Statistical_parsing">natural language parsing</a>. Given a sentence of a natural language, the corresponding parsing tree is to be constructed. Surprisingly, parsing trees could also be represented as high-dimensional vectors, with constraints applied. To summarize, structure prediction has outcome that is <b>multivariate</b>, <b>correlated </b>and <b>constrained</b>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Ideally, the parameters of structured prediction should be tuned via likelihood maximization, but this turns out to be intractable due to the need of computing the partition function on each gradient optimization step. That's why the L<sup>2</sup>-regularized hinge loss is usually minimized. The prediction algorithm is represented as a scoring function over possible outcomes. The task of the learning algorithm is to find the parameters of the scoring function to make it return maximum values for the true outcomes on the training set, and low ones for the instances that are far from optimal. Given that, the margin between good and bad possible outcomes should be maximized (in terms of scoring function value).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">You can learn more about structure learning from <a href="http://www.seas.upenn.edu/~taskar/">Ben Taskar</a>'s <a href="http://nips.cc/Conferences/2007/Program/event.php?ID=573">NIPS 2007 tutorial</a>. See also our recent <a href="http://shapovalov.ro/papers/Shapovalov-Velizhev-3dimpvt2011.pdf">3DIMPVT paper</a> for an up-to-date survey of MRF/CRF training methods and their applications.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">PS. I've installed <a href="http://www.mathjax.org/">MathJax </a>equation engine into the blog. Seems to work, huh?</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com2tag:blogger.com,1999:blog-2293611840857369720.post-87805120944439280192011-03-31T00:36:00.005+04:002011-03-31T01:27:18.539+04:00[CFP] GraphiCon-2011 and MSCVSS<div style="text-align: justify;"><a href="http://graphics.cs.msu.ru/en">Our lab</a> traditionally organizes <a href="http://gc2011.graphicon.ru/en">GraphiCon</a>, the major (and may be the only) conference in Russia specialized in computer graphics and computer vision. The conference is not very selective, still it has a decent community of professionals behind it. So, if you want to get a quality feedback on your preliminary work, or always wanted to visit Russia, consider submitting a paper by May, 16.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b>Special offer</b> for young readers of my blog: If it is the place where you learned about the conference and decided to submit a paper, I'll buy you a beer during the conference. :) Consider it another social micro-event.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Another piece of news is primarily for undergrads and PhD students from Russia. Our lab and Microsoft Research organize another event this summer: <a href="http://summerschool2011.graphicon.ru/en">Computer vision summer school</a>. There are such lecturers as <a href="http://pub.ist.ac.at/~chl/">Cristoph Lampert</a> and <a href="http://www.robots.ox.ac.uk/~az/">Andrew Zisserman</a>, among others. Participation is free of charge, and accommodation is provided. Deadline is on April, 30. You should not miss it!</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com2tag:blogger.com,1999:blog-2293611840857369720.post-67621180732568306262011-03-07T23:01:00.007+03:002011-04-25T23:26:48.384+04:00IEEE goes evil?<div style="text-align: justify;">In November, the IEEE released <a href="http://www.ieee.org/publications_standards/publications/rights/rights_policies.html">a new copyright agreement</a>, that is to be signed by the authors of all papers published by the organization since January 2011. The main novelty is that the authors are not allowed to publish final versions of their papers on-line. Fortunately, we still may post the versions accepted for print (<i>with </i>post-review corrections), which is still great, indeed. You can find the FAQ concerning the new policy <a href="http://www.ieee.org/documents/authorversionfaq.pdf">here</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It seems the IEEE realizes that nowadays there is almost no need in their printed materials and the digital library as search engines manage to find papers posted by their authors. No doubt that they do <i>not </i>want to restrict dissemination of scientific results, but that is a question of their survival as a major publishing organization. The worst is they use the drug-dealer strategy: first they organize great conferences and journals (the majority of top venues in my field are run by IEEE), and allow authors to re-publish anything on-line ("the first hit for free"), then cut it off. Yes, one still may post an accepted version, but who knows what is their next step?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The IEEE motivate their decision by preserving the value of their database: "<i>the IEEE is better able to track usage of articles for the benefit of authors and journals</i>". One can object that there are big and growing open-access databases like <a href="http://mendeley.com/">Mendeley</a>, where all the statistics is open and refined (the personal info of users is known). Also, not all institutions have an access to the library. For example, my university (the largest one in Russia) does not. And I cannot afford to pay $25 per paper. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What can we do about that? Sure, it is impossible to stop publish in IEEE venues, although the community can find the way around, as was demonstrated by the founders of <a href="http://jmlr.csail.mit.edu/">JMLR</a>, now the major journal in machine learning, which was <a href="http://yaroslavvb.blogspot.com/2010/11/springer-temporarily-opens-machine.html">created to cancel the overhead of the commercial publishing model</a>. Matt Blaze encourages the researchers <a href="http://www.crypto.com/blog/copywrongs/">not to serve as reviewers for IEEE</a>. As for me, I'm too young (as a researcher) to be a reviewer. <s>But it also has effect on me: I am now preparing a final version for the proceedings published by the IEEE, and I doubt if there is any use in improving my paper (even given a quality review), assuming that majority of researchers will use the accepted version and will never see the revised one.</s></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">UPD (Mart 14, 2011). I should have misunderstood the new policy at first, Blaze's post and IEEE terminology somehow misled me. The <i>accepted</i> paper means <b>accepted for print</b> by the IEEE, not the submitted for review. So, you still <b>can use review results</b> to correct the paper and post it on-line, then submit for publishing, where the formatting will probably be adjusted. So, <b>any meaningful part could be reflected in the on-line version</b>. I have corrected the post now, so it may seem weird.</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com3tag:blogger.com,1999:blog-2293611840857369720.post-16017827064243806882011-02-28T23:02:00.000+03:002011-03-01T02:46:59.878+03:00The one about CERN<div style="text-align: justify;">About a month ago I visited <a href="http://en.wikipedia.org/wiki/CERN">CERN</a>, arguably the most famous research organization in Europe. It is the place where <a href="http://en.wikipedia.org/wiki/World_Wide_Web">the World Wide Web</a> has been invented, and the home of <a href="http://en.wikipedia.org/wiki/Large_Hadron_Collider">the Large Hadron Collider</a>. I was very excited about the trip, <a href="http://www.youtube.com/watch?v=510oEykrQiE">much like Sheldon Cooper</a>. Here are some facts which were new to me:</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">... about CERN:</div><div><ul><li style="text-align: justify;">CERN was founded after the WWII by a bunch of European countries. They were exhausted by the war, and the only way to catch up with the USA and USSR in fundamental science was to join forces.</li><li style="text-align: justify;">The original name was <i>Conseil Européen pour la Recherche Nucléaire</i> (European Council for Nuclear Research), which abbreviated to CERN. Later the name has been officially changed to <i>European laboratory for particle physics</i>, which is both more relevant and less fearful for the locals, however the brand CERN is used now even in official documents.</li><li style="text-align: justify;">There are almost 3,000 full-time employees, but most of them are engineers and not scientists. There are a lot of visiting researchers though.</li><li style="text-align: justify;">There are 20 member states now (primarily EU states), and 6 observer states (such as Russia and the USA).</li><li style="text-align: justify;">CERN's annual budget is about € 1 billion, it is funded by the member states in proportion to their economical power, e.g. Germany gives 20% of the money.</li><li style="text-align: justify;">The budget money are spent to infrastructure and support, all the individual experiments are funded by research groups and their universities. </li><li style="text-align: justify;">In spite of the USA is not a member state, it leads on the number of researchers who work on CERN projects (more than thousand), second is Germany, third is Russia (yes, we still have a good shape in particle physics). It turned out that everybody at CERN spoke Russian, even the janitor. :)</li></ul></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">... about the LHC:</div><div style="text-align: justify;"><ul><li>It is in fact a circular tunnel of 27 km in circumference lying 175 m beneath the ground.</li><li>The tunnel was used before the LHC, it was build in 1983 for <a href="http://en.wikipedia.org/wiki/Large_Electron%E2%80%93Positron_Collider">the Large Electron-Positron Collider</a>. In 2008 it was upgraded to be able to accelerate heavy particles like protons to become the Large <i>Hadron </i>Collider (remind that proton and neutron are thousand times heavier than electron).</li><li>The tunnel is about 4 meters in diameter, one can walk there or ride a bike. </li><li>The tunnel encapsulates two small pipes for the particles that intersect at four points (to make the tracks' lengths equal, like in speed scating arenas). There are more then a thousand of electric magnet dipoles along the pipe. They are not that big as I imagined before.</li><li>It is nearly vacuum and zero temperature inside the tubes.</li><li>Proton beams are not generated inside the LHC. First, they are accelerated in the linear accelerator and almost reach the speed of light <i>c</i>. While the speed is rising, it becomes harder and harder to increase it since it cannot overcome the speed of light. Then they are accelerated in the small circular accelerator, and only after that they are injected into the LHC where during 40 minutes the beams are accelerated to speed as much close to <i>c</i> as possible.</li><li>When accelerated, the beams are being observed during 10 hours. They suffer about 10,000 collisions per second, about 20 pairs of protons collide each time. Since the speed is large, the energy of that collisions is enormous.</li><li>The collisions take place within special locations called <i>detectors</i>. We visited a control centre of one of them, <a href="http://en.wikipedia.org/wiki/ATLAS_experiment">ATLAS</a>. A detector has multiple layers, each able to register certain kind of particles, like photons.</li><li>Ten thousand collision per second would yield really big amount of data, so only few of them are selected to be logged. I don't know how they select those collisions, machine learning might be used. :)</li><li>All the collected data are spread into servers all over the world. An authorized researcher may log in to the grid network and execute her script to analyse the data.</li></ul></div><div style="text-align: justify;">... about Higgs boson:</div><div style="text-align: justify;"><ul><li><a href="http://en.wikipedia.org/wiki/Higgs_boson">Higgs boson</a> is a hypothetical particle, existence of which would prove the standard model of particle physics. </li><li>Higgs boson appears as a result of collision of two protons and large amount of energy. The protons in the LHC are accelerated enough to produce theoretically sufficient energy. </li><li>Higgs boson is very heavy and thus unstable. In theory, it decays into either four muons or two photons. So, if there will be two counter-directed light beams registered by the detector, this will be an evidence of the boson. See the picture below for example of likely detector output in case of the boson shows up.</li><li>Scientist say that if Higgs boson would not be detected, all the modern knowledge on particle physics will crush. They will be obliged to develop a new theory from scratch.</li><li>There are no published results that report on Higgs boson detection so far...</li></ul></div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://upload.wikimedia.org/wikipedia/commons/thumb/1/1c/CMS_Higgs-event.jpg/650px-CMS_Higgs-event.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 450px; height: 417px;" src="http://upload.wikimedia.org/wikipedia/commons/thumb/1/1c/CMS_Higgs-event.jpg/650px-CMS_Higgs-event.jpg" border="0" alt="" /></a><br /><div style="text-align: justify;">Don't you want to be a theoretical physicist now? =)</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com3tag:blogger.com,1999:blog-2293611840857369720.post-91686049943538590292011-01-25T09:04:00.000+03:002011-01-25T09:04:28.728+03:00Art + Multimedia<div style="text-align: justify;">In December we visited a lecture about interaction between arts and sciences, namely between (primarily) visual arts and (again, primarily) multimedia studies (<a href="http://theoryandpractice.ru/seminars/11878-multimedia-mezhdistsiplinarnoe-vzaimodeystvie-sovremennogo-iskusstva-i-nauki-19-12">Russian page</a>). The lecture was given by <a href="http://www.ablogina.com/scroll.php?lg=en">Asya Ablogina</a>, who turned out to be a nice girl. Although she lacked any technical background, she was doing well. Surprisingly, there is a lot of artwork that exploits technical support in a witty manner, which I was unaware of, so the lousy stuff exhibited in <a href="http://www.mmoma.ru/en/">the Moscow Museum of Modern Art</a> is not everything one can do in this field!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The most interesting part was Asya's exposition of some masterpieces. Most of them were presented during 2009<span class="Apple-style-span"> </span><span class="Apple-style-span" style="line-height: 18px; "><span class="Apple-style-span" style="font-size: medium; "><span class="Apple-style-span"><i>Science as Suspense</i> exhibit</span></span></span> in Moscow (<a href="http://www.winzavod.ru/scienceartfest/index.html">Russian</a>). <a href="http://www.fondation-langlois.org/html/e/page.php?NumPage=101">Nicolas Reeves</a> is a famous Canadian architect, also known for his work on modelling biological systems. He used the output of biologically-inspired computer algorithms (such as <a href="http://en.wikipedia.org/wiki/Genetic_algorithms">genetic algorithms</a>) to draw pictures. In Moscow he presented a project called <strike><i>Marching</i></strike><i> Floating Cubes</i>: massive but light cubes float in the air. Their movements are controlled by tiny fans, although any little gust of wind can affect the movement. Each cube is equipped with an on-board computer, which helps to avoid collisions. The implemented algorithms are simple but stochastic, hence the behaviour is unpredictable. They are said to move like animal creatures. Here is the video:</div><div style="text-align: justify;"><br /><iframe title="YouTube video player" class="youtube-player" type="text/html" width="480" height="390" src="http://www.youtube.com/embed/qR7dPW73nCk" frameborder="0" allowfullscreen=""></iframe><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">A similar project was developed by <a href="http://www.zprod.org/PG/home.htm">Paul Granjon</a>. He also tries to gift robots with animal behaviour. In the video below, robots are sexed, i.e. they are able to locate robots of the opposite sex and eventually end up in coitus. Another Paul's robot (the <i>Smartbot, </i><a href="http://www.winzavod.ru/lj/sienceart/DSCF9887.JPG">photo</a>) creeps over the restricted space and always grumbles (just like <a href="http://en.wikipedia.org/wiki/Marvin_the_Paranoid_Android">Marvin</a>!). I also enjoy the way Paul speaks:</div><br /><iframe title="YouTube video player" class="youtube-player" type="text/html" width="480" height="390" src="http://www.youtube.com/embed/a-DTcw8EQBU" frameborder="0" allowfullscreen=""></iframe><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The real idol in the sci-art community is <a href="http://en.wikipedia.org/wiki/Stelarc">Stelarc</a> (see also <a href="http://web.stelarc.org/">his homepage</a>, although it takes balls to get through the welcome page :). His talent is recognized by both scientists and artists (it is enough to mention he's an Honorary Professor of Arts and Robotics at <a href="http://cmu.edu/">Carnegie Mellon</a>). He's best known for his experiments on his own body. For example, during a performance he allowed to control his body remotely over the Internet by muscle stimulation. Probably his favourite project is <a href="http://web.stelarc.org/projects/prosthetichead/index.html"><i>Prosthetic Head</i></a>, which is in fact a 3D model of his own head. The head is learned from Stelarc's behaviour and speech, so it is able to communicate with people using colloquial information as well as non-verbal cues. It is interesting to try it in practice, but here is just a non-interactive demo video:</div><div style="text-align: justify;"><br /><iframe title="YouTube video player" class="youtube-player" type="text/html" width="480" height="390" src="http://www.youtube.com/embed/Nym8hfNI9Gg" frameborder="0" allowfullscreen=""></iframe><br /></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Stelarc is probably the only person on Earth who has three ears. In 2007 surgeons <a href="http://web.stelarc.org/projects/earonarm/index.html">implanted an ear into his arm</a>! It is not functioning now (just a piece of skin), but he plans to install an audio receiver into the ear and broadcast everything he hears over the Internet. I definitely recommend to look through <a href="http://web.stelarc.org/projects.html">his projects</a>, there are decent ones.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Asya also presented <a href="http://www.ablogina.com/scroll.php?lg=en">her own works</a>. She focus on different photographic techniques such as overexposure. Here are photos from her project <span style="font-style:italic;">Canvas of the Road</span> compiled in a video. There is a play of words in Russian: the single word for canvas and road surface. On these photos car traces look like painter's strokes:</div><br /><iframe title="YouTube video player" class="youtube-player" type="text/html" width="480" height="390" src="http://www.youtube.com/embed/-eTVsP_R-w4" frameborder="0" allowfullscreen=""></iframe><div><br /></div><div style="text-align: justify;">Another Asya's project is a short film series called <i>Habitat</i>. She showed only one episode with the title <i>Habitat: MSU Main Building</i>. The film was about a girl who is a PhD student in <a href="http://msu.ru/">Moscow State University</a> and described her place: a standard 8 sq.m. dorm room. The narrative was full of expressive epithets and metaphors about that awful room: the girl felt like in a cage, she also didn't like shared bathroom, etc. It's funny, because I live in a similar PhD student single room, and I'm quite happy with it, since I've been living 5 years in a shared room before. :)</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The second lecturer, Vladimir Vishnyakov, presented his project called <i>The Museum of Revived Photography</i>. He used the photos of the XIX century to create image-based animation. The idea is technically simple: first segment people from the background, then use <a href="http://en.wikipedia.org/wiki/Inpainting">inpainting</a> techniques to restore the background behind them, and animate the movements, but Vladimir referred to a programmer he works with as a real genius. In fact, all the stuff I post here varies from easy to moderately complex (except of Stelarc's projects), so you can do something similar too. The main problem is to come up with the concept. Come on then! ;)</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com1tag:blogger.com,1999:blog-2293611840857369720.post-26362480242752556032010-12-22T22:21:00.003+03:002010-12-22T23:00:29.049+03:00Comic strip: Nerd's confessionI continue sharing my <a href="http://computerblindness.blogspot.com/2010/08/comic-strip-10.html">Prague comic strips</a>, since I have not been posting for a while. In fact, I was busy writing a paper, so this one may seem somehow connected. <b>Disclaimer:</b> There's nothing personal in the strip. Kids under 18 are not allowed. :)<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://shapovalov.ro/images/comics/LaTeX.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 500px; " src="http://shapovalov.ro/images/comics/LaTeX_preview.png" border="0" title="In comparison to condoms, LaTeX is free, reusable, more than 98% effective, and not condemned by the church. Are you still in doubt?" /></a><div>The next post is going to be about multimedia technologies for art. Stay tuned!</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com10tag:blogger.com,1999:blog-2293611840857369720.post-60748386247296243522010-11-26T10:30:00.004+03:002010-11-26T10:53:38.712+03:00Happy Thaknsgiving!<p><a href="http://chaospet.com/2008/11/27/115-russells-turkey/">Here</a> is a comic strip by Ryan Lake that illustrates the concept of Russell's turkey. The idea is simply the induction does not always work the way you expect. There could be some factors, which did not have influence on the training data, yet changing the learned concept dramatically. We should thus keep in mind that machine learning (which is based on induction) is not the panacea.</p><p><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiupnmK2Rvmw5nhi31STdYTeicOqv8ItUObspj0ixnTqER3y_SPdOngEaVkeY5ElSi62XJ9-IhpBINLR2cqabpwuwQr2DALp6E4MXA36xqMpP-oh7uzU-jJquGfSQoayOG9PXtTuW91Wcyg/s1600/russels-turkey.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 343px; height: 400px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiupnmK2Rvmw5nhi31STdYTeicOqv8ItUObspj0ixnTqER3y_SPdOngEaVkeY5ElSi62XJ9-IhpBINLR2cqabpwuwQr2DALp6E4MXA36xqMpP-oh7uzU-jJquGfSQoayOG9PXtTuW91Wcyg/s400/russels-turkey.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5543759780032129906" /></a><br /></p><p><a href="http://en.wikipedia.org/wiki/Bertrand_Russell">Bertrand Russell</a> is well-known for his metaphors. Probably the most famous one is <a href="http://en.wikipedia.org/wiki/Russell's_teapot">Russell's teapot</a> that aims to show how pointless is to believe in god.</p><p>Happy Thanksgiving!</p><p>PS. I am reading <a href="http://en.wikipedia.org/wiki/Animal_Farm"><em>Animal Farm</em></a> by George Orwell now, it is a great animal metaphor too.</p>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com2tag:blogger.com,1999:blog-2293611840857369720.post-17671383124648657212010-11-10T21:13:00.002+03:002010-11-10T21:31:41.876+03:00Nice paper acknowledgement<p>It turns out that not only in Russia PhD students could be underpaid so that they cannot afford food:</p><p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVmyZ8nMWI1UXA4dG4MmM3D5b0_WTBleIyFMT5zjZ02BZM3Oqm7U3Cf_cYbqpbbCKJ-irTX4UdORwr09_PExnAjDBgoA6bcXeAm1trD0jwr0aaCX0nX3E_pwE4ShmT-cKqtL2GDzMZTQcZ/s1600/delong.PNG"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 213px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVmyZ8nMWI1UXA4dG4MmM3D5b0_WTBleIyFMT5zjZ02BZM3Oqm7U3Cf_cYbqpbbCKJ-irTX4UdORwr09_PExnAjDBgoA6bcXeAm1trD0jwr0aaCX0nX3E_pwE4ShmT-cKqtL2GDzMZTQcZ/s400/delong.PNG" border="0" alt="" id="BLOGGER_PHOTO_ID_5537986287777967922" /></a></p><p>This is from <a href="http://www.csd.uwo.ca/~olga/">Olga Veksler</a>'s paper "<a href="http://www.csd.uwo.ca/faculty/olga/Papers/cvprFinal07.pdf">Graph cut based optimization for MRFs with truncated convex priors</a>" from CVPR 2007. </p><p><a href="http://www.csd.uwo.ca/~adelong3/">Andrew</a> seems to be a nice person. I should definitely ask Anton Osokin to introduce us on ocasion. =)</p>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com3tag:blogger.com,1999:blog-2293611840857369720.post-32167762244352669942010-10-10T10:10:00.009+04:002011-03-14T01:01:35.327+03:00Papers, citations, co-authorship, and genealogy<div style="text-align: justify;">This summer I accidentally found out that there are two papers citing <a href="http://graphics.cs.msu.ru/sites/default/files/download/CMRT09_Barinova_et_al.pdf">our CMRT 2009 paper</a>. I was excited a bit about that since they were the first actual citations of me, so I even read those papers.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The first of them [<a href="http://www.itlab.unn.ru/archive/MSConf10/msconf-2010_book.pdf#page=60">Димашова, 2010</a>] was published in Russian by <a href="http://opencv.willowgarage.com/wiki/">OpenCV</a> developers from Nizhny Novgorod. They implemented cascade-based face detection algorithm that used either Local Binary Patterns (LBP) or Haar features. The algorithm was released within OpenCV 2.0. They cite our paper as an example of using Random Forest on the stages of the cascade. However, they implemented the classical variation with the cascade over AdaBoost.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Another citation [<a href="http://www.zemris.fer.hr/~ssegvic/pubs/sikiric10mipro.pdf">Sikiric et al., 2010</a>] is more relevant since it came from the road mapping community. They address the problem of recovering a road appearance mosaic from the orthogonal views to the surface. They contrast their approach with ours in the way that theirs do not employ human interaction. In fact, we need human input for recognizing road defects and lane marking, rectification is done in the previous stage, which is fully automatic.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The rest of the post is devoted to some interesting metrics and structures concerning citations, co-authorship and supervising students.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><span class="Apple-style-span" style="font-size: large;">Citations</span></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The number of citations is a weak measure of paper quality. We can also go further and estimate the impact of a journal or a particular researcher based on the number of citations. A generally accepted measure is the journal <a href="http://en.wikipedia.org/wiki/Impact_factor">impact factor</a>, which is simply the mean number of citations by paper published in the journal for some period in time. Individual researchers could be evaluated by the impact factor of the journals they published in, though it is considered as a bad practice. Another not-so-bad practise is <a href="http://en.wikipedia.org/wiki/H-index">h-index</a>. By definition, one's h-index equals <i>H</i> if there are at least <i>H</i> citations of his top <i>H</i> papers<sup><a href="http://www.blogger.com/post-edit.g?blogID=2293611840857369720&postID=3216776224435266994#supremum">1</a></sup>. It has also been <a href="http://en.wikipedia.org/wiki/H-index#Criticism">criticised</a> widely.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So, <a href="http://blogs.nature.com/kausikdatta/2010/09/20/impact-of-impact-factors">citation-based scoring has a lot of flaws</a>. But what can we use instead? Another interesting approach is introduced by <a href="http://readermeter.org/">ReaderMeter</a>. They collect the information about the number of people who have added some paper to their Mendeley collections and compute something like h-index. Unfortunately, they recently excluded some papers from their database, so the statistics became less representative but more accurate. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-size: large;"><b>Co-authorship and Erdős number</b></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><a href="http://en.wikipedia.org/wiki/Paul_Erd%C5%91s">Paul Erdős</a> was a Hungarian mathematician who published about 14 hundred research papers with 511 different co-authors. That's why he has a special role in bibliometrics. <a href="http://en.wikipedia.org/wiki/Erd%C5%91s_number">Erdős number</a> is defined as a collaborative distance from a researcher to Erdős. More strictly, Wikipedia defines:</div><div style="text-align: justify;"><blockquote>Paul Erdős is the one person having an Erdős number of zero. For any author other than Erdős, if the lowest Erdős number of all of his coauthors is <i>k</i>, then the author's Erdős number is <i>k</i> + 1.</blockquote></div><div style="text-align: justify;">My Erdős number is at most 7 (via Olga Barinova, Pushmeet Kohli, Philip H.S. Torr, Bernhard Schölkopf, John Shawe Taylor, David Godsil). To be honest, Erdős number system is primarily used for math papers. Even if we use the wider definition, it is not that beautiful, because there are not too many gates, e.g. all the vision community will probably connect to Erdős via the machine learning gate. Thus, the researchers who work on the edge will be closer. To illustrate this fact, David Marr probably had not any finite Erdős number during his lifetime (now he definitely has). So, we can introduce, say, Zisserman number for computer vision.<sup><a href="http://www.blogger.com/post-edit.g?blogID=2293611840857369720&postID=3216776224435266994#Zisserman">2</a></sup> <a href="http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/z/Zisserman:Andrew.html">According to DBLP</a>, Andrew Zisserman has 165 direct co-authors so far. Now, my Zisserman number is 3, David Marr's is 4 (via Tomaso Poggio, Lior Wolf, Yonatan Wexler).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The movie industry have their own metrics, which is the <a href="http://en.wikipedia.org/wiki/Bacon_number#Bacon_numbers">Bacon number</a> (after <a href="http://en.wikipedia.org/wiki/Kevin_Bacon">Kevin Bacon</a>). The distance is established 1 if two actors have appeared in a single movie. Someone tried to combine those two to the <a href="http://en.wikipedia.org/wiki/Erd%C5%91s-Bacon_number">Erdős-Bacon number</a>. For some person, it is just a sum of her Erdős and Bacon numbers. Of course, very few people have a finite Erdős-Bacon number, since one should both appear in a movie and publish a research paper (and it is still not sufficient). Often they are possessed by researchers who consulted the filming crew and accidentally were filmed. =) Erdős himself has this number 3 or 4 (depending on the details of definition), since he starred in <a href="http://en.wikipedia.org/wiki/N_is_a_Number">N is a Number (1993)</a> and his Bacon number is thus 3 0r 4. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The person with the probably lowest Erdős-Bacon number is <a href="http://en.wikipedia.org/wiki/Daniel_Kleitman">Daniel Kleitman</a>, an MIT mathematician, who has appeared in one of my favourite movies <a href="http://en.wikipedia.org/wiki/Good_Will_Hunting">Good Will Hunting (1997)</a> along with Minnie Driver, who collaborated with Bacon in <a href="http://en.wikipedia.org/wiki/Sleepers_(film)">Sleepers (1996)</a>. Since Kleitman has 6 joint papers with Erdős, his Erdős-Bacon number is 3.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Surprisingly, Marvin Minsky has his Erdős number (4) greater than his Bacon number (2), which he obtained via <a href="http://www.imdb.com/title/tt0190708/">The Revenge of the Dead Indians (1993)</a> and <a href="http://en.wikipedia.org/wiki/Yoko_Ono">Yoko Ono</a>. Another strange example is <strike>the paedophile's dream</strike> <a href="http://en.wikipedia.org/wiki/Natalie_Portman">Natalie Portman</a>. She has graduated from Harvard, saying she would rather "be smart than a movie star". That's my kind of a girl! A neuroscience paper [<a href="http://www.nmr.mgh.harvard.edu/DOT/PDF/Baird_NeuroImage_16_1120_2002.pdf">Baird et al., 2002</a>] brought Natalie Hershlag (her real name) Erdős number of 5, and then she appeared in <a href="http://www.imdb.com/title/tt0808399/">New York, I Love You (2009)</a> along with Kevin Bacon, so she reached the same Erdős-Bacon number as Minsky, i.e. 6.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">UPD (Mart 13, 2011). There is a totally relevant <a href="http://xkcd.com/599/">xkcd strip</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-size: large;"><b>Scientific genealogy</b></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Finally, I tell about what is known as scientific genealogy. Every grown-up researcher has a PhD advisor, usually one, while an advisor can have a lot of students. Let's just use the analogy with parents and children. We get a tree (or a forest) representing the historical structure of science. <a href="http://www.genealogy.ams.org/">The mathematics genealogy project</a> aims to recover this structure.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I tried to track my genealogy back. Unfortunately, I failed to find who was <a href="http://www.keldysh.ru/pages/cgraph/people/bayakovski.htm">Yury M. Bayakovsky</a>'s PhD advisor. But if consider <a href="http://graphics.cs.msu.ru/en/people/staff/obarinova">Olga Barinova</a> my advisor, I am the 11th generation descendant of <a href="http://en.wikipedia.org/wiki/Gauss">Carl Friedrich Gauß</a>. Nice ancestry, huh?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><a href="http://www.visiongenealogy.summerhost.info/">The similar project exists for computer vision</a>. There are 290 people in the base, though there are duplicates (I've found five <a href="http://www.vision.ee.ethz.ch/~vferrari/">Vitorio Ferrari</a>'s :). It seems strange that some trees have depth as big as 5 (e.g. <a href="http://userweb.cs.utexas.edu/~grauman/">Kristen Graumann</a> and <a href="http://www.lsi.upc.edu/~aquattoni/">Adriana Quattoni</a> are the 5th generation after David Marr), though vision is a relatively young field.</div><br /><sup><a name="supremum">1</a></sup> Okay, take the supremum of the set if you want to stay formal<br /><sup><a name="Zisserman">2</a></sup> It seems that Philip Torr <a href="http://cms.brookes.ac.uk/staff/PhilipTorr/">has already used</a> that number.Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com1tag:blogger.com,1999:blog-2293611840857369720.post-23723723262663817002010-09-16T16:39:00.002+04:002010-09-16T17:01:21.550+04:00LidarK 2.0 released<p align="justify">The second major release of GML LidarK is now available. It reflects our 3-year experience on 3D data processing. The description from <a href="http://graphics.cs.msu.ru/en/science/research/3dpoint/lidark">the project page</a>:</p><p align="justify"><cite>The LidarK library provides an open-source framework for processing multidimensional point data such as 3D LIDAR scans. It allows building a spatial index for performing fast search queries of different kinds. Although it is intended to be used for LIDAR scans, it can be helpful for a wide range of problems that require spatial data processing.</cite></p><p align="justify">The API has been enriched with various features in this release. Indeed, it became more consistent and logical. New ways to access data (i.e. various iterators) are implemented. One can now find k nearest neighbours for any point, not just for one that belongs to the index. Since the data structure is a container, we've decided to parametrize it with template parameter. This decision is controversive: one does not need to cast tuples any more, but the code became clumsier and less portable, unfortunately.</p><p align="justify">The C++/MATLAB code is licensed free of charge for academic use. You can download it <a href="http://graphics.cs.msu.ru/en/science/research/3dpoint/lidark">here</a>.</p>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com0tag:blogger.com,1999:blog-2293611840857369720.post-39804658891315852862010-09-11T22:51:00.002+04:002010-10-10T22:06:43.078+04:00ECCV 2010 highlights<div style="text-align: justify;">Today is the last day of <a href="http://www.ics.forth.gr/eccv2010/intro.php">ECCV 2010</a>, so the best papers are already announced. Although I was not there, a lot of papers are available via <a href="http://cvpapers.com/eccv2010.html">cvpapers</a>, so I eventually run into some of them.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><span class="Apple-style-span" style="font-size:large;">Best papers</span></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The full list of the conference awards is available <a href="http://www.ics.forth.gr/eccv2010/awards.php">here</a>. The best paper is therefore "<a href="http://research.microsoft.com/en-us/um/people/pkohli/papers/lrkt_eccv2010.pdf">Graph Cut based Inference with Co-occurrence Statistics</a>" by <a href="http://sots.brookes.ac.uk/lubor/">Lubor Ladicky</a>, Chris Russell, <a href="http://research.microsoft.com/en-us/um/people/pkohli/">Pushmeet Kohli</a>, and <a href="http://cms.brookes.ac.uk/staff/PhilipTorr/">Philip Torr</a>. The second best is "<a href="http://www.cs.cmu.edu/~abhinavg/blocksworld/blocksworld.pdf">Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics</a>" by <a href="http://www.cs.cmu.edu/~abhinavg/Home.html">Abhinav Gupta</a>, <a href="http://www.cs.cmu.edu/~efros/">Alyosha Efros</a>, and <a href="http://www.cs.cmu.edu/~hebert/">Martial Hebert</a>. Two thoughts before starting reviewing them. First, the papers come from the two institutions which are (arguably) considered now the leading ones in the vision community: <a href="http://research.microsoft.com/en-us/">Microsoft Research</a> and <a href="http://www.ri.cmu.edu/">Carnegie-Mellon Robotics Institute</a>. Second, both papers are about semantic segmentation (although the latter couples it with implicit geometry reconstruction); Vidit Jain already <a href="http://vimsu99.blogspot.com/2010/06/acceptance-bias-at-computer-vision.html">noted the acceptance bias</a> in favour of the recognition papers.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Okay, the papers now. Ladicky et al. addressed the problem of global terms in the energy minimization for semantic segmentation. Specifically, their global term deals only with <i>occurrences</i> of object classes and invariant to how many connected components (i.e. objects) or individual pixels represent the class. Therefore, one cow on an image gives the same contribution to the global term as two cows, as well as one accidental pixel of a cow. The global term penalizes big quantity of different categories in the single image (<a href="http://en.wikipedia.org/wiki/Minimum_description_length">the MDL prior</a>), which is helpful when we are given a large set of possible class labels, and also penalizes the co-occurrence of the classes that are unlikely to come along with each other, like cows and sheep. Statistics is collected from the train set and defines if the co-occurrence of the certain pair of classes should be encouraged or penalized. Although the idea of incorporating co-occurrence to the energy function is not new [<a href="http://www.csail.mit.edu/~murphyk/Papers/AImemo03.pdf">Torralba et al, 2003</a>; <a href="http://vision.cs.ucla.edu/papers/rabinovichVGWB07.pdf">Rabinovich et al., 2007</a>], the authors claim that their method is the first one which simultaneously satisfy the four conditions: <i>global energy minimization</i> (implicit global term rather than a multi-stage heuristic process), <i>invariance to the structure of classes</i> (see above), <i>efficiency </i>(not to make the model order of magnitude larger) and <i>parsimony </i>(MDL prior, see above).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">How do the authors minimize the energy? They restrict the global term to the function of the set of classes represented in the image, which is monotonic w.r.t. argument set enclosing (more classes, more penalty). Then the authors introduce some fake nodes for αβ-swap or α-expansion procedure, so that the optimized energy remains submodular. It is really similar to how they applied graph-cut based techniques for minimizing energy with higher-order cliques [<a href="http://www.springerlink.com/index/U44182U53Q41078Q.pdf">Kohli, Ladicky and Torr, 2009</a>]. So when you face some non-local terms in the energy, you can try something similar. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What are the shortcomings of the method? It would be great to penalize <i>objects </i>in addition to classes. First, local interactions are taken into account, as well as global ones. But what about the medium level? Shape of an object, size, colour consistency etc. are the great cues. Second, on the global level only inter-class co-occurrences play a role, but what about the intra-class ones? It is impossible to have two suns in a photo, but it is likely to meet several pedestrians walking along the street. It is actually done by <a href="http://www.cse.wustl.edu/~mgeorg/readPapers/byVenue/iccv2009/desai2009_iccv_discriminativeModelsMulticlassObjectLayout.pdf">Desai et al. [2009]</a> for object detection.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The second paper is by Gupta et al., who have remembered the romantic period of computer vision, when the scenes composed of perfect geometrical shapes were reconstructed successfully. They address the problem of 3D reconstruction by a single image, like in <a href="http://www.cs.uiuc.edu/homes/dhoiem/projects/popup/index.html">auto pop-up</a> [<a href="http://www.cs.uiuc.edu/homes/dhoiem/publications/popup.pdf">Hoiem, Efros and Hebert, 2005</a>]. They compare the result of auto pop-up with <a href="http://en.wikipedia.org/wiki/Potemkin_village">Potemkin villages</a>: "there is nothing behind the pretty façade." (I believe this comparison is the contribution of <a href="http://www.cs.cmu.edu/~efros/">the second author</a>). Instead of surfaces, they fit boxes into the image, which allows them to put a wider range of constraints to the 3D structure, including: </div><div style="text-align: justify;"><ul><li><i>static equilibrium</i>: it seems that the only property they check here is that centroid is projected into the figure bearing;</li><li><i>enough support force</i>: they estimate density (light -- vegetation, medium -- human, heavy -- buildings) and say that it is unlikely that building is build on the tree;</li><li><i>volume constraint</i>: boxes cannot intersect;</li><li><i>depth ordering</i>: backprojecting the result to the image plane should correspond to what we see on the image.</li></ul></div><div style="text-align: justify;">This is a great paper that exploits Newtonian mechanics as well as human intuition, however, there are still some heuristics (like the density of a human) which could probably be generalized out. It seems that this approach has a big potential, so it might became the seminal paper for the new direction. Composing recognition with geometry reconstruction is quite trendy now, and this method is ideologically simple but effective. There are a lot of examples how the algorithm works on <a href="http://www.cs.cmu.edu/~abhinavg/blocksworld/">the project page</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-size:large;"><b>Funny papers</b></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There are a couple of ECCV papers which have fancy titles. The first one is "<a href="http://grail.cs.washington.edu/malkovich/eccv10paper.pdf">Being John Malkovich</a>" by <a href="http://www.cs.washington.edu/homes/kemelmi/">Ira Kemelmacher-Shlizerman</a>, <a href="http://www.cs.washington.edu/homes/aditya/">Aditya Sankar</a>, <a href="http://www.adobe.com/technology/people/seattle/shechtman.html">Eli Shechtman</a>, and <a href="http://www.cs.washington.edu/homes/seitz/">Steve Seitz</a> from <a href="http://grail.cs.washington.edu/">the University of Washington GRAIL</a>. If you've seen <a href="http://www.imdb.com/title/tt0120601/">the movie</a>, you can guess what is the article about. Given the video of someone pulling faces, the algorithm transforms it to the video of <a href="http://en.wikipedia.org/wiki/John_Malkovich">John Malkovich</a> making similar faces. "Ever wanted to be someone else? Now you can." In contrast to the movie, in the paper not necessarily John Malkovich plays himself: it could be George Bush, Cameron Diaz, John Clooney and even any person for whom you can find a sufficient video or photo database! You can see the video of the real-time puppetry on <a href="http://grail.cs.washington.edu/malkovich/">the project page</a>, although obvious lags take place and the result is still far from being perfect.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Another fancy title is "<a href="http://cs.unc.edu/~jmf/publications/Frahm_et_al_ReconstructionFromPhotoCollection.pdf">Building Rome on a Cloudless Day</a>". There are 11 (eleven) authors contributing to the paper, including <a href="http://www.cs.unc.edu/~marc/">Marc Pollefeys</a>. This summer I spent one cloudless day in Rome, and, to be honest, it was not that pleasant. So, why is the paper called this way then? The paper refers to another one: "<a href="http://grail.cs.washington.edu/rome/rome_paper.pdf">Building Rome in a Day</a>" from ICCV 2009 by the guys from Washington again, which itself refers to the proverb "<a href="http://en.wiktionary.org/wiki/Rome_wasn't_built_in_a_day">Rome was not built in a day</a>." In this paper authors build a dense 3D model of some Rome sights using a set of Flickr photos tagged "Rome" or "Roma". Returning back to the monument of collective intelligence from ECCV2010, they did the same, but without cloud computing, that's why the day is cloudless now. S.P.Q.R.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I cannot avoid to mention here the following papers, although they are not from ECCV. Probably the most popular CVPR 2010 paper is "<a href="http://www.cs.cmu.edu/~rahuls/pub/cvpr2010-food-rahuls.pdf">Food Recognition Using Statistics of Pairwise Local Features</a>" by <a href="http://www.cs.washington.edu/homes/yang/">Shulin Yang</a>, <a href="http://www.cs.cmu.edu/~meichen/">Mei Chen</a>, <a href="http://www.ri.cmu.edu/person.html?person_id=241">Dean Pomerleau</a>, <a href="http://www.cs.cmu.edu/~rahuls/">Rahul Sukthankar</a>. The first page of the paper contains the motivation picture with a hamburger, and it looks pretty funny. They insist that the stuff from McDonald's is very different from that from Burger King, and it is really important to recognize them to keep track of the calories. Well, the authors don't look overweight, so the method should work.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The last paper in this section is "<a href="http://vision.ucsd.edu/sites/default/files/gestalt.pdf">Paper Gestalt</a>" by the imaginary Carven von Bearnensquash, published in Secret Proceedings of Computer Vision and Pattern Recognition (CVPR), 2010. The authors (presumably from UCSD) make fun of the way we usually write computer vision papers assuming that some features might convince a reviewer to accept or reject the paper, like mathematical formulas that create an illusion of author qualification (although if they are irrelevant), ROC curves etc. It also derides the attempts to apply black-box machine-learning techniques without the appropriate analysis of the possible features. Now I am trying to subscribe to the Journal of Machine Learning Gossip.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-size: large;"><b>Colleagues</b></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">There was only one paper from our lab at the conference: "<a href="http://research.microsoft.com/en-us/um/people/pkohli/papers/bltk_eccv2010.pdf">Geometric Image Parsing in Man-Made Environments</a>" by <a href="http://graphics.cs.msu.ru/en/people/staff/obarinova">Olga Barinova</a> and Elena Tretiak (in co-authorship with <a href="http://www.robots.ox.ac.uk/~vilem/">Victor Lempitsky</a> and Pushmeet Kohli). The scheme similar to the image parsing framework [<a href="http://www.loni.ucla.edu/~ztu/publication/IJCV_parsing.pdf">Tu et al., 2005</a>] is utilized, i.e. top-down analysis is performed. They detect parallel lines (like edges of buildings and windows), their vanishing points and the zenith jointly, using a witty graphical model. The approach is claimed to be robust to the clutter in the edge map.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Indeed, this paper could not have been possible without me. =) It was me who convinced Lena to join the lab two years ago (actually, it was more like convincing her not to apply for <a href="http://lvk.cs.msu.su/">the other lab</a>). So, the lab will remember me at least as a decent selectioner/scout...</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com1tag:blogger.com,1999:blog-2293611840857369720.post-85407492752120972672010-08-27T22:49:00.006+04:002010-08-28T00:14:43.853+04:00Comic strip: 10%<div>Last summer, during my internship at <a href="http://cmp.felk.cvut.cz/">CMP</a> in Prague I drew some <a href="http://xkcd.com/">xkcd</a>-style comic strips, and I want to share some of them here. I considered starting a different blog for that, but there are too few of them, and they do suck (at least in comparison to Randall's work). However, most of them are programming/research related, so they are appropriate here.</div><div><br /></div><div>I planned to use them when I would have nothing to post about. This is not the case now (I am planning a couple more posts in a week or so), but this one is topical. It was originally drawn in September 1, 2009 when <a href="http://gmailblog.blogspot.com/2009/09/todays-gmail-problems.html">GMail was not operating</a> for a few hours. From the recent news: <a href="http://googleblog.blogspot.com/2010/08/update-on-google-wave.html">Google shuts Google Wave down</a>, admitting their failure predicted by some sceptics (see the image title text).</div><div><br /></div><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://shapovalov.ro/images/comics/GMail.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 500px; " title="Those employees who were working on Google Wave as their main project used to waste up to 80% of their working time" src="http://shapovalov.ro/images/comics/GMail_preview.png" border="0" /></a>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com2tag:blogger.com,1999:blog-2293611840857369720.post-70792922728143697042010-08-19T01:49:00.008+04:002010-08-21T15:18:14.674+04:00Theorem 8.10. P ≠ NP.<div style="text-align: justify;">Nearly two weeks ago Indian-American mathematician <a href="http://www.hpl.hp.com/personal/Vinay_Deolalikar/">Vinay Deolalikar</a>, the employee of HP Labs, sent a letter to a group of mathematicians asking them to proof-read his attempt to prove <a href="http://en.wikipedia.org/wiki/P_versus_NP_problem">P ≠ NP</a>. The pre-print was eventually distributed over the Internet, so a lot of people became aware. Later two major flaws were found in the proof, so it is now considered wrong. Nevertheless, Vinay has removed the paper from his page and commented that he fixed all the issues and is going to submit a paper to the journal review. You can download the original paper from my web-site [<a href="http://shapovalov.ro/botva/pnp12pt.pdf">Deolalikar, 2010</a>]. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">We'll see if he is bluffing or not. In any case, it is very pleasant to me that the main role in the proof is played by... the graphical probabilistic model framework. Yeah, those graphical models we use in computer vision! It was surprising indeed, since the problems in computational complexity are usually discrete and well-posed. So, this post is devoted to understanding why a graphical model emerges in the proof.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">First, I am going to review some key points of the proof. It is far from being exhaustive, just some rough sketches that are critical for understanding. Then I report on the flaws found in the proof, and then I end up with a methodological note on the whole P vs. NP problem. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-size:large;"><b>k-SAT and d1RSB Phase</b></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In order to prove P ≠ NP, it is enough to show the absence of a polynomial algorithm for some problem from <a href="http://en.wikipedia.org/wiki/NP_(complexity)">the class NP</a>, e.g. for some <a href="http://en.wikipedia.org/wiki/NP-complete">NP-complete</a> problem (which are the "hardest" problems in NP). <a href="http://en.wikipedia.org/wiki/Boolean_satisfiability_problem">The boolean satisfiability problem</a> (SAT) addresses the issue of checking if there is at least one assignment of the boolean variables in the formula, represented in conjunctive normal form (CNF), which makes it TRUE. The k-satisfiability problem (k-SAT) is a particular case of SAT, where all the clauses in the CNF have order <i>k</i>. For example, (<i>x</i>˅<i>y</i>)&(¬<i>x</i>˅<i>y</i>)&(<i>x</i>˅¬<i>y</i>)&(¬<i>x</i>˅¬<i>y</i>) is an instance of 2-SAT which has the answer 'NO', since any boolean values of (<i>x, y</i>) makes the formula false. (<i>x</i>˅<i>y</i>˅<i>z</i>)&(¬<i>x</i>˅<i>u</i>˅<i>v</i>)&(¬<i>x</i>˅<i>u</i>˅¬<i>z</i>) is a 3-SAT over 5 variables (x, y, z, u, v), which is satisfiable, e.g. on the input (1,0,0,1,0). 2-SAT is in P, but k-SAT is NP-complete whenever <i>k</i> > 2. The Deolalikar's idea is to show that k-SAT (with <i>k</i> > 8) is outside of P, which (if true) proves that there is a separation between P and NP.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The space of possible k-SAT solutions is analysed. Let <i>m</i> be the number of clauses in the CNF, <i>n</i> be the number of variables. Consider the ratio <i>α</i> = <i>m</i>/<i>n</i>. In the border case <i>m</i> = 1 (<i>α </i>is small), k-SAT is always true, there are usually a lot of inputs that satisfy the CNF. Consider an ensemble of random k-SATs (the following is statistical reasoning). With the growth of <i>α</i>, the probability of the CNF to be satisfiable decreases. When a certain threshold <i>α<sub>d</sub> </i>is reached, the "true" solutions set breaks into clusters. This stage is known in statistical mechanics as <i>dynamical one step replica symmetry breaking</i> (d1RSB) phase. Moving further, we make the problem completely unsatisfiable. It turns out that d1RSB stage subproblems are the most challenging of the all k-SAT problems. The effect could be observed only if <i>k</i> > 8, that's why such problems are used in the proof. [<a href="http://shapovalov.ro/botva/pnp12pt.pdf">Deolalikar, 2010</a>, Chapter 6]</div><div style="text-align: justify;"><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2zkYAsZVgHZfaRucRwAA8ziIHpCaXA18l1mxwhYIKA_Isd6Wgtvp70P5TFkEd3TAOB4pAxtGATw1gd_uTdKFlQY6jMi5uzuJna6QQK6vaowjpDnS8xHdQDvxuER6tz19VZbiKtS0-oZdp/s1600/d1RSB.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 144px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2zkYAsZVgHZfaRucRwAA8ziIHpCaXA18l1mxwhYIKA_Isd6Wgtvp70P5TFkEd3TAOB4pAxtGATw1gd_uTdKFlQY6jMi5uzuJna6QQK6vaowjpDnS8xHdQDvxuER6tz19VZbiKtS0-oZdp/s400/d1RSB.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5506898457490973010" /></a><br /></div><div style="text-align: justify;"><b><span class="Apple-style-span" style="font-size:large;">FO(PFP) and FO(LFP)</span></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the <a href="http://en.wikipedia.org/wiki/Finite_model_theory">finite model theory</a> there are recurrent extensions of the first-order logic. The predicate P<sub><i>i</i></sub>(<i>x</i>) is evaluated as some second-order function <i>φ</i>(P<sub><i>i-1</i></sub><i>, x</i>), where x is a <i>k</i>-element vector, P<sub><i>0</i></sub>(<i>x</i>) ≡ false. In the general case, either there is a fixed point, or is P<sub><i>i </i></sub>looping. For example, if <i>φ</i>(P<i>, x</i>) ≡ ¬P(<i>x</i>), then P<sub><i>0</i></sub>(<i>x</i>) ≡ false, P<sub><i>1</i></sub>(<i>x</i>) ≡ true, P<sub><i>2</i></sub>(<i>x</i>) ≡ false, etc. Here <i>x</i> is meaningless, but it is not always the case. Consider the following definition: <i>φ</i>(P<i>, x</i>) ≡ max(<i>x</i>) ˅ P(<i>x</i>) // recall <i>x</i> is a vector, in the boolean case 'max' is the same as 'or'. If <i>x</i> contains at least one non-zero element, P<sub><i>0</i></sub>(<i>x</i>) = false, P<sub><i>1</i></sub>(<i>x</i>) = true, P<sub><i>2</i></sub>(<i>x</i>) = true, etc. Otherwise, P<sub><i>i</i></sub>(<b>0</b>) = false for all <i>i</i>. In the case of looping, let's redefine the fixed point to be constantly false. <a href="http://en.wikipedia.org/wiki/FO_(complexity)#Partial_fixed_point_is_PSPACE"><i>FO(PFP)</i></a> is a class of problems of checking if there will be a loop for some <i>x</i>, or a fixed point. They are actually very difficult problems. FO(PFP) is equal to the whole <a href="http://en.wikipedia.org/wiki/PSPACE">PSPACE</a>. Suppose now that <i>φ</i> is monotonic on P. It means that P(<i>x</i>) only appears in the formula with even number of negations before it (or zero, as in the second example). This means that once P<sub><i>i</i></sub>(<i>x</i>) is true, P<sub><i>j</i></sub>(<i>x</i>) will be true for any <i>j</i> > <i>i</i>. This reduces the class to <a href="http://en.wikipedia.org/wiki/FO_(complexity)#Least_Fixed_Point_is_PTIME"><i>FO(LFP)</i></a>, which is proven to be equal to the class <a href="http://en.wikipedia.org/wiki/P_(complexity)">P</a>. So, the global problem now is to show that k-SAT is outside of the class FO(LFP).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><b><span class="Apple-style-span" style="font-size:large;">ENSP: the graphical model for proving P ≠ NP</span></b></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">So how a graphical model emerges here? Graphical model is a way to describe a multivariate probability distribution, i.e. dependencies of covariates in the distribution. Recall now <a href="http://en.wikipedia.org/wiki/NP_(complexity)#Verifier-based_definition">the definition of NP</a>. If we somehow guess the certificate, we are done (i.e. have a polynomial algorithm). If the space of certificates (possible solutions, in terms of Deolalikar) is simple, we can probably get a polynomial algorithm. What is simple? This means that the distribution is parametrizable with a small amount of parameters (c.f. <a href="http://en.wikipedia.org/wiki/Intrinsic_dimension">intrinsic dimensionality</a>), which allows us to traverse the solution space efficiently. This is twofold. First, the distribution is simple if it has a limited support, i.e. all the probability measure is distributed among a limited number of points of the probabilistic space. Second, it is simple if the covariates are as much conditionally independent as possible. In the ideal case, if all the groups of covariates are independent (recall that <a href="http://www.morris.umn.edu/~sungurea/introstat/stat2611/independence.html">pairwise and mutual independence do not subsume each other!</a>), we need only <i>n</i> parameters to describe the distribution (<i>n</i> is the number of covariates), while in general case this number is exponential. See [<a href="http://shapovalov.ro/botva/pnp12pt.pdf">Deolalikar, 2010</a>, p. 6] for examples.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">How to measure the degree of conditional independence? Yes, to build a graphical model. It is factorized to cliques according to <a href="http://computerblindness.blogspot.com/2010/01/on-mrf-factorization.html">Hammersley-Clifford theorem</a>. Larger cliques you get, stronger dependency is. When the largest cliques are k-interactions, the distribution can be parametrized with <i>n</i>2<sup><i>k</i></sup> independent parameters. Finally, Vinay shows that FO(LFP) can deal only with the distributions parametrizable with 2<sup>poly(log <i>n</i>)</sup> parameters, which is not the case for d1RSB k-SAT (its solution space is too complicated). </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In order to show it strictly, Deolalikar introduces a tailored graphical model describing LO(LPF) iterative process, the E<i>lement-Neighborhood-Stage Product</i> model (ENSP):</div><div style="text-align: justify;"><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLyRk_YaHfh27f8iSPHXhtBdyTBJ9ATQMykl9rpLJPKQLVtZ2HstpN9WDPLt6r5YqPOUcYgO27Qh2LxxOYXdx26K0vyTYUh2wQg4IycKGk8nI4rm1P25YXjSxYGKOmrFOLbEY8lS8l-r0E/s1600/ENSP.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 300px; height: 400px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLyRk_YaHfh27f8iSPHXhtBdyTBJ9ATQMykl9rpLJPKQLVtZ2HstpN9WDPLt6r5YqPOUcYgO27Qh2LxxOYXdx26K0vyTYUh2wQg4IycKGk8nI4rm1P25YXjSxYGKOmrFOLbEY8lS8l-r0E/s400/ENSP.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5507040396001248802" /></a>There are two types of sites: corresponding to the variables on the stages of LFP (small circles) and corresponding to the elements of witty neighbourhood system (some closure over <a href="http://en.wikipedia.org/wiki/Gaifman_graph">Gaifman graph</a>; big blue circles). When a variable is assigned 0 or 1, it is painted with red or green respectively. The last column corresponds to the fixed point, all the variables are assigned. Thus, this model is easily factorizable to the small cliques, and it cannot describe the imaginable LFP process for some k-SAT. See [<a href="http://shapovalov.ro/botva/pnp12pt.pdf">Deolalikar, 2010</a>, Chapter 8] for the details.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-size:large;"><b>So what's the problem?</b></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><a href="http://www.cs.umass.edu/~immerman/">Neil Immerman</a>, an expert in the finite model theory, noticed two major flaws. The first one is that Vinay actually model only monadic LFP, which is not equal to P, as assumed. He thus proved that there is a gap between NP and some subclass of P, which is not obligatory equals P. The second major flow deals with modelling k-SAT as a logical structure. I do not fully understand this step, but some order-invariance assumptions are wrong. Here is a discussion in <a href="http://rjlipton.wordpress.com/2010/08/15/the-p%E2%89%A0np-proof-is-one-week-old/">Lipton's blog</a>. According to it, the flaws are really fatal.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-size:large;"><b>The third way</b></span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It seems that it is really difficult to prove P ≠ NP. The most of scientists think that P = NP is improbable, and that's why the hypothesis P ≠ NP is generally adopted now. But there is one more option: neither P = NP nor P ≠ NP is provable. As every mathematical theory, computational complexity is a set of axioms and statements, which are usually could be proved to be correct or not. But there are sometimes some statements formulated in terms of the theory, which could not. Moreover, according to <a href="http://en.wikipedia.org/wiki/Goedel's_theorem#First_incompleteness_theorem">the first Gödel's incompleteness theorem</a>, in any mathematical theory based on our natural understanding of natural numbers, there is either a statement that could be proved both true or false, or an unprovable one. I know no actual reasons why this could not be the case for P ≠ NP (although I have feelings there are some since it is not usually taken into account). </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Suppose it is really so. This would mean that the whole theory should be reconsidered. Some axioms or definitions will change to make this fact provable. But may be anything else will become unprovable. Who knows...</div><div style="text-align: justify;"><i></i></div><div style="text-align: justify;"><i></i></div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com3tag:blogger.com,1999:blog-2293611840857369720.post-89065014239183361642010-08-15T22:38:00.002+04:002010-08-16T09:35:32.513+04:00Image Parsing: Unifying Segmentation, Detection, and Recognition<div style="text-align: justify;"><a href="http://computerblindness.blogspot.com/2010/06/object-detection-vs-semantic.html">Recently I have posted</a> about the connection between object category detection and semantic image segmentation. While delving into this problem more deeply, I have found the paper denoted in the title of this post.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The ICCV 2003 paper by Tu et al. was among the three <a href="http://en.wikipedia.org/wiki/Marr_Prize#9th_ICCV.2C_2003.2C_Nice.2C_France">Marr prize</a> winners. And it is really a prominent piece of work! In <a href="http://resources.metapress.com/pdf-preview.axd?code=t5r052375jr10776&size=largest">the introduction</a> to <a href="http://www.springerlink.com/content/0920-5691/63/2/">the special Marr prize IJCV issue</a>, <a href="http://ljk.imag.fr/membres/Bill.Triggs/////////">Bill Triggs</a> featured the paper as one that "would have been particularly pleasing to <a href="http://en.wikipedia.org/wiki/David_Marr_(neuroscientist)">David Marr</a>", because of "its bold attack on the central problem of perceptual organization and whole scene understanding". <a href="http://www.loni.ucla.edu/~ztu/publication/IJCV_parsing.pdf">Here</a> is the journal version of the paper.</div><div style="text-align: justify;"><br /></div><div><div style="text-align: justify;">The authors combined discriminative and generative models, which resulted to the unified framework for image parsing. For each image the corresponding parsing tree could be found. The root of the tree represents the whole scene. On the next level, nodes represent semantic fragments, such as human faces, text, or textured regions. The leaves of the tree correspond to the pixels of the image. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The approach differs drastically from the vanilla CRF framework in the way that the structure of the parsing tree is dynamic while the CRF structure remains constant and just interaction models between the pairs of sites may change. The goal is to obtain the tree that maximizes the posterior probability given the image. The directed search in the space of valid trees is performed by means of <a href="http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo">Markov chain Monte-Carlo</a> (MCMC). The possible tree changes like split and merge of regions, varying the border between regions, are defined. Such changes are described in terms of learnable Markov chain transition kernels.</div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">What they actually did is they effectively combined top-down and bottom-up approaches. In a nutshell, the bottom-up (discriminative) method generates the hypotheses of pixel labels and object positions using the local neighbourhood, and the top-down generative model is then built making use of those hypotheses. The latest guarantees the consistency of the output and the optimum of its posterior probability.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Okay, what does it mean for the issue of detection and segmentation convergence? Because the shapes of objects are determined implicitly during the recognition, and stuff regions are detected by their colour and texture, the problem of semantic image segmentation is actually solved. Moreover, <i>multi-scale semantic segmentation</i> is performed during the parsing. Therefore, image parsing seems to be the most general formulation of image recognition problem.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The nodes of a parsing tree, which belong to the same depth level, are not connected (right, because it is a tree). This means that spatial interactions could not be modelled directly. A totally different approach was introduced in another Marr prize winning paper by <a href="http://www.ics.uci.edu/~fowlkes/papers/drf-iccv09.pdf">Desai et al [2009]</a>. They also use bottom-up tests to generate hypotheses on the object locations and then use CRF over those location to take spatial context into account. They model different kinds of inter-class and intra-class interactions. Thus, they ensure that the objects in the image (described by their bounding boxes) are arranged correctly, e.g. a cup is likely to be on a table (spatial arrangement), while a train could not be close to a ship (mutual exclusion).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It seems that the two papers exploit the two different forms of information (aggregation vs. spatial interactions), which are mutually redundant to a certain extent, but do not completely exhaust each other. It seems that the group of researchers who will successfully combine those approaches will receive some 201x Marr prize.</div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com1tag:blogger.com,1999:blog-2293611840857369720.post-67542306779915328762010-07-28T00:41:00.010+04:002010-07-29T01:51:43.193+04:00ICVSS 2010: some trivia<div style="text-align: justify;">If you think that summer schools are only about lectures and posters, you are certainly wrong. Well, I also did not expect that I would get drunk every night (it was not in Russia after all!). On the first night I convinced people to go to the beach for night swimming, but on the next day I realized that it was not a really good idea, because lectures started at 9. Eventually, we did not go to bed early for the rest of the week.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I would like to thank "Marsa Sicla" people (Fig. 1), you were the great company!</div><div style="text-align: center;"><br /></div><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8PsiyXinujvBWpU0YifRxfnsEXVzCykxSB3Y8lstOr66XK7xQUbjaqTGfJ0L4R882Ef2u5mysLKCGVsQgvVoqKrG7tn5t0wiJbLkZCVtbws_ooRdWHduWBMedFt0sFqqNHdmZkGUKY08_/s400/MSpeople.jpg" style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 235px;" border="0" alt="" id="BLOGGER_PHOTO_ID_5499051300018626354" /><div style="text-align: center;"><span class="Apple-style-span" style="font-size:small;">Fig. 1. The Marsa Sicla company. Left to right: </span><a href="http://flavors.me/robotblackdebbie"><span class="Apple-style-span" style="font-size:small;">Richard</span></a><span class="Apple-style-span" style="font-size:small;">, </span><a href="http://www.cs.ucf.edu/~ramin/"><span class="Apple-style-span" style="font-size:small;">Ramin</span></a><span class="Apple-style-span" style="font-size:small;">,<br /></span><a href="http://www.lmt.ei.tum.de/team/alt/"><span class="Apple-style-span" style="font-size:small;">Nicolas</span></a><span class="Apple-style-span" style="font-size:small;">, </span><a href="http://www.facebook.com/selinachenxi?ref=ts#!/selinachenxi"><span class="Apple-style-span" style="font-size:small;">Xi</span></a><span class="Apple-style-span" style="font-size:small;">, </span><a href="http://www.stanford.edu/~vijayc/"><span class="Apple-style-span" style="font-size:small;">Vijay</span></a><span class="Apple-style-span" style="font-size:small;">, </span><a href="http://www.limsi.fr/Annuaire/infos?id=cchastag"><span class="Apple-style-span" style="font-size:small;">Clément</span></a><span class="Apple-style-span" style="font-size:small;">, </span><a href="http://www.linkedin.com/pub/aishwarya-natesh/13/845/103"><span class="Apple-style-span" style="font-size:small;">Aish</span></a><span class="Apple-style-span" style="font-size:small;">, </span><a href="http://shapovalov.ro/"><span class="Apple-style-span" style="font-size:small;">me</span></a><span class="Apple-style-span" style="font-size:small;">. </span><a href="http://www.ktl.elf.stuba.sk/portal/?a=17&s=222"><span class="Apple-style-span" style="font-size:small;">Sandra</span></a><span class="Apple-style-span" style="font-size:small;"> is behind the camera.</span></div><div style="text-align: center;"><br /></div><div style="text-align: justify;">Actually, there was a reason why we had such a close-knit company. On the Fig. 2 you can see the map of the region where the school took place. The school was hosted by Baia Samuele hotel village (inside the blue frame on the map). The cheapest accommodation at Baia Samuele costed €100 per night, so some pragmatic people scared a different accommodation up. It was in a neighbouring hotel village, Marsa Sicla. </div><div style="text-align: justify;"><br /></div><div style="text-align: center;"><iframe width="550" height="700" frameborder="0" scrolling="no" marginheight="0" marginwidth="0" src="http://maps.google.com/maps/ms?ie=UTF8&hl=en&t=k&msa=0&msid=114318388347498250868.00048c655613735765d52&ll=36.725023,14.753973&spn=0.012039,0.011802&z=16&output=embed"></iframe><br /><small>Fig. 2. ICVSS location map. View <a href="http://maps.google.com/maps/ms?ie=UTF8&hl=en&t=k&msa=0&msid=114318388347498250868.00048c655613735765d52&ll=36.725023,14.753973&spn=0.012039,0.011802&z=16&source=embed" style="color:#0000FF;text-align:left">ICVSS_2010_MSvsBS</a> in a larger map</small></div><div style="text-align: justify;"></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In a map it looked pretty close, but in practice it turned out that there were a lot of fences (blue line)! Officially, we had to walk along the red route (~2.5 km, partly along a highway) through the official entrance. Moreover, we had to leave the village for lunch (since there were buffet), otherwise we would have been charged €30 per meal. To control it they wanted us to leave our documents at the control post (bottom blue tick on Fig. 2). Thus, we were supposed to walk 2.5 km four times a day. But we found a better solution.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"></div><div style="text-align: justify;">Although the fence was equipped with a barbwire, we managed to find two shortcuts (green ticks). One of them led through ever-closed gate (Fig. 3), which were suitable for hopping over. So, we used the green path usually. It made the way two times shorter. Moreover, we did not have to leave Baia Samuele for lunch, since we weren't officially there. Sometimes we even had free lunch, that finally proved that <a href="http://en.wikipedia.org/wiki/No_free_lunch_theorem">the no free lunch theorem</a> is wrong (joke by Ramin).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFB-awuQB3r1Tepp2RscwBon9xJBbXQnbeaOF9eLdWqp2ghUGivoANBKUev-qzrQ076hyDbafajILAv903rUIC8fLchbO96BsCjcPRgRkhvIzhFpZEQj1bPvKy5UoBLmUTqJQZ1s_dRRgX/s1600/IMG_0271.jpg"><img style="text-align: justify;display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; cursor: pointer; width: 400px; height: 300px; " src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFB-awuQB3r1Tepp2RscwBon9xJBbXQnbeaOF9eLdWqp2ghUGivoANBKUev-qzrQ076hyDbafajILAv903rUIC8fLchbO96BsCjcPRgRkhvIzhFpZEQj1bPvKy5UoBLmUTqJQZ1s_dRRgX/s400/IMG_0271.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5499063929594125762" /></a></div><div style="text-align: center;"><span class="Apple-style-span" style="font-size:small;">Fig. 3. Hopping over the fence. </span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-size:medium;"><br /></span></div><div style="text-align: justify;">Well, the lunch was really great. But this was not really a question of food, the main reason not to leave Baia Samuele was of course socialization, which used to be active during lunch (and also Internet access at the conference centre :). If we knew about such situation before booking the apartments, we would probably followed the official accommodation recommendations. However, we had a great company at Marsa Sicla too. But because of "unofficial night programme" I had to (almost) sleep during lectures. Most of guys was on their last year of PhD, so they considered the school as a vacation. But I really wanted to learn a lot of stuff. It was really difficult to perceive any information when you had not slept enough time. Surprisingly, I somehow managed to pass the final exam, but I feel I need to look through the lecture slides again, reading up some papers they refer to, otherwise the scientific part of the school will be useless for me. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"></div><div style="text-align: justify;">Thus, I finish posting about ICVSS. To conclude, I want to recommend everyone to visit such summer schools, because they are the best way to enter into the community. </div>Anonymoushttp://www.blogger.com/profile/01422787855469629259noreply@blogger.com2