Showing posts with label history. Show all posts
Showing posts with label history. Show all posts

## X Bridges of Königsberg

The recent Spiked Math comic strip proposes some cheats to build an Eulerian path over the famous bridges of Königsberg, which was originally impossible. Here is the strip:

It turns out that in practice another solution worked out (in fact, the mix of the engineer’s and mythbuster’s solutions). You can ruin the city during a world war and then transfer it to Russians, who will build the new bridges instead. I’ve been to Kaliningrad (the city has been renamed too) two years ago, but did not try to solve the Euler’s problem in field. Let’s find out if it is possible now. Look at the modern map:

View Königsberg bridges in a larger map

So, the modified Spiked Math’s scheme looks as follows (the new bridges are shown in light blue):

Is there an Eulerian path nowadays, i.e. can you walk through the city and cross each bridge once and only once? Recall the theorem:
A connected graph allows an Eulerian path if and only if it has no more than two vertices of an odd degree.
In the picture above, two pieces of land have even number of incident bridges, while the Kneiphof island in the center has degree 3, and the left bank (in the bottom) has degree 5. This means that now Eulerian paths exist. Any such path should begin in the left bank and end in the island, or vice-versa. It may be a good idea to finish your journey near the grave of Immanuel Kant. :) However, there is no Eulerian conduit, i.e. closed Eulerian path.

PS. 1. If you zoom out the above map, you’ll see one more piece of land and two more bridges on the East. However, it only adds a vertex of degree 2, and swaps parity of the banks, so the properties of the new graph are similar, you just need to start/finish your journey in the right bank, not the left one (and walk 10 km more).
2. When I was preparing this article, I found a similar analysis. It used an old map, so the Jubilee bridge and the 2nd Estacade bridges were not marked (they were built in 2005 and 2012, respectively), and also ignored the Reichsbahnbrüke (railroad bridge, but seems pedestrian-accessible).
3. It would be more interesting to analyze the bridge graphs of other cities with more islands like St. Petersburg (several hundred bridges) or Amsterdam (numerous bridges/canals, but have structure).
4. Here is Euler’s original paper (in Latin) with his drawings of the above schemes.

About a month ago I visited CERN, arguably the most famous research organization in Europe. It is the place where the World Wide Web has been invented, and the home of the Large Hadron Collider. I was very excited about the trip, much like Sheldon Cooper. Here are some facts which were new to me:

• CERN was founded after the WWII by a bunch of European countries. They were exhausted by the war, and the only way to catch up with the USA and USSR in fundamental science was to join forces.
• The original name was Conseil Européen pour la Recherche Nucléaire (European Council for Nuclear Research), which abbreviated to CERN. Later the name has been officially changed to European laboratory for particle physics, which is both more relevant and less fearful for the locals, however the brand CERN is used now even in official documents.
• There are almost 3,000 full-time employees, but most of them are engineers and not scientists. There are a lot of visiting researchers though.
• There are 20 member states now (primarily EU states), and 6 observer states (such as Russia and the USA).
• CERN's annual budget is about € 1 billion, it is funded by the member states in proportion to their economical power, e.g. Germany gives 20% of the money.
• The budget money are spent to infrastructure and support, all the individual experiments are funded by research groups and their universities.
• In spite of the USA is not a member state, it leads on the number of researchers who work on CERN projects (more than thousand), second is Germany, third is Russia (yes, we still have a good shape in particle physics). It turned out that everybody at CERN spoke Russian, even the janitor. :)

• It is in fact a circular tunnel of 27 km in circumference lying 175 m beneath the ground.
• The tunnel was used before the LHC, it was build in 1983 for the Large Electron-Positron Collider. In 2008 it was upgraded to be able to accelerate heavy particles like protons to become the Large Hadron Collider (remind that proton and neutron are thousand times heavier than electron).
• The tunnel is about 4 meters in diameter, one can walk there or ride a bike.
• The tunnel encapsulates two small pipes for the particles that intersect at four points (to make the tracks' lengths equal, like in speed scating arenas). There are more then a thousand of electric magnet dipoles along the pipe. They are not that big as I imagined before.
• It is nearly vacuum and zero temperature inside the tubes.
• Proton beams are not generated inside the LHC. First, they are accelerated in the linear accelerator and almost reach the speed of light c. While the speed is rising, it becomes harder and harder to increase it since it cannot overcome the speed of light. Then they are accelerated in the small circular accelerator, and only after that they are injected into the LHC where during 40 minutes the beams are accelerated to speed as much close to c as possible.
• When accelerated, the beams are being observed during 10 hours. They suffer about 10,000 collisions per second, about 20 pairs of protons collide each time. Since the speed is large, the energy of that collisions is enormous.
• The collisions take place within special locations called detectors. We visited a control centre of one of them, ATLAS. A detector has multiple layers, each able to register certain kind of particles, like photons.
• Ten thousand collision per second would yield really big amount of data, so only few of them are selected to be logged. I don't know how they select those collisions, machine learning might be used. :)
• All the collected data are spread into servers all over the world. An authorized researcher may log in to the grid network and execute her script to analyse the data.
• Higgs boson is a hypothetical particle, existence of which would prove the standard model of particle physics.
• Higgs boson appears as a result of collision of two protons and large amount of energy. The protons in the LHC are accelerated enough to produce theoretically sufficient energy.
• Higgs boson is very heavy and thus unstable. In theory, it decays into either four muons or two photons. So, if there will be two counter-directed light beams registered by the detector, this will be an evidence of the boson. See the picture below for example of likely detector output in case of the boson shows up.
• Scientist say that if Higgs boson would not be detected, all the modern knowledge on particle physics will crush. They will be obliged to develop a new theory from scratch.
• There are no published results that report on Higgs boson detection so far...

Don't you want to be a theoretical physicist now? =)

## Papers, citations, co-authorship, and genealogy

This summer I accidentally found out that there are two papers citing our CMRT 2009 paper. I was excited a bit about that since they were the first actual citations of me, so I even read those papers.

The first of them [Димашова, 2010] was published in Russian by OpenCV developers from Nizhny Novgorod. They implemented cascade-based face detection algorithm that used either Local Binary Patterns (LBP) or Haar features. The algorithm was released within OpenCV 2.0. They cite our paper as an example of using Random Forest on the stages of the cascade. However, they implemented the classical variation with the cascade over AdaBoost.

Another citation [Sikiric et al., 2010] is more relevant since it came from the road mapping community. They address the problem of recovering a road appearance mosaic from the orthogonal views to the surface. They contrast their approach with ours in the way that theirs do not employ human interaction. In fact, we need human input for recognizing road defects and lane marking, rectification is done in the previous stage, which is fully automatic.

The rest of the post is devoted to some interesting metrics and structures concerning citations, co-authorship and supervising students.

Citations

The number of citations is a weak measure of paper quality. We can also go further and estimate the impact of a journal or a particular researcher based on the number of citations. A generally accepted measure is the journal impact factor, which is simply the mean number of citations by paper published in the journal for some period in time. Individual researchers could be evaluated by the impact factor of the journals they published in, though it is considered as a bad practice. Another not-so-bad practise is h-index. By definition, one's h-index equals H if there are at least H citations of his top H papers1. It has also been criticised widely.

So, citation-based scoring has a lot of flaws. But what can we use instead? Another interesting approach is introduced by ReaderMeter. They collect the information about the number of people who have added some paper to their Mendeley collections and compute something like h-index. Unfortunately, they recently excluded some papers from their database, so the statistics became less representative but more accurate.

Co-authorship and Erdős number

Paul Erdős was a Hungarian mathematician who published about 14 hundred research papers with 511 different co-authors. That's why he has a special role in bibliometrics. Erdős number is defined as a collaborative distance from a researcher to Erdős. More strictly, Wikipedia defines:
Paul Erdős is the one person having an Erdős number of zero. For any author other than Erdős, if the lowest Erdős number of all of his coauthors is k, then the author's Erdős number is k + 1.
My Erdős number is at most 7 (via Olga Barinova, Pushmeet Kohli, Philip H.S. Torr, Bernhard Schölkopf, John Shawe Taylor, David Godsil). To be honest, Erdős number system is primarily used for math papers. Even if we use the wider definition, it is not that beautiful, because there are not too many gates, e.g. all the vision community will probably connect to Erdős via the machine learning gate. Thus, the researchers who work on the edge will be closer. To illustrate this fact, David Marr probably had not any finite Erdős number during his lifetime (now he definitely has). So, we can introduce, say, Zisserman number for computer vision.2 According to DBLP, Andrew Zisserman has 165 direct co-authors so far. Now, my Zisserman number is 3, David Marr's is 4 (via Tomaso Poggio, Lior Wolf, Yonatan Wexler).

The movie industry have their own metrics, which is the Bacon number (after Kevin Bacon). The distance is established 1 if two actors have appeared in a single movie. Someone tried to combine those two to the Erdős-Bacon number. For some person, it is just a sum of her Erdős and Bacon numbers. Of course, very few people have a finite Erdős-Bacon number, since one should both appear in a movie and publish a research paper (and it is still not sufficient). Often they are possessed by researchers who consulted the filming crew and accidentally were filmed. =) Erdős himself has this number 3 or 4 (depending on the details of definition), since he starred in N is a Number (1993) and his Bacon number is thus 3 0r 4.

The person with the probably lowest Erdős-Bacon number is Daniel Kleitman, an MIT mathematician, who has appeared in one of my favourite movies Good Will Hunting (1997) along with Minnie Driver, who collaborated with Bacon in Sleepers (1996). Since Kleitman has 6 joint papers with Erdős, his Erdős-Bacon number is 3.

Surprisingly, Marvin Minsky has his Erdős number (4) greater than his Bacon number (2), which he obtained via The Revenge of the Dead Indians (1993) and Yoko Ono. Another strange example is the paedophile's dream Natalie Portman. She has graduated from Harvard, saying she would rather "be smart than a movie star". That's my kind of a girl! A neuroscience paper [Baird et al., 2002] brought Natalie Hershlag (her real name) Erdős number of 5, and then she appeared in New York, I Love You (2009) along with Kevin Bacon, so she reached the same Erdős-Bacon number as Minsky, i.e. 6.

UPD (Mart 13, 2011). There is a totally relevant xkcd strip.

Scientific genealogy

Finally, I tell about what is known as scientific genealogy. Every grown-up researcher has a PhD advisor, usually one, while an advisor can have a lot of students. Let's just use the analogy with parents and children. We get a tree (or a forest) representing the historical structure of science. The mathematics genealogy project aims to recover this structure.

I tried to track my genealogy back. Unfortunately, I failed to find who was Yury M. Bayakovsky's PhD advisor. But if consider Olga Barinova my advisor, I am the 11th generation descendant of Carl Friedrich Gauß. Nice ancestry, huh?

The similar project exists for computer vision. There are 290 people in the base, though there are duplicates (I've found five Vitorio Ferrari's :). It seems strange that some trees have depth as big as 5 (e.g. Kristen Graumann and Adriana Quattoni are the 5th generation after David Marr), though vision is a relatively young field.

1 Okay, take the supremum of the set if you want to stay formal
2 It seems that Philip Torr has already used that number.

## ECCV 2010 highlights

Today is the last day of ECCV 2010, so the best papers are already announced. Although I was not there, a lot of papers are available via cvpapers, so I eventually run into some of them.

Best papers

The full list of the conference awards is available here. The best paper is therefore "Graph Cut based Inference with Co-occurrence Statistics" by Lubor Ladicky, Chris Russell, Pushmeet Kohli, and Philip Torr. The second best is "Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics" by Abhinav Gupta, Alyosha Efros, and Martial Hebert. Two thoughts before starting reviewing them. First, the papers come from the two institutions which are (arguably) considered now the leading ones in the vision community: Microsoft Research and Carnegie-Mellon Robotics Institute. Second, both papers are about semantic segmentation (although the latter couples it with implicit geometry reconstruction); Vidit Jain already noted the acceptance bias in favour of the recognition papers.

Okay, the papers now. Ladicky et al. addressed the problem of global terms in the energy minimization for semantic segmentation. Specifically, their global term deals only with occurrences of object classes and invariant to how many connected components (i.e. objects) or individual pixels represent the class. Therefore, one cow on an image gives the same contribution to the global term as two cows, as well as one accidental pixel of a cow. The global term penalizes big quantity of different categories in the single image (the MDL prior), which is helpful when we are given a large set of possible class labels, and also penalizes the co-occurrence of the classes that are unlikely to come along with each other, like cows and sheep. Statistics is collected from the train set and defines if the co-occurrence of the certain pair of classes should be encouraged or penalized. Although the idea of incorporating co-occurrence to the energy function is not new [Torralba et al, 2003; Rabinovich et al., 2007], the authors claim that their method is the first one which simultaneously satisfy the four conditions: global energy minimization (implicit global term rather than a multi-stage heuristic process), invariance to the structure of classes (see above), efficiency (not to make the model order of magnitude larger) and parsimony (MDL prior, see above).

How do the authors minimize the energy? They restrict the global term to the function of the set of classes represented in the image, which is monotonic w.r.t. argument set enclosing (more classes, more penalty). Then the authors introduce some fake nodes for αβ-swap or α-expansion procedure, so that the optimized energy remains submodular. It is really similar to how they applied graph-cut based techniques for minimizing energy with higher-order cliques [Kohli, Ladicky and Torr, 2009]. So when you face some non-local terms in the energy, you can try something similar.

What are the shortcomings of the method? It would be great to penalize objects in addition to classes. First, local interactions are taken into account, as well as global ones. But what about the medium level? Shape of an object, size, colour consistency etc. are the great cues. Second, on the global level only inter-class co-occurrences play a role, but what about the intra-class ones? It is impossible to have two suns in a photo, but it is likely to meet several pedestrians walking along the street. It is actually done by Desai et al. [2009] for object detection.

The second paper is by Gupta et al., who have remembered the romantic period of computer vision, when the scenes composed of perfect geometrical shapes were reconstructed successfully. They address the problem of 3D reconstruction by a single image, like in auto pop-up [Hoiem, Efros and Hebert, 2005]. They compare the result of auto pop-up with Potemkin villages: "there is nothing behind the pretty façade." (I believe this comparison is the contribution of the second author). Instead of surfaces, they fit boxes into the image, which allows them to put a wider range of constraints to the 3D structure, including:
• static equilibrium: it seems that the only property they check here is that centroid is projected into the figure bearing;
• enough support force: they estimate density (light -- vegetation, medium -- human, heavy -- buildings) and say that it is unlikely that building is build on the tree;
• volume constraint: boxes cannot intersect;
• depth ordering: backprojecting the result to the image plane should correspond to what we see on the image.
This is a great paper that exploits Newtonian mechanics as well as human intuition, however, there are still some heuristics (like the density of a human) which could probably be generalized out. It seems that this approach has a big potential, so it might became the seminal paper for the new direction. Composing recognition with geometry reconstruction is quite trendy now, and this method is ideologically simple but effective. There are a lot of examples how the algorithm works on the project page.

Funny papers

There are a couple of ECCV papers which have fancy titles. The first one is "Being John Malkovich" by Ira Kemelmacher-Shlizerman, Aditya Sankar, Eli Shechtman, and Steve Seitz from the University of Washington GRAIL. If you've seen the movie, you can guess what is the article about. Given the video of someone pulling faces, the algorithm transforms it to the video of John Malkovich making similar faces. "Ever wanted to be someone else? Now you can." In contrast to the movie, in the paper not necessarily John Malkovich plays himself: it could be George Bush, Cameron Diaz, John Clooney and even any person for whom you can find a sufficient video or photo database! You can see the video of the real-time puppetry on the project page, although obvious lags take place and the result is still far from being perfect.

Another fancy title is "Building Rome on a Cloudless Day". There are 11 (eleven) authors contributing to the paper, including Marc Pollefeys. This summer I spent one cloudless day in Rome, and, to be honest, it was not that pleasant. So, why is the paper called this way then? The paper refers to another one: "Building Rome in a Day" from ICCV 2009 by the guys from Washington again, which itself refers to the proverb "Rome was not built in a day." In this paper authors build a dense 3D model of some Rome sights using a set of Flickr photos tagged "Rome" or "Roma". Returning back to the monument of collective intelligence from ECCV2010, they did the same, but without cloud computing, that's why the day is cloudless now. S.P.Q.R.

I cannot avoid to mention here the following papers, although they are not from ECCV. Probably the most popular CVPR 2010 paper is "Food Recognition Using Statistics of Pairwise Local Features" by Shulin Yang, Mei Chen, Dean Pomerleau, Rahul Sukthankar. The first page of the paper contains the motivation picture with a hamburger, and it looks pretty funny. They insist that the stuff from McDonald's is very different from that from Burger King, and it is really important to recognize them to keep track of the calories. Well, the authors don't look overweight, so the method should work.

The last paper in this section is "Paper Gestalt" by the imaginary Carven von Bearnensquash, published in Secret Proceedings of Computer Vision and Pattern Recognition (CVPR), 2010. The authors (presumably from UCSD) make fun of the way we usually write computer vision papers assuming that some features might convince a reviewer to accept or reject the paper, like mathematical formulas that create an illusion of author qualification (although if they are irrelevant), ROC curves etc. It also derides the attempts to apply black-box machine-learning techniques without the appropriate analysis of the possible features. Now I am trying to subscribe to the Journal of Machine Learning Gossip.

Colleagues

There was only one paper from our lab at the conference: "Geometric Image Parsing in Man-Made Environments" by Olga Barinova and Elena Tretiak (in co-authorship with Victor Lempitsky and Pushmeet Kohli). The scheme similar to the image parsing framework [Tu et al., 2005] is utilized, i.e. top-down analysis is performed. They detect parallel lines (like edges of buildings and windows), their vanishing points and the zenith jointly, using a witty graphical model. The approach is claimed to be robust to the clutter in the edge map.

Indeed, this paper could not have been possible without me. =) It was me who convinced Lena to join the lab two years ago (actually, it was more like convincing her not to apply for the other lab). So, the lab will remember me at least as a decent selectioner/scout...

## Comic strip: 10%

Last summer, during my internship at CMP in Prague I drew some xkcd-style comic strips, and I want to share some of them here. I considered starting a different blog for that, but there are too few of them, and they do suck (at least in comparison to Randall's work). However, most of them are programming/research related, so they are appropriate here.

I planned to use them when I would have nothing to post about. This is not the case now (I am planning a couple more posts in a week or so), but this one is topical. It was originally drawn in September 1, 2009 when GMail was not operating for a few hours. From the recent news: Google shuts Google Wave down, admitting their failure predicted by some sceptics (see the image title text).

## Image Parsing: Unifying Segmentation, Detection, and Recognition

Recently I have posted about the connection between object category detection and semantic image segmentation. While delving into this problem more deeply, I have found the paper denoted in the title of this post.

The ICCV 2003 paper by Tu et al. was among the three Marr prize winners. And it is really a prominent piece of work! In the introduction to the special Marr prize IJCV issue, Bill Triggs featured the paper as one that "would have been particularly pleasing to David Marr", because of "its bold attack on the central problem of perceptual organization and whole scene understanding". Here is the journal version of the paper.

The authors combined discriminative and generative models, which resulted to the unified framework for image parsing. For each image the corresponding parsing tree could be found. The root of the tree represents the whole scene. On the next level, nodes represent semantic fragments, such as human faces, text, or textured regions. The leaves of the tree correspond to the pixels of the image.

The approach differs drastically from the vanilla CRF framework in the way that the structure of the parsing tree is dynamic while the CRF structure remains constant and just interaction models between the pairs of sites may change. The goal is to obtain the tree that maximizes the posterior probability given the image. The directed search in the space of valid trees is performed by means of Markov chain Monte-Carlo (MCMC). The possible tree changes like split and merge of regions, varying the border between regions, are defined. Such changes are described in terms of learnable Markov chain transition kernels.

What they actually did is they effectively combined top-down and bottom-up approaches. In a nutshell, the bottom-up (discriminative) method generates the hypotheses of pixel labels and object positions using the local neighbourhood, and the top-down generative model is then built making use of those hypotheses. The latest guarantees the consistency of the output and the optimum of its posterior probability.

Okay, what does it mean for the issue of detection and segmentation convergence? Because the shapes of objects are determined implicitly during the recognition, and stuff regions are detected by their colour and texture, the problem of semantic image segmentation is actually solved. Moreover, multi-scale semantic segmentation is performed during the parsing. Therefore, image parsing seems to be the most general formulation of image recognition problem.

The nodes of a parsing tree, which belong to the same depth level, are not connected (right, because it is a tree). This means that spatial interactions could not be modelled directly. A totally different approach was introduced in another Marr prize winning paper by Desai et al [2009]. They also use bottom-up tests to generate hypotheses on the object locations and then use CRF over those location to take spatial context into account. They model different kinds of inter-class and intra-class interactions. Thus, they ensure that the objects in the image (described by their bounding boxes) are arranged correctly, e.g. a cup is likely to be on a table (spatial arrangement), while a train could not be close to a ship (mutual exclusion).

It seems that the two papers exploit the two different forms of information (aggregation vs. spatial interactions), which are mutually redundant to a certain extent, but do not completely exhaust each other. It seems that the group of researchers who will successfully combine those approaches will receive some 201x Marr prize.

## ICVSS 2010: Have you cited Lagrange?

Yeah, I have. :) Before the beginning of the summer school we received a homework assignment from Prof. Stefano Soatto. We had to read the three papers and discover the roots of ideas exploited by the authors. Such connections are not obligatory expressed by references. As a result, one had to get a tree (or, may be, lattice) with a root in the given paper. Since the task was not accurately formulated, most of us chose just one paper and made a tree for it. Later it turned out that all three problems (optical flow estimation, multi-view stereo and volume registration) have the same roots because they lead to the similar optimization problems and we were supposed to establish that connection.

I (like most of the students) had a limited time to prepare the work. All the modern research in optical flow is based on two seminal papers by MIT's Horn & Shunk and CMU's Lukas & Kanade, both from 1981. During the last 30 years a lot of work has been done. It was summarized by Sun et al. (2010), who discovered that only small fraction of the ideas give significant improvement and combined them into quite simple method that found its place on the top of Middlebury table. Since I had no time to read a lot of optical flow papers, I discovered a lot of math stuff (like PDEs and variation calculus) used in Horn-Shunk paper. Just as a joke I added references to the works of Newton, Lagrange and d'Alambert to my report. Surprisingly, joke was not really understood.

There were only 21 submissions, one of them was 120 pages long (the author did not show up at the seminar =), the tree depth varied from 1 to 20. I was not the only one who cited "pre-historic" papers, someone traced ideas back to Aristotle through Occam. The questions about the horizon arose: it is really ridiculous to find the roots of computer vision at Aristotle works. Since the prizes were promised for rare references (the prize policy was left unclear too), some argument took place. Soatto did not give any additional explanations on the formulation of the task but let the audience decide which references are legitimate in controversial cases. Eventually, I was among the 5 people whose references were considered controversial, so I needed to defend them. Well, I suppose I looked pretty pathetic talking about Newton's finite differences which were used for approximation of derivatives in the Horn-Shunk's paper. Surprisingly, almost a half of the audience voted for me. =) Also, my reference to the Thomas Reid's essay was tentatively approved.

Finally, there were no bonuses for the papers that could not be found with Google, and the whole \$1000 prize went to the Cambridge group. To conclude, Soatto (among other) said that nobody had read any German or Russian papers. After the seminar I told him how I tried to dig into the library (it is described here in Russian), he answered something like "funny, it is hard to find them even for you!"

One evening during the dinner we talked about Russia, and Vijay told a lot of interesting stuff (like the guy who developed ChatRoulette is now working in Silicon Valley). He remembered Stephen Boyd who is known for his jokes about Russia. I said that I watched the records of his amazing course on Convex Optimization. It turned out that Vijay (who is now a PhD student in Stanford) took that course, and he promised to tell Boyd that he has a fan in Russia. (Or, maybe in Soviet Russia fans have Boyd =).

continued here

## On Fundamental Artificial Intelligence

Today I attended Mikhail Burtsev's lecture on evolutionary approach to AI. The lecture was hosted by Moscow Polytechnic Museum's lecture hall. First of all, the organization was great, and the quality of presentation was high. Unsurprisingly, the hall was almost totally occupied by the listeners. I recommend you to attend their lectures.

Mikhail talked a lot about the history and foundations of AI, which was not new to me. He did not bore the audience with the psychology-related stuff, but featured two approaches to AI, that have been being developed simultaneously all the way since 1950s: artificial neural networks and symbolic computational AI. ANNs attempt to model human brain functioning by means of artificial neurons and axons. The problem here is if we build even human-scale artificial brain (which was actually done in Allen Institute for Brain Science in 2005), it will be isolated from the world, and could not be trained like real human brain. Symbolic AI uses formal logic to solve the problems. It also suffers certain problems, like the lack of uncertainty. So, vanilla AI is in the period of stagnation now (it dates bake to the late 1980s, when Rodney Brooks started to criticize it).

Mikhail's point was the one that most researchers attempt to build a straight away model of human intelligence. He suggested to look at the evolution process. We need to start with understanding how the neural system of an amoeba operates, and then track the evolution of it to find out how a human being's mind works. Evolutionary algorithms might be useful there.

I cannot agree with that point. Last 50 years witnessed the failure of fundamental AI research. Their common problem was they tried to model some cognitive mechanisms instead of tackling actual problems, while those problems were being solved by different techniques. Consider the autonomous robot control problem (e.g. a robotic vacuum cleaner). It can be formulated in terms of reinforcement learning and solved by the methods not connected to the vanilla AI (dynamic programming-based, I suppose). Mikhail does not depart from the problem, again. So I don't really believe in success of the method. I am also sceptic to the computational methods inspired by biology. They could be a good starting point, but they used to be supplanted by mathematically founded ones.

Another issue raised in the discussion after the lecture was AI safety, which is probably the most popular philosophical question connected to AI. As far as I understood, there are two main concerns: the AI could become self-replicated (or self-improved) uncontrollably, and it could become self-motivated. If the AI is self-replicated, it is not so dangerous unless it is self-motivated. If it is motivated by a human, it is bad just like nuclear weapon, which is scary only if an evil party possesses it (so, nothing brand new here). If the AI will be motivated to gain the world domination, it could be harmful. But where such self-motivation can come from? In my opinion, the only option is the biological obsession to propagate (imagine some joint AI system with bio-motivator), but such a disaster has already been overcome once. Do you remember that rabbits in Australia? I hope, an uncontrolled AI won't be a bigger problem.

## ICVSS 2010

This summer I'm going to take part at the International Computer Vision Summer School. I have already been accepted and registered for it. I'm really looking forward this event and hope it will not be too hot in July in Sicily. =)

I am now seeking for a room-mate, because accommodation is quite expensive there. If you are a participant too and have a similar problem, please feel free to contact me!

One of the activities of the school is a reading group leaded by Prof. Stefano Soatto. As a homework, we need to investigate the reference trees on three certain subjects. Moreover, there will be prizes for students who will have discovered papers that could not be found via standard search engines. I suppose I have a bonus here: Russian science has been isolated from the, well, science due to that iron curtain thing (it is still quite isolated, by inertia). So, I just need to dig in a library and find something appropriate in old Soviet journals.

## The disruption of scholarly publishing [in Russia]

You know, I am relatively new to the big science. The first 2-column paper I read was Antonin Guttman's paper about R-Trees.1 It was about 2.5 years ago. So, I've never used published journals or conference proceedings. I have been wondering why do they spend money to printing journals if the researchers usually publish their papers on their home pages. Recently, I've read the article about the disruption of scholarly publishing by Michael Clarke. The article is quite dragged out, so I summarize its ideas in the following paragraphs.

When Tim Berners-Lee came up with the idea of WWW in 1991, he thought of its purpose as some kind of scientific media, which would replace conferences and journals. Nowadays, the Internet is used for social networking, illegal distribution of music and pornography, but we still have journals and conferences off-line. Why?

The author indicates 5 main points:
• Dissemination - journals distribute papers all over the world
• Registration of discovery - to know who was the first, Popov or Marconi
• Validation - to ensure that the results are correct; now provided by peer-review
• Filtration - you are likely to read a paper from the journal with bigger impact factor
• Designation - if you have publications on top conferences, you might be a cool scientist
The author argues that only the last point is critical for the modern system of journals and conferences. The others could be better handled with on-line services, we already have a lot of examples.

But in Russia, we don't rank scientists according to their citation index!2 (because we don't have one :) ) So, the designation problem is not solved here, and Russia is already ready to the new publishing system. Why do we still have journals? I don't know, probably the sluggishness of minds...

2 I am cunning a bit: some journals carry out designation purposes. I mean journals published by VAK. One should have at least one publication in such a journal to get Candidate of Science degree (Russian PhD equivalent). But the review process is usually lousy, there was also Rooter case with one of them.

## Hello, world!

Well, I've just started my technical/research blog.

#### Why?

I already have a personal blog, which is (I hope) used to be read by a number of my friends. In recent time I noticed that I had begun post a lot of technical stuff my friends were not interested in. Also, that blog is in Russian, which limits its audience. So, I preferred not to change the blog policy but create a new blog. It was also Boris Yangel's amazing blog that inspired me to create mine.

#### Computer blindness?

Well, yes. Folks who know me might already divine that the blog is primarily about computer vision, and they are certainly right. I chose such a fancy name because computer vision is somewhat disappointing. Let me explain.

Do you remember that story when Marvin Minsky at MIT asked his student to teach a computer understand the scene retrieved from a camera during the 1966 summer break? Unsurprisingly, the student failed. Moreover, the general task is far from being solved even now. That days computers were slow, and AI scientists thought that performing logical inference is way more complicated than just scene analysis. Now, we have Prolog and a pile of different verification systems (thanks to the university curriculum -- we used some of them), but a computer is not able to recognize even simple object categories on different classes of images (e.g. relatively robust face detection was done only in early 2000s by Viola and Jones).

Actually, not only Minsky was misled. If you are a vision researcher, when you tell people about your research, they are likely to reply: "Is that all you can? Man, it's simple!" Sure, it is simple for you to figure out that it is a cow on the meadow, not a horse, but try to explain it to computer! It is actually a problem in our lab: when vision guys try to defend a Masters/PhD thesis in front of the committee (which consists of folks who do research in computer hardware, system programming etc), and the committee is usually not impressed by the results, because they think it is not too difficult.

Okay, you've got the point. Computers are blind, and we should cope with that.

Next. Do you know, what's the difference between computer vision of 1980s and modern computer vision? In 80s, they used to solve particular problems, not general ones. They could implement quite a decent vision system that performed its task, but it could not be transferred to another domain. Nowadays, such approach is no more scientific, while it is still good for engineers. Today, computer vision science is about general tasks. It became possible because of extensive using of machine learning methods (a lot of them were developed in '90s). Vision is nothing without learning now. I hope you've got this point too: Vision + Learning = Forever Together. That's why my blog cannot ignore machine learning issues.

What else? Programming language is a tool, but it can be interesting per se. I am about to finish a 5-year university programme in Computer Science, so I have been being taught different language concepts and programming paradigms, and I find it interesting sometimes. That is another possible topic.

#### Finally,

Subscribe me, read me and comment me. Everybody is welcome! Let it roll!