## Papers, citations, co-authorship, and genealogy

This summer I accidentally found out that there are two papers citing our CMRT 2009 paper. I was excited a bit about that since they were the first actual citations of me, so I even read those papers.

The first of them [Димашова, 2010] was published in Russian by OpenCV developers from Nizhny Novgorod. They implemented cascade-based face detection algorithm that used either Local Binary Patterns (LBP) or Haar features. The algorithm was released within OpenCV 2.0. They cite our paper as an example of using Random Forest on the stages of the cascade. However, they implemented the classical variation with the cascade over AdaBoost.

Another citation [Sikiric et al., 2010] is more relevant since it came from the road mapping community. They address the problem of recovering a road appearance mosaic from the orthogonal views to the surface. They contrast their approach with ours in the way that theirs do not employ human interaction. In fact, we need human input for recognizing road defects and lane marking, rectification is done in the previous stage, which is fully automatic.

The rest of the post is devoted to some interesting metrics and structures concerning citations, co-authorship and supervising students.

**Citations**

The number of citations is a weak measure of paper quality. We can also go further and estimate the impact of a journal or a particular researcher based on the number of citations. A generally accepted measure is the journal impact factor, which is simply the mean number of citations by paper published in the journal for some period in time. Individual researchers could be evaluated by the impact factor of the journals they published in, though it is considered as a bad practice. Another not-so-bad practise is h-index. By definition, one's h-index equals

*H*if there are at least*H*citations of his top*H*papers^{1}. It has also been criticised widely.So, citation-based scoring has a lot of flaws. But what can we use instead? Another interesting approach is introduced by ReaderMeter. They collect the information about the number of people who have added some paper to their Mendeley collections and compute something like h-index. Unfortunately, they recently excluded some papers from their database, so the statistics became less representative but more accurate.

**Co-authorship and Erdős number**

Paul Erdős was a Hungarian mathematician who published about 14 hundred research papers with 511 different co-authors. That's why he has a special role in bibliometrics. Erdős number is defined as a collaborative distance from a researcher to Erdős. More strictly, Wikipedia defines:

Paul Erdős is the one person having an Erdős number of zero. For any author other than Erdős, if the lowest Erdős number of all of his coauthors isk, then the author's Erdős number isk+ 1.

My Erdős number is at most 7 (via Olga Barinova, Pushmeet Kohli, Philip H.S. Torr, Bernhard Schölkopf, John Shawe Taylor, David Godsil). To be honest, Erdős number system is primarily used for math papers. Even if we use the wider definition, it is not that beautiful, because there are not too many gates, e.g. all the vision community will probably connect to Erdős via the machine learning gate. Thus, the researchers who work on the edge will be closer. To illustrate this fact, David Marr probably had not any finite Erdős number during his lifetime (now he definitely has). So, we can introduce, say, Zisserman number for computer vision.

^{2}According to DBLP, Andrew Zisserman has 165 direct co-authors so far. Now, my Zisserman number is 3, David Marr's is 4 (via Tomaso Poggio, Lior Wolf, Yonatan Wexler).The movie industry have their own metrics, which is the Bacon number (after Kevin Bacon). The distance is established 1 if two actors have appeared in a single movie. Someone tried to combine those two to the Erdős-Bacon number. For some person, it is just a sum of her Erdős and Bacon numbers. Of course, very few people have a finite Erdős-Bacon number, since one should both appear in a movie and publish a research paper (and it is still not sufficient). Often they are possessed by researchers who consulted the filming crew and accidentally were filmed. =) Erdős himself has this number 3 or 4 (depending on the details of definition), since he starred in N is a Number (1993) and his Bacon number is thus 3 0r 4.

The person with the probably lowest Erdős-Bacon number is Daniel Kleitman, an MIT mathematician, who has appeared in one of my favourite movies Good Will Hunting (1997) along with Minnie Driver, who collaborated with Bacon in Sleepers (1996). Since Kleitman has 6 joint papers with Erdős, his Erdős-Bacon number is 3.

Surprisingly, Marvin Minsky has his Erdős number (4) greater than his Bacon number (2), which he obtained via The Revenge of the Dead Indians (1993) and Yoko Ono. Another strange example is ~~the paedophile's dream~~ Natalie Portman. She has graduated from Harvard, saying she would rather "be smart than a movie star". That's my kind of a girl! A neuroscience paper [Baird et al., 2002] brought Natalie Hershlag (her real name) Erdős number of 5, and then she appeared in New York, I Love You (2009) along with Kevin Bacon, so she reached the same Erdős-Bacon number as Minsky, i.e. 6.

UPD (Mart 13, 2011). There is a totally relevant xkcd strip.

**Scientific genealogy**

Finally, I tell about what is known as scientific genealogy. Every grown-up researcher has a PhD advisor, usually one, while an advisor can have a lot of students. Let's just use the analogy with parents and children. We get a tree (or a forest) representing the historical structure of science. The mathematics genealogy project aims to recover this structure.

I tried to track my genealogy back. Unfortunately, I failed to find who was Yury M. Bayakovsky's PhD advisor. But if consider Olga Barinova my advisor, I am the 11th generation descendant of Carl Friedrich Gauß. Nice ancestry, huh?

The similar project exists for computer vision. There are 290 people in the base, though there are duplicates (I've found five Vitorio Ferrari's :). It seems strange that some trees have depth as big as 5 (e.g. Kristen Graumann and Adriana Quattoni are the 5th generation after David Marr), though vision is a relatively young field.

^{1}Okay, take the supremum of the set if you want to stay formal

^{2}It seems that Philip Torr has already used that number.

Yaroslavsays:2 November 2010 at 00:58

Perhaps related, University of Minnesota are now thinking about including "new forms of scholarship" (ie, blogs, websites, software) when considering academic worth of candidates

http://agtb.wordpress.com/2010/11/01/new-forms-of-scholarship/