Testing machine learning algorithms

Tests are the only way to estimate the quality of a machine learning algorithm in practice. In order to prove your algorithm is usable, you need to design good tests. To design test you should collect the data and split them into the train set and the test set.


Machine learning theorists say that the train and the test sets should come from the single probability distribution (unless they are talking about transfer learning or some kinds of on-line learning). But in practice it is a bit more complicated. We are currently working on the problem of laser scan points classification. It is not a trivial task to design tests! We have scans from different domains (aerial scans, scans from moving vehicles, stationary terrestrial scans), and for each domain we would like to have kind of universal benchmark. It means that a wide range of algorithms are supposed to be tested with the test, so the test may not stimulate overfitting.

So, how can we split the data? To satisfy the claim of the single distribution, we can add the odd points from the cloud to the train set and even points to the test set. This is a bad idea. Suppose your classifier use 3D coordinates of a point as the features. For each point in the test set, we have a similar point in the train set. Therefore we get nearly 100% precision using such a primitive learner. Such benchmark is not enough challenging.

Well, let's split every scan into a few pieces then. If we compose the test set from different subscans, does it solve the problem? Not at all. For example, we have a number of aerial scans. The scans can be retrieved from different heights, different scanners, in different weather. So, if we add the pieces of a single scan both to the test set and to the train set, we will get a non-challenging test again. The rule is: the pieces of a single scan may not persist both in the test set and the train set, if we want to train the classifier once for the whole domain. Do the test set and the train set come from the single distribution? No! But we need to neglect the theory in favour of practice.

One could say that it is reasonable to use cross-validation here. Well, it makes a sense. According to Wikipedia, there are three types of cross-validation:
  • Repeated random sub-sampling validation
  • K-fold cross-validation
  • Leave-one-out cross-validation
According to the rule, only k-fold x-validation can be used, and each fold should contain points from its own scans. But it is very laborious to label scans. It takes more than 20 hours to label a standard million-points scan. So, we cannot have a lot of scans labelled.

This is not the only problem with testing point classification. Since the single point does not tell us anything, we should consider some neighbourhood of it, and approximate it with a surface. For every point in both sets there should be some neighbourhood. The problem is solved too you put the whole scan to the set.

Read Users' Comments (5)

5 Response to "Testing machine learning algorithms"

  1. hr0nix says:
    12 January 2010 at 12:29

    Can you state your problem more precisely? Do you need to classify points (what classes do you have, btw?) from the same type of scan (aerial, from vehicle etc) using one single classifier trained using points from some distinct scans (which can be performed using different scanners)?

    Btw, as far as I know, one the best ways to do CV is to combine k-fold cv with random subsampling in a 5-times 2-fold cross validation process (http://web.engr.oregonstate.edu/~tgd/publications/nc-stats.ps.gz)

  2. Roman V. Shapovalov says:
    13 January 2010 at 02:52

    Actually, I cannot. =) There are several possible statements, the most interesting is, as you put it, to train single classifier on different scans retrieved in different places/conditions. But I'm not still sure it is possible to do it effectively. Probably, we should have some general model, which is supposed to be specified for the particular test in some way (transfer learning could be useful here).

    Classes are usually: ground, tree (forest), building; sometimes: car, wire, pole, low vegetation, fence etc. It also depends on the type of a scan.

    As for random subsampling cross-validation, I'm afraid, it is not the option. First, there is a big probability that for the point in the test set there is a neighbouring point in the train set (violates "the rule"). Second, we should consider groups of points (or even the whole scans) atomic, since we need to approximate the scan with a surface (either implicitly or explicitly); it is good for MRF/CRF too.

  3. Jones Morris says:
    7 December 2016 at 20:18

    Its a great pleasure reading your post.Its full of information I am looking for and I love to post a comment that "The content of your post is awesome" Great work. Testing Machine

  4. HariS Rajput says:
    22 February 2017 at 23:17

    Yes i am totally agreed with article and i want say that this article is very nice and very informative article.I will make sure to be reading your blog more. You made a good point but I can't help but wonder, what about the other side? !!!!!!THANKS!!!!!!
    Electronic Crockmeter

  5. Testing Indonesia says:
    4 April 2017 at 05:46

    Thank you, this article is very useful for me

    Apa itu Universal Testing Machine?

Post a Comment