Recently I read an intriguing research. It’s about the fact that if we always trust a classifier? I’m going to explain it, but first I need to explain a bit about how a classifier works.

How a Classifier Works?

Simply stating, a classifier is something that classifies. It can be a neural network classifier or a logistic regression classifier.

Now, let’s take the example of a dog vs. wolf classifier. You give your classifier a picture of a dog, it tells you that it’s a dog. You give it a picture of a wolf, it tells you that it’s a wolf. The classifier has to be trained with a lot of pictures of dog and wolf before it learns to classify. If it can classify at a good accuracy, we say that our model is great.


The Experiment

In order to evaluate the impact of data used to train a classifier, a group of scientists from The University of Washington performed an experiment. They trained a classifier (logistic regression) in order to distinguish between wolves and dogs. However, during training, they did something really interesting.

All the pictures of wolf they used to train the model had snow on the background, but the pictures of dog did not have any snow.

Training is done. Now, comes testing.

What will happen if we give the classifier a picture of a dog on snow? What will be its prediction?

Its prediction is — it’s a wolf!

Well, maybe, it was just a bad call. So, Give the network another picture of a dog with snow on the background.

Again, its prediction is — it’s a wolf!

What Just Happened?

The problem is, even though we knew that we are trying to create a classifier that can distinguish between dogs and wolves, the network doesn’t. During training, it learns that all the images of wolf have snow but the images of dogs don’t. Hence it picks snow as a feature of the wolf, regardless of animal color, position, pose, etc.

That means, it is differentiating between wolves and dogs based on the presence or absence of snow.

Ultimately, what happened is, even though we wanted to build a dog vs. wolf classifier, ended building a snow vs. no snow classifier.

This raises a potential question of — Why should I trust you? Why should you even trust a model which doesn’t even know which features to pick?

How to Avoid Such Situations?

At the beginning it was almost impossible to tell exactly for which particular features we are classifying a dog as a dog or a cat as a cat. That means, the classifier which is classifying a dog as a dog or a cat as a cat acts as some sort of a black box.

In 2018, a group of scientists from MIT invented something called Class activation map or CAM in short. What it does is, it indicates the region which the classifier is using to predict something using a heatmap.

For example, in the image below, a classifier is given an image of an X-ray in order to detect if the X-ray sample contains pneumonia or not. The network classifies the sample as Pneumonia positive and indicates the region for which it’s classifying it as such.

Have a nice day. :)