Profile Blog About

Art of overfitting

How to overfit a model? That's quite easy, as you'll see. In this notebook we'll look into several "overfitting techniques" that are often used in practice.

This notebook on Github

Setup

Simulating noise data

Let's simulate some data. Notice that targets and features are independent.

Model definitions

If we model it fair and square with a linear model, we get approx. 50% accuracy on both train and test data.

We can easily overfit a more flexible model on the train set - however, we'll still have approx. 50% on the test data.

Experiments

Fair & Square

Story: You design your experiment in advance and do an honest job: your check the results on test data only once, and report them. Even if they suck.

Results:

Hyperparameter Overtuning

(tuning on test data)

Story: You've finished your experiment and checked the results on test data. The results are underwhelming... but then you had a thought "But what if I tweaked that smoothing parameter just a little? I think that'll do the trick!" You repeat it multiple times with different sets of parameters and then choose the best model.

Results:

Refitting

(refitting a model again and again)

Story: You've finished your experiment and checked the results on test data. The results are underwhelming... but then you think "What if I was just unlucky and SGD happened to find a weird local minimum? Let's try again!" You refit your model multiple times until you find a fit that works well on the test data.

Results:

Post-selecting features

(selecting features based on the test set)

Story: You've finished your experiment and checked the results on test data. The results are underwhelming... but then your boss comes and says "Hey, I think your should definitely try to include these cool features. And you can remove these ones, they don't have any fundamental meaning. I think that'll do the trick!" You repeat it multiple times with different sets of features, and then choose the set of features with the best performance.

Results: