Profile Blog About

Art of overfitting

How to overfit a model? That's quite easy, as you'll see. In this notebook we'll look into several "overfitting techniques" that are often used in practice.

This notebook on Github

Setup

Simulating noise data

Let's simulate some data. Notice that targets and features are independent.

Model definitions

If we model it fair and square with a linear model, we get approx. 50% accuracy on both train and test data.

We can easily overfit a more flexible model on the train set - however, we'll still have approx. 50% on the test data.

Experiments

Fair & Square

Story: You design your experiment in advance and do an honest job: your check the results on test data only once, and report them. Even if they suck.

Results:

Hyperparameter Overtuning

(tuning on test data)

Story: You've finished your experiment and checked the results on test data. The results are underwhelming... but then you had a thought "But what if I tweaked that smoothing parameter just a little? I think that'll do the trick!" You repeat it multiple times with different sets of parameters and then choose the best model.

Results:

Refitting

(refitting a model again and again)

Story: You've finished your experiment and checked the results on test data. The results are underwhelming... but then you think "What if I was just unlucky and SGD happened to find a weird local minimum? Let's try again!" You refit your model multiple times until you find a fit that works well on the test data.

Results:

Post-selecting features

(selecting features based on the test set)

Story: You've finished your experiment and checked the results on test data. The results are underwhelming... but then your boss comes and says "Hey, I think your should definitely try to include these cool features. And you can remove these ones, they don't have any fundamental meaning. I think that'll do the trick!" You repeat it multiple times with different sets of features, and then choose the set of features with the best performance.

Results:

Post-selecting targets

Story: You've finished your experiment and checked the results on test data. The results are underwhelming... but then your colleague Jess says "Hey, why won't you check your model on all the other markets? Maybe it'll work on at least one of them. I think that'll do the trick!" You test the model on all available markets, and then choose the market with the best performance.

Results:

Conclusion

Size (of the sample) matters!

Question: Previous results were for 400 data samples. What would happend if you had less (more) data?

Answer: Basically, the more data you have, the more difficult it is to get over-optimistic even if your experiment is badly designed. The opposite is true as well.

Just a fun fact: You can notice that the test error convergence rate is $n^\frac{1}{2}$.

Some tools to spot overfitting