MLExpert logo

MLExpert

What is the Bias-Variance tradeoff?

To understand the tradeoff, we first need to recall what bias and variance are:

Bias

The bias dictates how easy it for a model is to learn complex decision functions. That is, make predictions on "tricky" data.

  • Low bias - the model is sufficiently complex to not make oversimplification assumptions. Some of the models that have low bias are - Neural Networks, Random Forests, Support Vector Machines

  • High bias - the model has some assumptions that need to be true to make good predictions. High bias models don't learn from the training data and have high error on training and test data (underfitting). Some models with high bias are: Linear Regression and Logistic Regression

Variance

Variance is the error caused by small fluctuations in the training data.

  • Low variance - your model isn't largely affected by small changes in the training data. Some models with low variance are: Linear Regression and Logistic Regression
  • High variance - your model learned the training data a bit too well. It can't generalize on unseen data (overfitting). Some models with high variance are: Neural Networks, Random Forests, Support Vector Machines

The tradeoff

The tradeoff is that by manipulating one of the components, you'll affect the other. Pick the Linear Regression model to have low variance results in high bias. But you can use these concepts to create a practical solution that works.

How to tackle the problem in practice

  • Choose an algorithm that is flexible enough to model the problem. This minimizes the bias.
  • Apply regularization to minimize the variance.

References

Copyright © 2021 MLExpert by Venelin Valkov. All rights reserved.