Assume you have training data $(x_1,y_1), \ldots, (x_n,y_n)$ and a relationship $y_i=f(x_i)+\epsilon_i$, where $\epsilon$ is a random variable. Assume you approximate $f$ with $\hat{f}$ using the training data. Then for a new point(independent of the training data) $x_0$ we get the bias variance tradeoff formula:
$E((f(x_0)+\epsilon_0-\hat{f}(x_0))^2)=Var(\epsilon_o)+(f(x_0)-E(\hat{f}(x_0))^2+E((\hat{f}(x_0)-E(\hat{f}(x_0)^2)$.
The second term is called bias, and the third is called variance in the bias-variance tradeoff formula.
I have seen a source call underfitting for high bias, and overfitting for high variance. And I am wondering if these things are the same, or just related. Let us assume that underfitting is that the model is not able to capture the training data very well, and overfitting is that we use a very complex model so that we fit the training data very well, but not other data that is not part of the training data. Will then these things be the same?
To clarify, I have four questions:
- If we overfit, will we have high variance? This I think to be true.
- If we have high variance, will we have overfitting?
- If we have underfitting, will we have high bias?
- If we have high bias, will we have underfitting? This I think will be true.