Suppose that we have a general loss function that depends on some parameters $w$ (e.g. neural network weights): $$L_w =\frac{1}{N} \sum_i \ell(\hat{y}_i, y_i)$$
Is it beneficial to standardize the target in addition to features?
That is, should we prefer to optimize $L_w'$: $$L_w' =\frac{1}{N} \sum_i \ell\left(\hat{y}_i, \frac{y_i-\bar{y}}{\sigma} \right)$$
over $L_w$?
Related questions
In the accepted answer of this question, it is stated that:
Normalizing the output will not affect shape of $f$, so it's generally not necessary.
where $\hat{y} = f_w (x)$. However, during training we optimize the loss function and as such, the shape of $f$ is irrelevant.