Should the target be standardized in gradient descent?

Ask Question

Asked 1 year, 4 months ago

Modified 1 year, 3 months ago

Viewed 138 times

Suppose that we have a general loss function that depends on some parameters $w$ (e.g. neural network weights): $$L_w =\frac{1}{N} \sum_i \ell(\hat{y}_i, y_i)$$

Is it beneficial to standardize the target in addition to features?

That is, should we prefer to optimize $L_w'$: $$L_w' =\frac{1}{N} \sum_i \ell\left(\hat{y}_i, \frac{y_i-\bar{y}}{\sigma} \right)$$

over $L_w$?

Related questions

In the accepted answer of this question, it is stated that:

Normalizing the output will not affect shape of $f$, so it's generally not necessary.

where $\hat{y} = f_w (x)$. However, during training we optimize the loss function and as such, the shape of $f$ is irrelevant.

edited Aug 4, 2024 at 22:47

asked Aug 2, 2024 at 23:09

Antonios Sarikas

8817 silver badges13 bronze badges

$\begingroup$ This will obviously depend on the loss function. Accordingly, this question makes little sense as stated. What specific loss function(s) do you care about? $\endgroup$

g g
– g g

2024-08-04 19:23:50 +00:00
Commented Aug 4, 2024 at 19:23
$\begingroup$ @g g I am more interested in the effect of standardizing for gradient descent. I will edit the question appropriately. $\endgroup$

Antonios Sarikas
– Antonios Sarikas

2024-08-04 22:42:34 +00:00
Commented Aug 4, 2024 at 22:42
1

$\begingroup$ Bear in mind your general loss function $\ell$ can absorb your intended output normalizing. And for consistency purpose if you normalize the ground truth output, then you also need to normalize the model predicted output in the same loss function. $\endgroup$

cinch
– cinch

2024-08-12 02:31:17 +00:00
Commented Aug 12, 2024 at 2:31
$\begingroup$ @cinch We usually normalize just the ground truth values as a preprocessing step and don't modify model predictions (during training). So, should we opt for this preprocessing step or not? This is what I am asking. For example, when doing regression with gradient descent and MSE (mean squarred error) should we trained with the normalized ground truth values? $\endgroup$

Antonios Sarikas
– Antonios Sarikas

2025-05-22 19:35:05 +00:00
Commented May 22 at 19:35

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Should the target be standardized in gradient descent?

0

Your Answer

Linked

Hot Network Questions

Should the target be standardized in gradient descent?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions