Normalising data for simple linear regression

Question

Consider a simple linear regression problem where:

X = [1,2,3,4,5,100,200]
Y=  [2,4,6,8,10,200,400]

Clearly, the relationship is of the form $y=2x$; While trying to solve this using gradient descent based method using MSE loss, it never converges and gives a $W$ (slope of the line) that is too different from the actual value of $2$.

At the same time, my solution works perfectly when the $X$ are small evenly spaced values like $X = [1,2,3,4,5,6]$. But the solution does not work for large values of $X$ like $X = [100,200,300,400]$ or unevenly spaced $X$ like $X = [1,2,3,4,100,200]$

import numpy as np
X = np.array ([1,2,3,4,5,100,200])
Y= X*2
W = np.array ([0.0]) # initialize the weight to be 0

def forward (X, W):

  return W*X

def backward (Y_predicted, X, Y):
  dW = np.matmul (X.T, Y_predicted - Y ).mean()
  return dW

lr = 0.01
n_epochs = 15
for epoch in range (n_epochs):
  prediction = forward (X,W)
  dW = backward (prediction, X, Y)
  W = W - lr*dW
  print (W)

Oxbowerce · Accepted Answer · 2022-12-13 10:21:10Z

1

Gradient descent works perfectly fine when your features have large values, but you might have to use a smaller learning rate. In this case, simply decreasing the learning rate to 0.00001 allows the model to converge to the correct result of W=2. Generally input features are normalized (e.g. to a range of 0-1) to help the model converge more easily.

answered Dec 13, 2022 at 10:21

Oxbowerce

8,7372 gold badges10 silver badges26 bronze badges

$\begingroup$ Yes a learning rate of 0.00001 helps, but it took nearly 700 epochs to converge. And if the X is changed to say X = np.array ([100,200,300,400,500,600]), even the learning rate of 0.00001 wont help; the weight W becomes nan within 60 epochs. I believe the trick is to standardise data at all times. $\endgroup$

butwhy
– butwhy

2022-12-14 16:36:57 +00:00
Commented Dec 14, 2022 at 16:36
$\begingroup$ I only needed less than 15 epochs to get the model to converge to a weight of close to 2. It would indeed be easiest to standardize your data, or use something like gradient clipping to prevent the gradients to explode. Clipping the gradients allows me to get the model to converge using a learning rate of 0.01 for both input arrays. $\endgroup$

Oxbowerce
– Oxbowerce

2022-12-14 17:04:33 +00:00
Commented Dec 14, 2022 at 17:04

Add a comment |

Stack Exchange Network

Normalising data for simple linear regression

1 Answer 1

Your Answer

Hot Network Questions

Normalising data for simple linear regression

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions