When do Adaptive Optimization Algorithms modify their parameters?

Question

When do "Ada" optimizers (e.g. Adagrad, Adam, etc...) "adapt" their parameters? Is it at the end of each mini-batch or epoch?

Arya McCarthy · Accepted Answer · 2021-06-03 17:34:57Z

2

They update their parameters after each mini-batch. (I use this term to avoid confusion with “batch gradient descent”; most neural network libraries talk about a ‘batch size’ when they mean ‘minibatch size’.)

To help remember this: The optimizer has no notion of an ‘epoch’. For instance, for stochastic gradient descent, you could sample randomly from the dataset at every time step (rather than the common shuffle-and-iterate strategy) and it’ll still work. (It’s your job to define the training curriculum, not the optimizer’s.)

In that case, there’s no clearly defined ‘epoch’. Everything is in terms of which mini-batch the optimizer processes.

edited Jun 3, 2021 at 17:34

answered Jun 3, 2021 at 17:29

Arya McCarthy

9,2891 gold badge25 silver badges61 bronze badges

$\begingroup$ Yes, I meant mini-batch, editing question now. It makes sense, in documentation I sometimes read "after each gradient update" and that happens after processing a mini-batch. $\endgroup$

Marsellus Wallace
– Marsellus Wallace

2021-06-03 17:43:09 +00:00
Commented Jun 3, 2021 at 17:43

Add a comment |

Stack Exchange Network

When do Adaptive Optimization Algorithms modify their parameters?

1 Answer 1

Your Answer

Hot Network Questions

When do Adaptive Optimization Algorithms modify their parameters?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions