When fitting neural networks, I often run stochastic gradient descent multiple times and take the run with the lowest training loss. I'm trying to look up research literature on this practice, but I'm unaware what its called. Any terms, keywords, or references are appreciated.
The closest thing I have found is "Stochastic Gradient Descent with Restarts", but I don't believe its quite the same idea.