Gradient Descent is an iterative optimization method. It tries to minimize some loss function . The definition is:
Gradient Descent is an iterative method that works in the following steps:
The choice of learning rate is crucial when using optimization algorithms. Let's look at an example:
Gradient Descent works with the whole dataset. In practice, you can't hold a large dataset in memory (unless you are someone with too much compute).
Stochastic Gradient Descent (SGD) solves this problem by choosing a small sample from the data (minibatch) and apply the Gradient Descent algorithm.
Unfortunately, more tricks are needed to get good convergence. A wide array of optimizers are available to change the learning rate while training and overcome saddle points.