Gradient Descent is an iterative method that works in the following steps:

Start at some (random) point

Calculate the gradient of the target function w.r.t. to the parameters $\theta$

Move down the slope (using the gradient) with a step sizeof $\lambda$

Continue until the algorithm reaches some optimum point

Learning rate

The choice of learning rate is crucial when using optimization algorithms. Let's look at an example:

Low learning rate slows down the training process. You might not come close to an optimal value.

Good learning rate allows for fast training and finds optimal values

High learning rate allows for fast training, but it overshoots the optimal values

Stochastic Gradient Descent

Gradient Descent works with the whole dataset. In practice, you can't hold a large dataset in memory (unless you are someone with too much compute).

Stochastic Gradient Descent (SGD) solves this problem by choosing a small sample from the data (minibatch) and apply the Gradient Descent algorithm.

Unfortunately, more tricks are needed to get good convergence. A wide array of optimizers are available to change the learning rate while training and overcome saddle points.