## What is Gradient Descent? What modifications exist?

Gradient Descent is an iterative optimization method. It tries to minimize some loss function $f(\theta)$. The definition is:

$\theta^{\prime} = \theta - \lambda \cdot \nabla_{\theta} f(\theta)$

Gradient Descent is an iterative method that works in the following steps:

• Start at some (random) point
• Calculate the gradient of the target function w.r.t. to the parameters $\theta$
• Move down the slope (using the gradient) with a step sizeof $\lambda$
• Continue until the algorithm reaches some optimum point

The choice of learning rate is crucial when using optimization algorithms. Let's look at an example:

• Low learning rate slows down the training process. You might not come close to an optimal value.
• Good learning rate allows for fast training and finds optimal values
• High learning rate allows for fast training, but it overshoots the optimal values