# Gradient Descent

![Source: O'Reilly Media](/files/-LvHmoeDavPzU-wiLw_a)

Gradient descent is an iterative optimization algorithm used in machine learning to minimize a [loss function](/wiki/accuracy-and-loss.md#loss).

The loss function describes how well the model will perform given the current set of parameters (weights and biases) and gradient descent is used to find the *best* set of parameters. This is achieved by taking the partial derivative at a given point and then iteratively traversing the search space in the negative direction of the function gradient. &#x20;

As the loss function improves, the parameters of a model (weights) are updated until it reaches the optimal point which is the **minima** of the loss function (the weights are updated in proportion to the derivative of the error). The two key aspects of Gradient descent are a) the direction to move and b) the size of the step (learning rate, discussed below).

![Gradient Descent in action](/files/-Lw6lSr3sxuY0gzJBEB9)

Gradient descent is used when the model parameters cannot be calculated using straightforward math (e.g., linear algebra) and must be searched for using an optimization algorithm.

There are several variants of gradient descent including **batch**, **stochastic**, and **mini-batch**. &#x20;

There are also several optimization algorithms including momentum, adagrad, nesterov accelerated gradient, RMSprop, adam, etc. Here is a [blog post](https://blog.paperspace.com/intro-to-optimization-momentum-rmsprop-adam/) that covers the differences between these algorithms.&#x20;

Gradient descent has a parameter called **learning rate** which represents the size of the steps taken as that network navigates the curve in search of the valley. If the learning rate is too high, the network may overshoot the minimum. If it's too low, the training will take too long and may never reach the minimum, or else get stuck in local minima.&#x20;

![Source: Rohith Gandhi / Towards Data Science](/files/-Lw6lH-WAYaxFcXq03Fq)

Check out the in-depth explanation of Gradient Descent in this [blog post](https://blog.paperspace.com/intro-to-optimization-in-deep-learning-gradient-descent/).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://machine-learning.paperspace.com/wiki/gradient-descent.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
