Gradient descent has a parameter called learning rate which represents the size of the steps taken as that network navigates the curve in search of the valley. If the learning rate is too high, the network may overshoot the minimum. If it's too low, the training will take too long and may never reach the minimum, or else get stuck in local minima.