Gradient Descent

Gradient Descent
Regression
Author

Francois de Ryckel

Published

April 26, 2024

Modified

April 30, 2024

Gradient Descent is an optimization technique used in many Machine Learning algorithms to find the minimum of a function. It does require a convex and differentiable function to ensure we have a minimum. At its core it’s like searching for the lowest valley in a hilly landscape. The idea is to take small steps in the steepest downhill direction until you reach the lowest point.

Gradient Descent is used to find the parameters of the cost function. Think the parameters in the linear regression for instance.

Basic Gradient Descent

One of the main disadvantage of gradient descent is getting stuck to a local minimum or a saddle point and not finding the global minimum.

To perform a gradient descent, we need

  • a function
  • its derivative
  • a starting point (where do we start going down)
  • the size of a step (aka learning rate)
  • the number of iterations (aka how many steps do we intend to take)
  • optionally, we can set a threshold for when we stop the iterative process of going down the hill.

Let’s take a quadratic function to start.

\[f(x) = 0.91 x^2 + 11x - 7\]

import numpy as np
import matplotlib.pyplot as plt

# the original function 
def cost_function(x): 
  return(3.91*x**2 + 5*x - 59)

# the derivative of our function 
def gradient(x): 
  return(3.91*x+5)

# checking our function
cost_function(4.5)
42.67750000000001
# quick visual check 
x = np.linspace(-10, 10, 1000)
y = cost_function(x)

plt.clf()
plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('cost_function(x)')
plt.title('Plot of the cost_function(x) vs x')
plt.show()

library(dplyr)
library(purrr)
library(ggplot2)

# the original function 
cost_function <- function(x){
  return(3.91*x^2 + 5*x - 59)
}

gradient <- function(x) {
  return(3.91*x+5)
}

# checking our function
cost_function(4.5)
[1] 42.6775
# quick visual check 
df <- tibble(x = seq(from = -10, to = 10, length.out = 1000)) |> 
  mutate(y = map_dbl(x, cost_function))

ggplot(df, aes(x, y)) + 
  geom_line(color = 'blue') + 
  #xlab('x') + ylab('cost_function(x)') + 
  labs(x = 'x',  y = 'cost_function(x)', 
       title = 'Plot of the cost_function(x) vs x') 

We can now put everything together and define our gradient descent function.

\[x_{n+1} = x_n - {Gradient} \space \cdot \space {Learning \space Rate}\]

def gradient_descent(f, deriv, start, learning_rate, n_iter): 
  x = start
  for i in range(n_iter): 
    grad = gradient(x)
    
    # we now update x
    x = x - learning_rate * grad 
    
  print(f"Minimum value of x: {x:.2f}")
  print(f"Minimum value for our Cost function: {cost_function(x):.3f}")

gradient_descent(cost_function, gradient, 15, 0.01, 10000)
Minimum value of x: -1.28
Minimum value for our Cost function: -59.000

We could change slightly our code to store the iterations for visualization or analysis.

def gradient_descent(f, deriv, start, learning_rate, n_iter): 
  x = start
  # initialize a list to store values 
  cost_values = []
  
  for i in range(n_iter): 
    cost = cost_function(x)
    grad = gradient(x)
    # update of x
    x = x - learning_rate * grad
    # append the value of cost to the list
    cost_values.append(cost)
    # print the progress
    if i % 10 == 0  and i < 200: 
      print(f"Iteration {i}: x = {x:.4f}, cost = {cost:.4f}")
  
gradient_descent(cost_function, gradient, 
                 start = np.random.randn(), learning_rate = 0.01, 
                 n_iter = 1000)
Iteration 0: x = 1.2439, cost = -45.1770
Iteration 10: x = 0.4142, cost = -55.6721
Iteration 20: x = -0.1426, cost = -59.4457
Iteration 30: x = -0.5163, cost = -60.5056
Iteration 40: x = -0.7671, cost = -60.5538
Iteration 50: x = -0.9354, cost = -60.2875
Iteration 60: x = -1.0483, cost = -59.9742
Iteration 70: x = -1.1241, cost = -59.7034
Iteration 80: x = -1.1750, cost = -59.4944
Iteration 90: x = -1.2091, cost = -59.3419
Iteration 100: x = -1.2320, cost = -59.2340
Iteration 110: x = -1.2474, cost = -59.1590
Iteration 120: x = -1.2577, cost = -59.1077
Iteration 130: x = -1.2646, cost = -59.0727
Iteration 140: x = -1.2693, cost = -59.0489
Iteration 150: x = -1.2724, cost = -59.0329
Iteration 160: x = -1.2745, cost = -59.0221
Iteration 170: x = -1.2759, cost = -59.0149
Iteration 180: x = -1.2768, cost = -59.0100
Iteration 190: x = -1.2775, cost = -59.0067

We could visualize how the process happen. We’ll return the cost_values list for that to our function.

# the original function 
def cost_function(x): 
  return(0.91*x**2 + 5*x - 59)

# the derivative of our function 
def gradient(x): 
  return(0.91*x+5)

def gradient_descent(f, deriv, start, learning_rate, n_iter): 
  x = start
  cost_values = []
  x_list = [start]
  
  for i in range(n_iter): 
    cost = cost_function(x)
    grad = gradient(x)
    x = x - learning_rate * grad
    cost_values.append(cost)
    x_list.append(x)
    print("Iteration {}: x = {}, cost = {}".format(i, x, cost))
      
  return(x_list)

x_list = gradient_descent(cost_function, gradient, 
                          start = 7, learning_rate = 0.3, 
                          n_iter = 5)
Iteration 0: x = 3.589, cost = 20.590000000000003
Iteration 1: x = 1.109203, cost = -29.33336189
Iteration 2: x = -0.693609419, cost = -52.33438352135981
Iteration 3: x = -2.004254047613, cost = -62.03025153122578
Iteration 4: x = -2.9570926926146512, cost = -65.36576903655549
x = np.linspace(-10, 10, 50)
y = cost_function(x)

plt.clf()
plt.plot(x, y)
for i in range(len(x_list) - 1):
    x1 = x_list[i]
    y1 = cost_function(x1)
    x2 = x_list[i + 1]
    y2 = cost_function(x2)
    plt.plot([x1, x2], [y1, y2], 'ro--')
    plt.text(x1 + 0.5, y1 - 2, round(y1, 2))
# Label the final cost value
x_final = x_list[-1]
y_final = cost_function(x_final)
plt.text(x_final, y_final - 5, round(y_final, 2))

plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Gradient Descent for f(x)')
plt.show()

Stochastic Gradient Descent