import numpy as np
import matplotlib.pyplot as plt
# the original function
def cost_function(x):
return(3.91*x**2 + 5*x - 59)
# the derivative of our function
def gradient(x):
return(3.91*x+5)
4.5)
cost_function(
# quick visual check
= np.linspace(-10, 10, 1000)
x = cost_function(x)
y
plt.clf()
plt.plot(x, y)'x')
plt.xlabel('cost_function(x)')
plt.ylabel('Plot of the cost_function(x) vs x')
plt.title( plt.show()
Gradient Descent is an optimization technique used in many Machine Learning algorithms to find the minimum of a function. It does require a convex and differentiable function to ensure we have a minimum. At its core it’s like searching for the lowest valley in a hilly landscape. The idea is to take small steps in the steepest downhill direction until you reach the lowest point.
Gradient Descent is used to find the parameters of the cost function. Think the parameters in the linear regression for instance.
Basic Gradient Descent
One of the main disadvantage of gradient descent is getting stuck to a local minimum or a saddle point and not finding the global minimum.
To perform a gradient descent, we need
- a function
- its derivative
- a starting point (where do we start going down)
- the size of steps (aka learning rate)
- the number of iterations
- optionally, we can set a threshold for when we stop the iterative process of going down the hill.
\[f(x) = 0.91 X^2 + 11x - 7\]
We can now put everything together and define our gradient descent function.
\[x_{n+1} = x_n - {Gradient} \space \cdot \space {Learning \space Rate}\]
def gradient_descent(f, deriv, start, learning_rate, n_iter):
= start
x for i in range(n_iter):
= gradient(x)
grad
# we now update x
= x - learning_rate * grad
x
print(f"Minimum value of x: {x:.2f}")
print(f"Minimum value for our Cost function: {cost_function(x):.3f}")
15, 0.01, 10000) gradient_descent(cost_function, gradient,
Minimum value of x: -1.28
Minimum value for our Cost function: -59.000
We could change slightly our code to store the iterations for visualization or analysis.
def gradient_descent(f, deriv, start, learning_rate, n_iter):
= start
x # initialize a list to store values
= []
cost_values
for i in range(n_iter):
= cost_function(x)
cost = gradient(x)
grad # update of x
= x - learning_rate * grad
x # append the value of cost to the list
cost_values.append(cost)# print the progress
if i % 10 == 0 and i < 200:
print(f"Iteration {i}: x = {x:.4f}, cost = {cost:.4f}")
gradient_descent(cost_function, gradient, = np.random.randn(), learning_rate = 0.01,
start = 1000) n_iter
Iteration 0: x = -0.1305, cost = -59.3916
Iteration 10: x = -0.5082, cost = -60.4952
Iteration 20: x = -0.7616, cost = -60.5584
Iteration 30: x = -0.9317, cost = -60.2958
Iteration 40: x = -1.0459, cost = -59.9822
Iteration 50: x = -1.1225, cost = -59.7098
Iteration 60: x = -1.1739, cost = -59.4992
Iteration 70: x = -1.2084, cost = -59.3453
Iteration 80: x = -1.2315, cost = -59.2364
Iteration 90: x = -1.2471, cost = -59.1607
Iteration 100: x = -1.2575, cost = -59.1088
Iteration 110: x = -1.2645, cost = -59.0734
Iteration 120: x = -1.2692, cost = -59.0495
Iteration 130: x = -1.2723, cost = -59.0333
Iteration 140: x = -1.2745, cost = -59.0224
Iteration 150: x = -1.2759, cost = -59.0150
Iteration 160: x = -1.2768, cost = -59.0101
Iteration 170: x = -1.2775, cost = -59.0068
Iteration 180: x = -1.2779, cost = -59.0046
Iteration 190: x = -1.2782, cost = -59.0031
We could visualize how the process happen. We’ll return the cost_values list for that to our function.
# the original function
def cost_function(x):
return(0.91*x**2 + 5*x - 59)
# the derivative of our function
def gradient(x):
return(0.91*x+5)
def gradient_descent(f, deriv, start, learning_rate, n_iter):
= start
x = []
cost_values = [start]
x_list
for i in range(n_iter):
= cost_function(x)
cost = gradient(x)
grad = x - learning_rate * grad
x
cost_values.append(cost)
x_list.append(x)print("Iteration {}: x = {}, cost = {}".format(i, x, cost))
return(x_list)
= gradient_descent(cost_function, gradient,
x_list = 7, learning_rate = 0.3,
start = 5)
n_iter
= np.linspace(-10, 10, 50)
x = cost_function(x)
y
plt.clf()
plt.plot(x, y)for i in range(len(x_list) - 1):
= x_list[i]
x1 = cost_function(x1)
y1 = x_list[i + 1]
x2 = cost_function(x2)
y2 'ro--')
plt.plot([x1, x2], [y1, y2], + 0.5, y1 - 2, round(y1, 2))
plt.text(x1 # Label the final cost value
= x_list[-1]
x_final = cost_function(x_final)
y_final - 5, round(y_final, 2))
plt.text(x_final, y_final
'x')
plt.xlabel('f(x)')
plt.ylabel('Gradient Descent for f(x)')
plt.title( plt.show()
Iteration 0: x = 3.589, cost = 20.590000000000003
Iteration 1: x = 1.109203, cost = -29.33336189
Iteration 2: x = -0.693609419, cost = -52.33438352135981
Iteration 3: x = -2.004254047613, cost = -62.03025153122578
Iteration 4: x = -2.9570926926146512, cost = -65.36576903655549