Gradient descent is one of the basic algorithm in machine learning, to master machine learning and deep learning implementing gradient descent is must. In this post, we write a simple code to implement Linear Regression using gradient descent.
Gradient Descent
An intuitive way of explaining Gradient Descent is to imagine a mountain and the path of for snowboarding in the mountain. The goal of the snowboarding is to reach the bottom it is exactly the same what gradient descent strives to achieve.
But few hills can have more than on lower points this means that they can have multiple local minima but single global minima.
Suppose we were told that you need to get to the lowest point possible. Between every two hills is a low point, not necessarily the lowest point. Once you started our gradient descent it doesnot have any power to take you up again to the second hill. For this we have to completely start a new descent down the second hill.
In this we will implement Linear regression using gradient descent. The main idea behind is that, In each iteration we will calculate the error and according to that we will try to find the new value yintercept (b) and slope(m) that classify our data better than before.
To learn more about gradient descent

Gradient Descent (notebook code )  Gradient Descent For Machine Learning
 Quora: Gradient Descent
 An Introduction to Gradient Descent and Linear Regression
Implementation of Gradient Descent
Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost). So lets implement the idea what we have discussed above. We will write Gradient Descent manually without using any external machine learning libraries to understand the intuition behind it. Since, Gradient Descent is consider as the base algorithm for Machine Learning.
Import Libraries
We only use numpy to use some math function in our code and matplotlib for plotting the data.
from numpy import *
import matplotlib.pyplot as plt
Load the dataset
Data is present in the csv file data.csv. After reading the data from the file we will save in the variable call points so we can use it afterwards.
points = genfromtxt("input/data.csv", delimiter=",")
print (points[0:10,:])
print('Shape: {0} '.format(points.shape))
Initialize hyperparameters
Next, we will initialize all the hyperparameters with the initial value so we can start the algorithm.
b = 0
m = 0
learning_rate = 0.0001 #alpha
num_iterations = 1000 #number of iteration
Error function
We compute mean square error by calculating difference between original output y and predicted output y^.
def ComputerError(b,m,points):
totalError = 0
for i in range(len(points)):
x = points[i, 0]
y = points[i, 1]
totalError += (y  (m * x + b)) ** 2
return totalError / float(len(points))
Gradient Descent
This methods is used to updated the yintercept and slope by computing the partial derivative with respect to yintercept and slope. For this we use the below equations.
def GetGradientDescent(b_current, m_current, points,learning_rate):
b_gradient = 0
m_gradient = 0
N = float(len(points))
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
b_gradient += (2/N) * (y  ((m_current * x) + b_current))
m_gradient += (2/N) * x * (y  ((m_current * x) + b_current))
new_b = b_current  (learning_rate * b_gradient)
new_m = m_current  (learning_rate * m_gradient)
return [new_b, new_m]
After writing a method to computer new Slope and Yintercept we run the Gradient descent algorithm for prediction of the line.
def RunAlgorithm(starting_b, starting_m, learning_rate, num_iterations, points):
b = starting_b
m = starting_m
for i in range(num_iterations):
b, m = GetGradientDescent(b, m, array(points), learning_rate)
error = ComputerError(b, m, points)
if error < float64(2.0):
print(error)
return [b, m]
if i % 100 == 0:
print ("After {0} iterations b = {1}, m = {2}, error = {3}".format(i, b, m, ComputerError(b, m, points)))
return [b, m]
Results
Below is the snapshots of gradient descent running after 1000 iterations for our example problem. We start out at point m = 0 and b = 0. Each iteration m and b are updated to values that yield slightly lower error than the previous iteration. The plot displays the current location of the gradient descent search (red dot) and the path taken to get there (blue line). Eventually we ended up with a pretty accurate fit.
Tips for Gradient Descent
This section lists some tips and tricks for getting the most out of the gradient descent algorithm for machine learning.
 Plot Cost versus Time: Collect and plot the cost values calculated by the algorithm each iteration. The expectation for a well performing gradient descent run is a decrease in cost each iteration. If it does not decrease, try reducing your learning rate.
 Learning Rate: The learning rate value is a small real value such as 0.1, 0.001 or 0.0001. Try different values for your problem and see which works best. Rescale Inputs: The algorithm will reach the minimum cost faster if the shape of the cost function is not skewed and distorted. You can achieved this by rescaling all of the input variables (X) to the same range, such as [0, 1] or [1, 1].
 Few Passes: Stochastic gradient descent often does not need more than 1to10 passes through the training dataset to converge on good or good enough coefficients.
 Plot Mean Cost: The updates for each training dataset instance can result in a noisy plot of cost over time when using stochastic gradient descent. Taking the average over 10, 100, or 1000 updates can give you a better idea of the learning trend for the algorithm.
Code for this can be found here