Linear Regression in Machine Learning Explained

What is Linear Regression?

Linear regression is a statistical method that allows us to summarize and study the relationships between quantitative variables. Essentially, linear regression helps us model how changes in one or more inputs vary the output. The output is usually a continuous variable, such as time, price and height. It is very common to find linear regression in machine learning. This article explains the difference between the statistics and the machine learning notations.

Why is Linear Regression Important?

With linear regression we estimate the relationship between an output (dependent variable) and one or more inputs (independent variables) in order to help us make predictions. For example, a doctor might be interested in predicting whether a cancer will recur in a patient, and how long it will take until the disease recurs, given the characteristics for the type of tumor.


Let’s have a look at the jargon related to linear regression :

This figure represents an example of the linear regression equation in machine leaning.
This figure represents an example of the linear regression equation.
  1. Predictors or Independent Variables or Inputs
  2. Response or Dependent Variable or Output
  3. Residual – The difference between the observed (real) value and the predicted value of the dependent variable.
  4. Coefficients or Weights – These are calculated to determine the line that best “fits” the data. In stats, usually denoted as β. In ML these are usually denoted as W or θ.
  5. Bias or Intercept – It helps offset the effects of missing relevant predictors for the response and helps make the mean of the residuals 0. The intercept or bias acts as the default value for the function i.e. when all independent values are zero. 

Let’s Start With A Simple Model

The term “linear” refers to a line, in this case a straight line. Intuitively, we know that a line indicates there is a relationship between inputs and outputs. The slope of the line provides a rate for the change on the output given the input.

To help us understand better how Linear Regression works, let’s try modeling the relationship between a single predictor and a response i.e. Simple Linear Regression. We’ll look at the relationship between tumor radius (predictor) and time to recurrence (response).

Linear regression equation showing the relationship between the tumor radius and the cancer time to recurrence. The bigger the tumor radius, the faster cancer recurrence.
Linear regression equation showing the relationship between the tumor radius and the cancer time to recurrence. The bigger the tumor radius, the faster cancer recurrence.

Y = W0 + W1・X1 + ε

The above equation looks very similar to an equation of a line (y = c + m・x) where

W0 is equivalent to the y-intercept (c) and W1 is equivalent to the slope of the line (m) modeling the relationship. The machine learning model would be:

hw(x) = W0 + W1・x1

The function hw(x) will predict the time it takes for a cancer to return based on the tumor radius X1 with weight W1.

Optimizing the Best Fit Line

The goal of applying Linear Regression in Machine Leaning is to select the line of best fit i.e. a line that is as close as possible to the data points. The line of best fit is found by minimizing a cost function like the Mean Squared Error (MSE) function.

Mean Squared Error Explained

Figure shows the mean squared error (MSE) graph.
Figure shows the mean squared error (MSE) graph.

The mean squared error (MSE) is calculated using the following equation:

where N is the number of data points (training samples). Thus, MSE can be thought of as the mean of the summation of all the residuals squared. 

Cost/Error Function for Machine Learning

In Machine Learning, we use a Cost Function that is very similar to the MSE and is denoted as:

Essentially, we want to find the optimal W to minimize the error function J(W). The number of examples in the training set is “m”. We subtract Yi, the actual value from hw(xi), the predicted value based on our model and square this number. We add all of these numbers together and take the 1/2 just to simplify the derivations that will follow with gradient descent.

Video snapshot explaining the Least Mean Squares (LMS) function taken from the "Lean AI With an AI" course. Module 1, Lecture 4.
Video snapshot explaining the Least Mean Squares (LMS) function taken from the “Lean AI With an AI” course. Module 1, Lecture 4.

This minimization of the cost function occurs through an algorithm known as the gradient descent algorithm. The gradient descent algorithm can be understood imagining you are at the top of a hill and you need to reach the bottom of the hill. Sounds easy! But what if you are blindfolded. You will most likely start probing around you to find a downhill slope and move in that direction to reach the bottom of the hill. Essentially this is what gradient descent does. It tries to reach the bottom of the hill, i.e. minimum error, by moving towards it by calculating the derivatives of the cost function.

Gradient Descent, Regularization and more

For a deeper comprehension about how gradient descent operates and clear examples of linear regression in machine learning with technical explanations, enroll in the “Learn AI with an AI Course” with Audrey Durand.

Machine Learning in Healthcare Series

In this series of articles we explore the use of machine learning in the healthcare industry. Important concepts and algorithms are covered with real applications including open datasets, open code (available on Github) used to run the analyses and the final results and insights.

Subscribe to the Korbit Newsletter

Stay up to date with news, blog posts, code examples and new courses. The newsletter goes out every month.

Series Articles

More Articles for You

How to Set Up a Python Notebook on Colab

Running ML Code the Free and Easy Way You’ve probably watched a few machine learning tutorials on the web but …

Predicting Cancer Recurrence Time in Python

Predicting Breast Cancer Recurrence Time In this post we will build a model for predicting cancer recurrence time with Linear …

Applications of Machine Learning in Healthcare

Data Abundance According to Wikipedia, the term information explosion refers to the rapid increase in the amount of published information …

We’re Hiring: Machine Learning Engineer (closed)

Open Position: Machine Learning Engineer for EdTech Startup We are looking for an extremely motivated full-time machine learning engineer to …