# Linear Regression in Machine Learning Explained

## What is Linear Regression?

Linear regression is a statistical method that allows us to summarize and study the relationships between quantitative variables. Essentially, linear regression helps us model how changes in one or more inputs vary the output. The output is usually a continuous variable, such as time, price and height. It is very common to find linear regression in machine learning. This article explains the difference between the statistics and the machine learning notations.

## Why is Linear Regression Important?

With linear regression we estimate the relationship between an output (dependent variable) and one or more inputs (independent variables) in order to help us make predictions. For example, a doctor might be interested in predicting whether a cancer will recur in a patient, and how long it will take until the disease recurs, given the characteristics for the type of tumor.

## Definition

Let’s have a look at the jargon related to linear regression :

1. Predictors or Independent Variables or Inputs
2. Response or Dependent Variable or Output
3. Residual – The difference between the observed (real) value and the predicted value of the dependent variable.
4. Coefficients or Weights – These are calculated to determine the line that best “fits” the data. In stats, usually denoted as β. In ML these are usually denoted as W or θ.
5. Bias or Intercept – It helps offset the effects of missing relevant predictors for the response and helps make the mean of the residuals 0. The intercept or bias acts as the default value for the function i.e. when all independent values are zero.

The term “linear” refers to a line, in this case a straight line. Intuitively, we know that a line indicates there is a relationship between inputs and outputs. The slope of the line provides a rate for the change on the output given the input.

To help us understand better how Linear Regression works, let’s try modeling the relationship between a single predictor and a response i.e. Simple Linear Regression. We’ll look at the relationship between tumor radius (predictor) and time to recurrence (response).

### Y = W0 + W1・X1 + ε

The above equation looks very similar to an equation of a line (y = c + m・x) where

W0 is equivalent to the y-intercept (c) and W1 is equivalent to the slope of the line (m) modeling the relationship. The machine learning model would be:

hw(x) = W0 + W1・x1

The function hw(x) will predict the time it takes for a cancer to return based on the tumor radius X1 with weight W1.

## Optimizing the Best Fit Line

The goal of applying Linear Regression in Machine Leaning is to select the line of best fit i.e. a line that is as close as possible to the data points. The line of best fit is found by minimizing a cost function like the Mean Squared Error (MSE) function.

## Mean Squared Error Explained

The mean squared error (MSE) is calculated using the following equation:

where N is the number of data points (training samples). Thus, MSE can be thought of as the mean of the summation of all the residuals squared.

## Cost/Error Function for Machine Learning

In Machine Learning, we use a Cost Function that is very similar to the MSE and is denoted as:

Essentially, we want to find the optimal W to minimize the error function J(W). The number of examples in the training set is “m”. We subtract Yi, the actual value from hw(xi), the predicted value based on our model and square this number. We add all of these numbers together and take the 1/2 just to simplify the derivations that will follow with gradient descent.

This minimization of the cost function occurs through an algorithm known as the gradient descent algorithm. The gradient descent algorithm can be understood imagining you are at the top of a hill and you need to reach the bottom of the hill. Sounds easy! But what if you are blindfolded. You will most likely start probing around you to find a downhill slope and move in that direction to reach the bottom of the hill. Essentially this is what gradient descent does. It tries to reach the bottom of the hill, i.e. minimum error, by moving towards it by calculating the derivatives of the cost function.

## Gradient Descent, Regularization and more

For a deeper comprehension about how gradient descent operates and clear examples of linear regression in machine learning with technical explanations, enroll in the “Learn AI with an AI Course” with Audrey Durand.

## Machine Learning in Healthcare Series

In this series of articles we explore the use of machine learning in the healthcare industry. Important concepts and algorithms are covered with real applications including open datasets, open code (available on Github) used to run the analyses and the final results and insights.

## Subscribe to the Korbit Newsletter

Stay up to date with news, blog posts, code examples and new courses. The newsletter goes out every month.