**What is Linear Regression?**

Linear regression is a statistical method that allows us to summarize and study the relationships between quantitative variables. Essentially, linear regression helps us model how changes in one or more inputs vary the output. The output is usually a continuous variable, such as time, price and height. It is very common to find linear regression in machine learning. This article explains the difference between the statistics and the machine learning notations.

**Why is Linear Regression Important?**

With linear regression we estimate the relationship between an output (dependent variable) and one or more inputs (independent variables) in order to help us make predictions. For example, a doctor might be interested in predicting whether a cancer will recur in a patient, and **how long it will take until the disease recurs**, given the characteristics for the type of tumor.

**Definition**

Let’s have a look at the jargon related to linear regression :

**Predictors**or Independent Variables or**Inputs****Response**or Dependent Variable or**Output****Residual**– The difference between the observed (real) value and the predicted value of the dependent variable.- Coefficients or
**Weights**– These are calculated to determine the line that best “fits” the data. In stats, usually denoted as β. In ML these are usually denoted as W or θ. - Bias or
**Intercept**– It helps offset the effects of missing relevant predictors for the response and helps make the mean of the residuals 0. The intercept or bias acts as the default value for the function i.e. when all independent values are zero.

**Let’s Start With A Simple Model**

The term “linear” refers to a ** line**, in this case a straight line. Intuitively, we know that a line indicates there is a relationship between inputs and outputs. The slope of the line provides a rate for the change on the output given the input.

To help us understand better how Linear Regression works, let’s try modeling the relationship between a single predictor and a response i.e. Simple Linear Regression. We’ll look at the relationship between **tumor radius (predictor)** and **time to recurrence (response)**.

**Y = W**_{0}** + W**_{1}**・X**_{1}** + ε**

_{0}

_{1}

_{1}

The above equation looks very similar to an equation of a line (y = c + m・x) where

W_{0} is equivalent to the y-intercept (c) and W_{1} is equivalent to the slope of the line (m) modeling the relationship. The machine learning model would be:

**h**_{w}**(x) = W**_{0}** + W**_{1}**・x**_{1}

The function **h _{w}(x)** will predict the time it takes for a cancer to return based on the tumor radius

**X**with weight

_{1}**W**.

_{1}**Optimizing the Best Fit Line**

The goal of applying Linear Regression in Machine Leaning is to select the line of best fit i.e. a line that is as close as possible to the data points. The line of best fit is found by minimizing a cost function like the Mean Squared Error (MSE) function.

**Mean Squared Error** Explained

The mean squared error (MSE) is calculated using the following equation:

where N is the number of data points (training samples). Thus, MSE can be thought of as the mean of the summation of all the residuals squared.

**Cost/Error Function** for Machine Learning

In Machine Learning, we use a Cost Function that is very similar to the MSE and is denoted as:

Essentially, **we want to find the optimal W** to minimize the error function** **J(W). The number of examples in the training set is **“m”**. We subtract **Y _{i}**, the actual value from

**h**, the predicted value based on our model and square this number. We add all of these numbers together and take the 1/2 just to simplify the derivations that will follow with gradient descent.

_{w}(x_{i})This minimization of the cost function occurs through an algorithm known as the gradient descent algorithm. The gradient descent algorithm can be understood imagining you are at the top of a hill and you need to reach the bottom of the hill. Sounds easy! But what if you are blindfolded. You will most likely start probing around you to find a downhill slope and move in that direction to reach the bottom of the hill. Essentially this is what gradient descent does. It **tries to reach the bottom of the hill, i.e. minimum error**, by moving towards it by calculating the derivatives of the cost function.

**Gradient Descent, Regularization and more**

For a deeper comprehension about how gradient descent operates and clear examples of linear regression in machine learning with technical explanations, enroll in the “Learn AI with an AI Course” with Audrey Durand.

**Machine Learning in Healthcare Series**

In this series of articles we explore the use of machine learning in the healthcare industry. Important concepts and algorithms are covered with real applications including open datasets, open code (available on Github) used to run the analyses and the final results and insights.

## Subscribe to the Korbit Newsletter

Stay up to date with news, blog posts, code examples and new courses. The newsletter goes out every month.

## Series Articles

- Applications of Machine Learning in Healthcare
- (Coming Soon) Predicting Cancer Recurrence Outcome With Machine Learning (Including Code)
- (Coming Soon) Predicting Cancer Recurrence Time With Machine Learning (Including Code)
- (Coming Soon) How to Run Code on Google Colab
- (Coming Soon) Classification in Machine Learning Explained