Top 5 Assumptions for Linear Regression

Dhiraj K
2 min readApr 1, 2019

Linear Regression can not work on all data samples. For a linear regression algorithm to work properly, it has to pass at least the following five assumptions:

Photo by Brooke Cagle on Unsplash
  1. Linear Relationship: The relationship between the independent and dependent variables should be linear. This can be tested using scatter plots.
  2. Multivariate Normal: All the variables together should be multivariate normal. For all the variables to be multivariate normal each variable separately has to be univariate normal means a bell shaped curve.And any subset of variables should also be multivariate normal. This can be tested by plotting a histogram.
  3. No Multicollinearity: There is little or no multicollinearity in the data. Multicollinearity happens when the independent variables are highly correlated with each other. Multicollinearity can be tested with correlation matrix.
  4. No Autocorrelation: There is little or no autocorrelation in the data. Autocorrelation means a single column data values are related to each other. In other words f(x+1)is dependent on value of f(x). Autocorrelation can be tested with scatter plots.
  5. Homoscedasticity: Homoscedasticity is there. This means “same variance” .In other words residuals are equal across regression line. Homoscedasticity can also be tested using scatter plot.

Note : We can use the gvlma library to check the above assumptions of linear regression automatically.

Additionally, you may like to watch how to implement Linear Regression from Scratch in python without using sklearn

--

--

Dhiraj K

Data Scientist & Machine Learning Evangelist. I like to mess with data. dhiraj10099@gmail.com