Member-only story
Linear Regression can not work on all data samples. For a linear regression algorithm to work properly, it has to pass at least the following five assumptions:
- Linear Relationship: The relationship between the independent and dependent variables should be linear. This can be tested using scatter plots.
- Multivariate Normal: All the variables together should be multivariate normal. For all the variables to be multivariate normal each variable separately has to be univariate normal means a bell shaped curve.And any subset of variables should also be multivariate normal. This can be tested by plotting a histogram.
- No Multicollinearity: There is little or no multicollinearity in the data. Multicollinearity happens when the independent variables are highly correlated with each other. Multicollinearity can be tested with correlation matrix.
- No Autocorrelation: There is little or no autocorrelation in the data. Autocorrelation means a single column data values are related to each other. In other words f(x+1)is dependent on value of f(x). Autocorrelation can be tested with scatter plots.
- Homoscedasticity: Homoscedasticity is there. This means “same variance” .In other words residuals are equal across regression line. Homoscedasticity can also be tested using scatter plot.