When we are a beginner in the Machine Learning field, we often get confused with classification and regression analysis. Regression is applied to the problem when a real or continuous value needs to be predicted, such as “salary” or “prices of the houses”. In these problem statements the target value is continuous and can be classified into a “yes” or “no” category. In such cases, we need to apply regression techniques. In this blog, I will cover the basics of different regression techniques and it’s python implementation.
What is Regression?
Regression is a statistical approach that understands the possible relationship among variables. The study of regression clarifies the changes in parameters in relation to changes in the target predictors. To investigate or analyze the relationship between the dependent and independent set of variables, regression methods are applied. It covers the variety of data analysis techniques that are implemented in qualitative-exploratory research for analyzing infinite variables. The prime applications of regression analysis are for forecasting, time series analysis modeling, and defining cause-effect relationships.
Types of regression techniques:
There are different types of regression techniques but in this blog, we going to consider the following ones:
- Linear regression
- Multiple Variable Linear Regression
- Polynomial Regression
- Non-linear Regression
Let’s see in detail for each of them with their respective python code. We are going to implement these techniques for predicting co2 emission from vehicles. Dataset and notebook links can be found here.
Linear Regression :
It is the simplest form of regression. It is a technique in which the dependent variable is continuous in nature. In simple words, the relationship between the dependent variable and independent variables is considered to be linear in nature.
Coefficient and Intercept in the simple linear regression, are the parameters of the fit line. Given that it is a simple linear regression, with only 2 parameters, and knowing that the parameters are the intercept and slope of the line, sklearn can estimate them directly from our data. Notice that all of the data must be available to traverse and calculate the parameters.
Multiple Regression :
The relation between multiple independent or predictor variables and one dependent or criterion variable is generally explained by multiple regression. A dependent variable is modeled along with the constant term as a function of many independent variables with corresponding coefficients. Two or more predictor variables are required for multiple regression, and this is why it is called multiple regression.
In our example, there are multiple variables that predict the Co2emission. When more than one independent variable is present, the process is called multiple linear regression. For example, predicting co2emission using FUELCONSUMPTION_COMB, EngineSize, and Cylinders of cars. The good thing here is that Multiple linear regression is the extension of a simple linear regression model.
Coefficients: [[11.34282345 7.4584514 9.46781705]]
As mentioned before, Coefficient and Intercept, are the parameters of the fit line. Given that it is a multiple linear regression, with 3 parameters, and knowing that the parameters are the intercept and coefficients of the hyperplane.
We try to fit polynomials of different degrees to determine the best fitting curve to determine the CO2 emissions.
Polynomial Regression is a type of linear regression that models the relationship between the independent variable x and the dependent variable y as a polynomial of the nth degree. A nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y) matches the polynomial regression.
These are used essentially to explain or identify non-linear phenomena, such as: Growth rate of tissues, Progression of disease epidemics and Distribution of carbon isotopes in lake sediments.
Coefficients: [[ 0. 50.42171112 -1.61613827]]
It is a method for modeling a non-linear relationship between variables that are dependent and independent. If the data shows a curvy pattern, it is used in place, and as opposed to non-linear regression, linear regression does not yield very detailed results. This is because the data is pre-assumed to be linear in linear regression.
To sum up, there are different regression techniques that can be applied in Machine learning and Dta science. The selection of picking the right regression technique entirely depends on the data and requirements needed to apply. Hopefully, this blog provides you an inspiration for an overview of different regression methods.