Multiple Linear Regression


Simple Linear Regression Vs Multiple Linear Regression

Perfect Linear Regression

       

Real Life

       

Linear Regression

we are good if and only if my assumptions are met

Assumptions:

Error

          

Case Study:-

The dataset contains 9568 data points collected from a Combined Cycle Power Plant, when the power plant was set to work with full load.  Features consist of hourly average ambient variables.

Temperature (T),
Ambient Pressure (AP),
Relative Humidity (RH) and
Exhaust Vacuum (V)
to predict the net hourly electrical energy output (EP) of the plant.

A gas turbine generator generates electricity while the waste heat from the gas turbine is used to make steam to generate additional electricity via a steam turbine.

Read the dataset and check for summary

      

Check boxplot for outliers

> boxplot(edata$AT,main="Temperature",horizontal = T)

      

boxplot(edata$V,main="Exhaust Vaccum",horizontal = T)

      

> boxplot(edata$AP,main="Ambient Preasure",horizontal = T)

      

boxplot(edata$RH,main="Relative Humidity",horizontal = T)

      

> boxplot(edata$PE, main="Energy Production",horizontal = T)

      

Split the a dataset and check correlation matrix

      

Check linearity of dependent variable with independent variables.

      

Fit the model, check summary and check plots for assumptions

    

Checking multicollinearity, remove vif influencing variable, remove outliers and fitting the models with new dataset “new_train”

    

Check Multicollinearity

> vif(model_fit)

  AT    V      AP     RH
5.911819  3.882701  1.467918  1.693694

Since there is a high correlation Between AT and V, the variation inflation factor was very high for AT.

Hence we have to drop either AT or V.

We will keep AT as it has high correlation with dependent variable.

Remove the outliers ,prepare new dataset and run the model with new dataset.

    

> Model_new_fit=lm(PE~AT+AP+RH,data=new_train)

Residuals Plots for assumptions

      

Normality plot for residuals

      

Scaled

      

Cooks Distance

      

PE = 481.76 – 2.37*AT + 0.03*AP – 0.203*RH