## Multiple Linear Regression

Simple Linear Regression Vs Multiple Linear Regression
**In simple Linear Regression:**One Dependent variable and one Independent variable
**Multiple Linear Regression :**One Dependent variable and many Independent variables
**Example:**Predict Land rate given the Land area
**In Simple Linear Regression:**Land Rate = Land Area
**In Multiple Linear Regression:**Land Rate = Land Area + Locality + Connectivity to Basic Amenities

Perfect Linear Regression

Real Life

## Linear Regression

we are good if and only if my assumptions are met

**Assumptions:**
- The dependent variable should be linearly related to independent variables.
- Homoscedasticity.
- The dependent variable should be linearly related to independent variables.
- The error should be normally distributed.
- No Auto-correlation.

Error

Case Study:-

The dataset contains 9568 data points collected from a Combined Cycle Power Plant, when the power plant was set to work with full load. Features consist of hourly average ambient variables.

Temperature (T),

Ambient Pressure (AP),

Relative Humidity (RH) and

Exhaust Vacuum (V)

to predict the net hourly electrical energy output (EP) of the plant.

##### A gas turbine generator generates electricity while the waste heat from the gas turbine is used to make steam to generate additional electricity via a steam turbine.

Read the dataset and check for summary

Check boxplot for outliers

> boxplot(edata$AT,main="Temperature",horizontal = T)
boxplot(edata$V,main="Exhaust Vaccum",horizontal = T)
> boxplot(edata$AP,main="Ambient Preasure",horizontal = T)
boxplot(edata$RH,main="Relative Humidity",horizontal = T)
> boxplot(edata$PE, main="Energy Production",horizontal = T)
Split the a dataset and check correlation matrix

Check linearity of dependent variable with independent variables.

Fit the model, check summary and check plots for assumptions

Checking multicollinearity, remove vif influencing variable, remove outliers and fitting the models with new dataset “new_train”

Check Multicollinearity

> vif(model_fit)
AT V AP RH

5.911819 3.882701 1.467918 1.693694

Since there is a high correlation Between AT and V, the variation inflation factor was very high for AT.

Hence we have to drop either AT or V.

We will keep AT as it has high correlation with dependent variable.

Remove the outliers ,prepare new dataset and run the model with new dataset.

> Model_new_fit=lm(PE~AT+AP+RH,data=new_train)
Residuals Plots for assumptions

Normality plot for residuals

Scaled

Cooks Distance

- The final Equation of the model is

PE = 481.76 – 2.37*AT + 0.03*AP – 0.203*RH