Perfect Linear Regression
we are good if and only if my assumptions are metAssumptions:
The dataset contains 9568 data points collected from a Combined Cycle Power Plant, when the power plant was set to work with full load. Features consist of hourly average ambient variables.
Ambient Pressure (AP),
Relative Humidity (RH) and
Exhaust Vacuum (V)
to predict the net hourly electrical energy output (EP) of the plant.
Read the dataset and check for summary
Check boxplot for outliers
Split the a dataset and check correlation matrix
Check linearity of dependent variable with independent variables.
Fit the model, check summary and check plots for assumptions
Checking multicollinearity, remove vif influencing variable, remove outliers and fitting the models with new dataset “new_train”
Since there is a high correlation Between AT and V, the variation inflation factor was very high for AT.
Hence we have to drop either AT or V.
We will keep AT as it has high correlation with dependent variable.
Remove the outliers ,prepare new dataset and run the model with new dataset.
Residuals Plots for assumptions
Normality plot for residuals
PE = 481.76 – 2.37*AT + 0.03*AP – 0.203*RH