## Logistic Regression in R

### What is Logistic Regression?

• Logistic Regression is a classification Technique.
• You can also think of logistic regression as a special case of linear regression when the outcome variable is categorical.
• The dependent variable is represented as binary (1 / 0, Yes / No, True / False) given a set of independent variables.
• It is used to predict a binary outcome (1 / 0, Yes / No, True / False).
Not a good fit Logistic curve  log(p/1-p) is the link function. Logarithmic transformation on the outcome variable allows us to model a non-linear association in a linear way.

### Goodness of fit in logistic regression

• AIC (Akaike Information Criteria) :Adjusted R-square of logistic regression. It penalizes model for the number of model coefficients. Therefore, we always prefer model with minimum AIC value.
• Null Deviance and Residual Deviance:Null Deviance indicates the response predicted by a model with nothing but an intercept. Lower the value, better the model. Residual deviance indicates the response predicted by a model on adding independent variables. Lower the value, better the model.
• Confusion Matrix: Tabulated representation of Actual vs Predicted values. This tells the accuracy of the model and avoid overfitting. ### Structure of the dataset and data types ### Change the factor variables to factors ### To check is the dataset is balanced

##### Its gives frequency of 0’s and 1’s
> table(loan_data\$default)

0     1
517  883

### Summary of the loan_data ### Boxplot to check for Outliers

boxplot(loan_data_balanced\$income,horizontal = T,main="Income") boxplot(loan_data_balanced\$debtinc,horizontal = T,main="Debtinc") > boxplot(loan_data_balanced\$othdebt,horizontal = T,main="Otherdebt") ### Capping the outliers After capping check boxplot once again..

> boxplot(loan_data_balanced\$income,horizontal = T,main="Income") > boxplot(loan_data_balanced\$income,horizontal = T,main="Income") boxplot(loan_data_balanced\$othdebt,horizontal = T,main="Otherdebt") Check the distribution with histogram

> hist(loan_data_balanced\$debtinc,main="Debt",col="Red") > hist(loan_data_balanced\$creddebt,main="Credit Debt",col="brown") > hist(loan_data_balanced\$othdebt,main = "Other Debt",color="blue") > hist(loan_data_balanced\$income,main = "Other Debt",col=“yellow") Split the dataset into training and testing   Accuracy accuracy of your model = (True Positive + True Negative) / Total

Sensitivity = (True Positive) / True Positive + False Negative

Specificity = (True Negative) / True Negative + False positive Plotting accuracy curve   Area Under Curve Area Under Curve Plot 