## Logistic Regression in R

### What is Logistic Regression?

• Logistic Regression is a classification Technique.
• You can also think of logistic regression as a special case of linear regression when the outcome variable is categorical.
• The dependent variable is represented as binary (1 / 0, Yes / No, True / False) given a set of independent variables.
• It is used to predict a binary outcome (1 / 0, Yes / No, True / False).
Not a good fit

Logistic curve

log(p/1-p) is the link function. Logarithmic transformation on the outcome variable allows us to model a non-linear association in a linear way.

### Goodness of fit in logistic regression

• AIC (Akaike Information Criteria) :Adjusted R-square of logistic regression. It penalizes model for the number of model coefficients. Therefore, we always prefer model with minimum AIC value.
• Null Deviance and Residual Deviance:Null Deviance indicates the response predicted by a model with nothing but an intercept. Lower the value, better the model. Residual deviance indicates the response predicted by a model on adding independent variables. Lower the value, better the model.
• Confusion Matrix: Tabulated representation of Actual vs Predicted values. This tells the accuracy of the model and avoid overfitting.

### To check is the dataset is balanced

##### Its gives frequency of 0’s and 1’s
> table(loan_data\$default)

0     1
517  883

### Boxplot to check for Outliers

boxplot(loan_data_balanced\$income,horizontal = T,main="Income")

boxplot(loan_data_balanced\$debtinc,horizontal = T,main="Debtinc")

> boxplot(loan_data_balanced\$othdebt,horizontal = T,main="Otherdebt")

### Capping the outliers

After capping check boxplot once again..

> boxplot(loan_data_balanced\$income,horizontal = T,main="Income")

> boxplot(loan_data_balanced\$income,horizontal = T,main="Income")

boxplot(loan_data_balanced\$othdebt,horizontal = T,main="Otherdebt")

Check the distribution with histogram

> hist(loan_data_balanced\$debtinc,main="Debt",col="Red")

> hist(loan_data_balanced\$creddebt,main="Credit Debt",col="brown")

> hist(loan_data_balanced\$othdebt,main = "Other Debt",color="blue")

> hist(loan_data_balanced\$income,main = "Other Debt",col=“yellow")

Split the dataset into training and testing

Accuracy

accuracy of your model = (True Positive + True Negative) / Total

Sensitivity = (True Positive) / True Positive + False Negative

Specificity = (True Negative) / True Negative + False positive

Plotting accuracy curve

Area Under Curve

Area Under Curve Plot