*Logistic Regression in R*

### What is Logistic Regression?

- Logistic Regression is a classification Technique.
- You can also think of logistic regression as a special case of linear regression when the outcome variable is categorical.
- The dependent variable is represented as binary (1 / 0, Yes / No, True / False) given a set of independent variables.
- It is used to predict a binary outcome (1 / 0, Yes / No, True / False).

Not a good fit
Logistic curve
Link Function:
log(p/1-p) is the link function. Logarithmic transformation on the outcome variable allows us to model a non-linear association in a linear way.

### Goodness of fit in logistic regression

**AIC (Akaike Information Criteria) :**Adjusted R-square of logistic regression. It penalizes model for the number of model coefficients. Therefore, we always prefer model with minimum AIC value.
**Null Deviance and Residual Deviance:**Null Deviance indicates the response predicted by a model with nothing but an intercept. Lower the value, better the model. Residual deviance indicates the response predicted by a model on adding independent variables. Lower the value, better the model.
**Confusion Matrix:** Tabulated representation of Actual vs Predicted values. This tells the accuracy of the model and avoid overfitting.

### Structure of the dataset and data types

### Change the factor variables to factors

### To check is the dataset is balanced

##### Its gives frequency of 0’s and 1’s

* > table(loan_data$default)*
0 1

517 883

### Summary of the loan_data

### Boxplot to check for Outliers

* boxplot(loan_data_balanced$income,horizontal = T,main="Income")*
* boxplot(loan_data_balanced$debtinc,horizontal = T,main="Debtinc")*
* > boxplot(loan_data_balanced$othdebt,horizontal = T,main="Otherdebt")*
### Capping the outliers

After capping check boxplot once again..

* > boxplot(loan_data_balanced$income,horizontal = T,main="Income")*
* > boxplot(loan_data_balanced$income,horizontal = T,main="Income")*
* boxplot(loan_data_balanced$othdebt,horizontal = T,main="Otherdebt")*
Check the distribution with histogram

* > hist(loan_data_balanced$debtinc,main="Debt",col="Red")*
*> hist(loan_data_balanced$creddebt,main="Credit Debt",col="brown")*
*> hist(loan_data_balanced$othdebt,main = "Other Debt",color="blue")*
* > hist(loan_data_balanced$income,main = "Other Debt",col=“yellow")*
Split the dataset into training and testing

Accuracy

accuracy of your model = (True Positive + True Negative) / Total

Sensitivity = (True Positive) / True Positive + False Negative

Specificity = (True Negative) / True Negative + False positive

Plotting accuracy curve

Area Under Curve

Area Under Curve Plot