Logistic Regression in R

What is Logistic Regression?

Not a good fit


Logistic curve


Link Function:

log(p/1-p) is the link function. Logarithmic transformation on the outcome variable allows us to model a non-linear association in a linear way.

Goodness of fit in logistic regression


Structure of the dataset and data types


Change the factor variables to factors


To check is the dataset is balanced

Its gives frequency of 0’s and 1’s
> table(loan_data$default)

0     1
517  883

Summary of the loan_data


Boxplot to check for Outliers

boxplot(loan_data_balanced$income,horizontal = T,main="Income")


boxplot(loan_data_balanced$debtinc,horizontal = T,main="Debtinc")


> boxplot(loan_data_balanced$othdebt,horizontal = T,main="Otherdebt")


Capping the outliers


After capping check boxplot once again..

> boxplot(loan_data_balanced$income,horizontal = T,main="Income")


> boxplot(loan_data_balanced$income,horizontal = T,main="Income")


boxplot(loan_data_balanced$othdebt,horizontal = T,main="Otherdebt")


Check the distribution with histogram

> hist(loan_data_balanced$debtinc,main="Debt",col="Red")


> hist(loan_data_balanced$creddebt,main="Credit Debt",col="brown")


> hist(loan_data_balanced$othdebt,main = "Other Debt",color="blue")


> hist(loan_data_balanced$income,main = "Other Debt",col=“yellow")


Split the dataset into training and testing






accuracy of your model = (True Positive + True Negative) / Total

Sensitivity = (True Positive) / True Positive + False Negative

Specificity = (True Negative) / True Negative + False positive


Plotting accuracy curve




Area Under Curve


Area Under Curve Plot