Logistic Regression in R


What is Logistic Regression?

Not a good fit

          

Logistic curve

          

Link Function:
       

log(p/1-p) is the link function. Logarithmic transformation on the outcome variable allows us to model a non-linear association in a linear way.

Goodness of fit in logistic regression

     

Structure of the dataset and data types

     

Change the factor variables to factors

     

To check is the dataset is balanced

Its gives frequency of 0’s and 1’s
> table(loan_data$default)

0     1
517  883

Summary of the loan_data

     

Boxplot to check for Outliers

boxplot(loan_data_balanced$income,horizontal = T,main="Income")

      

boxplot(loan_data_balanced$debtinc,horizontal = T,main="Debtinc")

      

> boxplot(loan_data_balanced$othdebt,horizontal = T,main="Otherdebt")

      

Capping the outliers

      

After capping check boxplot once again..

> boxplot(loan_data_balanced$income,horizontal = T,main="Income")

      

> boxplot(loan_data_balanced$income,horizontal = T,main="Income")

      

boxplot(loan_data_balanced$othdebt,horizontal = T,main="Otherdebt")

      

Check the distribution with histogram

> hist(loan_data_balanced$debtinc,main="Debt",col="Red")

      

> hist(loan_data_balanced$creddebt,main="Credit Debt",col="brown")

      

> hist(loan_data_balanced$othdebt,main = "Other Debt",color="blue")

      

> hist(loan_data_balanced$income,main = "Other Debt",col=“yellow")

      

Split the dataset into training and testing

        

      

        

Accuracy

      

accuracy of your model = (True Positive + True Negative) / Total

Sensitivity = (True Positive) / True Positive + False Negative

Specificity = (True Negative) / True Negative + False positive

      

Plotting accuracy curve

      

      

      

Area Under Curve

      

Area Under Curve Plot