## Survival Analysis

### Introduction

• It is a branch of statistics which studies the amount of time that it takes before a particular events, such as death, occurs.
• It is also know as duration analysis, transition analysis, failure time analysis and time-to-event analysis.

### Survival Models

Models time to happening of an event

Example:

1. Time to death after heart attack
2. Time to get job after Graduation
3. Time to default in payment after taking loan.
4. Time to adopt new technology after knowing it.

### Why can’t we use linear regression?

• Time is not normally distributed outcome : Time to an event is usually not normally distributed.
• Censoring – Partial Information
• May or may not be multivariate analysis: No independent variables. Hence we use Survival analysis.

### Requirements

• Definition of event – Clear definition of event
• Time scale – month, year
• Origin of the event
• Time to event T is a random variable - From zero to any value or infinity

### Event and Time

• Event: Death, disease occurrence, disease recurrence, recovery, or other experience of interest
• Time: The time from the beginning of an observation period (such as surgery or beginning treatment) to
(i) an event
(ii) end of the study
(iii) loss of contact or withdrawal from the study.

### Anomalies in Survival Analysis

• Though time-to-event study is theoretically simple to undertake, but in reality there are a number of obstacles if the event being studied is relatively rare or takes a long time to occur.

• For example, a study of death rates might be highly difficult to undertake if most subjects out live the term of the study, or drop out of the study while it is in progress.

### Censoring and Survival function

• Censoring / Censored observation: If a subject does not have an event during the observation time, they are described as censored. The subject is censored in the sense that nothing is observed or known about that subject after the time of censoring. A censored subject may or may not have an event after the end of observation time.

• Truncation: the process generating the data is such that it only is possible to observe outcomes above (or below) the truncation limit. This can for instance occur if measurements are taken using a detector which only is activated if the signals it detects are above a certain limit. There may be lots of weak incoming signals, but we can never tell using this detector.

• Survival Function S(t): The probability that a subject survives longer than time t.

### Causes of censored data:

Censoring and it may arise in the following ways:

1. a patient has not (yet) experienced the relevant outcome, such as relapse or death, by the time of the close of the study;
2. a patient is lost to follow-up during the study period;
3. a patient experiences a different event that makes further follow-up impossible.
4. Right censoring occurs when a subject leaves the study before an event occurs, or the study ends before the event has occurred. For example, we consider patients in a clinical trial to study the effect of treatments on stroke occurrence. The study ends after 5 years. Those patients who have had no strokes by the end of the year are censored. If the patient leaves the study at time te; then the event occurs in ( te, infinite).
5. Left censoring is when the event of interest has already occurred before enrolment. This is very rarely encountered.

### Definitions

• Density Functions: f(t) = p(T=t) . It is the probability of happening an event at time t. It is an unconditional probability.
• Survival Function: S(t) = P(T>=t) . Probability that the event happened.
For example death happened after time t.
Another way of putting it after (t= 1 year or 2nd year and so on).
• Hazard function: Instantaneous event rate.
It is a conditional probability.
• example:Given that a customer has paid his loan till time t. What is the probability that he or she defaults at t.

### Survival And Hazard function

• The survival probability (which is also called the survivor function) S(t) is the probability that an individual survives from the time origin (e.g. diagnosis of cancer) to a specified future time t. It is fundamental to a survival analysis because survival probabilities for different values of t provide crucial summary information from time to event data. These values describe directly the survival experience of a study cohort.
• The hazard is usually denoted by h(t) or λ(t) and is the probability that an individual who is under observation at a time t has an event at that time.

### Kaplan Meier estimate of the survival function

• The Survival function S(t), is the probability of survival up to each event time in data from the beginning of the follow up time.   ### Log-Rank Test

• Log-Rank test is to compare the survival distributions of two samples. It’s a non-parametric test and appropriate to use when data are right skewed and censored. It is widely used in clinical trails to establish efficacy of a new treatment in comparison with a control treatment when the measurement is time to event.
• The test is sometimes called the Mantel–Cox test
• The log-rank test can also be viewed as a time-stratified.

### Cox proportional hazards (PH) regression analysis

• Kaplan-Meier curves and log-rank tests are useful when the predictor variable is categorical (e.g., drug vs. placebo).
• Or takes a small number of values (e.g., drug doses 0, 20, 50, and 100 mg/day) that can be treated as categorical.
• For quantitative predictor variables, an alternative method is Cox proportional hazards regression analysis.

### By using R

• (a) Consider the “larynx” dataset in the “KMsurv” package and fit a Cox’s model with “stage” as the covariate.
•  (i) Determine if “age” should be included in the model as a linear term. If age cannot be entered into the model as a linear term, what can you say about the functional form for “age”. Give proper explanations to your answers
(ii) Repeat (i) for the covariate “diagyr”
• (b) Fit a Cox’s model with the covariate “stage” and a linear term for “age”. Comment on the adequacy of the fit by diagnosing appropriate residuals. Justify the choice of the chosen residuals.
• (c) Comment on the proportional hazards assumption with respect to the covariate “stage” with proper explanations. Use the log cumulative hazard plots.
• (d) Fit a Cox’s model with all the covariates available in the data frame (you can assume the functional form for each covariate to be linear). Use deviance residuals to detect potential outliers.
• e) Once again, fit a Cox’s model with all the covariates included in the model. Using “dfbeta” residuals, find the influential observations on the estimate of the regression coe cient corresponding to “age” only. Explain clearly why you considered these observations to be influential.