It is an analysis that is related to time. Time bound observations are common phenomenon in any business activity. Business see jump in their sales at certain time, sluggish at certain season and sometime it remains normal. These fluctuations in thier trend are also known as seasonal variations. It plays vital role in time series analysis. For example festive season draws huge sales. Most of business firm bank on this opportunity to stay ahead of their peer competitors. It is important to not only estimate the projected demand in coming season and preparing their production to be ready to fully capitalize the market. Times series analysis is a easy tool to forecast the future demand in the market or project profit, it can be used in various other business domain to projects parameter of their interest.

Start coding in python
**
Keys steps to start with Time series analysis in python
1) Loading the data.
2) Specifying the column into date time information.
3) Checking the stationarity of time series.
**
a) Constant mean: The mean of the variable should not be a function of time. Varying mean would depict time dependence, consequently assumption of stationarity fails. Hence it’s important to check the rolling means across time.

b) Constant variance: The variance should also not be a function of time. If the variance tends to vary across time, it induces heterogeneous property, which is undesirable in time series data. To proceed ahead in analyzing time series data we have maintain homogeneity.

c) Covariance that does not depend on time: Another important phenomena is auto covariance. The covariance depends on the relative position of two points. Meaning, how far they are from the two points. Keys steps to start with Time series analysis in python.

2) specifying the column into date time information.

3) Checking the stationarity of time series.

b) Constant variance.

c) Covariance that does not depend on time.

1. Trend: If trend is visible, it’s attributed to varying means across time.

2. Seasonality: High sales during a particular season. Says for example during festive season.

If the above two components are removed from the series we can achieve stationarity.

There are two main reason behind non-stationarity of Time Series:

a)Trend: If trend is visible, it’s attributed to varying means across time.

b)Seasonality: High sales during a particular season. Says for example during festive season.

If the above two components are removed from the series we can achieve stationarity.

>import pandas as pd

>import numpy as np

>time_data=pd.read_csv("/content/AirPassengers.csv")

The date information is not in proper format, so we will have to make it into proper format.

>time_data["Month"]

dt_parse = lambda dates: pd.datetime.strptime(dates, '%Y-%m')

df_time= pd.read_csv('AirPassengers.csv', parse_dates=['Month'], index_col='Month',date_parser=dt_parse)

# Check the plot

import matplotlib.pyplot as plt

plt.plot(df_time)

It’s clearly indicative from the plot with increase in time in x-axis values are also increasing. We can see a pattern being followed with time. Hence this series is a function of time. Seasonality and trend is evident as we see the graph so we already have an indication that the series is non-stationary.

Let’s check some diagnostics

plt.plot(df_time, color='blue',label='Original')

plt.plot(roll_mean, color='red', label='Rolling Mean')

plt.plot(roll_std, color='black', label = 'Rolling Std')

plt.legend(loc='best')

plt.title('Rolling Mean & Standard Deviation')

plt.show(block=False)

from statsmodels.tsa.stattools import adfuller

df_ad = adfuller(df_time["#Passengers"],autolag='AIC')

df_result = pd.Series(df_ad[0:4], index=['Test_Statistic','p_value','Lags_Used','Number of Observations Used'])

Since the p_value(0.9918) is insignificant, hence the series is non-stationary.

** a)Trend:** If trend is visible, it’s attributed to varying means across time.

** b)Seasonality:** High sales during a particular season. Says for example during festive season.

So we have to remove these two components trend and seasonality. Since we see a high positive trend, we will do transformation ( log transformation ). Log transformation will trim higher values compare to lower ones.

log_trans=np.log(df_time)

plt.plot(log_trans)

Though we have been successful in trimming the high values but still we get to see the positive trend.

One thing we know from the above plot that mean is steadily increasing. We have to remove this trend by average. There are two methods

b) Exponential smoothing

We will take moving average of 12 months. And then see the plot. Then we will remove moving average from series.

roll_mean=log_trans.rolling(12).mean()

plt.plot(log_trans)

plt.plot(roll_mean, color='yellow')

Let’s remove the moving average from the series:

minus_log_trans=log_trans-roll_mean

minus_log_trans.dropna(inplace=True)

plt.plot(minus_log_trans)

Now, we see from the plot there is not trend, just a random data. Let’s do the dicky fuller test:

from statsmodels.tsa.stattools import adfuller

df_ad = adfuller(minus_log_trans["#Passengers"],autolag='AIC')

df_result = pd.Series(df_ad[0:4], index=['Test_Statistic','p_value','Lags_Used','Number of Observations Used'])

print(df_result)

It’s evident looking at the p-value that the series is stationary.

exp_mean=log_trans.ewm(span=12).mean()

exp_minus_log_trans=log_trans-exp_mean

exp_minus_log_trans.dropna(inplace=True)

plt.plot(minus_log_trans)

from statsmodels.tsa.stattools import adfuller

df_ad = adfuller(exp_minus_log_trans["#Passengers"],autolag='AIC')

df_result = pd.Series(df_ad[0:4], index=['Test_Statistic','p_value','Lags_Used','Number of Observations Used'])

print(df_result)

Seems, with exponential smoothing the stationary results are even better.

AR stands for Auto – Regressive.

MA stands for moving average

I stands for integrated

ARIMA(p,q,d)

Where p is number of AR, q is number of MA and d is number of differencing.

To identify the value of p and q we will see ACF and PCF plot. Since we have made the series stationary, we will not do differencing, that is d. Differencing is another method of removing trend and making the series stationary.

Start coding in python