## Describing what autocorrelation is and why it is useful in time series analysis.

In time series analysis we often make inferences about the past to produce forecasts about the future. In order for this process to be successful, we must diagnose our time series thoroughly to find all its ‘nooks and crannies.’

One such diagnosis method is **autocorrelation**. This helps us detect certain features in our series to enable us to choose the most optimal forecasting model for our data.

In this short post I want to go over: what is autocorrelation, why it is useful and finish with how to apply it to a simple dataset in Python.

Autocorrelation is just the **correlation** of the data with itself. So, instead of measuring the correlation between two random variables, we are measuring the correlation between a random variable against itself. Hence, why it is called ** auto**-correlation.

Correlation is how strongly two variables are related to each other. If the value is 1, the variables are perfectly positively correlated, -1 they are perfectly negatively correlated and 0 there is no correlation.

For time-series, the autocorrelation is the correlation of that time series at two different points in time (also known as *lags*). In other words, we are measuring the time series against some lagged version of itself.

Mathematically, autocorrelation is calculated as :

Where ** N **is the length of the time series

**and**

*y***is the specifie lag of the time series. So, when calculating**

*k***we are computing the correlation between**

*r_1***and**

*y_t*

*y_{t-1}.*The autocorrelation between

y_tandy_twould be1as they are identical.

As stated above, we use autocorrelation to measure the correlation of a time series with a lagged version of itself. This computation allows us to gain some interesting insight into the characteristics of our series:

**Seasonality**: Lets say we find the correlation at certain lag multiples is in general higher than others. This means we have some seasonal component in our data. For example, if we have daily data and we find that every multiple oflag term is higher than others, we probably have some weekly seasonality.*7***Trend****:**If the correlation for recent lags is higher and slowly decreases as the lags increase, then there is some trend in our data. Therefore, we would need to carry out some differencing to render the time series**stationary**.

To learn more about seasonality, trend and stationarity, check out my previous articles on those topics:

Let’s now go through an example in Python to make this theory more concrete!

For this walkthrough we will use the classic airline passenger volumes dataset:

Data sourced from Kaggle with a CC0 licence.

There is a clear upwards trend and yearly seasonality (data points indexed by month).

We can use the **plot_acf** function from the statsmodels package to plot the autocorrelation of our time series at various lags, this type plot is known as a *correlogram*:

We observe the following:

- There is a clear
*cyclical*pattern in the lags every multiple of**12**. As our data is indexed by month, we therefore have a*yearly seasonality*in our data. - The strength of correlation is generally and
*slowly decreasing*as the lags increase. This points to a*trend*in our data and it needs to be differenced to make it stationary when modelling.

The blue region signifies which lags are **statistically significant**. Therefore, when building a forecast model for this data, the next month forecast should probably only consider **~15** of the previous values due to their statistical significance.

The lag at value 0 has a perfect correlation of 1 because we are correlating the time series with an exact copy of itself.

In this post we have described what autocorrelation is and how we can use it to detect seasonality and trends in our time series. However, it does have other uses to. For example, we can use an autocorrelation plot for the **residuals** from a forecasting model to determine if the residuals are indeed independent. If the autocorrelation for the residuals are *not* mostly zero, then the fitted model has not accounted for all information and probably can be improved.

The full code script used in this article can be found at my GitHub here:

Autocorrelation For Time Series Analysis Republished from Source https://towardsdatascience.com/autocorrelation-for-time-series-analysis-86e68e631f77?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed

<!–

–>