Breaking down a time series into its basic building blocks
Understanding your time series is fundamental when trying to gain insight and find the best model to make future predictions. Most time series can be divided into different components to help diagnose it in a structured way, providing a powerful analysis tool.
In this post, I want to discuss what these different components are, how to acquire them, and how we can perform time series decomposition using Python.
Time series are a combination of (mainly) three components: Trend, Seasonality and Leftovers/Leftover. Let’s break each of them down.
Trend: This is the overall movement of the series. This could be a consistent increase in overtime, a decrease in overtime, or a combination of both.
Seasonality: Every regular season model in the series. For example, ice cream sales are regularly higher in summer than in winter. To learn more about seasonality, see my last post:
There is also a separate one sometimes cycle component, but is often grouped into the trend component.
How these components are combined depends on the nature of your series. For one supplement model we have:
And for a multiplicative series:
Where Y is the series T is the trend S is the seasonality and R is the residual component.
The additive model is most appropriate when the size of the series variations is on a consistent absolute numerical scale. On the other hand, the multiplicative model is when the fluctuations of the series are on a relative and proportional scale.
For example, if ice cream sales are higher in the summer by 1000 each year, then the model is additive. If sales increase by a constant 20% each summer, but the absolute number of sales changes, then the model is multiplicative. Later we will consider an example that should make this theory more concrete.
It is possible to convert a multiplicative model to an additive one by simply taking a log transfrom or the Box-Cox transformation:
To learn more about the Box-Cox transform, you can read my previous article about it:
There are numerous algorithms and methods for decomposing the time series into the three components. I want to review classic approach as this is used frequently and is quite intuitive.
- Calculate the trend component, T, using a moving/moving average.
- Decrease the trend of the series, YT for additive model and Y/T for a multiplicative model.
- Calculate the seasonal component, Sby taking the average of the downtrend series for each season.
- The residual component, Ris calculated as: R = YTR for additive model and R = Y/(TR) for a multiplicative model.
There are also several other decomposition methods available, such as STL, X11 and SEATS. These are advanced methods and add to the basic approach of the classical method and improve its shortcomings.
Once again, let’s revisit the classic data set for US airline passenger traffic between 1948 and 1961:
Data originated from Kaggle with a CC0 license.
From this graph, we observe an increasing trend and annual seasonality. Note that the size of the fluctuations increases with time, hence we have a multiplicative model.
We can decompose the time series using the statsmodels function seasonal_breakdown and specifying that we have a “multiplicative” pattern when calling the function:
From the graph above, we can see that the function has indeed successfully captured the three components.
We can transform our series into an additive model by stabilizing the variance using the Box-Cox transformation by applying boxcox Scipy function:
Again, the function seems to have captured the three components well. Interestingly, we see that the residuals have higher volatility in the earlier and later years. This might be something to consider when building a forecasting model for this series.
In this post, we showed how a time series can be broken down into three main components: trend, seasonality, and residuals. The combination of these three components creates your observed time series, and depending on its nature, it can be additive or multiplicative. There are several techniques to do the decomposition such as STL, SEAL and X11, but I prefer the classic approach as it is very intuitive. Being able to decompose your time series helps build your understanding of your data, making it easier to make future predictions.
The full code used in this post can be found on my GitHub here: