The post covers:

- Creating time series data with pandas.
- Decomposing time series data.
- Forecasting with ARMA/ARIMA model

import random import pandas as pd import matplotlib.pyplot as plt from statsmodels.tsa.arima_model import ARIMA from statsmodels.tsa.arima_model import ARMA from statsmodels.tsa.seasonal import seasonal_decompose

**Creating time series data with pandas**

For test purpose, I'll create time series data with the following function.

def CreateTSData(N): columns = ['value'] df = pd.DataFrame(columns=columns) for i in range(N): v = i/20+random.uniform(-12, 8)+random.uniform(-1, 1) df.loc[i]= [v] return df N = 400 # total number of rows days = 10 df = CreateTSData(N) df.index=pd.DatetimeIndex(freq="d", start=pd.Timestamp('2000-01-01'),periods=N) df.head()value

2000-01-01 -0.802450

2000-01-02 -0.147009

2000-01-03 -1.862958

2000-01-04 5.919821

2000-01-05 2.061787

**Decomposing time series data**

Time series data decomposition is a method to split data series into the components like a trend, seasonal, and irregular noise.

- Trend component reflects the overall direction in data. It is mean value over time.
- Seasonal component is variations that occur at specific regular intervals in data series (e.g., weekly, monthly).
- Irregular (noise) component is residuals that is a remaining part after removing the above components.

decomp = seasonal_decompose(df["value"]) decomp.plot() plt.show()

**Forecasting with ARMA/ARIMA models**

Autoregressive Moving Average (ARMA) and Autoregressive Integrated Moving Average (ARIMA) are commonly used models to forecast time series data. The ARMA model needs (p, q) values and the ARIMA model requires (p,d,q) values where p, d, and q are non-negative integer values, and represents;

*p -*the number of lag observations in the model, also known as the AR.

*d*- the number of times that the raw observations are differenced, also known as the degree of difference.

*q*- the size of the moving average window, also known as the order of the moving average.

**ARIMA model**

The model can be created with ARIMA function, you may check the summary of the model with below functions

arima = ARIMA(df, order = (10,0,0)) arima = arima.fit() arima.summary()

Next, we forecast data for new 10 days and visualize it in a plot.

plt.plot(df) plt.plot(arima.predict(1, N + days), color="red") plt.show()

**ARMA model**

We use ARMA function this time and fit the model.

arma = ARMA(df, order = (2,1)) arma = arma.fit() arma.summary()

Next, we forecast data for new 10 days and visualize it in a plot.

plt.plot(df) plt.plot(arma.predict(1, N+days), color = "red") plt.show()

In this post, we have briefly learned how to decompose and forecast time series data in Python. I hope you have found it useful.

A full source code is listed below.

import random import pandas as pd import matplotlib.pyplot as plt from statsmodels.tsa.arima_model import ARIMA from statsmodels.tsa.arima_model import ARMA from statsmodels.tsa.seasonal import seasonal_decompose def CreateTSData(N): columns = ['value'] df = pd.DataFrame(columns=columns) for i in range(N): v = i/20+random.uniform(-12, 8)+random.uniform(-1, 1) df.loc[i]= [v] return df N = 400 # total number of rows days = 10 # days to forecast df = CreateTSData(N) df.index=pd.DatetimeIndex(freq="d",start=pd.Timestamp('2000-01-01'),periods=N) df.head() decomp = seasonal_decompose(df["value"]) decomp.plot() plt.show() arima = ARIMA(df, order = (10,0,0)) arima = arima.fit() arima.summary() plt.plot(df) plt.plot(arima.predict(1, N + days), color="red") plt.show() arma = ARMA(df, order = (2,1)) arma = arma.fit() arma.summary() plt.plot(df) plt.plot(arma.predict(1, N+days), color = "red") plt.show()

very informative article post. much thanks again

ReplyDeleteData Science Training in Hyderabad

your article on data science is very good keep it up thank you for sharing.

ReplyDeleteData Science Training in Hyderabad