Forecasting time series data in Python

   Time series data is a sequence of data indexed in a time dimension. In this post, we learn how to decompose and forecast time series data in Python.
  The post covers:
  1. Creating time series data with pandas.
  2. Decomposing time series data.
  3. Forecasting with ARMA/ARIMA model
First, we add required libraries into the source code.

import random
import pandas as pd
import matplotlib.pyplot as plt

from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.arima_model import ARMA
from statsmodels.tsa.seasonal import seasonal_decompose


Creating time series data with pandas

For test purpose, I'll create time series data with the following function.

def CreateTSData(N):
 columns = ['value']
 df = pd.DataFrame(columns=columns)
 for i in range(N):    
  v = i/20+random.uniform(-12, 8)+random.uniform(-1, 1)
  df.loc[i]= [v]
 return df

N = 400  # total number of rows
days = 10
df = CreateTSData(N)
df.index=pd.DatetimeIndex(freq="d", start=pd.Timestamp('2000-01-01'),periods=N)
df.head() 
                    value
2000-01-01 -0.802450
2000-01-02 -0.147009
2000-01-03 -1.862958
2000-01-04  5.919821
2000-01-05  2.061787


Decomposing time series data

   Time series data decomposition is a method to split data series into the components like a trend, seasonal, and irregular noise.
  • Trend component reflects the overall direction in data. It is mean value over time.
  • Seasonal component is variations that occur at specific regular intervals in data series (e.g., weekly, monthly). 
  • Irregular (noise) component is residuals that is a remaining part after removing the above components.
   We can decompose time series data with seasonal_decompose function, and the plot looks as below.

decomp = seasonal_decompose(df["value"])
decomp.plot()
plt.show()


Forecasting with ARMA/ARIMA models

   Autoregressive Moving Average (ARMA) and Autoregressive Integrated Moving Average (ARIMA) are commonly used models to forecast time series data. The ARMA model needs (p, q) values and the ARIMA model requires  (p,d,q) values where p, d, and q are non-negative integer values, and represents;
    p -  the number of lag observations in the model, also known as the AR.
    d - the number of times that the raw observations are differenced, also known as the degree of difference.
    q - the size of the moving average window, also known as the order of the moving average.

ARIMA model

The model  can be created with ARIMA function, you may check the summary of the model with below functions

arima = ARIMA(df, order = (10,0,0))
arima = arima.fit()
arima.summary()


Next, we forecast data for new 10 days and visualize it in a plot.

plt.plot(df)
plt.plot(arima.predict(1, N + days), color="red")
plt.show()


ARMA model

We use ARMA function this time and fit the model.

arma = ARMA(df, order = (2,1))
arma = arma.fit()
arma.summary()

Next, we forecast data for new 10 days and visualize it in a plot.

plt.plot(df)
plt.plot(arma.predict(1, N+days), color = "red")
plt.show()




   In this post, we have briefly learned how to decompose and forecast time series data in Python. I hope you have found it useful.
   A full source code is listed below.

import random
import pandas as pd
import matplotlib.pyplot as plt

from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.arima_model import ARMA
from statsmodels.tsa.seasonal import seasonal_decompose

def CreateTSData(N):
 columns = ['value']
 df = pd.DataFrame(columns=columns)
 for i in range(N):    
  v = i/20+random.uniform(-12, 8)+random.uniform(-1, 1)
  df.loc[i]= [v]
 return df

N = 400    # total number of rows
days = 10  # days to forecast
df = CreateTSData(N)
df.index=pd.DatetimeIndex(freq="d",start=pd.Timestamp('2000-01-01'),periods=N)
df.head()

decomp = seasonal_decompose(df["value"])
decomp.plot()
plt.show()

arima = ARIMA(df, order = (10,0,0))
arima = arima.fit()
arima.summary()

plt.plot(df)
plt.plot(arima.predict(1, N + days), color="red")
plt.show()

arma = ARMA(df, order = (2,1))
arma = arma.fit()
arma.summary()

plt.plot(df)
plt.plot(arma.predict(1, N+days), color = "red")
plt.show()
 

No comments:

Post a Comment