Time series data prediction with Keras LSTM model in Python


   Long Short-Term Memory (LSTM) network is a type of recurrent neural network to analyze sequence data. It learns input data by iterating the sequence elements and acquires state information regarding the checked part of the elements. Based on the learned data, it predicts the next item in the sequence. In this post, we'll learn how to fit and predict time series data with a keras LSTM model in Python.
   The post covers:
  1. Preparing test data
  2. Shaping input data
  3. Defining Keras LSTM model
  4. Predicting test data and plotting the result
  First, we'll load the required libraries for this tutorial.

import random
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense,LSTM,Dropout


Preparing test data

Next, I'll generate time series data for this tutorial.

random.seed(123)
def CreateTSData(N):
 columns = ['value']
 df = pd.DataFrame(columns=columns)
 for i in range(N):    
  v = i/100+math.sin(2*i)+random.uniform(-1,1) 
  df.loc[i]= [v]
 return df

N = 240    # The number of elements
df = CreateTSData(N)
df.index=pd.DatetimeIndex(freq="d",start=pd.Timestamp('2000-01-01'),periods=N)
>>> df.head()
               value
2000-01-01 -0.895273
2000-01-02  0.093671
2000-01-03 -0.922319
2000-01-04 -1.034015
2000-01-05  1.831756

plt.plot(df)
plt.show()


Next, we'll split 'df' dataset into a training and test parts.

Tp = 200     # training part limit 
values=df.values
train,test = values[0:Tp,:], values[Tp:N,:]


Shaping input data

   LSTM requires a window step that contains the number of elements as an input sequence. Here, we define it as a 'step' value. This is an important part of LSTM so let's see an example:
x has the following sequence data.
x = [1,2,3,4,5,6,7,8,9,10]
for step=1, x and y contain:
x  y
1  2
2  3
3  4
4  5
..
9  10
for step=3, x and y contain:
x         y
1,2,3   4
2,3,4   5
3,4,5   6
4,5,6   7
...
7,8,9   10

As you have noticed the sizes of x input and y output become different. We'll fix it by adding step size into the training and test data.

step=3
test = np.append(test, np.repeat(test[-1,], step))
train = np.append(train, np.repeat(train[-1,], step))

Next, we'll convert test and train data into the matrix with step value as it has shown above example.

# convert into dataset matrix
def convertToMatrix(data, step):
 X, Y =[], []
 for i in range(len(data)-step):
  d=i+step  
  X.append(data[i:d,])
  Y.append(data[d,])
 return np.array(X), np.array(Y)

trainX,trainY =convertToMatrix(train, step)
testX,testY =convertToMatrix(test, step)
>>> testX.shape
(40, 3) 

Finally, we'll reshape trainX and testX to fit with Keras model. You can see the shape of testX below.

trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
>>> testX.shape
(40, 1, 3) 


Defining Keras LSTM model

Next, we create the keras Sequential model.

model = Sequential()
model.add(LSTM(units=32, input_shape=(1,step), activation="relu"))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 32)                4608      
_________________________________________________________________
dropout_1 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 33        
=================================================================
Total params: 4,641
Trainable params: 4,641
Non-trainable params: 0
_________________________________________________________________
 

Predicting test data and plotting the result

Next, we'll fit model with trainX data and predict testX data.

model.fit(trainX,trainY, epochs=100, batch_size=32, verbose=2)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
predicted = np.concatenate((trainPredict,testPredict),axis=0)


Finally, we check the result in a plot. A vertical line in a plot identifies a splitting point between training and test part.

index = df.index.values
plt.plot(index,df)
plt.plot(index,predicted)
plt.axvline(df.index[Tp], c="r")
plt.show()


   In this post, we've learned how to fit and predict time series data with keras LSTM model. Full source code is listed below.
   Thank you for reading!

import random
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout

random.seed(123)
# create dataset
def CreateTSData(N):
 columns = ['value']
 df = pd.DataFrame(columns=columns)
 for i in range(N):    
  v = i/100+math.sin(2*i)+random.uniform(-1,1)
  df.loc[i]= [v]
 return df

# convert into dataset matrix
def convertToMatrix(data, step):
 X, Y =[], []
 for i in range(len(data)-step):
  d=i+step  
  X.append(data[i:d,])
  Y.append(data[d,])
 return np.array(X), np.array(Y)

step=3
N = 240    # total number of rows
Tp = 200     # training part 
df = CreateTSData(N)
df.index=pd.DatetimeIndex(freq="d",start=pd.Timestamp('2000-01-01'),periods=N)
df.head()

values=df.values
train,test = values[0:Tp,:], values[Tp:N,:]

# add step elements into train and test
test = np.append(test,np.repeat(test[-1,],step))
train = np.append(train,np.repeat(train[-1,],step))
 
trainX,trainY =convertToMatrix(train,step)
testX,testY =convertToMatrix(test,step)
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

# Keras LSTM model 
model = Sequential()
model.add(LSTM(units=32, input_shape=(1,step), activation="relu"))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
model.summary()

model.fit(trainX,trainY, epochs=100, batch_size=32, verbose=2)
trainPredict = model.predict(trainX)
testPredict= model.predict(testX)
predicted=np.concatenate((trainPredict,testPredict),axis=0)

index = df.index.values
plt.plot(index,df)
plt.plot(index,predicted)
plt.axvline(df.index[Tp], c="r")
plt.show()

No comments:
Post a Comment