DataTechNotes: Time series data prediction with Keras LSTM model in Python

Long Short-Term Memory (LSTM) network is a type of recurrent neural network to analyze sequence data. It learns input data by iterating the sequence elements and acquires state information regarding the checked part of the elements. Based on the learned data, it predicts the next item in the sequence.

In this tutorial, we'll briefly learn how to fit and predict time series data with Keras LSTM model in Python. The tutorial covers:

Preparing test data
Shaping input data
Defining Keras LSTM model
Predicting test data and plotting the result

We'll start by loading the required libraries for this tutorial.

import random
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense,LSTM,Dropout

Preparing test data

Next, we'll generate time series data for this tutorial.

random.seed(123)

def CreateTSData(N):
 columns = ['value']
 df = pd.DataFrame(columns=columns)
 for i in range(N):    
  v = i/100+math.sin(2*i)+random.uniform(-1,1) 
  df.loc[i]= [v]
 return df

N = 240    # The number of elements
df = CreateTSData(N)
df.index=pd.DatetimeIndex(freq="d",start=pd.Timestamp('2000-01-01'),periods=N)
df.head()
               value
2000-01-01 -0.895273
2000-01-02  0.093671
2000-01-03 -0.922319
2000-01-04 -1.034015
2000-01-05  1.831756

plt.plot(df)
plt.show()

Next, we'll split 'df' dataset into a training and test parts.

Tp = 200     # training part limit 
values=df.values
train,test = values[0:Tp,:], values[Tp:N,:]

Shaping input data

LSTM requires a window step that contains the number of elements as an input sequence. Here, we define it as a 'step' value. This is an important part of LSTM so let's see an example:
x has the following sequence data.

x = [1,2,3,4,5,6,7,8,9,10]
for step=1, x and y contain:
x y
1 2
2 3
3 4
4 5
..
9 10

for step=3, x and y contain:
x       y
1,2,3   4
2,3,4   5
3,4,5   6
4,5,6   7
...
7,8,9   10

As you have noticed the sizes of x input and y output become different. We'll fix it by adding step size into the training and test data.

step = 3
test = np.append(test, np.repeat(test[-1,], step))
train = np.append(train, np.repeat(train[-1,], step))

Next, we'll convert test and train data into the matrix with step value as it has shown above example.

# convert into dataset matrix
def convertToMatrix(data, step):
 X, Y =[], []
 for i in range(len(data)-step):
  d=i+step  
  X.append(data[i:d,])
  Y.append(data[d,])
 return np.array(X), np.array(Y)

trainX,trainY =convertToMatrix(train, step)
testX,testY =convertToMatrix(test, step)

testX.shape
(40, 3)

Finally, we'll reshape trainX and testX to fit with Keras model. You can see the shape of testX below.

trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

testX.shape
(40, 1, 3)

Defining Keras LSTM model

Next, we create the keras Sequential model.

model = Sequential()
model.add(LSTM(units=32, input_shape=(1,step), activation="relu"))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 32)                4608      
_________________________________________________________________
dropout_1 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 33        
=================================================================
Total params: 4,641
Trainable params: 4,641
Non-trainable params: 0
_________________________________________________________________

Predicting test data and plotting the result

Next, we'll fit model with trainX data and predict testX data.

model.fit(trainX,trainY, epochs=100, batch_size=32, verbose=2)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
predicted = np.concatenate((trainPredict,testPredict),axis=0)

Finally, we check the result in a plot. A vertical line in a plot identifies a splitting point between training and test part.

index = df.index.values
plt.plot(index,df)
plt.plot(index,predicted)
plt.axvline(df.index[Tp], c="r")
plt.show()

In this post, we've learned how to fit and predict time series data with Keras LSTM model. Full source code is listed below.

Source code listing

import random
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout

random.seed(123)
# create dataset
def CreateTSData(N):
 columns = ['value']
 df = pd.DataFrame(columns=columns)
 for i in range(N):    
  v = i/100+math.sin(2*i)+random.uniform(-1,1)
  df.loc[i]= [v]
 return df

# convert into dataset matrix
def convertToMatrix(data, step):
 X, Y =[], []
 for i in range(len(data)-step):
  d=i+step  
  X.append(data[i:d,])
  Y.append(data[d,])
 return np.array(X), np.array(Y)

step = 3
N = 240    # total number of rows
Tp = 200     # training part 
df = CreateTSData(N)
df.index=pd.DatetimeIndex(freq="d",start=pd.Timestamp('2000-01-01'),periods=N)
df.head()

values = df.values
train, test = values[0:Tp,:], values[Tp:N,:]

# add step elements into train and test
test = np.append(test,np.repeat(test[-1,],step))
train = np.append(train,np.repeat(train[-1,],step))
 
trainX,trainY =convertToMatrix(train,step)
testX,testY =convertToMatrix(test,step)
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

# Keras LSTM model 
model = Sequential()
model.add(LSTM(units=32, input_shape=(1,step), activation="relu"))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
model.summary()

model.fit(trainX,trainY, epochs=100, batch_size=32, verbose=2)
trainPredict = model.predict(trainX)
testPredict= model.predict(testX)
predicted=np.concatenate((trainPredict,testPredict),axis=0)

index = df.index.values
plt.plot(index,df)
plt.plot(index,predicted)
plt.axvline(df.index[Tp], c="r")
plt.show()

DataTechNotes

Pages

Time series data prediction with Keras LSTM model in Python

No comments:

Post a Comment