DataTechNotes: Regression Example with Keras LSTM Networks in R

The LSTM (Long Short-Term Memory) network is a type of Recurrent Neural Networks (RNN). The RNN model processes sequential data. It learns the input data by iterating the sequence of elements and acquires the state information regarding the observed part of the elements. Based on the learned data, it predicts the next item in the sequence.

LSTM network applies memory units to remember RNN outputs. Memory units contain gates to deal with the output information. The importance of the information is decided by the weights measured by the algorithm. The forget gate discards the output if it is useless, the input gate allows to update the state, and the output gate sends the output. In this post, we'll learn how to fit and predict regression data with a Keras LSTM model in R.

This tutorial covers:

Generating sample dataset
Reshaping input data
Building Keras LSTM model
Predicting and plotting the result

We'll start by loading the 'keras' library for R.

library(keras)

Generating sample dataset

We need a regression data and we'll create simple vector data as a target regression dataset for this tutorial.

N = 400
set.seed(123)
n = seq(1:N)
a = n/10+4*sin(n/10)+sample(-1:6,N,replace=T)+rnorm(N)

head(a,20)
 [1]  3.698144  7.307090  3.216936  8.500867  8.003362  1.382323  5.488268
 [8]  9.074807  8.684215  6.311856 10.784075  7.171844 10.386709  7.825735
[15]  3.497473 13.273991  5.225496  3.972325  5.448927 10.352474

Reshaping input data

Next, we'll create 'x' and 'y' training sequence data. Here, we apply a window method with the size of the 'step' value. The result (y value) comes after the sequence of window elements (x values), then the window shifts to the next elements of x, and y value is collected and so on.

 step = 2   # step is a window size

To cover all elements in a vector, we'll add a 'step' into the last part of 'a' vector by replicating the last element.

a = c(a, replicate(step, tail(a, 1)))

Creating x - input, and y - output data.

x = NULL
y = NULL
for(i in 1:N)
{
  s = i-1+step
  x = rbind(x,a[i:s])
  y = rbind(y,a[s+1])
}

cbind(head(x), head(y))
         [,1]     [,2]     [,3]
[1,] 3.698144 7.307090 3.216936
[2,] 7.307090 3.216936 8.500867
[3,] 3.216936 8.500867 8.003362
[4,] 8.500867 8.003362 1.382323
[5,] 8.003362 1.382323 5.488268
[6,] 1.382323 5.488268 9.074807

Input data should be an array type, so we'll reshape it.

X = array(x, dim=c(N, step,1))

Building Keras LSTM model

Next, we'll create Keras sequential model, add an LSTM layer, and compile it with defined metrics.

model = keras_model_sequential() %>%   
   layer_lstm(units=128, input_shape=c(step, 1), activation="relu") %>%  
   layer_dense(units=64, activation = "relu") %>%  
   layer_dense(units=32) %>%  
   layer_dense(units=1, activation = "linear")
 
model %>% compile(loss = 'mse',
                  optimizer = 'adam',
                  metrics = list("mean_absolute_error")
                   )
 
model %>% summary()
____________________________________________________________________________
Layer (type)                      Output Shape                  Param #     
============================================================================
lstm_16 (LSTM)                    (None, 128)                   66560       
____________________________________________________________________________
dense_36 (Dense)                  (None, 64)                    8256        
____________________________________________________________________________
dense_37 (Dense)                  (None, 32)                    2080        
____________________________________________________________________________
dense_38 (Dense)                  (None, 1)                     33          
============================================================================
Total params: 76,929
Trainable params: 76,929
Non-trainable params: 0
_______________________________________________________________________

Predicting and plotting the result

Next, we'll train the model with X and y input data, predict X data, and check the errors.

model %>% fit(X,y, epochs=50, batch_size=32, shuffle = FALSE)

y_pred = model %>% predict(X)
 
scores = model %>% evaluate(X, y, verbose = 0)
print(scores)
$loss
[1] 11.84502

$mean_absolute_error
[1] 2.810479

Finally, we'll plot the results.

x_axes = seq(1:length(y_pred))
plot(x_axes, y, type="l", col="red", lwd=2)
lines(x_axes, y_pred, col="blue",lwd=2)
legend("topleft", legend=c("y-original", "y-predicted"),
        col=c("red", "blue"), lty=1,cex=0.8)

You may change the step size and observe the prediction results.
In this tutorial, we've briefly learned how to use Keras LSTM to predict regression data in R. The full source code is listed below.

library(keras)
 
N = 400
step = 2
set.seed(123)
n = seq(1:N)
a = n/10+4*sin(n/10)+sample(-1:6,N,replace=T)+rnorm(N)
a = c(a,replicate(step,tail(a,1)))

x = NULL
y = NULL

for(i in 1:N)
{
  s = i-1+step
  x = rbind(x,a[i:s])
  y = rbind(y,a[s+1])
}

X = array(x, dim=c(N,step,1))
 
model = keras_model_sequential() %>% 
  layer_lstm(units=128, input_shape=c(step, 1), activation="relu") %>%  
  layer_dense(units=64, activation = "relu") %>% 
  layer_dense(units=32) %>% 
  layer_dense(units=1, activation = "linear")

model %>% compile(loss = 'mse',
                  optimizer = 'adam',
                  metrics = list("mean_absolute_error")
                   )
 
model %>% summary()

model %>% fit(X,y, epochs=50, batch_size=32, shuffle = FALSE, verbose=0)
y_pred  =  model %>% predict(X)
 
scores  =  model %>% evaluate(X, y, verbose = 0)
print(scores)

x_axes = seq(1:length(y_pred))
plot(x_axes, y, type="l", col="red", lwd=2)
lines(x_axes, y_pred, col="blue",lwd=2)
legend("topleft", legend=c("y-original", "y-predicted"),
        col=c("red", "blue"), lty=1,cex=0.8)

References:

10 comments:

AnonymousJune 24, 2019 at 1:10 PM
This model only looks good because it probably overfits the data. You did not include any test/validation data to see if the model generalizes out of the training sample. Additionally, with only 400 data points but almost 80,000 learnable parameters, the memory capacity of the net is likely too large for this task. This means that the net was probably able to memorize the test data's specific input-output mappings, and will thus lack predictive power.
AnonymousJuly 9, 2019 at 2:18 PM
Hello, excelent post, Im in a proyect using this algorithm and I have one question, if I have more predictors, on the model fit should I use ###fit(x1+x2,y,....) and the predictions ###predict(x1+x2) ??? or am I wrong?
Thanks for your help. Great post.
AnonymousJuly 10, 2019 at 7:44 AM
How do I tune this model?
UnknownNovember 5, 2019 at 8:45 PM
Hi, I tried this method work for time series data with last 4 year monthly values. The predicted values are vague and i'm not sure of what i did wrong. I also tried by changing the step size but it is also not working out.Can you please help me out with it?

Pages

Regression Example with Keras LSTM Networks in R

10 comments: