Regression Example with Keras LSTM Networks in R


   The LSTM (Long Short-Term Memory) network is a type of Recurrent Neural Networks (RNN). The RNN model processes sequential data. It learns the input data by iterating the sequence of elements and acquires the state information regarding the observed part of the elements. Based on the learned data, it predicts the next item in the sequence.
   LSTM network applies memory units to remember RNN outputs. Memory units contain gates to deal with the output information. The importance of the information is decided by the weights measured by the algorithm. The forget gate discards the output if it is useless, the input gate allows to update the state, and the output gate sends the output. In this post, we'll learn how to fit and predict regression data with a Keras LSTM model in R.
   This tutorial covers:
  •     Generating sample dataset
  •     Reshaping input data
  •     Building Keras LSTM model
  •     Predicting and plotting the result
   We'll start by loading the 'keras' library for R.

library(keras)

Generating sample dataset

   We'll create sample dataset for this tutorial. Here, we'll create 'a' vector as a dataset of regression data.

N = 400
set.seed(123)
n = seq(1:N)
a = n/10+4*sin(n/10)+sample(-1:6,N,replace=T)+rnorm(N)
 
head(a,20)
 [1]  3.698144  7.307090  3.216936  8.500867  8.003362  1.382323  5.488268
 [8]  9.074807  8.684215  6.311856 10.784075  7.171844 10.386709  7.825735
[15]  3.497473 13.273991  5.225496  3.972325  5.448927 10.352474 

Reshaping input data

   Next, we'll create 'x' and 'y' training sequence data. Here, we apply a window method with the size of the 'step' value. The result (y value) comes after the sequence of window elements (x values), then the window shifts to the next elements of x, and y value is collected and so on.

 step = 2   # step is a window size

To cover all elements in a vector, we'll add a 'step' into the last part of  'a' vector by replicating the last element.

a = c(a, replicate(step, tail(a, 1)))

Creating x - input, and y - output data.

x = NULL
y = NULL
for(i in 1:N)
{
  s = i-1+step
  x = rbind(x,a[i:s])
  y = rbind(y,a[s+1])
}
 
cbind(head(x), head(y))
         [,1]     [,2]     [,3]
[1,] 3.698144 7.307090 3.216936
[2,] 7.307090 3.216936 8.500867
[3,] 3.216936 8.500867 8.003362
[4,] 8.500867 8.003362 1.382323
[5,] 8.003362 1.382323 5.488268
[6,] 1.382323 5.488268 9.074807

 Input data should be an array type, so we'll reshape it.

X = array(x, dim=c(N, step,1))

Building Keras LSTM model

Next, we'll create Keras sequential model, add an LSTM layer, and compile it with defined metrics.

model = keras_model_sequential() %>%   
   layer_lstm(units=128, input_shape=c(step, 1), activation="relu") %>%  
   layer_dense(units=64, activation = "relu") %>%  
   layer_dense(units=32) %>%  
   layer_dense(units=1, activation = "linear")
 
model %>% compile(loss = 'mse',
                  optimizer = 'adam',
                  metrics = list("mean_absolute_error")
                   )
 
model %>% summary()
____________________________________________________________________________
Layer (type)                      Output Shape                  Param #     
============================================================================
lstm_16 (LSTM)                    (None, 128)                   66560       
____________________________________________________________________________
dense_36 (Dense)                  (None, 64)                    8256        
____________________________________________________________________________
dense_37 (Dense)                  (None, 32)                    2080        
____________________________________________________________________________
dense_38 (Dense)                  (None, 1)                     33          
============================================================================
Total params: 76,929
Trainable params: 76,929
Non-trainable params: 0
____________________________________________________________________________

Predicting and plotting the result

Next, we'll train the model with X and y input data, predict X data, and check the errors.

model %>% fit(X,y, epochs=50, batch_size=32, shuffle = FALSE)
y_pred = model %>% predict(X)
 
scores = model %>% evaluate(X, y, verbose = 0)
print(scores)
$loss
[1] 11.84502

$mean_absolute_error
[1] 2.810479

Finally, we'll plot the results.

x_axes = seq(1:length(y_pred))
plot(x_axes, y, type="l", col="red", lwd=2)
lines(x_axes, y_pred, col="blue",lwd=2)
legend("topleft", legend=c("y-original", "y-predicted"),
        col=c("red", "blue"), lty=1,cex=0.8) 


  You may change the step size and check the prediction results.
   In this tutorial, we've briefly learned how to use Keras LSTM to predict regression data in R. Thank you for reading!
The full source code is listed below.

library(keras)
 
N = 400
step = 2
set.seed(123)
n = seq(1:N)
a = n/10+4*sin(n/10)+sample(-1:6,N,replace=T)+rnorm(N)
a = c(a,replicate(step,tail(a,1)))

x = NULL
y = NULL

for(i in 1:N)
{
  s = i-1+step
  x = rbind(x,a[i:s])
  y = rbind(y,a[s+1])
}

X = array(x, dim=c(N,step,1))
 
model = keras_model_sequential() %>% 
  layer_lstm(units=128, input_shape=c(step, 1), activation="relu") %>%  
  layer_dense(units=64, activation = "relu") %>% 
  layer_dense(units=32) %>% 
  layer_dense(units=1, activation = "linear")

model %>% compile(loss = 'mse',
                  optimizer = 'adam',
                  metrics = list("mean_absolute_error")
                   )
 
model %>% summary()

model %>% fit(X,y, epochs=50, batch_size=32, shuffle = FALSE, verbose=0)
y_pred  =  model %>% predict(X)
 
scores  =  model %>% evaluate(X, y, verbose = 0)
print(scores)

x_axes = seq(1:length(y_pred))
plot(x_axes, y, type="l", col="red", lwd=2)
lines(x_axes, y_pred, col="blue",lwd=2)
legend("topleft", legend=c("y-original", "y-predicted"),
        col=c("red", "blue"), lty=1,cex=0.8)

References:
  1. https://keras.rstudio.com/index.html 
  2. https://keras.rstudio.com/reference/layer_lstm.html
9 comments:
  1. This model only looks good because it probably overfits the data. You did not include any test/validation data to see if the model generalizes out of the training sample. Additionally, with only 400 data points but almost 80,000 learnable parameters, the memory capacity of the net is likely too large for this task. This means that the net was probably able to memorize the test data's specific input-output mappings, and will thus lack predictive power.

    ReplyDelete
    Replies
    1. Good point! But, here I did not intend to build a perfect predictive model. The purpose of this post is to show a simple, workable example with a random data for beginners. Readers should consider every aspect of the model building when they work with real problems.

      Delete
  2. Hello, excelent post, Im in a proyect using this algorithm and I have one question, if I have more predictors, on the model fit should I use ###fit(x1+x2,y,....) and the predictions ###predict(x1+x2) ??? or am I wrong?
    Thanks for your help. Great post.

    ReplyDelete
    Replies
    1. You are welcome! You need to create combined X array data (contains all features x1, x2, ..) for your training and prediction. It goes like this;
      x1, x2, y
      2, 3, 3
      3, 4, 4
      2, 4, => 4
      3, 5, => 5
      4, 6, => 6

      Here, each window contains 3 elements of both x1 and x2 series.
      2, 3,
      3, 4,
      2, 4, =>4

      3, 4,
      2, 4,
      3, 5, => 5

      2, 4,
      3, 5,
      4, 6, => 6

      Delete
    2. Thanks, I made an X array with all the predictors and it works. Got a mse = 15.9 (nice) with the default parameters, then I tunned the epochs parameter on the fit and got a better prediction. I´ve been tunnin with epochs and batch_size but I dont know very well how should I change the sequential keras model, (dense and units), I got 37 observations and 19 predictors. Can you give me advices with this tunning? Thanks for your time and post, my model's predictions are great, in fact I could stop now with my results but I want to improve and learn more about this model.

      Delete
    3. Good! Your data is too small to evaluate your model and improve the performance. To check the improvement in your model;
      1) Use bigger data,
      2) Change the units number,
      3) Add dense layer,
      4) Add dropout layer, layer_dropout()
      5) Change optimizer (rmsprop etc.)

      Delete
    4. Hi, I followed your advices and my model has improve, thanks. But (another doubt) I got some steps (number of samples to reach a new period) with differents numbers, at the beginning of the sample, each 4 samples change the period, and at the end, that changes for 5 periods. How should I attack this problem? 2 models? how can be ensembled?

      Thanks for your time, really.

      Delete
  3. How do I tune this model?

    ReplyDelete
  4. Hi, I tried this method work for time series data with last 4 year monthly values. The predicted values are vague and i'm not sure of what i did wrong. I also tried by changing the step size but it is also not working out.Can you please help me out with it?

    ReplyDelete