## Pages

### Regression Example with Keras LSTM Networks in R

The LSTM (Long Short-Term Memory) network is a type of Recurrent Neural Networks (RNN). The RNN model processes sequential data. It learns the input data by iterating the sequence of elements and acquires the state information regarding the observed part of the elements. Based on the learned data, it predicts the next item in the sequence.

LSTM network applies memory units to remember RNN outputs. Memory units contain gates to deal with the output information. The importance of the information is decided by the weights measured by the algorithm. The forget gate discards the output if it is useless, the input gate allows to update the state, and the output gate sends the output. In this post, we'll learn how to fit and predict regression data with a Keras LSTM model in R.

This tutorial covers:
•     Generating sample dataset
•     Reshaping input data
•     Building Keras LSTM model
•     Predicting and plotting the result

`library(keras)`

Generating sample dataset

We need a regression data and we'll create simple vector data as a target regression dataset for this tutorial.

```N = 400
set.seed(123)
n = seq(1:N)
a = n/10+4*sin(n/10)+sample(-1:6,N,replace=T)+rnorm(N)```
` `
```head(a,20)
  3.698144  7.307090  3.216936  8.500867  8.003362  1.382323  5.488268
  9.074807  8.684215  6.311856 10.784075  7.171844 10.386709  7.825735
  3.497473 13.273991  5.225496  3.972325  5.448927 10.352474 ```

Reshaping input data

Next, we'll create 'x' and 'y' training sequence data. Here, we apply a window method with the size of the 'step' value. The result (y value) comes after the sequence of window elements (x values), then the window shifts to the next elements of x, and y value is collected and so on.

` step = 2   # step is a window size`

To cover all elements in a vector, we'll add a 'step' into the last part of  'a' vector by replicating the last element.

`a = c(a, replicate(step, tail(a, 1)))`

Creating x - input, and y - output data.

```x = NULL
y = NULL
for(i in 1:N)
{
s = i-1+step
x = rbind(x,a[i:s])
y = rbind(y,a[s+1])
}```
` `
```cbind(head(x), head(y))
[,1]     [,2]     [,3]
[1,] 3.698144 7.307090 3.216936
[2,] 7.307090 3.216936 8.500867
[3,] 3.216936 8.500867 8.003362
[4,] 8.500867 8.003362 1.382323
[5,] 8.003362 1.382323 5.488268
[6,] 1.382323 5.488268 9.074807```

Input data should be an array type, so we'll reshape it.

`X = array(x, dim=c(N, step,1))`

Building Keras LSTM model

Next, we'll create Keras sequential model, add an LSTM layer, and compile it with defined metrics.

```model = keras_model_sequential() %>%
layer_lstm(units=128, input_shape=c(step, 1), activation="relu") %>%
layer_dense(units=64, activation = "relu") %>%
layer_dense(units=32) %>%
layer_dense(units=1, activation = "linear")

model %>% compile(loss = 'mse',
metrics = list("mean_absolute_error")
)

model %>% summary()
____________________________________________________________________________
Layer (type)                      Output Shape                  Param #
============================================================================
lstm_16 (LSTM)                    (None, 128)                   66560
____________________________________________________________________________
dense_36 (Dense)                  (None, 64)                    8256
____________________________________________________________________________
dense_37 (Dense)                  (None, 32)                    2080
____________________________________________________________________________
dense_38 (Dense)                  (None, 1)                     33
============================================================================
Total params: 76,929
Trainable params: 76,929
Non-trainable params: 0
_______________________________________________________________________```
` `

Predicting and plotting the result

Next, we'll train the model with X and y input data, predict X data, and check the errors.

`model %>% fit(X,y, epochs=50, batch_size=32, shuffle = FALSE)`
```y_pred = model %>% predict(X)

scores = model %>% evaluate(X, y, verbose = 0)
print(scores)
\$loss
 11.84502

\$mean_absolute_error
 2.810479```

Finally, we'll plot the results.

```x_axes = seq(1:length(y_pred))
plot(x_axes, y, type="l", col="red", lwd=2)
lines(x_axes, y_pred, col="blue",lwd=2)
legend("topleft", legend=c("y-original", "y-predicted"),
col=c("red", "blue"), lty=1,cex=0.8) ```

You may change the step size and observe the prediction results.
In this tutorial, we've briefly learned how to use Keras LSTM to predict regression data in R. The full source code is listed below.

 ```library(keras) N = 400 step = 2 set.seed(123) n = seq(1:N) a = n/10+4*sin(n/10)+sample(-1:6,N,replace=T)+rnorm(N) a = c(a,replicate(step,tail(a,1)))``` ```x = NULL y = NULL for(i in 1:N) { s = i-1+step x = rbind(x,a[i:s]) y = rbind(y,a[s+1]) } X = array(x, dim=c(N,step,1)) model = keras_model_sequential() %>% layer_lstm(units=128, input_shape=c(step, 1), activation="relu") %>% layer_dense(units=64, activation = "relu") %>% layer_dense(units=32) %>% layer_dense(units=1, activation = "linear") model %>% compile(loss = 'mse', optimizer = 'adam', metrics = list("mean_absolute_error") ) model %>% summary() model %>% fit(X,y, epochs=50, batch_size=32, shuffle = FALSE, verbose=0) y_pred = model %>% predict(X) scores = model %>% evaluate(X, y, verbose = 0) print(scores)``` ```x_axes = seq(1:length(y_pred)) plot(x_axes, y, type="l", col="red", lwd=2) lines(x_axes, y_pred, col="blue",lwd=2) legend("topleft", legend=c("y-original", "y-predicted"), col=c("red", "blue"), lty=1,cex=0.8)```
References:

1. This model only looks good because it probably overfits the data. You did not include any test/validation data to see if the model generalizes out of the training sample. Additionally, with only 400 data points but almost 80,000 learnable parameters, the memory capacity of the net is likely too large for this task. This means that the net was probably able to memorize the test data's specific input-output mappings, and will thus lack predictive power.

1. Good point! But, here I did not intend to build a perfect predictive model. The purpose of this post is to show a simple, workable example with a random data for beginners. Readers should consider every aspect of the model building when they work with real problems.

2. Hi, you can not use all data to train the net, since you use it to predict those data in use and absolutely it does perfectly. But your model should be tested with another data set, which operates very badly.

2. Hello, excelent post, Im in a proyect using this algorithm and I have one question, if I have more predictors, on the model fit should I use ###fit(x1+x2,y,....) and the predictions ###predict(x1+x2) ??? or am I wrong?
Thanks for your help. Great post.

1. You are welcome! You need to create combined X array data (contains all features x1, x2, ..) for your training and prediction. It goes like this;
x1, x2, y
2, 3, 3
3, 4, 4
2, 4, => 4
3, 5, => 5
4, 6, => 6

Here, each window contains 3 elements of both x1 and x2 series.
2, 3,
3, 4,
2, 4, =>4

3, 4,
2, 4,
3, 5, => 5

2, 4,
3, 5,
4, 6, => 6

2. Thanks, I made an X array with all the predictors and it works. Got a mse = 15.9 (nice) with the default parameters, then I tunned the epochs parameter on the fit and got a better prediction. I´ve been tunnin with epochs and batch_size but I dont know very well how should I change the sequential keras model, (dense and units), I got 37 observations and 19 predictors. Can you give me advices with this tunning? Thanks for your time and post, my model's predictions are great, in fact I could stop now with my results but I want to improve and learn more about this model.

3. Good! Your data is too small to evaluate your model and improve the performance. To check the improvement in your model;
1) Use bigger data,
2) Change the units number,
5) Change optimizer (rmsprop etc.)

4. Hi, I followed your advices and my model has improve, thanks. But (another doubt) I got some steps (number of samples to reach a new period) with differents numbers, at the beginning of the sample, each 4 samples change the period, and at the end, that changes for 5 periods. How should I attack this problem? 2 models? how can be ensembled?

3. 4. 