The LSTM (Long ShortTerm Memory) network is a type of Recurrent Neural Networks (RNN). The RNN model processes sequential data. It learns the input data by iterating the sequence of elements and acquires the state information regarding the observed part of the elements. Based on the learned data, it predicts the next item in the sequence.
LSTM network applies memory units to remember RNN outputs. Memory units contain gates to deal with the output information. The importance of the information is decided by the weights measured by the algorithm. The forget gate discards the output if it is useless, the input gate allows to update the state, and the output gate sends the output. In this post, we'll learn how to fit and predict regression data with a Keras LSTM model in R.
 Generating sample dataset
 Reshaping input data
 Building Keras LSTM model
 Predicting and plotting the result
library(keras)
Generating sample dataset
We need a regression data and we'll create simple vector data as a target regression dataset for this tutorial.
N = 400
set.seed(123)
n = seq(1:N)
a = n/10+4*sin(n/10)+sample(1:6,N,replace=T)+rnorm(N)
head(a,20) [1] 3.698144 7.307090 3.216936 8.500867 8.003362 1.382323 5.488268 [8] 9.074807 8.684215 6.311856 10.784075 7.171844 10.386709 7.825735 [15] 3.497473 13.273991 5.225496 3.972325 5.448927 10.352474
Reshaping input data
Next, we'll create 'x' and 'y' training sequence data. Here, we apply a window method with the size of the 'step' value. The result (y value) comes after the sequence of window elements (x values), then the window shifts to the next elements of x, and y value is collected and so on.
step = 2 # step is a window size
To cover all elements in a vector, we'll add a 'step' into the last part of 'a' vector by replicating the last element.
a = c(a, replicate(step, tail(a, 1)))
Creating x  input, and y  output data.
x = NULL
y = NULL
for(i in 1:N)
{
s = i1+step
x = rbind(x,a[i:s])
y = rbind(y,a[s+1])
}
cbind(head(x), head(y))
[,1] [,2] [,3]
[1,] 3.698144 7.307090 3.216936
[2,] 7.307090 3.216936 8.500867
[3,] 3.216936 8.500867 8.003362
[4,] 8.500867 8.003362 1.382323
[5,] 8.003362 1.382323 5.488268
[6,] 1.382323 5.488268 9.074807
Input data should be an array type, so we'll reshape it.
X = array(x, dim=c(N, step,1))
Building Keras LSTM model
Next, we'll create Keras sequential model, add an LSTM layer, and compile it with defined metrics.
model = keras_model_sequential() %>%
layer_lstm(units=128, input_shape=c(step, 1), activation="relu") %>%
layer_dense(units=64, activation = "relu") %>%
layer_dense(units=32) %>%
layer_dense(units=1, activation = "linear")
model %>% compile(loss = 'mse',
optimizer = 'adam',
metrics = list("mean_absolute_error")
)
model %>% summary()
____________________________________________________________________________
Layer (type) Output Shape Param #
============================================================================
lstm_16 (LSTM) (None, 128) 66560
____________________________________________________________________________
dense_36 (Dense) (None, 64) 8256
____________________________________________________________________________
dense_37 (Dense) (None, 32) 2080
____________________________________________________________________________
dense_38 (Dense) (None, 1) 33
============================================================================
Total params: 76,929
Trainable params: 76,929
Nontrainable params: 0
_______________________________________________________________________
Predicting and plotting the result
Next, we'll train the model with X and y input data, predict X data, and check the errors.
model %>% fit(X,y, epochs=50, batch_size=32, shuffle = FALSE)
y_pred = model %>% predict(X)
scores = model %>% evaluate(X, y, verbose = 0)
print(scores)
$loss
[1] 11.84502
$mean_absolute_error
[1] 2.810479
Finally, we'll plot the results.
x_axes = seq(1:length(y_pred))
plot(x_axes, y, type="l", col="red", lwd=2)
lines(x_axes, y_pred, col="blue",lwd=2)
legend("topleft", legend=c("yoriginal", "ypredicted"),
col=c("red", "blue"), lty=1,cex=0.8)
You may change the step size and observe the prediction results.
In this tutorial, we've briefly learned how to use Keras LSTM to predict regression data in R. The full source code is listed below.
library(keras)
N = 400
step = 2
set.seed(123)
n = seq(1:N)
a = n/10+4*sin(n/10)+sample(1:6,N,replace=T)+rnorm(N)
a = c(a,replicate(step,tail(a,1)))
x = NULL
y = NULL
for(i in 1:N)
{
s = i1+step
x = rbind(x,a[i:s])
y = rbind(y,a[s+1])
}
X = array(x, dim=c(N,step,1))
model = keras_model_sequential() %>%
layer_lstm(units=128, input_shape=c(step, 1), activation="relu") %>%
layer_dense(units=64, activation = "relu") %>%
layer_dense(units=32) %>%
layer_dense(units=1, activation = "linear")
model %>% compile(loss = 'mse',
optimizer = 'adam',
metrics = list("mean_absolute_error")
)
model %>% summary()
model %>% fit(X,y, epochs=50, batch_size=32, shuffle = FALSE, verbose=0)
y_pred = model %>% predict(X)
scores = model %>% evaluate(X, y, verbose = 0)
print(scores)
x_axes = seq(1:length(y_pred))
plot(x_axes, y, type="l", col="red", lwd=2)
lines(x_axes, y_pred, col="blue",lwd=2)
legend("topleft", legend=c("yoriginal", "ypredicted"),
col=c("red", "blue"), lty=1,cex=0.8)

This model only looks good because it probably overfits the data. You did not include any test/validation data to see if the model generalizes out of the training sample. Additionally, with only 400 data points but almost 80,000 learnable parameters, the memory capacity of the net is likely too large for this task. This means that the net was probably able to memorize the test data's specific inputoutput mappings, and will thus lack predictive power.
ReplyDeleteGood point! But, here I did not intend to build a perfect predictive model. The purpose of this post is to show a simple, workable example with a random data for beginners. Readers should consider every aspect of the model building when they work with real problems.
DeleteHello, excelent post, Im in a proyect using this algorithm and I have one question, if I have more predictors, on the model fit should I use ###fit(x1+x2,y,....) and the predictions ###predict(x1+x2) ??? or am I wrong?
ReplyDeleteThanks for your help. Great post.
You are welcome! You need to create combined X array data (contains all features x1, x2, ..) for your training and prediction. It goes like this;
Deletex1, x2, y
2, 3, 3
3, 4, 4
2, 4, => 4
3, 5, => 5
4, 6, => 6
Here, each window contains 3 elements of both x1 and x2 series.
2, 3,
3, 4,
2, 4, =>4
3, 4,
2, 4,
3, 5, => 5
2, 4,
3, 5,
4, 6, => 6
Thanks, I made an X array with all the predictors and it works. Got a mse = 15.9 (nice) with the default parameters, then I tunned the epochs parameter on the fit and got a better prediction. I´ve been tunnin with epochs and batch_size but I dont know very well how should I change the sequential keras model, (dense and units), I got 37 observations and 19 predictors. Can you give me advices with this tunning? Thanks for your time and post, my model's predictions are great, in fact I could stop now with my results but I want to improve and learn more about this model.
DeleteGood! Your data is too small to evaluate your model and improve the performance. To check the improvement in your model;
Delete1) Use bigger data,
2) Change the units number,
3) Add dense layer,
4) Add dropout layer, layer_dropout()
5) Change optimizer (rmsprop etc.)
Hi, I followed your advices and my model has improve, thanks. But (another doubt) I got some steps (number of samples to reach a new period) with differents numbers, at the beginning of the sample, each 4 samples change the period, and at the end, that changes for 5 periods. How should I attack this problem? 2 models? how can be ensembled?
DeleteThanks for your time, really.
How do I tune this model?
ReplyDeleteHi, I tried this method work for time series data with last 4 year monthly values. The predicted values are vague and i'm not sure of what i did wrong. I also tried by changing the step size but it is also not working out.Can you please help me out with it?
ReplyDelete