Sequence Prediction with LSTM model in PyTorch

     Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) architecture designed to overcome the limitations of traditional RNNs in capturing long-range dependencies in sequential data. 

    In this tutorial, we'll briefly learn about LSTM and how to implement an LSTM model with sequential data in PyTorch covering the following topics:
  1. Introduction to LSTM
  2. Data preparing
  3. Model definition and training
  4. Prediction
  5. Conclusion

Let's get started


Introduction to LSTM

    LSTM networks were developed to overcome the limitations of traditional RNNs, such as the vanishing gradient problem and difficulty in capturing long-term dependencies. LSTMs introduce gating mechanisms and a separate cell state, enabling better control over information flow and retention over long sequences. This design allows LSTMs to effectively capture complex temporal dependencies in sequential data, leading to significant improvements in tasks such as natural language processing and time-series analysis. 

     LSTM networks consist of memory cells with gates that regulate the flow of information. The forget gate controls what information to discard from the previous cell state, while the input gate determines what new information to add. The update gate combines these to produce the new cell state, and the output gate controls the output based on the updated cell state. LSTMs effectively capture temporal dependencies, making them suitable for tasks like time-series analysis and natural language processing.

    Despite their powerful architecture, LSTMs have limitations. They can be computationally expensive and memory-intensive, especially for long sequences. Additionally, they may struggle with capturing subtle temporal patterns or distinguishing between short and long-term dependencies. Tuning hyperparameters like sequence length and batch size can also be challenging.

Data preparing 

    Let's implement sequence data prediction with LSTM model. We start by loading the necessary libraries.

import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 

     In this tutorial we use simple sequential data. Below code shows how to generate sequence data and visualize it on a graph. Here, we use 720 samples as a training data and 80 samples for test data to forecast.

# Define parameters
step_size = 4
N = 800
forecast_start = 720

# Generate data
t = np.arange(0, N)
x = np.sin(0.02 * t) + 2 * np.random.rand(N)
df = pd.DataFrame(x)

Plot data
plt.axvline(df.index[forecast_start], c="r", label="forecast start point")

    Next, we convert data into training sequence and label with the given length. Below function helps us to create labels for sequence data.

# Convert data into sequence and label with given length
def create_labels(data, step):
X = np.array([data[i:i+step] for i in range(len(data) - step)])
y = np.array(data[step:])
return X, y

    We can split data into train and test parts using forecast_start variable, then generate sequence data and its labels. The np.reshape() function reshapes data for LSTM input. Train and test sets are converted to PyTorch tensors and DataLoader object is created using those tensors.

# Prepare data for training and testing
values = df.values
train, test = values[:forecast_start, :], values[forecast_start:N, :]

# generate sequence data
trainX, trainY = create_labels(train, step_size)
testX, testY = create_labels(test, step_size)

# Reshape data for LSTM input
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

# Convert data to PyTorch tensors
trainX_tens = torch.tensor(trainX, dtype=torch.float32)
trainY_tens = torch.tensor(trainY, dtype=torch.float32)
testX_tens = torch.tensor(testX, dtype=torch.float32)
testY_tens = torch.tensor(testY, dtype=torch.float32)

# Create DataLoader for training
train_dataset =, trainY_tens)
train_loader =, batch_size=64)


Model definition and training

    We define an LSTM model using PyTorch's nn.Module class. In the init method, we initialize the input, hidden, and output sizes of the LSTM model. The nn.LSTM() method constructs the LSTM layer with the specified input and hidden sizes, where batch_first=True indicates that input and output tensors have the shape (batch_size, sequence_length, input_size). Additionally, we define a fully connected linear layer using the nn.Linear() method, which maps the hidden state output of the LSTM to the desired output size.

    In the forward method, we implement the forward pass through the lstm layer, generating an output tensor 'out'. Then, we apply the fully connected layer to the last time step's output of the LSTM (out[:, -1, :]), producing the final output of the model.

# Define LSTM model
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(LSTMModel, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)

def forward(self, x):
out, _ = self.lstm(x)
out = self.fc(out[:, -1, :]) # Take the last time step's output
return out

    We define hyperparameters for our model and initialize the model using the above LSTMModel class. We use MSELoss() as a loss function and Adam optimizer.

# Hyperparameters
input_size = step_size
hidden_size = 128
output_size = 1
epochs = 100
learning_rate = 0.0001
# Instantiate LSTM model
model = LSTMModel(input_size=input_size, hidden_size=hidden_size, output_size=output_size)

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    Next, we train model by iterating over the number of epochs and print the loss in every 10 epochs.

# Train the model
for epoch in range(epochs):
for batch_X, batch_Y in train_loader:
optimizer.zero_grad() # Clears the gradients of all optimized parameters.
output = model(batch_X)

# Computes the loss between the model predictions and the ground
# truth labels for the current mini-batch.
loss = criterion(output, batch_Y)

# Computes gradients of the loss with respect to model parameters.

# Updates model parameters based on the computed gradients using
# the specified optimization algorithm.

if (epoch + 1) % 10 == 0:
print(f'Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}')

    Now, we can train the model by running the code.

Epoch [10/100], Loss: 4.1006
Epoch [20/100], Loss: 3.0744
Epoch [30/100], Loss: 1.9591
Epoch [40/100], Loss: 1.0960
Epoch [50/100], Loss: 0.6668
Epoch [60/100], Loss: 0.5284
Epoch [70/100], Loss: 0.4938
Epoch [80/100], Loss: 0.4853
Epoch [90/100], Loss: 0.4830
Epoch [100/100], Loss: 0.4821 


    After the training, we can predict test data by using trained model and visualize it in a graph.

# Evaluation
with torch.no_grad():
testPredict = model(testX_tens)

# Plot results
index = range(len(testY))
plt.plot(index, testY, label="Ground truth")
plt.plot(index, testPredict.numpy(), label="Predicted")


    In this tutorial, we learned about LSTM networks and how to implement LSTM model to predict sequential data in PyTorch. Overview of LSTMs, data preparation, defining LSTM model, training, and prediction of test data are explained in this tutorial. I hope this tutorial will help you to understand LSTMs and their application in sequential data.

No comments:

Post a Comment