Sequence Prediction with GRU Model in PyTorch

     Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) designed to capture long-term dependencies in sequential data efficiently. It is an extension of traditional RNNs and shares similarities with LSTM (Long Short-Term Memory) networks.

    In this tutorial, we'll briefly learn about GRU model and how to implement sequential data prediction with GRU in PyTorch covering the following topics:
  1. Introduction to GRU
  2. Data preparing
  3. Model definition and training
  4. Prediction
  5. Conclusion

Let's get started


Introduction to GRU

    The key idea behind GRU is to address the vanishing gradient problem and improve the ability of RNNs to retain information over long sequences. GRU achieves this by introducing gating mechanisms that regulate the flow of information within the network.

    A typical GRU unit consists of the following components:

  • Update Gate determines how much of the past information to keep and how much new information to let through. It is calculated using the input at the current time step and the previous hidden state.
  • Reset Gate controls which parts of the past hidden state should be ignored. It is calculated in a similar manner to the update gate.
  • Candidate Activation computes a new candidate activation based on the current input and the previous hidden state, considering the reset gate.
  • Hidden State combines the candidate activation with the update gate to produce the current hidden state.

    The update and reset gates allow the model to selectively update or ignore information from previous time steps, addressing the vanishing gradient problem and facilitating the capture of long-range dependencies.

Data preparing 

    Let's implement sequence data prediction with GRU model in PyTorch. We start by loading the necessary libraries for this tutorial.

import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 

     We use simple sequential data in this tutorial. Below code shows how to generate sequence data and visualize it on a graph. Here, we use 720 samples as a training data and 130 samples for test data to forecast.

# Define parameters
step_size = 4
N = 850
forecast_start = 720

# Generate data
t = np.arange(0, N)
x = np.sin(0.03 * t) + 1.2 * np.random.rand(N)+t/300
df = pd.DataFrame(x)

# Plot data
plt.axvline(df.index[forecast_start], c="r", label="forecast start point")

Next, we convert data into training sequence and label with the given length. Below function helps us to create labels for sequence data.

# Convert data into sequence and label with given length
def create_labels(data, step):
X = np.array([data[i:i+step] for i in range(len(data) - step)])
y = np.array(data[step:])
return X, y

    We can split data into train and test parts using forecast_start variable, then generate sequence data and its labels. The np.reshape() function reshapes data for LSTM input. Train and test sets are converted to PyTorch tensors and DataLoader object is created using those tensors.

# Prepare data for training and testing
values = df.values
train, test = values[:forecast_start, :], values[forecast_start:N, :]

# generate sequence data
trainX, trainY = create_labels(train, step_size)
testX, testY = create_labels(test, step_size)

# Reshape data for LSTM input
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

# Convert data to PyTorch tensors
trainX_tens = torch.tensor(trainX, dtype=torch.float32)
trainY_tens = torch.tensor(trainY, dtype=torch.float32)
testX_tens = torch.tensor(testX, dtype=torch.float32)
testY_tens = torch.tensor(testY, dtype=torch.float32)

# Create DataLoader for training
train_dataset =, trainY_tens)
train_loader =, batch_size=64)


Model definition and training

    We define an GRU model using PyTorch's nn.Module class. In the init method, we initialize the input, hidden, and output sizes of the GRU model. The nn.GRU() method constructs the GRU layer with the specified input and hidden sizes, where batch_first=True indicates that input and output tensors have the shape (batch_size, sequence_length, input_size). Additionally, we define a fully connected linear layer using the nn.Linear() method, which maps the hidden state output of the GRU to the desired output size.

    In the forward method, we implement the forward pass through the gru layer, generating an output tensor 'out'. Then, we apply the fully connected layer to the last time step's output of the GRU (out[:, -1, :]), producing the final output of the model.

# Define GRU model
class GRUModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(GRUModel, self).__init__()
self.gru = nn.GRU(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)

def forward(self, x):
out, _ = self.gru(x)
out = self.fc(out[:, -1, :]) # Take the last time step's output
return out

    We define hyperparameters for our model and initialize the model using the abvoe GRUModel class. We use MSELoss() as a loss function and Adam optimizer.

# Hyperparameters
input_size = step_size
hidden_size = 128
output_size = 1
epochs = 100
learning_rate = 0.0001
# Instantiate GRU model
model = GRUModel(input_size=input_size, hidden_size=hidden_size, output_size=output_size)

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) 

    Next, we train model by iterating over the number of epochs and print the loss in every 10 epochs.

# Train the model
for epoch in range(epochs):
for batch_X, batch_Y in train_loader:
optimizer.zero_grad() # Clears the gradients of all optimized parameters.
output = model(batch_X)

# Computes the loss between the model predictions and the ground
# truth labels for the current mini-batch.
loss = criterion(output, batch_Y)

# Computes gradients of the loss with respect to model parameters.

# Updates model parameters based on the computed gradients using
# the specified optimization algorithm.

if (epoch + 1) % 10 == 0:
print(f'Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}')

    Now, we can start training the model.

Epoch [10/100], Loss: 7.4051
Epoch [20/100], Loss: 4.0839
Epoch [30/100], Loss: 1.6807
Epoch [40/100], Loss: 0.5536
Epoch [50/100], Loss: 0.2236
Epoch [60/100], Loss: 0.1506
Epoch [70/100], Loss: 0.1338
Epoch [80/100], Loss: 0.1286
Epoch [90/100], Loss: 0.1256
Epoch [100/100], Loss: 0.1231 


    After the training, we can predict test data by using trained model and visualize it in a graph.

# Evaluation
with torch.no_grad():
testPredict = model(testX_tens)

# Plot results
index = range(len(testY))
plt.plot(index, testY, label="Ground truth")
plt.plot(index, testPredict.numpy(), label="Predicted")


    GRU simplifies the architecture of traditional LSTM networks by combining the forget and input gates into a single update gate, making it computationally efficient in capturing temporal dependencies.

    In this tutorial, we learned about GRU networks and how to predict sequence data with GRU model in PyTorch. Overview of GRU, data preparation, GRU model definition, training, and prediction of test data are explained in this tutorial. I hope this tutorial will help you to understand GRU and its application in sequential data.

No comments:

Post a Comment