Introduction to Recurrent Neural Networks (RNNs) with PyTorch

    Recurrent Neural Network (RNN) is a type of neural network architecture designed for sequence modeling and processing tasks. Unlike feedforward neural networks, which process each input independently, RNNs have connections that allow them to combine information about previous inputs into their current computations. 

    In this tutorial, we'll briefly learn about RNNs and how to implement a simple RNN model with sequential data in PyTorch covering the following topics:

  1. Introduction to RNNs
  2. Data preparing
  3. Model definition and training
  4. Prediction
  5. Conclusion

Let's get started


Introduction to RNNs

    RNNs are a specialized type of neural network designed for sequential data. The key feature of RNNs is their ability to maintain a state or memory of previous inputs while processing a sequence of data points.

    RNNs facilitate recurrent connections that allow information to persist across time steps. This characteristic enables RNNs to capture temporal dependencies in sequential data such as time series, natural language, or any other sequential data. RNNs process input sequences sequentially, updating hidden states at each step to encode information about previous inputs, which is crucial for tasks where understanding the sequence of data is important.

    Recurrent Neural Networks (RNNs) confront several challenges:

  1. Vanishing Gradient Problem: RNNs suffer from vanishing gradients during backpropagation which makes it difficult for model to learn long-range dependencies in sequences.
  2. Exploding Gradient Problem: On the other hand, RNNs may also suffer exploding gradients, where gradients grow exponentially during training.
  3. Memory and Computational Intensity: RNNs can be memory and computation intensive, particularly when processing long sequences, slowing down training and inference.
  4. Difficulty in Capturing Global Context: Due to their incremental processing of sequential data, RNNs may struggle to capture global context or dependencies across distant parts of the sequence.

    LSTM and GRU architectures have become popular alternatives to traditional RNNs due to their ability to address the limitations of vanishing gradients, exploding gradients, memory, and computational intensity, while improving the model's ability to capture global context in sequential data.

Data preparing 

We start by loading the necessary libraries.

import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

     In this tutorial we use simple sequential data. Below code shows how to generate and visualize it on a graph. Here, we use 800 samples as a training data and 200 samples for test data to forecast.

# Define parameters
step_size = 3
N = 1000
forecast_start = 800

# Generate data
t = np.arange(0, N)
x = np.sin(0.02*t) + 2*np.random.rand(N)
df = pd.DataFrame(x)

# Plot data
plt.axvline(df.index[forecast_start], c="r", label="forecast start point")

    Next, we convert data into training sequence and label with the given length. Below function helps us to create labels for sequence data.

# Convert data into sequence and label with given length
def create_labels(data, step):
X, y = [], []
for i in range(len(data)-step):
d = i + step
return np.array(X), np.array(y)

    We can split data into train and test parts using forecast_start variable, then generate sequence data and its labels. The np.reshape() function reshapes data for RNN input. A train and test sets are converted to PyTorch tensors and DataLoader objects are created using those tensors.

# Prepare data for training and testing
values = df.values
train, test = values[:forecast_start,:], values[forecast_start:N,:]

# generate sequence data
trainX, trainY = create_labels(train, step_size)
testX, testY = create_labels(test, step_size)

# Reshape data for RNN input
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

# Convert data to PyTorch tensors
trainX_tens = torch.tensor(trainX, dtype=torch.float32)
trainY_tens = torch.tensor(trainY, dtype=torch.float32)
testX_tens = torch.tensor(testX, dtype=torch.float32)
testY_tens = torch.tensor(testY, dtype=torch.float32)
# Create DataLoader for training
train_dataset =, trainY_tens)
train_loader =, batch_size=32)


Model definition and training

    We define a simple Recurrent Neural Network (RNN) model using PyTorch's nn.Module class. In the init method, we initialize the input, hidden, and output sizes of the RNN model. The nn.RNN() method constructs the RNN layer with the specified input and hidden sizes, where batch_first=True indicates that input and output tensors have the shape (batch_size, sequence_length, input_size). Additionally, we define a fully connected linear layer using the nn.Linear() method, which maps the hidden state output of the RNN to the desired output size.

    In the forward method, we implement the forward pass through the RNN layer, generating an output tensor 'out'. Then, we apply the fully connected layer to the last time step's output of the RNN (out[:, -1, :]), producing the final output of the model.

# Define RNN model
class SimpleRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleRNN, self).__init__()
self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)

def forward(self, x):
out, _ = self.rnn(x)
out = self.fc(out[:, -1, :]) # Take the last time step's output
return out

    We define hyperparameters for our model and initialize the model using SimpleRNN class. We use MSELoss() as a loss function and Adam optimizer.

# Hyperparameters
input_size = step_size
hidden_size = 128
output_size = 1
epochs = 100
learning_rate = 0.0001
# Instantiate RNN model
model = SimpleRNN(input_size=input_size, hidden_size=hidden_size, output_size=output_size)

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

Next, we train model by iterating over the number of epochs and print the loss in every 10 epochs.

# Train the model
for epoch in range(epochs):
for batch_X, batch_Y in train_loader:
optimizer.zero_grad() # Clears the gradients of all optimized parameters.
output = model(batch_X)
# Computes the loss between the model predictions and the ground
# truth labels for the current mini-batch.
loss = criterion(output, batch_Y)
# Computes gradients of the loss with respect to model parameters.
# Updates model parameters based on the computed gradients using
# the specified optimization algorithm.
if (epoch+1) % 10 == 0:
print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

Epoch [10/100], Loss: 0.4343
Epoch [20/100], Loss: 0.3793
Epoch [30/100], Loss: 0.3902
Epoch [40/100], Loss: 0.3918
Epoch [50/100], Loss: 0.3930
Epoch [60/100], Loss: 0.3941
Epoch [70/100], Loss: 0.3951
Epoch [80/100], Loss: 0.3959
Epoch [90/100], Loss: 0.3966
Epoch [100/100], Loss: 0.3971


    We predict test data by using trained model and visualize it in a graph.

# Evaluation
with torch.no_grad():
testPredict = model(testX_tens)

# Plot results
index = range(len(testY))
plt.plot(index, testY, label="Ground truth")
plt.plot(index, testPredict.numpy(), label="Predicted")


    In this tutorial, we learned about RNNs and how to implement simple RNN model with sequential data in PyTorch. Overview of RNNs, data preparation, defining RNN model architecture, and model training and prediction of test data are explained in this tutorial. I hope this tutorial will help you to understand RNNs and their application in sequential data.

No comments:

Post a Comment