DataTechNotes: Hyperparameter Tuning with Grid Search in PyTorch

Grid search is a technique for optimizing hyperparameters during model training. In this tutorial, I will explain how to use Grid Search to fine-tune the hyperparameters of neural network models in PyTorch. This tutorial will cover:

Introduction to Grid Search
Implementation and performance check
Conclusion

Let's get started.

Introduction to Grid Search

Grid search is a hyperparameter optimization technique used to find the best combination of hyperparameters for a neural network model. It involves systematically searching through a predefined set of hyperparameters and evaluating the model's performance for each combination.

To perform a grid search, we define the hyperparameters to tune and specify the possible values for each. Common hyperparameters include learning rate, batch size, number of epochs, activation functions, and others. Below, we define candidate values for the hyperparameters: learning rate, momentum, and batch size. These candidate values will help us find the best combination for optimal performance.

 
# Define hyperparameter grid
param_grid = {
    'lr': [0.001, 0.01, 0.1],
    'momentum': [0.8, 0.9, 0.95],
    'batch_size': [16, 32, 64]
}

Next, we use the ParameterGrid class from scikit-learn to create a grid with all possible combinations of the specified hyperparameters.

 
from sklearn.model_selection import ParameterGrid

grid = ParameterGrid(hyperparameter_grid)

For each combination of hyperparameters, we train a model and evaluate its performance on a validation set. Record the performance metrics for each combination.

 
# Perform grid search to find the best parameters
best_params = None
best_loss = float('inf')  # Initialize best_loss with infinity
for params in ParameterGrid(param_grid):
    print(f"Testing parameters: {params}")

    # Get data loaders for current batch size
    trainloader, valloader = get_data_loaders(X_train, y_train, X_val, y_val, params['batch_size'])

    # Initialize model, loss function, and optimizer with current parameters
    model = SimpleNN()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])

    # Train the model
    train(model, trainloader, criterion, optimizer, epochs=5)
    
    # Validate the model
    val_loss = validate(model, valloader, criterion)

    # Update best parameters if current validation loss is lower
    if val_loss < best_loss:
        best_loss = val_loss
        best_params = params

Grid search examines every possible combination, ensuring that the best combination within the predefined grid is found. It is easy to understand and implement, making it a good starting point for hyperparameter optimization.

However, applying grid search has some disadvanteges that the number of combinations grows exponentially with the number of hyperparameters and their possible values, leading to high computational costs. It may spend a lot of time evaluating hyperparameter combinations that are not ideal.

Grid search is best suited for smaller models or as a preliminary step when dealing with smaller datasets. For larger models or more extensive hyperparameter spaces, it’s often practical to use more sophisticated optimization techniques to save computational resources and time.

Implementation and performance check

In the code below, we use grid search to optimize a neural network model in PyTorch.

For this example, we implement a simple neural network called SimpleNN(), generate a synthetic dataset, and split it into training and testing sets. The get_data_loaders() method creates data loaders for both the training and validation data with the specified batch size. The train() function is used to train the model on the training data, while the validate() function evaluates the model on the validation data.

Let's run the code and evaluate the model's performance.

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import ParameterGrid
from sklearn.model_selection import train_test_split

# Define a simple neural network model
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        # Define a fully connected layer with input size 10 and output size 2
        self.fc = nn.Linear(10, 2)

    def forward(self, x):
        # Forward pass through the network
        return self.fc(x)

# Generate synthetic data
X = torch.randn(32000, 10)  # Random input features
y = torch.randint(0, 2, (32000,))  # Random binary labels

# Split the synthetic data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

def get_data_loaders(X_train, y_train, X_val, y_val, batch_size):
    """
    Create data loaders for training and validation data with a given batch size.
    """
    trainloader = [(X_train[i:i+batch_size], y_train[i:i+batch_size]) for i in range(0, len(X_train), batch_size)]
    valloader = [(X_val[i:i+batch_size], y_val[i:i+batch_size]) for i in range(0, len(X_val), batch_size)]
    return trainloader, valloader

def train(model, trainloader, criterion, optimizer, epochs=5):
    """
    Train the model on the training data.
    """
    model.train()  # Set model to training mode
    for epoch in range(epochs):
        running_loss = 0.0
        for inputs, labels in trainloader:
            optimizer.zero_grad()  # Clear previous gradients
            outputs = model(inputs)  # Forward pass
            loss = criterion(outputs, labels)  # Compute loss
            loss.backward()  # Backward pass
            optimizer.step()  # Update model parameters
            running_loss += loss.item()  # Accumulate loss
        print(f"Epoch {epoch+1}, Loss: {running_loss / len(trainloader)}")  # Print average loss for the epoch

def validate(model, valloader, criterion):
    """
    Evaluate the model on the validation data.
    """
    model.eval()  # Set model to evaluation mode
    val_loss = 0.0
    with torch.no_grad():  # Disable gradient computation
        for inputs, labels in valloader:
            outputs = model(inputs)  # Forward pass
            loss = criterion(outputs, labels)  # Compute loss
            val_loss += loss.item()  # Accumulate loss
    return val_loss / len(valloader)  # Return average validation loss

# Define parameter grid for grid search
param_grid = {
    'lr': [0.001, 0.01, 0.1],  # Learning rates to test
    'momentum': [0.8, 0.9, 0.95],  # Momentum values to test
    'batch_size': [16, 32, 64]  # Batch sizes to test
}

# Perform grid search to find the best parameters
best_params = None
best_loss = float('inf')  # Initialize best_loss with infinity
for params in ParameterGrid(param_grid):
    print(f"Testing parameters: {params}")

    # Get data loaders for current batch size
    trainloader, valloader = get_data_loaders(X_train, y_train, X_val, y_val, params['batch_size'])

    # Initialize model, loss function, and optimizer with current parameters
    model = SimpleNN()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])

    # Train the model
    train(model, trainloader, criterion, optimizer, epochs=5)
    
    # Validate the model
    val_loss = validate(model, valloader, criterion)

    # Update best parameters if current validation loss is lower
    if val_loss < best_loss:
        best_loss = val_loss
        best_params = params

# Print the best parameters and corresponding validation loss
print(f"Best parameters found by Grid Search: {best_params}")
print(f"Best validation loss: {best_loss}")

The output is shown below:

 
Testing parameters: {'batch_size': 16, 'lr': 0.001, 'momentum': 0.8}
Epoch 1, Loss: 0.6991030895709991
Epoch 2, Loss: 0.69358927693218
Epoch 3, Loss: 0.6935953156277538
Epoch 4, Loss: 0.6935954415053129
Epoch 5, Loss: 0.6935954458266497
Testing parameters: {'batch_size': 16, 'lr': 0.001, 'momentum': 0.9}
Epoch 1, Loss: 0.7035660340636969
Epoch 2, Loss: 0.69402473654598
Epoch 3, Loss: 0.6940247337147594
Epoch 4, Loss: 0.6940247337147594
Epoch 5, Loss: 0.6940247334539891
Testing parameters: {'batch_size': 16, 'lr': 0.001, 'momentum': 0.95}
Epoch 1, Loss: 0.6975623657181859
Epoch 2, Loss: 0.6949406218156219
Epoch 3, Loss: 0.6949406227841973
Epoch 4, Loss: 0.6949406222626567
Epoch 5, Loss: 0.6949406220763922
Testing parameters: {'batch_size': 16, 'lr': 0.01, 'momentum': 0.8}
Epoch 1, Loss: 0.6981436071917415
Epoch 2, Loss: 0.6970651404559612
Epoch 3, Loss: 0.6970651407912374
Epoch 4, Loss: 0.6970651409775018
Epoch 5, Loss: 0.6970651403814554
Testing parameters: {'batch_size': 16, 'lr': 0.01, 'momentum': 0.9}
Epoch 1, Loss: 0.7021853063255549
Epoch 2, Loss: 0.7013016481697559
Epoch 3, Loss: 0.7013016485422849
Epoch 4, Loss: 0.7013016484305262
Epoch 5, Loss: 0.7013016491383314
Testing parameters: {'batch_size': 16, 'lr': 0.01, 'momentum': 0.95}
Epoch 1, Loss: 0.7114614603668451
Epoch 2, Loss: 0.710773399695754
Epoch 3, Loss: 0.7107733990624547
Epoch 4, Loss: 0.7107733986526727
Epoch 5, Loss: 0.7107733987271786
Testing parameters: {'batch_size': 16, 'lr': 0.1, 'momentum': 0.8}
 
...
 
Testing parameters: {'batch_size': 64, 'lr': 0.01, 'momentum': 0.95}
Epoch 1, Loss: 0.7011606568098068
Epoch 2, Loss: 0.6982702931761742
Epoch 3, Loss: 0.6982703319191933
Epoch 4, Loss: 0.69827033162117
Epoch 5, Loss: 0.6982703322172165
Testing parameters: {'batch_size': 64, 'lr': 0.1, 'momentum': 0.8}
Epoch 1, Loss: 0.704802389293909
Epoch 2, Loss: 0.7043688933551312
Epoch 3, Loss: 0.7043688948452472
Epoch 4, Loss: 0.7043688946962356
Epoch 5, Loss: 0.7043688967823982
Testing parameters: {'batch_size': 64, 'lr': 0.1, 'momentum': 0.9}
Epoch 1, Loss: 0.7200291982293129
Epoch 2, Loss: 0.7179572413861751
Epoch 3, Loss: 0.717957241088152
Epoch 4, Loss: 0.7179572395980358
Epoch 5, Loss: 0.7179572404921055
Testing parameters: {'batch_size': 64, 'lr': 0.1, 'momentum': 0.95}
Epoch 1, Loss: 0.7495152302086353
Epoch 2, Loss: 0.7483982764184475
Epoch 3, Loss: 0.7483981350064277
Epoch 4, Loss: 0.7483981330692768
Epoch 5, Loss: 0.7483981341123581
Best parameters found by Grid Search: {'batch_size': 16, 'lr': 0.001, 'momentum': 0.8}
Best validation loss: 0.6928423246741295
 

The result shows that the best hyper parameters for our model are 'batch_size': 16, 'lr': 0.001, 'momentum': 0.8, and best validation loss: 0.6928423246741295.

Conclusion

In this tutorial, we explored the implementation of grid search method for neural network models in PyTorch.

Grid search is a direct approach to hyperparameter tuning, where we exhaustively try all combinations of hyperparameters to find the optimal set. Despite its simplicity and effectiveness, it's computationally expensive.

DataTechNotes

Pages

Hyperparameter Tuning with Grid Search in PyTorch

No comments:

Post a Comment