Understanding Batch Normalization with Keras in Python


   Batch Normalization is a technique to normalize the activation between the layers in neural networks to improve the training speed and accuracy (by regularization) of the model. It is intended to reduce the internal covariate shift for neural networks. The internal covariate shift means that if the first layer changes its parameters based on back-propagation feedback, the second layer also needs to adjust its parameters based on the output of the first layer, and the third layer after the second and so on. Consequent readjustment in network layers destabilizes all the subsequent layers' learning process. This makes the training process slow especially the networks with a large number of layers. Batch Normalization is used to overcome this issue. 
   Batch Normalization works well with image data training and it is widely used in training of Generative Adversarial Networks (GAN) models. 
   In this tutorial, we'll learn how to apply batch normalization in deep learning networks with Keras. The tutorial covers.
  1. Normalization
  2. Preparing the data
  3. Building the model
  4. Comparing the training results
  5. Source code listing
We'll start by loading the required packages.

from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D 
from keras.layers import Dense, Flatten, Dropout
from keras.layers import BatchNormalization
from keras.datasets import mnist
from keras.optimizers import RMSprop
import matplotlib.pyplot as plt


Normalization

Normalization is a method to scale the input data with 0 mean and 1 standard deviation that is all values are distributed between -1 and 1. It converts raw numbers into the distribution values. The below example shows how to normalize the data and its values after normalization.

import sklearn.preprocessing as prep

data =[[10, 321, -22, 3210, 23, -321]]
norm = prep.normalize(data)
print(norm) 
[[ 0.00308441  0.09900951 -0.0067857   0.99009512  0.00709414 -0.09900951]] 

Here, the data values are scaled in a range between -1 and 1. This conversion improves model training speed and the same approach is used in Batch Normalization. In neural networks, ever layer applies a separate normalization layer so that it is called a Batch Normalization.



Preparing the data

For this tutorial, we'll use the 'mnist' dataset. We'll start by loading the dataset and check the training set length.

(trainX, trainY), (testX, testY) = mnist.load_data()
print(trainX.shape)
(60000, 28, 28) 

To make the training process lighter, I'll use some part of the dataset.

trainX = trainX[1:8001,]
trainY = trainY[1:8001,]
testX = testX[1:201,]
testY = testY[1:201,]

Next, we'll reshape the training data and convert the output data into a categorical type.

trainX = trainX.reshape((trainX.shape[0], 28,28,1))
testX = testX.reshape((testX.shape[0], 28,28,1))
trainY = to_categorical(trainY)
testY = to_categorical(testY)


Building the model

In Karas, we can easily implement Batch Normalization by adding the BatchNormalization() layer into the model.

model= Sequential()
model.add(Conv2D(32, (3,3), activation="relu", input_shape=(28,28,1))) 
model.add(BatchNormalization())
...

We'll write a function to train the model for both with and without a Batch Normalization. By setting true to the bn parameter the function adds BatchNormalization layer into the model. After the training, it returns the training history of the model. Then we'll compare training history with both methods.

def build_model(trainX, trainY, testX, testY, bn=False):
 model= Sequential()
 model.add(Conv2D(32, (3,3), activation="relu", input_shape=(28,28,1)))
 model.add(Conv2D(64, (3,3), activation="relu"))
 if(bn):
  model.add(BatchNormalization())
 model.add(MaxPooling2D((2,2)))
 model.add(Dropout(0.2))
 model.add(Flatten())
 model.add(Dense(128, activation="relu"))
 model.add(Dropout(0.2))
 if(bn):
  model.add(BatchNormalization())
 model.add(Dense(10, activation="softmax"))
 model.compile(loss="categorical_crossentropy", optimizer=RMSprop(),
     metrics=["accuracy"]) 
 print(model.summary())
 history = model.fit(trainX, trainY, epochs=30, batch_size=16, 
       validation_data=(testX, testY), verbose=0)
 _, acc = model.evaluate(testX, testY, verbose=0)
 if(bn):
  print("Accuracy with BN: ", acc)
 else:
  print("Accuracy without BN: ", acc)
 return history


Comparing the training results

Next, we'll train the 'mnist' data with the above function. First, we call the function without Batch Normalization. It takes a few munites to train the data on the CPU. After the training is finished, we can check the model and its accuracy.

model_hist = build_model(trainX, trainY, testX, testY)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_9 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_9 (Dropout)          (None, 12, 12, 64)        0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_9 (Dense)              (None, 128)               1179776   
_________________________________________________________________
dropout_10 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense_10 (Dense)             (None, 10)                1290      
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________
None
Accuracy without BN:  0.91 


Next, we'll call the function by applying the Batch Normalization.

model_hist_bn = build_model(trainX, trainY, testX, testY, bn=True)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_13 (Conv2D)           (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 24, 24, 64)        18496     
_________________________________________________________________
batch_normalization_9 (Batch (None, 24, 24, 64)        256       
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_13 (Dropout)         (None, 12, 12, 64)        0         
_________________________________________________________________
flatten_7 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_13 (Dense)             (None, 128)               1179776   
_________________________________________________________________
dropout_14 (Dropout)         (None, 128)               0         
_________________________________________________________________
batch_normalization_10 (Batc (None, 128)               512       
_________________________________________________________________
dense_14 (Dense)             (None, 10)                1290      
=================================================================
Total params: 1,200,650
Trainable params: 1,200,266
Non-trainable params: 384
_________________________________________________________________
None
Accuracy with BN:  0.975 


Finally, we'll visualize both training results in a plot.

f = plt.figure()
f.add_subplot(1,2,1)
plt.title("Train without Batch Normalization")
plt.plot(model_hist.history['acc'], label='train')
plt.plot(model_hist.history['val_acc'], label="test")
plt.legend()
f.add_subplot(1,2,2)
plt.title("Train with Batch Normalization")
plt.plot(model_hist_bn.history['acc'], label='train')
plt.plot(model_hist_bn.history['val_acc'], label="test")
plt.legend()
plt.show()
 



   In this tutorial, we've briefly learned Batch Normalization and how to apply it to the neural networks in Keras.


Source code listing

from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D 
from keras.layers import Dense, Flatten, Dropout
from keras.layers import BatchNormalization
from keras.datasets import mnist
from keras.optimizers import RMSprop
import matplotlib.pyplot as plt


(trainX, trainY), (testX, testY) = mnist.load_data()
print(trainX.shape)

trainX = trainX[1:8001,]
trainY = trainY[1:8001,]
testX = testX[1:201,]
testY = testY[1:201,]

trainX = trainX.reshape((trainX.shape[0], 28,28,1))
testX = testX.reshape((testX.shape[0], 28,28,1))
trainY = to_categorical(trainY)
testY = to_categorical(testY)

def build_model(trainX, trainY, testX, testY, bn=False):
 model= Sequential()
 model.add(Conv2D(32, (3,3), activation="relu", input_shape=(28,28,1)))
 model.add(Conv2D(64, (3,3), activation="relu"))
 if(bn):
  model.add(BatchNormalization())
 model.add(MaxPooling2D((2,2)))
 model.add(Dropout(0.2))
 model.add(Flatten())
 model.add(Dense(128, activation="relu"))
 model.add(Dropout(0.2))
 if(bn):
  model.add(BatchNormalization())
 model.add(Dense(10, activation="softmax"))
 model.compile(loss="categorical_crossentropy", optimizer=RMSprop(),
     metrics=["accuracy"]) 
 print(model.summary())
 history = model.fit(trainX, trainY, epochs=30, batch_size=16, 
       validation_data=(testX, testY), verbose=0)
 _, acc = model.evaluate(testX, testY, verbose=0)
 if(bn):
  print("Accuracy with BN: ", acc)
 else:
  print("Accuracy without BN: ", acc)
 return history

model_hist = build_model(trainX, trainY, testX, testY)
model_hist_bn = build_model(trainX, trainY, testX, testY,bn=True)


f = plt.figure()
f.add_subplot(1,2,1)
plt.title("Train without Batch Normalization")
plt.plot(model_hist.history['acc'], label='train')
plt.plot(model_hist.history['val_acc'], label="test")
plt.legend()
f.add_subplot(1,2,2)
plt.title("Train with Batch Normalization")
plt.plot(model_hist_bn.history['acc'], label='train')
plt.plot(model_hist_bn.history['val_acc'], label="test")
plt.legend()
plt.show()


Further Reading:

Batch Normalization: Accelerating Deep Network Training byReducing Internal Covariate Shift

No comments:
Post a Comment