DataTechNotes: Understanding Activation Functions with Python

The activation function is one of the important building blocks of neural networks. Based on input data, coming from one or multiple outputs from the neurons from the previous layer, the activation function decides to activate the neuron or not. The activation decision comes after the summing up the inputs and their weights and adding the bias value. This process provides the nonlinearity between the input and output values in a network.
In this tutorial, we'll learn some of the mainly used activation function in neural networks like sigmoid, tanh, ReLU, and Leaky ReLU and their implementation with Keras in Python. The tutorial covers:

Sigmoid function
Tanh function
ReLU (Rectified Linear Unit) function
Leaky ReLU function

We'll start by loading the following libraries.

import numpy as np
import matplotlib.pyplot as plt

from keras.models import Sequential
from keras.layers import Activation, Dense, LeakyReLU

To check the performance of the activation function, we'll use x generated sequence data.

x = np.arange(-5, 5, 0.1)
print(x[1:10])

[-4.9 -4.8 -4.7 -4.6 -4.5 -4.4 -4.3 -4.2 -4.1]

Sigmoid function

Sigmoid function transforms input value to the output between the range from 0 and 1. It is also called a logistic function and the curve of a function looks S-shaped. It is used in cases like making the final decision in the binary classification layer in a network.
Let's define the function in Python.

def sigmoid(x):
  return 1 / (1 + np.exp(-x))

Next, we'll draw the function in a plot.

y = [sigmoid(i) for i in x]

plt.axvline(x=0, color="red", linewidth=.5)
plt.axhline(y=0.5, color="red", linewidth=.5)
plt.plot(x, y)
plt.show()

Sigmoid can be implemented in a Keras model as shown below. There are two ways to add the activation layer into the model, you can use any of them.

model = Sequential()
...
model.add(keras.layers.Dense(1, activation="sigmoid"))

model.add(Activation('sigmoid'))

Tanh function

Tanh (Tangent Hyperbolic) function scales data to the range from -1 to 1 and centers the mean to 0. It is similar to sigmoid and the curve is S-shaped. We'll define the function in Python.

def tanh(x):
 return np.tanh(x)

And draw the function in a plot.

y = [tanh(i) for i in x]

plt.axvline(x=0, color="red", linewidth=.5)
plt.axhline(y=0, color="red", linewidth=.5)
plt.plot(x, y)
plt.show()

Tanh can be implemented in a Keras model as shown below.

model = Sequential()
...
model.add(Dense(10, activation="tanh"))

model.add(Activation('tanh'))

ReLU function

The ReLU stands for the Rectified Linear Unit and it is a commonly used activation function in neural networks these days. The function transforms to 0 if the input value is negative and keeps it if it is positive. The ReLU is cheaper and faster in computation than the sigmoid and tanh function.
We'll define the function in Python.

def relu(x):
 return np.maximum(0, x)

And draw the function in a plot.

y = [relu(i) for i in x]
plt.axvline(x=0, color="red", linewidth=.5)
plt.axhline(y=0, color="red", linewidth=.5)
plt.plot(x, y)
plt.show()

ReLU can be implemented in a Keras model as shown below.

model = Sequential()
...

model.add(Dense(10, activation="relu"))

model.add(Activation('relu'))

Leaky ReLU function

ReLU function converts all negative inputs to 0. The leaky ReLU prevents all negative neurons from dying by providing a small ingredient that allows some values less than 0. And this process improves the training capability of the network. We'll define the function in Python. Here, the alpha is used to multiply the input value.

def leakyrelu(x, alpha=0.01): 
 if (x>0):
  return np.maximum(0, x)
 else:
  return x*alpha

And draw the function in a plot.

y = [leakyrelu(i, alpha=0.1) for i in x]
plt.axvline(x=0, color="red", linewidth=.5)
plt.axhline(y=0, color="red", linewidth=.5)
plt.plot(x, y)
plt.show()

The Leaky ReLU can be implemented in a Keras model as shown below.

model = Sequential()
...

model.add(LeakyReLU(0.2))

Let's do some simulation with the above functions. Here, we'll generate sample input, weight, and bias parameters to check the output of each activation function above.

input = np.random.choice(range(-100, 100), 5)

weight = np.random.randn(5)/10
bias = np.random.randn(5)

actf = [sigmoid, tanh, relu, leakyrelu]
for f in actf:
 print(f.__name__, " function:")
 for i,w,b in zip(input,weight,bias):
  print("%.1f" % i, "  =>  ", f((i*w) + b))

sigmoid  function:
-11.0   =>   0.8182085255905602
31.0   =>   0.1681421926415923
44.0   =>   0.8427583522921182
-58.0   =>   0.60535247393778
-34.0   =>   0.6713054653476185
tanh  function:
-11.0   =>   0.905914553687992
31.0   =>   -0.9214954932658087
44.0   =>   0.9327182035841355
-58.0   =>   0.40349605295020774
-34.0   =>   0.6132385416839997
relu  function:
-11.0   =>   1.504256940169569
31.0   =>   0.0
44.0   =>   1.6788964853893629
-58.0   =>   0.4278178624878862
-34.0   =>   0.7140954189784394
leakyrelu  function:
-11.0   =>   1.504256940169569
31.0   =>   -0.015988515154050215
44.0   =>   1.6788964853893629
-58.0   =>   0.4278178624878862
-34.0   =>   0.7140954189784394

From the result above, we can check the output interval differences and their scaling range.

In multiple inputs, all inputs and weights are summed then bias will be added.

for f in actf:
 print(f.__name__, f(np.dot(input, weight)+bias[1]))

sigmoid 0.564087924191638
tanh 0.2522081568246547
relu 0.2577695712819361
leakyrelu 0.2577695712819361

In this tutorial, we've briefly learned activation functions and their implementation in the Keras model.

Source code listing

import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x):
  return 1 / (1 + np.exp(-x))

def tanh(x):
 return np.tanh(x)

def relu(x):
 return np.maximum(0, x)

def leakyrelu(x, alpha=0.01): 
 if (x>0):
  return np.maximum(0, x)
 else:
  return x*alpha

x = np.arange(-5, 5, 0.1)
print(x[1:10])

y = [sigmoid(i) for i in x]
plt.axvline(x=0, color="red", linewidth=.5)
plt.axhline(y=0.5, color="red", linewidth=.5)
plt.plot(x, y)
plt.show()

y= [tanh(i) for i in x]
plt.axvline(x=0, color="red", linewidth=.5)
plt.axhline(y=0, color="red", linewidth=.5)
plt.plot(x, y)
plt.show()

y= [relu(i) for i in x]
plt.axvline(x=0, color="red", linewidth=.5)
plt.axhline(y=0, color="red", linewidth=.5)
plt.plot(x, y)
plt.show()

y= [leakyrelu(i, alpha=0.1) for i in x]
plt.axvline(x=0, color="red", linewidth=.5)
plt.axhline(y=0, color="red", linewidth=.5)
plt.plot(x, y)
plt.show()


weight=np.random.randn(5)/10
bias=np.random.randn(5)
input=np.random.choice(range(-100,100), 5)

actf = [sigmoid, tanh, relu, leakyrelu]
for f in actf:
 print(f.__name__, " function:")
 for i,w,b in zip(input,weight,bias):
  print("%.1f" % i, "  =>  ", f((i*w) + b))  

for f in actf:
 print(f.__name__, f(np.dot(input, weight)+bias[1]))

References:

DataTechNotes

Pages

Understanding Activation Functions with Python

No comments:

Post a Comment