One Hot Encoding Example in Python

   One hot encoding is an important technique in data classification with neural network models. Labels in classification data need to be represented in a matrix map with 0 and 1 elements to train the model and this representation is called one-hot encoding.
  In this post, we'll learn how to create one hot encoding array map in Python. The post covers:
  1. One hot encoding with the sklearn 
  2. One hot encoding with Keras
  3. Iris dataset one hot encoding example
  4. Source code listing
We'll start by loading the required libraries.

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from keras.utils import to_categorical
from sklearn import datasets


One hot encoding with sklearn

   To represent labels in one hot encoding map, first, we need to create integer vector with unique integer value assigned to each label class like 'cat':0, 'dog':1, 'mouse':2, etc. Let's see an example.

labels=['dog','cat','cat','mouse','dog','dog']
label_encoder=LabelEncoder()
label_ids=label_encoder.fit_transform(labels)
 
print(labels)
['dog', 'cat', 'cat', 'mouse', 'dog', 'dog'] 
print(label_ids)
[1 0 0 2 1 1] 


Then we can create a one hot encoded matrix that identifies label with the value 1. One hot matrix map is about the positions of unique label names with alphabetic order like {cat, dog, mouse}. The target label is defined by setting a '1' in its position in a matrix.

     { (0, 0, 1),
      (0, 1, 0),
      (1, 0, 0) }

Here,  (0, 0, 1) represents 'mouse',   (0, 1, 0) represents 'dog', and (1, 0, 0) represents 'cat'. We can create the matrix map as shown below.

onehot_encoder=OneHotEncoder(sparse=False)
reshaped=label_ids.reshape(len(label_ids), 1)
onehot=onehot_encoder.fit_transform(reshaped)

print(onehot)
[[0. 1. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 1. 0.]] 


One hot encoding with Keras

   We can also create one hot encoding map with to_categorical() function of Keras. Here, we'll use label_ids vector data.

print(label_ids)
[1 0 0 2 1 1]
 
to_cat=to_categorical(label_ids)
print(to_cat)
[[0. 1. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 1. 0.]] 


Iris dataset one hot encoding example

   Next, we'll create one hot encoding map for iris dataset category values. As you may know, iris data contains 3 types of species; setosa, versicolor, and virginica. They are encoded as 0, 1, and 2 in a dataset. So we can reshape and transform with a OneHotEncoder().

iris= datasets.load_iris()
X = iris.data
Y = iris.target

onehot_encoder=OneHotEncoder(sparse=False)
reshaped=Y.reshape(len(Y), 1)
y_onehot=onehot_encoder.fit_transform(reshaped)
 
print(Y.shape)
(150,)
print(y_onehot.shape)
(150, 3) 

print(Y[0:10])
[0 0 0 0 0 0 0 0 0 0]
 
print(y_onehot[1:10])
[[1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]] 



   In this post, we've briefly learned how to create one hot encoding map for labels in classification data. The full source is listed below.


 Source code listing

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from keras.utils import to_categorical
from sklearn import datasets

labels=['dog','cat','cat','mouse','dog','dog']
label_encoder=LabelEncoder()
label_ids=label_encoder.fit_transform(labels)
print(labels)
print(label_ids)

onehot_encoder=OneHotEncoder(sparse=False)
reshaped=label_ids.reshape(len(label_ids), 1)
onehot=onehot_encoder.fit_transform(reshaped)
print(onehot)

to_cat=to_categorical(label_ids)
print(to_cat)

iris= datasets.load_iris()
X = iris.data
Y = iris.target

onehot_encoder=OneHotEncoder(sparse=False)
reshaped=Y.reshape(len(Y), 1)
y_onehot=onehot_encoder.fit_transform(reshaped)
print(Y.shape)
print(y_onehot.shape)

print(Y[0:10])
print(y_onehot[1:10])


No comments:

Post a Comment