DataTechNotes: How to Build Variational Autoencoders and Generate Images in R

In this tutorial, we'll learn how to build the Variational Autoencoders (VAE) and generate the images in R. Classical autoencoder simply learns how to encode input and decode the output based on given data using in between randomly generated latent space layer. By using this method we can not increase the model training ability by updating parameters in learning.

The variational autoencoders, on the other hand, apply some statistical findings by using learned mean and standard deviations to learn the distribution. The latent space mean and variance are kept to update in each layer and this helps to improve the generator model. The tutorial covers,

Preparing the data
Defining the encoder
Defining the VAE model
Defining generator
Generating images
Source code listing

In my previous post, we learned how to create classical autoencoders with simple dense and convolutional layers in R you can check them in below link.

How to Build Simple Autoencoder with Keras in R

Convolutional Autoencoder Example with Keras in R

Let's get started by loading the Keras packages for R.

library(keras)

Preparing the data

We'll use MNIST handwritten digit dataset to train the VAE. After loading it, we'll scale it into the range of [0, 1]. VAE training requires only input data so that we focus only on x part of the dataset.

c(c(xtrain, ytrain), c(xtest, ytest)) %<-% dataset_mnist()
print(dim(xtrain))

input_size = dim(xtrain)[2]*dim(xtrain)[3]
x_train = xtrain/255
x_test = xtest/255
latent_size = 10


[1] 60000    28    28

Next, we'll define the input data size that comes from image dimension and latent vector size to use in a model later.

x_train <- array_reshape(x_train, c(nrow(x_train), input_size))
x_test <- array_reshape(x_test, c(nrow(x_test), input_size))
print(dim(x_train))


[1] 60000   784

Defining the encoder

We'll start defining the encoder. After the first layers, we'll extract the mean and log variance of this layer. We can create a z layer based on those two parameters to generate an input image.

encoder <- keras_model(enc_input, z_mean)
enc_input <- layer_input(shape = c(input_size))
layer_one <- layer_dense(enc_input, units=256, activation = "relu")
z_mean <- layer_dense(layer_one, latent_size)
z_log_var <- layer_dense(layer_one, latent_size)
 
encoder <- keras_model(enc_input, z_mean)
summary(encoder)


Model: "model_1"
_________________________________________________________________________________
Layer (type)                        Output Shape                    Param #      
=================================================================================
input_2 (InputLayer)                [(None, 784)]                   0            
_________________________________________________________________________________
dense_3 (Dense)                     (None, 256)                     200960       
_________________________________________________________________________________
dense_4 (Dense)                     (None, 10)                      2570         
=================================================================================
Total params: 203,530
Trainable params: 203,530
Non-trainable params: 0
_________________________________________________________________________________

The latent space sampling function helps to sample the distribution by using mean and variance and returns sampled latent vector. The decoder uses this information to generate images.

sampling <- function(arg){
  z_mean <- arg[, 1:(latent_size)]
  z_log_var <- arg[, (latent_size + 1):(2 * latent_size)]
  epsilon <- k_random_normal(shape = c(k_shape(z_mean)[[1]]), mean=0)
  z_mean + k_exp(z_log_var/2)*epsilon
}

z <- layer_concatenate(list(z_mean, z_log_var)) %>% 
  layer_lambda(sampling)

Defining the VAE model

Next, we'll define the decoder layer by z vector data. The VAE model contains input and final output layers.

decoder_layer <- layer_dense(units = 256, activation = "relu")
decoder_mean <- layer_dense(units = input_size, activation = "sigmoid")
h_decoded <- decoder_layer(z)
x_decoded_mean <- decoder_mean(h_decoded)
 
vae <- keras_model(enc_input, x_decoded_mean)
summary(vae)

Model: "model_2"
_________________________________________________________________________________
Layer (type)              Output Shape      Param #   Connected to               
=================================================================================
input_2 (InputLayer)      [(None, 784)]     0                                    
_________________________________________________________________________________
dense_3 (Dense)           (None, 256)       200960    input_2[0][0]              
_________________________________________________________________________________
dense_4 (Dense)           (None, 10)        2570      dense_3[0][0]              
_________________________________________________________________________________
dense_5 (Dense)           (None, 10)        2570      dense_3[0][0]              
_________________________________________________________________________________
concatenate (Concatenate) (None, 20)        0         dense_4[0][0]              
                                                      dense_5[0][0]              
_________________________________________________________________________________
lambda (Lambda)           (None, 10)        0         concatenate[0][0]          
_________________________________________________________________________________
dense_6 (Dense)           (None, 256)       2816      lambda[0][0]               
_________________________________________________________________________________
dense_7 (Dense)           (None, 784)       201488    dense_6[0][0]              
=================================================================================
Total params: 410,404
Trainable params: 410,404
Non-trainable params: 0
_________________________________________________________________________________

We'll define the loss function.

vae_loss <- function(input, x_decoded_mean){
  xent_loss=(input_size/1.0)*loss_binary_crossentropy(input, x_decoded_mean)
  kl_loss=-0.5*k_mean(1+z_log_var-k_square(z_mean)-k_exp(z_log_var), axis=-1)
  xent_loss + kl_loss
}

Finally, we'll compile the model by using optimizer and loss function.

vae %>% compile(optimizer = "rmsprop", loss = vae_loss)

Defining generator

The generator model helps to generate images by using encoded data.

dec_input <- layer_input(shape = latent_size)
h_decoded_2 <- decoder_layer(dec_input)
x_decoded_mean_2 <- decoder_mean(h_decoded_2)
generator <- keras_model(dec_input, x_decoded_mean_2)
summary(generator)

Model: "model_3"
_________________________________________________________________________________
Layer (type)                        Output Shape                    Param #      
=================================================================================
input_3 (InputLayer)                [(None, 10)]                    0            
_________________________________________________________________________________
dense_6 (Dense)                     (None, 256)                     2816         
_________________________________________________________________________________
dense_7 (Dense)                     (None, 784)                     201488       
=================================================================================
Total params: 204,304
Trainable params: 204,304
Non-trainable params: 0
_________________________________________________________________________________

Training the model

Finally, we'll train the VAE model on train data.

vae %>% fit(
  x_train, x_train, 
  shuffle = TRUE, 
  epochs = 20, 
  batch_size = 64, 
  validation_data = list(x_test, x_test)
)

If you get the error like the below in this stage, then you may need to use below command to fix it.

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  RuntimeError: in user code:

    C:\Users\abcd\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\keras\engine\training.py:571 train_function  *
        outputs = self.distribute_strategy.run(
.....

tensorflow::tf$compat$v1$disable_eager_execution()

Generating images

Now, we can encode and generate images by using the above model. Here, I use the first 10 images of the xtest data.

n = 10
test =  x_test[0:n,]
x_test_encoded <- predict(encoder, test)
 
decoded_imgs = generator %>% predict(x_test_encoded)
pred_images = array_reshape(decoded_imgs, dim=c(dim(decoded_imgs)[1], 28, 28))
orig_imgaes = array_reshape(test, dim=c(dim(test)[1], 28, 28))

We'll check the generated images visually. The left column is the original and the right one is the generated images.

op = par(mfrow=c(n,2), mar=c(1,0,0,0))
for (i in 1:n) 
{
  plot(as.raster(orig_imgaes[i,,]))
  plot(as.raster(pred_images[i,,]))
}

In this tutorial, we've briefly learned how to build the VAE model and generate the image in R. The full source code is listed below.

Source code listing

library(keras)

c(c(xtrain, ytrain), c(xtest, ytest)) %<-% dataset_mnist()
print(dim(xtrain))

input_size = dim(xtrain)[2]*dim(xtrain)[3]
x_train = xtrain/255
x_test = xtest/255
latent_size = 10

x_train <- array_reshape(x_train, c(nrow(x_train), input_size))
x_test <- array_reshape(x_test, c(nrow(x_test), input_size))
print(dim(x_train))

encoder <- keras_model(enc_input, z_mean)
enc_input <- layer_input(shape = c(input_size))
layer_one <- layer_dense(enc_input, units=256, activation = "relu")
z_mean <- layer_dense(layer_one, latent_size)
z_log_var <- layer_dense(layer_one, latent_size)
 
encoder <- keras_model(enc_input, z_mean)
summary(encoder)

sampling <- function(arg){
  z_mean <- arg[, 1:(latent_size)]
  z_log_var <- arg[, (latent_size + 1):(2 * latent_size)]
  epsilon <- k_random_normal(shape = c(k_shape(z_mean)[[1]]), mean=0)
  z_mean + k_exp(z_log_var/2)*epsilon
}

z <- layer_concatenate(list(z_mean, z_log_var)) %>% 
  layer_lambda(sampling)

decoder_layer <- layer_dense(units = 256, activation = "relu")
decoder_mean <- layer_dense(units = input_size, activation = "sigmoid")
h_decoded <- decoder_layer(z)
x_decoded_mean <- decoder_mean(h_decoded)
 
vae <- keras_model(enc_input, x_decoded_mean)
summary(vae)

vae_loss <- function(input, x_decoded_mean){
  xent_loss=(input_size/1.0)*loss_binary_crossentropy(input, x_decoded_mean)
  kl_loss=-0.5*k_mean(1+z_log_var-k_square(z_mean)-k_exp(z_log_var), axis=-1)
  xent_loss + kl_loss
}

vae %>% compile(optimizer = "rmsprop", loss = vae_loss)
summary(vae)

dec_input <- layer_input(shape = latent_size)
h_decoded_2 <- decoder_layer(dec_input)
x_decoded_mean_2 <- decoder_mean(h_decoded_2)
generator <- keras_model(dec_input, x_decoded_mean_2)
summary(generator)

vae %>% fit(
  x_train, x_train, 
  shuffle = TRUE, 
  epochs = 20, 
  batch_size = 64, 
  validation_data = list(x_test, x_test)
)

n = 10
test =  x_test[0:n,]
x_test_encoded <- predict(encoder, test)
 
decoded_imgs = generator %>% predict(x_test_encoded)
pred_images = array_reshape(decoded_imgs, dim=c(dim(decoded_imgs)[1], 28, 28))
orig_imgaes = array_reshape(test, dim=c(dim(test)[1], 28, 28))

op = par(mfrow=c(n,2), mar=c(1,0,0,0))
for (i in 1:n) 
{
  plot(as.raster(orig_imgaes[i,,]))
  plot(as.raster(pred_images[i,,]))
}

References:

DataTechNotes

Pages

How to Build Variational Autoencoders and Generate Images in R

No comments:

Post a Comment