How to Build Variational Autoencoders and Generate Images in R

    In this tutorial, we'll learn how to build the Variational Autoencoders (VAE) and generate the images in R. Classical autoencoder simply learns how to encode input and decode the output based on given data using in between randomly generated latent space layer. By using this method we can not increase the model training ability by updating parameters in learning.


    The variational autoencoders, on the other hand, apply some statistical findings by using learned mean and standard deviations to learn the distribution. The latent space mean and variance are kept to update in each layer and this helps to improve the generator model. The tutorial covers,

  1. Preparing the data
  2. Defining the encoder
  3. Defining the VAE model
  4. Defining generator
  5. Generating images
  6. Source code listing

    In my previous post, we learned how to create classical autoencoders with simple dense and convolutional layers in R you can check them in below link.

How to Build Simple Autoencoder with Keras in R

Convolutional Autoencoder Example with Keras in R

Let's get started by loading the Keras packages for R.


library(keras)



Preparing the data


    We'll use MNIST handwritten digit dataset to train the VAE. After loading it, we'll scale it into the range of [0, 1]. VAE training requires only input data so that we focus only on x part of the dataset. 


c(c(xtrain, ytrain), c(xtest, ytest)) %<-% dataset_mnist()
print(dim(xtrain))

input_size = dim(xtrain)[2]*dim(xtrain)[3] x_train = xtrain/255 x_test = xtest/255 latent_size = 10

[1] 60000 28 28


Next, we'll define the input data size that comes from image dimension and latent vector size to use in a model later.


x_train <- array_reshape(x_train, c(nrow(x_train), input_size))
x_test <- array_reshape(x_test, c(nrow(x_test), input_size))
print(dim(x_train))

[1] 60000 784

 

Defining the encoder


    We'll start defining the encoder. After the first layers, we'll extract the mean and log variance of this layer. We can create a z layer based on those two parameters to generate an input image. 


encoder <- keras_model(enc_input, z_mean)
enc_input <- layer_input(shape = c(input_size))
layer_one <- layer_dense(enc_input, units=256, activation = "relu")
z_mean <- layer_dense(layer_one, latent_size)
z_log_var <- layer_dense(layer_one, latent_size)
 
encoder <- keras_model(enc_input, z_mean)
summary(encoder)


Model: "model_1" _________________________________________________________________________________ Layer (type) Output Shape Param # ================================================================================= input_2 (InputLayer) [(None, 784)] 0 _________________________________________________________________________________ dense_3 (Dense) (None, 256) 200960 _________________________________________________________________________________ dense_4 (Dense) (None, 10) 2570 ================================================================================= Total params: 203,530 Trainable params: 203,530 Non-trainable params: 0 _________________________________________________________________________________


The latent space sampling function helps to sample the distribution by using mean and variance and returns sampled latent vector. The decoder uses this information to generate images.


sampling <- function(arg){
  z_mean <- arg[, 1:(latent_size)]
  z_log_var <- arg[, (latent_size + 1):(2 * latent_size)]
  epsilon <- k_random_normal(shape = c(k_shape(z_mean)[[1]]), mean=0)
  z_mean + k_exp(z_log_var/2)*epsilon
}

z <- layer_concatenate(list(z_mean, z_log_var)) %>%
layer_lambda(sampling)



Defining the VAE  model


    Next, we'll define the decoder layer by z vector data. The VAE model contains input and final output layers. 


decoder_layer <- layer_dense(units = 256, activation = "relu")
decoder_mean <- layer_dense(units = input_size, activation = "sigmoid")
h_decoded <- decoder_layer(z)
x_decoded_mean <- decoder_mean(h_decoded)
 
vae <- keras_model(enc_input, x_decoded_mean)
summary(vae)

Model: "model_2" _________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================= input_2 (InputLayer) [(None, 784)] 0 _________________________________________________________________________________ dense_3 (Dense) (None, 256) 200960 input_2[0][0] _________________________________________________________________________________ dense_4 (Dense) (None, 10) 2570 dense_3[0][0] _________________________________________________________________________________ dense_5 (Dense) (None, 10) 2570 dense_3[0][0] _________________________________________________________________________________ concatenate (Concatenate) (None, 20) 0 dense_4[0][0] dense_5[0][0] _________________________________________________________________________________ lambda (Lambda) (None, 10) 0 concatenate[0][0] _________________________________________________________________________________ dense_6 (Dense) (None, 256) 2816 lambda[0][0] _________________________________________________________________________________ dense_7 (Dense) (None, 784) 201488 dense_6[0][0] ================================================================================= Total params: 410,404 Trainable params: 410,404 Non-trainable params: 0 _________________________________________________________________________________


We'll define the loss function.


vae_loss <- function(input, x_decoded_mean){
  xent_loss=(input_size/1.0)*loss_binary_crossentropy(input, x_decoded_mean)
  kl_loss=-0.5*k_mean(1+z_log_var-k_square(z_mean)-k_exp(z_log_var), axis=-1)
  xent_loss + kl_loss
}


Finally, we'll compile the model by using optimizer and loss function.


vae %>% compile(optimizer = "rmsprop", loss = vae_loss)



Defining generator


    The generator model helps to generate images by using encoded data.


dec_input <- layer_input(shape = latent_size)
h_decoded_2 <- decoder_layer(dec_input)
x_decoded_mean_2 <- decoder_mean(h_decoded_2)
generator <- keras_model(dec_input, x_decoded_mean_2)
summary(generator)

Model: "model_3" _________________________________________________________________________________ Layer (type) Output Shape Param # ================================================================================= input_3 (InputLayer) [(None, 10)] 0 _________________________________________________________________________________ dense_6 (Dense) (None, 256) 2816 _________________________________________________________________________________ dense_7 (Dense) (None, 784) 201488 ================================================================================= Total params: 204,304 Trainable params: 204,304 Non-trainable params: 0 _________________________________________________________________________________



Training the model


    Finally, we'll train the VAE model on train data.


vae %>% fit(
  x_train, x_train, 
  shuffle = TRUE, 
  epochs = 20, 
  batch_size = 64, 
  validation_data = list(x_test, x_test)
)


If you get the error like the below in this stage, then you may need to use below command to fix it. 


Error in py_call_impl(callable, dots$args, dots$keywords) : 
  RuntimeError: in user code:

    C:\Users\abcd\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\keras\engine\training.py:571 train_function  *
        outputs = self.distribute_strategy.run(
.....


tensorflow::tf$compat$v1$disable_eager_execution()



Generating images


    Now, we can encode and generate images by using the above model. Here, I use the first 10 images of the xtest data.


n = 10
test =  x_test[0:n,]
x_test_encoded <- predict(encoder, test)
 
decoded_imgs = generator %>% predict(x_test_encoded)
pred_images = array_reshape(decoded_imgs, dim=c(dim(decoded_imgs)[1], 28, 28))
orig_imgaes = array_reshape(test, dim=c(dim(test)[1], 28, 28))


We'll check the generated images visually. The left column is the original and the right one is the generated images.


op = par(mfrow=c(n,2), mar=c(1,0,0,0))
for (i in 1:n) 
{
  plot(as.raster(orig_imgaes[i,,]))
  plot(as.raster(pred_images[i,,]))
}




    In this tutorial, we've briefly learned how to build the VAE model and generate the image in R. The full source code is listed below.



Source code listing


library(keras)

c(c(xtrain, ytrain), c(xtest, ytest)) %<-% dataset_mnist()
print(dim(xtrain))

input_size = dim(xtrain)[2]*dim(xtrain)[3] x_train = xtrain/255 x_test = xtest/255 latent_size = 10

x_train <- array_reshape(x_train, c(nrow(x_train), input_size))
x_test <- array_reshape(x_test, c(nrow(x_test), input_size))
print(dim(x_train))

encoder <- keras_model(enc_input, z_mean) enc_input <- layer_input(shape = c(input_size)) layer_one <- layer_dense(enc_input, units=256, activation = "relu") z_mean <- layer_dense(layer_one, latent_size) z_log_var <- layer_dense(layer_one, latent_size) encoder <- keras_model(enc_input, z_mean)
summary(encoder)


sampling <- function(arg){ z_mean <- arg[, 1:(latent_size)] z_log_var <- arg[, (latent_size + 1):(2 * latent_size)] epsilon <- k_random_normal(shape = c(k_shape(z_mean)[[1]]), mean=0) z_mean + k_exp(z_log_var/2)*epsilon }
z <- layer_concatenate(list(z_mean, z_log_var)) %>%
layer_lambda(sampling)


decoder_layer <- layer_dense(units = 256, activation = "relu") decoder_mean <- layer_dense(units = input_size, activation = "sigmoid") h_decoded <- decoder_layer(z) x_decoded_mean <- decoder_mean(h_decoded) vae <- keras_model(enc_input, x_decoded_mean) summary(vae)

vae_loss <- function(input, x_decoded_mean){ xent_loss=(input_size/1.0)*loss_binary_crossentropy(input, x_decoded_mean) kl_loss=-0.5*k_mean(1+z_log_var-k_square(z_mean)-k_exp(z_log_var), axis=-1) xent_loss + kl_loss }

vae %>% compile(optimizer = "rmsprop", loss = vae_loss) summary(vae)

dec_input <- layer_input(shape = latent_size) h_decoded_2 <- decoder_layer(dec_input) x_decoded_mean_2 <- decoder_mean(h_decoded_2) generator <- keras_model(dec_input, x_decoded_mean_2) summary(generator)

vae %>% fit( x_train, x_train, shuffle = TRUE, epochs = 20, batch_size = 64, validation_data = list(x_test, x_test) )

n = 10 test = x_test[0:n,] x_test_encoded <- predict(encoder, test) decoded_imgs = generator %>% predict(x_test_encoded) pred_images = array_reshape(decoded_imgs, dim=c(dim(decoded_imgs)[1], 28, 28)) orig_imgaes = array_reshape(test, dim=c(dim(test)[1], 28, 28))

op = par(mfrow=c(n,2), mar=c(1,0,0,0)) for (i in 1:n) { plot(as.raster(orig_imgaes[i,,])) plot(as.raster(pred_images[i,,])) }



References:

  1. R interface to Keras
  2. How to Build Simple Autoencoder with Keras in R

No comments:

Post a Comment