Factor Analysis is a technique that used to express data with reduced number of variables. Reducing the number of variables in a data is helpful method to simplify large dataset by decreasing the variables without loosing the generality of it.
The Scikit-learn API provides the FactorAnalysis model that performs a maximum likelihood estimate of loading matrix using SVD based approach. In this tutorial, we'll briefly learn how to use FactorAnalysis model to reduce the data dimension and visualize the output in Python. The tutorials covers:
- MNIST dataset Projection with Factor Analysis
- Image data Factor Analysis and visualizing
- Source code listing
We'll start by loading the required libraries and functions.
from sklearn.decomposition import FactorAnalysis
from keras.datasets import mnist
from sklearn.datasets import load_iris
from numpy import reshape
import seaborn as sns
import pandas as pd
from numpy import where
import matplotlib.pyplot as plt
MNIST dataset projection with factor analysis
We load MNIST handwritten
digit dataset provided by Keras library. We'll check the dimensions of x part of data and transform it into the two-dimensional data.
(x_train, y_train), (_ , _) = mnist.load_data()
print(x_train.shape)
(60000, 28, 28)
x_mnist = reshape(x_train, [x_train.shape[0], x_train.shape[1]*x_train.shape[2]])
print(x_mnist.shape)
(60000, 784)
Next,
we'll define the model by using the FactorAnalysis class, here the n_components
parameter defines the number of target dimensions.
fa = FactorAnalysis(n_components=2, random_state=123)
z = fa.fit_transform(x_mnist)
To visualize the transformed data, we'll collect the output
component in a dataframe and plot it by using the 'seaborn' library's
scatterplot(). In color palette of scatter plot, we'll
set 10 because there are 10 type of categories in label data.
df = pd.DataFrame()
df["y"] = y_train
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]
sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
palette=sns.color_palette("hls", 10),
data=df).set(title="MNIST data projection with Factor Analysis")
The
plot shows a two-dimensional visualization of the MNIST data. The colors define
the target digits and their feature data location in 2D space.
Image data Factor Analysis and visualizing
Next,
we'll apply the factor analysis method to image data. Here, we use digit '3' x and y data. We can extract and reshape data as below.
digit3_y = where(y_train==3)
digit3_x = x_train[digit3_y]
x_mnist = reshape(digit3_x, [digit3_x.shape[0], digit3_x.shape[1]*digit3_x.shape[2]])
print(x_mnist.shape)
(6131, 784)
Here,
we have 784 features and 6131 sample images. We'll fit FactorAnalysis model on x_mnist data and visualize the output images.
fa = FactorAnalysis(n_components=10, random_state=123)
z = fa.fit(x_mnist)
print(z.components_.shape)
plt.subplots_adjust(wspace=0, hspace=0)
plt.tight_layout()
plt.gray()
for i in range(0, 9):
plt.subplot(3, 3, i + 1)
plt.tick_params(labelbottom=False)
plt.tick_params(labelleft=False)
plt.imshow(z.components_[i].reshape(28,28), )
plt.show()
The
plot shows nine samples of output image data.
In this tutorial, we've briefly learned how to how to use Sklearn's FactorAnalysis model to reduce dimensions of data data in Python. The full source code is listed below.
Source code listing
from sklearn.decomposition import FactorAnalysis
from keras.datasets import mnist
from sklearn.datasets import load_iris
from numpy import reshape
import seaborn as sns
import pandas as pd
from numpy import where
import matplotlib.pyplot as plt
(x_train, y_train), (_ , _) = mnist.load_data()
print(x_train.shape)
x_mnist = reshape(x_train, [x_train.shape[0], x_train.shape[1]*x_train.shape[2]])
print(x_mnist.shape)
fa = FactorAnalysis(n_components=2, random_state=123)
z = fa.fit_transform(x_mnist)
df = pd.DataFrame()
df["y"] = y_train
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]
sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
palette=sns.color_palette("hls", 10),
data=df).set(title="MNIST data projection with Factor Analysis")
digit3_y = where(y_train==3)
digit3_x = x_train[digit3_y]
x_mnist = reshape(digit3_x, [digit3_x.shape[0], digit3_x.shape[1]*digit3_x.shape[2]])
print(x_mnist.shape)
fa = FactorAnalysis(n_components=10, random_state=123)
z = fa.fit(x_mnist)
print(z.components_.shape)
plt.subplots_adjust(wspace=0, hspace=0)
plt.tight_layout()
plt.gray()
for i in range(0, 9):
plt.subplot(3, 3, i + 1)
plt.tick_params(labelbottom=False)
plt.tick_params(labelleft=False)
plt.imshow(z.components_[i].reshape(28,28), )
plt.show()
References:
No comments:
Post a Comment