SparsePCA Projection Example in Python

     Sparse Principal Component Analysis is a type of PCA analysis method. SparsePCA extracts sparse components to build the data.

    The Scikit-learn API provides SparsePCA class to apply Sparse PCA method in Python. In this tutorial, we'll briefly learn how to project data by using SparsePCA and visualize the projected data in Python. The tutorials covers:

  1. Iris dataset SparsePCA projection and visualizing
  2. MNIST dataset SparsePCA projection and visualizing
  3. Source code listing

We'll start by loading the required libraries and functions.

from sklearn.decomposition import SparsePCA 
from keras.datasets import mnist
from sklearn.datasets import load_iris
from numpy import reshape
import seaborn as sns
import pandas as pd 
   

Iris dataset SparsePCA projection and visualizing

    After loading the Iris dataset, we'll get the data and label parts of the dataset. 
 
iris = load_iris()
x = iris.data
y = iris.target 
 
Then, we'll define the model by using the SparsePCA class, here the n_components parameter defines the number of target dimensions.
 
spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x)  
 
 
Next, we'll visualize the result in a plot. We'll collect the output component data in a dataframe, then we use 'seaborn' library's scatterplot() to plot the data. In color palette of scatter plot, we'll set 3 because there are 3 types categories in label data.

df = pd.DataFrame()
df["y"] = y
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
                palette=sns.color_palette("hls", 3),
                data=df).set(title="Iris data SparsePCA projection"
 

 
MNIST dataset SparsePCA projection and visualizing

    Next, we'll apply the same method to the larger dataset. MNIST handwritten digit dataset works well for this purpose and we can use Keras API's MNIST data. We extract only train part of the dataset because here it is enough to test data with SparsePCA class.
 
(x_train, y_train), (_ , _) = mnist.load_data()
print(x_train.shape) 
 
(60000, 28, 28)
 
MNIST is a three-dimensional data, we'll reshape it into the two-dimensional one. 

x_mnist = reshape(x_train, [x_train.shape[0], x_train.shape[1]*x_train.shape[2]])
print(x_mnist.shape)
 
(60000, 784) 
 
Here, we have 784 features and 60000 samples.  Now, we'll project it into two dimensions with Sparse PCA method and visualize it in a plot.
 
spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x_mnist)
df = pd.DataFrame()
df["y"] = y_train
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
                palette=sns.color_palette("hls", 10),
                data=df).set(title="MNIST data SparsePCA projection")
 



    The plot shows a two-dimensional visualization of the MNIST data. The colors define the target digits and their feature data location in 2D space.
 
    In this tutorial, we've briefly learned how to how to project data with Sparse PCA method and visualize the projected data in Python. The full source code is listed below.
 
 
 Source code listing

from sklearn.decomposition import SparsePCA 
from keras.datasets import mnist
from sklearn.datasets import load_iris
from numpy import reshape
import seaborn as sns
import pandas as pd

iris = load_iris()
x = iris.data
y = iris.target

spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x)
df = pd.DataFrame()
df["y"] = y
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
                palette=sns.color_palette("hls", 3),
                data=df).set(title="Iris data SparsePCA projection")

(x_train, y_train), (_ , _) = mnist.load_data()
print(x_train.shape) 

x_mnist = reshape(x_train, [x_train.shape[0], x_train.shape[1]*x_train.shape[2]])
print(x_mnist.shape)

spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x_mnist)

df = pd.DataFrame()
df["y"] = y_train
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
                palette=sns.color_palette("hls", 10),
                data=df).set(title="MNIST data SparsePCA projection")
  


References:

No comments:

Post a Comment