DataTechNotes: SparsePCA Projection Example in Python

Sparse Principal Component Analysis is a an extended version of PCA by applying sparsity structure. Various estimation methods are used to achieve sparsity based on sparse loadings or sparse weights.

The Scikit-learn API provides SparsePCA class to apply Sparse PCA method in Python. In this tutorial, we'll briefly learn how to project data by using SparsePCA and visualize the projected data in a graph. The tutorials covers:

Iris dataset SparsePCA projection and visualizing
MNIST dataset SparsePCA projection and visualizing
Source code listing

We'll start by loading the required libraries and functions.

from sklearn.decomposition import SparsePCA 
from keras.datasets import mnist
from sklearn.datasets import load_iris
from numpy import reshape
import seaborn as sns
import pandas as pd

Iris dataset SparsePCA projection and visualizing

After loading the Iris dataset, we'll extract the data and label parts of the dataset.

iris = load_iris()
x = iris.data
y = iris.target

We'll define the model by using the SparsePCA class, here the n_components parameter defines the number of target dimensions.

spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x)

To visualize the result in a graph, we'll collect the output component data in a pandas dataframe, then use 'seaborn' library's scatterplot(). In color palette of scatter plot, we'll set 3 which defines the categories in label data.

df = pd.DataFrame()
df["y"] = y
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
                palette=sns.color_palette("hls", 3),
                data=df).set(title="Iris data SparsePCA projection")

MNIST dataset SparsePCA projection and visualizing

We'll apply the same method to the larger dataset. MNIST handwritten digit dataset works well for this purpose and we can use Keras API's MNIST data. We'll extract only train part of the dataset and it is enough for this example.

(x_train, y_train), (_ , _) = mnist.load_data()
print(x_train.shape)

(60000, 28, 28)

MNIST is a three-dimensional data, we'll reshape it into the two-dimensional one.

x_mnist = reshape(x_train, [x_train.shape[0], x_train.shape[1]*x_train.shape[2]])
print(x_mnist.shape)

(60000, 784)

Here, we have 784 features and 60000 samples. Now, we can project data SparsePCA and visualize it in a graph.

spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x_mnist)
df = pd.DataFrame()
df["y"] = y_train
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
                palette=sns.color_palette("hls", 10),
                data=df).set(title="MNIST data SparsePCA projection")

The plot shows a two-dimensional visualization of the MNIST data. The colors define the target digits and their feature data location in 2D space.

In this tutorial, we've briefly learned how to how to project data with Sparse PCA method and visualize the projected data in Python. The full source code is listed below.

Source code listing

from sklearn.decomposition import SparsePCA 
from keras.datasets import mnist
from sklearn.datasets import load_iris
from numpy import reshape
import seaborn as sns
import pandas as pd

iris = load_iris()
x = iris.data
y = iris.target

spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x)
df = pd.DataFrame()
df["y"] = y
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
                palette=sns.color_palette("hls", 3),
                data=df).set(title="Iris data SparsePCA projection")

(x_train, y_train), (_ , _) = mnist.load_data()
print(x_train.shape) 

x_mnist = reshape(x_train, [x_train.shape[0], x_train.shape[1]*x_train.shape[2]])
print(x_mnist.shape)

spca = SparsePCA(n_components=2, random_state=123)
z = spca.fit_transform(x_mnist)

df = pd.DataFrame()
df["y"] = y_train
df["comp-1"] = z[:,0]
df["comp-2"] = z[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=df.y.tolist(),
                palette=sns.color_palette("hls", 10),
                data=df).set(title="MNIST data SparsePCA projection")

References:

Scikit-learn SparsePCA

DataTechNotes

Pages

SparsePCA Projection Example in Python

No comments:

Post a Comment