Abracadabra

PCA With Tensorflow

PCA (Principal Component Analysis) is probably the oldest trick in the book.

PCA is well studied and there are numerous ways to get to the same solution, we will talk about two of them here, Eigen decomposition and Singular Value Decomposition (SVD) and then we will implement the SVD way in TensorFlow.

From now on, X will be our data matrix, of shape (n, p) where n is the number of examples, and p are the dimensions.

So given X, both methods will try to find, in their own way, a way to manipulate and decompose X in a manner that later on we could multiply the decomposed results to represent maximum information in less dimensions. I know I know, sounds horrible but I will spare you most of the math but keep the parts that contribute to the understanding of the method pros and cons.

So Eigen decomposition and SVD are both ways to decompose matrices, lets see how they help us in PCA and how they are connected.

Take a glance at the flow chart below and I will explain right after.

img

Figure 1 PCA workflow

So why should you care about this? Well there is something very fundamental about the two procedures that tells us a lot about PCA.

As you can see both methods are pure linear algebra, that basically tells us that using PCA is looking at the real data, from a different angle — this is unique to PCA since the other methods start with random representation of lower dimensional data and try to get it to behave like the high dimensional data.

Some other notable things are that all operations are linear and with SVD are super-super fast.

Also given the same data PCA will always give the same answer (which is not true about the other two methods).

Notice how in SVD we choose the r (r is the number of dimensions we want to reduce to) left most values of Σ to lower dimensionality?

Well there is something special about Σ .

Σ is a diagonal matrix, there are p (number of dimensions) diagonal values (called singular values) and their magnitude indicates how significant they are to preserving the information.

So we can choose to reduce dimensionality, to the number of dimensions that will preserve approx. given amount of percentage of the data and I will demonstrate that in the code (e.g. gives us the ability to reduce dimensionality with a constraint of losing a max of 15% of the data).

As you will see, coding this in TensorFlow is pretty simple — what we are are going to code is a class that has fit method and a reduce method which we will supply the dimensions to.

CODE (PCA)

Lets see how the fit method looks like, given self.X contains the data and self.dtype=tf.float32

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def fit(self):
self.graph = tf.Graph()
with self.graph.as_default():
self.X = tf.placeholder(self.dtype, shape=self.data.shape)
# Perform SVD
singular_values, u, _ = tf.svd(self.X)
# Create sigma matrix
sigma = tf.diag(singular_values)
with tf.Session(graph=self.graph) as session:
self.u, self.singular_values, self.sigma = session.run([u, singular_values, sigma],
feed_dict={self.X: self.data})

So the goal of fit is to create our Σ and U for later use.

We’ll start with the line tf.svd which gives us the singular values, which are the diagonal values of what was denoted as Σ in Figure 1, and the matrices U and V.

Then tf.diag is TensorFlow’s way of converting a 1D vector, to a diagonal matrix, which in our case will result in Σ.

At the end of the fit call we will have the singular values, Σ and U.

Now lets lets implement reduce.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def reduce(self, n_dimensions=None, keep_info=None):
if keep_info:
# Normalize singular values
normalized_singular_values = self.singular_values / sum(self.singular_values)
# Create the aggregated ladder of kept information per dimension
ladder = np.cumsum(normalized_singular_values)
# Get the first index which is above the given information threshold
index = next(idx for idx, value in enumerate(ladder) if value >= keep_info) + 1
n_dimensions = index
with self.graph.as_default():
# Cut out the relevant part from sigma
sigma = tf.slice(self.sigma, [0, 0], [self.data.shape[1], n_dimensions])
# PCA
pca = tf.matmul(self.u, sigma)
with tf.Session(graph=self.graph) as session:
return session.run(pca, feed_dict={self.X: self.data})

So as you can see reduce gets either keep_info or n_dimensions (I didn’t implement the input check where only one must be supplied).

If we supply n_dimensions it will simply reduce to that number, but if we supply keep_info which should be a float between 0 and 1, we will preserve that much information from the original data (0.9 — preserve 90% of the data).

In the first ‘if’, we normalize and check how many singular values are needed, basically figuring out n_dimensions out of keep_info.

In the graph, we just slice the Σ (sigma) matrix for as much data as we need and perform the matrix multiplication.

So lets try it out on the iris dataset, which is (150, 4) dataset of 3 species of iris flowers.

1
2
3
4
5
6
7
8
9
10
11
12
from sklearn import datasets
import matplotlib.pyplot as plt
import seaborn as sns
tf_pca = TF_PCA(iris_dataset.data, iris_dataset.target)
tf_pca.fit()
pca = tf_pca.reduce(keep_info=0.9) # Results in 2 dimensions
color_mapping = {0: sns.xkcd_rgb['bright purple'], 1: sns.xkcd_rgb['lime'], 2: sns.xkcd_rgb['ochre']}
colors = list(map(lambda x: color_mapping[x], tf_pca.target))
plt.scatter(pca[:, 0], pca[:, 1], c=colors)

img

Figure 2 Iris dataset PCA 2 dimensional plot

Not so bad huh?