# SVD for PCA#

Supplementary material to the answer by amoeba to Relationship between SVD and PCA. How to use SVD to perform PCA? - Cross Validated.

import numpy as np
from numpy import linalg as la
np.random.seed(42)


Let the data matrix $$\mathbf X$$ be of $$n \times p$$ size, where n is the number of samples and p is the number of variables (see Tidy data, Xarray).

n, p = 5, 3
X = np.random.randint(0, 8, [n, p]).astype(float)
X

array([[6., 3., 4.],
[6., 2., 7.],
[4., 4., 6.],
[1., 2., 6.],
[2., 2., 7.]])


Let us assume that it is centered, i.e. column means have been subtracted and are now equal to zero.

X -= np.mean(X, axis=0)
X

array([[ 2.2,  0.4, -2. ],
[ 2.2, -0.6,  1. ],
[ 0.2,  1.4,  0. ],
[-2.8, -0.6,  0. ],
[-1.8, -0.6,  1. ]])


Then the $$p \times p$$ covariance matrix $$\mathbf C$$ is given by $$\mathbf C = \mathbf X^\top \mathbf X/(n-1)$$:

C = np.cov(X, rowvar=False)
C

array([[ 5.2 ,  0.65, -1.  ],
[ 0.65,  0.8 , -0.5 ],
[-1.  , -0.5 ,  1.5 ]])


It is a symmetric matrix and so it can be diagonalized (see Eigendecomposition - Decomposition for special matrices):

$\mathbf C = \mathbf V \mathbf L \mathbf V^\top$

The eigenvalues in decreasing order:

l, principal_axes = la.eig(C)
idx = l.argsort()[::-1]
l, principal_axes = l[idx], principal_axes[:, idx]
l

array([5.57232637, 1.39174803, 0.5359256 ])


Where $$\mathbf V$$ is a matrix of eigenvectors (each column is an eigenvector):

principal_axes

array([[ 0.95454539,  0.29627488, -0.03262345],
[ 0.15658321, -0.40530879,  0.90067002],
[-0.25362333,  0.8648387 ,  0.4332773 ]])


The eigenvectors are called principal axes or principal directions of the data.

Projections of the data on the principal axes are called principal components, also known as PC scores; these can be seen as new, transformed, variables. The $$j$$-th principal component is given by $$j$$-th column of $$\mathbf {XV}$$. The coordinates of the $$i$$-th data point in the new PC space are given by the $$i$$-th row of $$\mathbf{XV}$$.

principal_components = X.dot(principal_axes)
principal_components

array([[ 2.66987981, -1.23999617, -0.5780582 ],
[ 1.7524266 ,  1.75982872, -0.1788963 ],
[ 0.41012557, -0.50817732,  1.25441334],
[-2.76667702, -0.58638441, -0.44905635],
[-2.06575496,  0.57472918, -0.04840249]])


If we now perform singular value decomposition of $$\mathbf X$$, we obtain a decomposition:

$\mathbf X = \mathbf U \mathbf S \mathbf V^\top$

where $$\mathbf U$$ is a unitary matrix and $$\mathbf S$$ is the diagonal matrix of singular values $$s_i$$.

U, s, Vt = la.svd(X, full_matrices=False)
V = Vt.T
S = np.diag(s)


From here one can easily see that:

$\mathbf C = \mathbf V \mathbf S \mathbf U^\top \mathbf U \mathbf S \mathbf V^\top /(n-1) = \mathbf V \frac{\mathbf S^2}{n-1}\mathbf V^\top$

meaning that right singular vectors $$\mathbf V$$ are principal directions and that singular values are related to the eigenvalues of covariance matrix via $$\lambda_i = s_i^2/(n-1)$$. Principal components are given by $$\mathbf X \mathbf V = \mathbf U \mathbf S \mathbf V^\top \mathbf V = \mathbf U \mathbf S$$.

def flip_signs(A, B):
"""
utility function for resolving the sign ambiguity in SVD
http://stats.stackexchange.com/q/34396/115202
"""
signs = np.sign(A) * np.sign(B)
return A, B * signs


To summarize:

• If $$\mathbf X = \mathbf U \mathbf S \mathbf V^\top$$, then columns of $$\mathbf V$$ are principal directions/axes.

assert np.allclose(*flip_signs(V, principal_axes))

• Columns of $$\mathbf {US}$$ are principal components (“scores”).

assert np.allclose(*flip_signs(U.dot(S), principal_components))

• Singular values are related to the eigenvalues of covariance matrix via $$\lambda_i = s_i^2/(n-1)$$. Eigenvalues $$\lambda_i$$ show variances of the respective PCs.

assert np.allclose((s ** 2) / (n - 1), l)

• Standardized scores are given by columns of $$\sqrt{n-1}\mathbf U$$ and loadings are given by columns of $$\mathbf V \mathbf S/\sqrt{n-1}$$. See e.g. here and here for why “loadings” should not be confused with principal directions.

• The above is correct only if $$\mathbf X$$ is centered. Only then is covariance matrix equal to $$\mathbf X^\top \mathbf X/(n-1)$$.

• The above is correct only for $$\mathbf X$$ having samples in rows and variables in columns. If variables are in rows and samples in columns, then $$\mathbf U$$ and $$\mathbf V$$ exchange interpretations.

• If one wants to perform PCA on a correlation matrix (instead of a covariance matrix), then columns of $$\mathbf X$$ should not only be centered, but standardized as well, i.e. divided by their standard deviations.

• To reduce the dimensionality of the data from $$p$$ to $$k < p$$, select $$k$$ first columns of $$\mathbf U$$, and $$k\times k$$ upper-left part of $$\mathbf S$$. Their product $$\mathbf U_k \mathbf S_k$$ is the required $$n \times k$$ matrix containing first $$k$$ PCs.

k = 2
PC_k = principal_components[:, :k]
US_k = U[:, :k].dot(S[:k, :k])
assert np.allclose(*flip_signs(PC_k, US_k))

• Further multiplying the first $$k$$ PCs by the corresponding principal axes $$\mathbf V_k^\top$$ yields $$\mathbf X_k = \mathbf U_k^\vphantom \top \mathbf S_k^\vphantom \top \mathbf V_k^\top$$ matrix that has the original $$n \times p$$ size but is of lower rank (of rank $$k$$). This matrix $$\mathbf X_k$$ provides a reconstruction of the original data from the first $$k$$ PCs. It has the lowest possible reconstruction error, see the answer here.

Xk = US_k.dot(Vt[:k, :])
Xk

array([[ 2.18114175,  0.92063969, -1.7495405 ],
[ 2.19416378, -0.43887346,  1.07751171],
[ 0.24092329,  0.27018751, -0.54350883],
[-2.81464977, -0.19554841,  0.19456592],
[-1.80157906, -0.55640533,  1.0209717 ]])

[la.matrix_rank(X), la.matrix_rank(Xk)]

[3, 2]

• Strictly speaking, $$\mathbf U$$ is of $$n\times n$$ size and $$\mathbf V$$ is of $$p \times p$$ size. However, if $$n>p$$ then the last $$n-p$$ columns of $$\mathbf U$$ are arbitrary (and corresponding rows of $$\mathbf S$$ are constant zero); one should therefore use an economy size (or thin) SVD that returns $$\mathbf U$$ of $$n\times p$$ size, dropping the useless columns. For large $$n\gg p$$ the matrix $$\mathbf U$$ would otherwise be unnecessarily huge. The same applies for an opposite situation of $$n\ll p$$.

assert U.shape == (n, p)
assert S.shape == (p, p)
assert V.shape == (p, p)