2
$\begingroup$

I think I understand how one could view PCA as a means to find the basis vectors that, once a projection is done onto the subspace spanned by these vectors, maximizes the variance of the new dataset we obtain.

What I don't understand is why we view it this way? Why don't we simply say: PCA is a method we use to preserve as much of the length and direction of the original vectors? This is what I mean:

Suppose we have a data-matrix $$\newcommand{\bm}[1]{\boldsymbol{#1}} \bm{X} := \begin{bmatrix}\bm{x}_1 \\ \bm{x}_2 \\ \vdots \\ \bm{x}_N\end{bmatrix} \subseteq \mathbb{R}^{N \times D},$$

and we wish to find for the vector $\bm{w} \in \mathbb{R}^{D \times 1}$ that will span the "major-axis" that goes through the dataset. Then, what we want is to maximize the length of the projection of the datapoint $\bm{x}_i$ onto the subspace spanned by $\bm{w}$. That is, we want to maximize $$\sum_{i = 1}^N |\bm{x}_i\bm{w}|.$$ We see that, since we don't want $\bm{w}$ to grow arbitrarily in length, we restrict $\|\bm{w}\| = 1$. We see finally see that $$\operatorname*{argmax}_{\bm{w} \in \mathbb{R}^{D \times 1} \\ \|\bm{w}\| = 1} \sum_{i = 1}^N |\bm{x}_i \bm{w}| = \operatorname*{argmax}_{\bm{w} \in \mathbb{R}^{D \times 1} \\ \|\bm{w}\| = 1} \sum_{i = 1}^N (\bm{x}_i \bm{w})^2.$$ From there, we see that $$\sum_{i = 1}^N (\bm{x}_i\bm{w})^2 = \sum_{i = 1}^N \bm{w}^\top\bm{x}_i^\top\bm{x}\bm{w} = \bm{w}^\top\biggl(\sum_{i = 1}^N \bm{x}_i^\top \bm{x}_i\biggr)\bm{w} = \bm{w}^\top \bm{X}^\top \bm{X} \bm{w}.$$

At the end, it's known that the vector $\bm{w}$ that maximizes the above quadratic form is a unit eigenvector of $\bm{X}^\top\bm{X}$ that corresponds to the maximal eigenvalue of $\bm{X}^\top\bm{X}$. The rest of the principal vector could be found in a similar manner.

But see? I never had to bring up variance! My question is mostly about why we do bring up variance when discussing PCA.

$\endgroup$
3
  • $\begingroup$ Because variance is a statistical concept with statistical meaning. $\endgroup$ Commented Apr 22 at 19:12
  • $\begingroup$ I suppose what I’m asking is: Is it wrong to think of PCA as I outlined above, as being the process of finding basis vectors that maximizes the length of the projections? $\endgroup$ Commented Apr 23 at 3:50
  • $\begingroup$ That's not at all wrong: "low rank approximation" might give you some good hits in any search. $\endgroup$ Commented Apr 23 at 13:31

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.