PCA As Maximizing Variance Vs. Maximizing Original Length

Ask Question

Asked 7 months ago

Modified 7 months ago

Viewed 56 times

I think I understand how one could view PCA as a means to find the basis vectors that, once a projection is done onto the subspace spanned by these vectors, maximizes the variance of the new dataset we obtain.

What I don't understand is why we view it this way? Why don't we simply say: PCA is a method we use to preserve as much of the length and direction of the original vectors? This is what I mean:

Suppose we have a data-matrix $$\newcommand{\bm}[1]{\boldsymbol{#1}} \bm{X} := \begin{bmatrix}\bm{x}_1 \\ \bm{x}_2 \\ \vdots \\ \bm{x}_N\end{bmatrix} \subseteq \mathbb{R}^{N \times D},$$

and we wish to find for the vector $\bm{w} \in \mathbb{R}^{D \times 1}$ that will span the "major-axis" that goes through the dataset. Then, what we want is to maximize the length of the projection of the datapoint $\bm{x}_i$ onto the subspace spanned by $\bm{w}$. That is, we want to maximize $$\sum_{i = 1}^N |\bm{x}_i\bm{w}|.$$ We see that, since we don't want $\bm{w}$ to grow arbitrarily in length, we restrict $\|\bm{w}\| = 1$. We see finally see that $$\operatorname*{argmax}_{\bm{w} \in \mathbb{R}^{D \times 1} \\ \|\bm{w}\| = 1} \sum_{i = 1}^N |\bm{x}_i \bm{w}| = \operatorname*{argmax}_{\bm{w} \in \mathbb{R}^{D \times 1} \\ \|\bm{w}\| = 1} \sum_{i = 1}^N (\bm{x}_i \bm{w})^2.$$ From there, we see that $$\sum_{i = 1}^N (\bm{x}_i\bm{w})^2 = \sum_{i = 1}^N \bm{w}^\top\bm{x}_i^\top\bm{x}\bm{w} = \bm{w}^\top\biggl(\sum_{i = 1}^N \bm{x}_i^\top \bm{x}_i\biggr)\bm{w} = \bm{w}^\top \bm{X}^\top \bm{X} \bm{w}.$$

At the end, it's known that the vector $\bm{w}$ that maximizes the above quadratic form is a unit eigenvector of $\bm{X}^\top\bm{X}$ that corresponds to the maximal eigenvalue of $\bm{X}^\top\bm{X}$. The rest of the principal vector could be found in a similar manner.

But see? I never had to bring up variance! My question is mostly about why we do bring up variance when discussing PCA.

asked Apr 22 at 17:43

Ahmed Addous

1211 bronze badge

$\begingroup$ Because variance is a statistical concept with statistical meaning. $\endgroup$

whuber
– whuber ♦

2025-04-22 19:12:05 +00:00
Commented Apr 22 at 19:12
$\begingroup$ I suppose what I’m asking is: Is it wrong to think of PCA as I outlined above, as being the process of finding basis vectors that maximizes the length of the projections? $\endgroup$

Ahmed Addous
– Ahmed Addous

2025-04-23 03:50:02 +00:00
Commented Apr 23 at 3:50
$\begingroup$ That's not at all wrong: "low rank approximation" might give you some good hits in any search. $\endgroup$

whuber
– whuber ♦

2025-04-23 13:31:23 +00:00
Commented Apr 23 at 13:31

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

PCA As Maximizing Variance Vs. Maximizing Original Length

0

Your Answer

Hot Network Questions

PCA As Maximizing Variance Vs. Maximizing Original Length

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions