Autoencoder processing whitened input

Question

I come across this when reading a UFLDL tutorial article where an autoencoder is used to learn the features of input data. Besides, the number of hidden-layer neurons is equal to the number of inputs. The thing is, input data is pre-processed using whitening transformation which is essentially PCA (see this). The purpose of this practice should be to make the training/optimization process easier (see this). Personally, I think PCA itself can be used as a feature learning algorithm by treating the identified principal components as features. What's done in the article then looks like

Learn features using PCA
Pre-process data based on the features learned by PCA
Learn features again using autoencoder based on the pre-processed data

There are thus two feature sets learned by PCA and autoencoder, respectively. Furthermore, the autoencoder is trained on data pre-processed using features identified by PCA. Since the number of hidden-layer neurons equals that of inputs, both feature sets have the same cardinality. My questions are

What are the characteristics of the two feature sets learned by autoencoder and PCA? A trivial fact is that features learned by PCA are orthogonal, which is not necessarily true for autoencoder.
What are the potential interaction effects I should be aware of by doing what's done in the article? I already know that whitening pre-processing is supposed to make the training process afterwards easier.
Or is my understanding totally wrong from the very beginning?

Related threads:

See this thread: stats.stackexchange.com/questions/145071/… — Alex R.
– Alex R., Commented Dec 14, 2017 at 21:07
@AlexR. After reading what you linked, I came up with something and have written up an answer myself. — Lingxi
– Lingxi, Commented Dec 15, 2017 at 5:47

Lingxi · Accepted Answer · 2017-12-15 05:40:50Z

After reading this thread referred to by @AlexR., I seem to figure it out myself.

First of all, using PCA as a feature learning algorithm may not be very useful. It's systematic, and thus dull in some sense. In fact, whitening transformation doesn't have to be PCA at all (which is PCA whitening), and simply identifying a set of orthogonal coordinates isn't really feature learning. It's better to just treat it as a pre-processing algorithm here. This invalidates my first question.

Now to answer the second question. For images, it's not difficult to realize that adjacent pixels are correlated. They tend to be similar. Besides, distant pixels tend to be different. I guess this is what the following has said in plain language (quoted from this thread).

Natural images have a lot of variance/energy in low spatial frequency components and little variance/energy in high spatial frequency components.

In terms of autoencoder, this means inputs that correspond to adjacent pixels are positively correlated. The effect is that autoencoder tends to learn the big picture while omit the details (see this). An explanation from spatial frequency perspective is:

When using squared Euclidean distance to evaluate the reconstruction of an autoencoder, this means that the network will focus on getting the low spatial frequencies right, since the error scales with the variance of the signal.

Another intuitive explanation I came up with myself is like this: It's easier to attain a lower cost function value if you focus more on the big picture, and ignore the details. This is because adjacent pixels tend to be similar, you get one right, then you get all right mostly. That is, when the big picture error is low, the details error tends to be low as well. The converse doesn't hold.

By applying the whitening transformation, this correlation and variance difference are eliminated, and the learning process pays equal attention to both the big picture and details. And according to this article, this usually results in a more effective training/optimization process, and a lower error in terms of both the big picture and details.

To conclude

Whitening transformation is not PCA.
In this context, it's simply a pre-processing algorithm that regulates the data, and nothing more.

Stack Exchange Network

Autoencoder processing whitened input

1 Answer 1

Your Answer

Linked

Hot Network Questions

Autoencoder processing whitened input

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions