I come across this when reading a UFLDL tutorial article where an autoencoder is used to learn the features of input data. Besides, the number of hidden-layer neurons is equal to the number of inputs. The thing is, input data is pre-processed using whitening transformation which is essentially PCA (see this). The purpose of this practice should be to make the training/optimization process easier (see this). Personally, I think PCA itself can be used as a feature learning algorithm by treating the identified principal components as features. What's done in the article then looks like
- Learn features using PCA
- Pre-process data based on the features learned by PCA
- Learn features again using autoencoder based on the pre-processed data
There are thus two feature sets learned by PCA and autoencoder, respectively. Furthermore, the autoencoder is trained on data pre-processed using features identified by PCA. Since the number of hidden-layer neurons equals that of inputs, both feature sets have the same cardinality. My questions are
- What are the characteristics of the two feature sets learned by autoencoder and PCA? A trivial fact is that features learned by PCA are orthogonal, which is not necessarily true for autoencoder.
- What are the potential interaction effects I should be aware of by doing what's done in the article? I already know that whitening pre-processing is supposed to make the training process afterwards easier.
- Or is my understanding totally wrong from the very beginning?
Related threads: