There's a set of methods called "robust" principal component analysis (here, "robust" means resistant to influence from outliers). One example is Hubert et al., "ROBPCA: A new approach to robust principal component analysis," from Technometrics (2005): https://doi.org/10.1198/004017004000000563. In that paper, in particular, a subset of obervations (say, 75%) are used to estimate principal components as those observations are assumed to be non-outliers. The paper then proposes methods that are intended to identify candidate outliers.
I can see the value in having a method for PCA that can help identify outliers if one then investigates the outliers found. If some of the outliers are then deemed to be inappropriate to include with the rest of the data (perhaps because they represent contaminated results or are from a population too different from the rest to justify lumping together), they can be removed. But suppose then that some observations identified as "outliers" are judged to be in-sample, and should not be modified or excluded. Then I'm nervous about using the resulting PCs as a substitute for conventional (let's say non-robust) PCs. There's theory for what conventional PCs mean and estimate in relation to the population. I don't know what the analogue is for robust PCs, what they're estimating, and whether what they estimate is desirable or meaningful.
So taking that supposition that observations identified by robust PCA as "outliers" are kept in the sample as-is. What is robust PCA estimating in the population and why do I care about it? Why should I continue to use robust PCA?