I have a dataset with 5 groups : 3 consists of patients with different cancer types, one consists of patients with benign tumour, another is a healthy control group. The proteins are measured in a way that they either have a non detected concentration level (intensity = 0) or some nonzero value, though these are not absolute quantification, so the differences in these values are rather abstract ( not meaningful mathematically). Sample size is rather small - <90 subjects overall.
What researches did is they performed PCA on the whole set of predictors (200+ proteins) - there are no distinctive patterns between the groups. However, when they use the subset of proteins (~50, based on literature review - which showed some association with cancer in previous studies) - there is some mild separation visible between cancer and other groups. First of all, the dataset is 'zero-inflated', that is there are many proteins which are simply not detected for most samples (intensity = 0).
Second, As there are no clear separation, how this situation can be interpreted?
Third, what would be other ways of performing statistical inference on the data (other than , maybe, Fisher test of nonzero vs zero table between cancer and non cancer samples )
Thanks in advance