For example:
Let's say I have dataset A:
Measured body temperature of a person during the day.
I have measurements from 3 people in the span of a year.
If I cluster it, I expect the clusters to inform me of something that will help me give advice to each person "Drink more water at this and this hour during the day".
So I use k-means to cluster the data from person #1 in dataset A. (100 000 points - no ground truth)
It gives me 5 clusters.
A new dataset( B) becomes available from a newer data-collection system, and I get measurements from 9 people this time.
I cluster data from person #1 in dataset B in the same way. (200 000 points - no ground truth)
It gives me 3 clusters.
I want to see if the performance suffered, improved or stayed the same.
Question(s):
How would I go about to:
- Compare their performances
(considering the difference of nr. of clusters, difference in data amount, perhaps in data quality that is not visible to the naked eye listing through the data etc. - can it be a 1-to-1 comparison at all?) - Validate it (because both could be bad/wrong), or how do I choose a sensible yardstick at least?
EDIT: Actually, the yardstick could be high temperature and low temperature, or anything in between, to give some kinda "better/worse" or direction. But still, how do I compare the two when the cluster numbers differ and the amount of data differs?