What can I do, to assess a classifiers accuracy, when class presence is scarce.
Setup 1: I have 1000 boxes, 500 contain gold. I build an automated tool to find the gold.
The recommended approach would be to open N boxes and compare with device's prediction. Stratified sampling would be a (better) alternative: Find N/2 empty boxes and N/2 gold boxes and calculate accuracy for empty/gold separately as it leads to balanced accuracy estimates, which is more interesting to me.
Setup 2: I have 1000 boxes, 5 contain gold. For robust estimates I would need to open hundreds of boxes. It would be much easier if I used the device to point at those 5 gold boxes and just checking if it was correct?
However, the last approach would introduce bias, correct? What else could I do?