Context
Suppose one has a public dataset plant_labels with:
- Input: plant pictures
- Labels: plant names
And a larger public model: object for object recognition with:
- Input: object picturesmodel
- Labels: object names
which has been trained on various public datasets, except for plant_labels.
Scenarios
Suppose 2 different scenarios:
- One applies homomorphic encryption to the
plant_labelsdataset and trains a modelHE_plant_Ion it. Next, one applies transfer learning to improve the larger modelobjectwith it. - One first trains model
plant_IIwithout homomorphic encryption, then applies transfer learning to towards aHE_plant_IImodel with homomorphic encryption, and then applies transfer learning again from theHE_plantmodel towards theobjectmodel to improve it.
Assumptions
I assume either 1 and/or 2 is possible, if this assumption is invalid, feel free to clarify that.
Question
Assuming either 1 and/or 2 is/are possible, is it possible to do in such a way that others* cannot determine, or prove, on which dataset you have trained the HE_plant_I (or HE_plant_II?
Others who also have access to all the datasets on which object is trained, as well as access to dataset plant, (but e.g. not to your random seed used for training).