I have an open-source dataset that I would like to use in a deep learning course, but the downside of using open-source datasets is that there are open-source solutions. I want to manipulate the data such that students cannot use previously trained architectures but enable them to undo the manipulation once the assignment is complete. This will allow them to compare their results to existing results on the same data.
Is it possible to manipulate the data such that the student will require a different neural network architecture than is optimal for the original dataset? In addition, is it possible to make it so the student can undo the manipulation and perform similarly well on the original dataset with their unique architecture?
One suggested is using an ill-conditioned matrix to manipulate the data and preserve the information, but it is not clear to me how you could use the inverse make the neural network architecture work on the original dataset (without retraining).
To be clear, it is known that, for the original open-source dataset:
The architecture to model the data exists and is publically available,
The trained model of (1) is publically available.
Therefore, I would like to transform the open-source dataset such that:
The student needs to figure out a new architecture that is not known,
As a consequence of (1), the student must train their model,
Without re-training the model, the student is able to apply a transformation (deducted from the original transformation) so their model works on the original open-source dataset.
The question is, what transform could be applied to the original data and how do you undo the transform in the trained model? There are a variety of datasets, including recordings, images, and numeric data for classification or regression.