Right Way to Sample a Validation Set

Ask Question

Asked 2 years, 2 months ago

Modified 2 years, 2 months ago

Viewed 46 times

I am working on a project that uses training data selection techniques; it involves sampling the training set in some smart way rather than sampling randomly. The goal is to compare different data selection techniques on the accuracies of the downstream tasks; this requires sampling many datasets.

Suppose I have a large dataset (its size much larger than 12000) to sample from to make sure the train: validation: test = 10000, 1000, 1000. After I randomly sample a test set, I have two choices for the training and validation set:

Option 1: First smartly sample a train set of 11000, and then randomly sample 1000 validation set from these 11000 samples.
Option 2: Independently and randomly sample a validation set of 1000 and then smartly sample the training set of 10000.

Though they may not look quite different at first glance, there are two practical implications:

Option 1 makes validation distribution similar to training distribution. However, as I need to sample multiple training sets, the validation sets will all be different.
Option 2 makes the validation set the same for all samples. However, its distribution is different from the training set.

Then which option should I take and why?

asked Sep 19, 2023 at 23:22

Mr.Robot

2573 silver badges12 bronze badges

$\begingroup$ Do you not have the option of smartly sampling both? $\endgroup$

Thomas Lumley
– Thomas Lumley

2023-09-20 05:21:33 +00:00
Commented Sep 20, 2023 at 5:21
$\begingroup$ The other question: what information do you have that you can use for sampling smartly and what information do you not have until you sample? $\endgroup$

Thomas Lumley
– Thomas Lumley

2023-09-20 05:28:33 +00:00
Commented Sep 20, 2023 at 5:28

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Right Way to Sample a Validation Set

0

Your Answer

Hot Network Questions

Right Way to Sample a Validation Set

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions