Weighted bootstrap sampling vs. uniform bootstrap sampling with later weighting

Ask Question

Asked 2 years, 2 months ago

Modified 2 years, 2 months ago

Viewed 149 times

Assume I have a fancy procedure $w: X \to \mathbb{R}$ to come up with weights for examples $x \in X$. Think of it as similar to the weights used in e.g. some boosting procedures.

Now, I want to build a classification Random Forest ensemble using bootstrapping, i.e. each tree is constructed on a randomly sampled subset of the data. The tree construction procedure should take the weights into account.

There are two possible approaches:

Perform a weighted bootstrap sampling, i.e. examples are not selected with uniform probability but according to their weights $w$
Perform uniform bootstrapping and use the weights in the tree induction procedure when counting the number of points of each class (e.g. as described here)

In both cases, we draw $n$ samples with replacement.

Is there a difference between the two approaches?

My motivation / context is as follows: I want to construct a Random Forest tree-by-tree where the construction of a next tree is biased towards complementing the ensemble constructed so far (similar to boosting). One possible weighting function could be the ratio of trees that misclassify the point. In the reference linked here, they indeed do both, weighted bootstrapping and weighted tree construction (as clarified here).

I've done experiments with only weighted bootstrapping and that seems to "work" aswell, so I'm curious.

Some thoughts of mine: Due to that we are sampling with replacement, each example may appear more than once or not at all in the bootstrap sample. If we were to perform weighted bootstrapping, examples would be over- or underrepresented according to their weight, effectively appearing multiple times at the same time in the tree induction procedure. This would be vaguely similar than the example appearing once but being considered in the induction procedure itself with a higher weight.

We know that for uniform bootstrapping there are, in the limit, $$ 1-\lim _{N \rightarrow \infty}\left(1-\frac{1}{N}\right)^N=1-e^{-1} \approx 0.632 $$ unique data points per individual bootstrap sample.

If we were to do weighted bootstrapping, we would have something else for $\frac{1}{N}$ above and there migth be fewer or more unique points in the dataset.

So, if we used uniform bootstrapping with later weighting, we would have potentially better chances of covering a good part of the original data with the bootstrap sample. Low-weight examples would still appear but be weighted lower later.

On the other hand, with weighted bootstrapping, low-weight examples are less-likely to appear in the bootstrap sample at all.

So my guess would be that the two approaches can be quite different. Does that make sense?

edited Sep 12, 2023 at 11:15

asked Sep 12, 2023 at 10:31

ngmir

3391 silver badge8 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Weighted bootstrap sampling vs. uniform bootstrap sampling with later weighting

0

Your Answer

Linked

Hot Network Questions

Weighted bootstrap sampling vs. uniform bootstrap sampling with later weighting

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions