Assume I have a fancy procedure $w: X \to \mathbb{R}$ to come up with weights for examples $x \in X$. Think of it as similar to the weights used in e.g. some boosting procedures.
Now, I want to build a classification Random Forest ensemble using bootstrapping, i.e. each tree is constructed on a randomly sampled subset of the data. The tree construction procedure should take the weights into account.
There are two possible approaches:
- Perform a weighted bootstrap sampling, i.e. examples are not selected with uniform probability but according to their weights $w$
- Perform uniform bootstrapping and use the weights in the tree induction procedure when counting the number of points of each class (e.g. as described here)
In both cases, we draw $n$ samples with replacement.
Is there a difference between the two approaches?
My motivation / context is as follows: I want to construct a Random Forest tree-by-tree where the construction of a next tree is biased towards complementing the ensemble constructed so far (similar to boosting). One possible weighting function could be the ratio of trees that misclassify the point. In the reference linked here, they indeed do both, weighted bootstrapping and weighted tree construction (as clarified here).
I've done experiments with only weighted bootstrapping and that seems to "work" aswell, so I'm curious.
Some thoughts of mine: Due to that we are sampling with replacement, each example may appear more than once or not at all in the bootstrap sample. If we were to perform weighted bootstrapping, examples would be over- or underrepresented according to their weight, effectively appearing multiple times at the same time in the tree induction procedure. This would be vaguely similar than the example appearing once but being considered in the induction procedure itself with a higher weight.
We know that for uniform bootstrapping there are, in the limit, $$ 1-\lim _{N \rightarrow \infty}\left(1-\frac{1}{N}\right)^N=1-e^{-1} \approx 0.632 $$ unique data points per individual bootstrap sample.
If we were to do weighted bootstrapping, we would have something else for $\frac{1}{N}$ above and there migth be fewer or more unique points in the dataset.
So, if we used uniform bootstrapping with later weighting, we would have potentially better chances of covering a good part of the original data with the bootstrap sample. Low-weight examples would still appear but be weighted lower later.
On the other hand, with weighted bootstrapping, low-weight examples are less-likely to appear in the bootstrap sample at all.
So my guess would be that the two approaches can be quite different. Does that make sense?