How to embed prior distribution of datasets into the classification model

Question

Given 3 training sets : $(X_1,y_1),(X_2,y_2)$ and $(X_3,y_3)$.

These three datasets are separated as it is being manually tagged in the preprocessing.

Based on the datasets, three classifiers can be trained:

P(class of x = 1) = $f_1(x)$

P(class of x = 1) = $f_2(x)$

P(class of x = 1) = $f_3(x)$

When a new data coming in the manually tagged label is missing so that we dont know which classifier we should use, but we know the distribution of the manually tagged labels:

P(x belongs to tag 1), P(x belongs to tag 2) and P(x belongs to tag 3)

Does it make sense to say that the optimal model is

P(class of x = 1) = P(x belongs to tag 1)$* f_1(x)+$P(x belongs to tag 2)$* f_2(x)+$P(x belongs to tag 3)$* f_3(x)$

If yes, any references are related to this approach?

If no, how should I embed this information into the model?

Note: Due to technical constrains, the size of the 3 training sets doest not match P(x belongs to tag 1), P(x belongs to tag 2) and P(x belongs to tag 3).

For instance, the three datasets may have the same size but

P(x belongs to tag 1)=0.5

P(x belongs to tag 2)=0.3

P(x belongs to tag 3)=0.2

Update: Maybe I can formula the question in a more mathematical precise way:

Given $k$ datasets $D_1,...,D_k$, each dataset consists of a a collection of features $X_i$ and the corresponding labels $y_i$.

The seperation of datasets is based on a preprocessing of the data.

Suppose that $k$ classifiers are trained based on each datasets, the classification probability of an input $x$ belongs to class $c$ in the classifier $i$ is written as

$P(x$ belongs to $ c | D_i)$

Due to technical limitation, for a new input x, we do not know whether we should use which classifier but we know the chance:

$P(D_i)$

Does it make sense to say that overall probability can be written as:

$P(x$ belongs to $ c | D) = \sum_i P(x$ belongs to $ c | D_i)P(D_i)$

user32398 · Accepted Answer · 2016-07-16 18:36:30Z

0

You might look at ensemble classifier fusion, which applies a weighting scheme for using votes from different classifiers. You might also use CV during training, so that each classifier gets a chance to learn the data in the other data sets. I wouldn't separate data used in training among different classifiers, because you want each classifier to see as much data as possible. Afterall, if there happens to be data that are much different in each of your datasets, then a classifier may never see those data during training -- and you would be essentially "choking" off what a classifier can learn (and predict). CV with folds applied to all classifiers used will prevent this from happening. Look at Ron Kohavi's original paper on cross-validation.

answered Jul 16, 2016 at 18:36

user32398

$\begingroup$ Thanks! but how does CV handle the case when we need to embed some prior distribution that is not captured by the training set? $\endgroup$

costa
– costa

2016-07-17 17:46:23 +00:00
Commented Jul 17, 2016 at 17:46
$\begingroup$ That sounds like Bayesian, so maybe refer to e.g. Bayesian classification trees, etc. The process called "boosting" can also perform a variety of strengthening steps for weak classifiers throughout the learning process. A lot of analysts like e.g. boosted decision trees. $\endgroup$

user32398
– user32398

2016-07-17 18:04:19 +00:00
Commented Jul 17, 2016 at 18:04

Add a comment |

Stack Exchange Network

How to embed prior distribution of datasets into the classification model

1 Answer 1

Your Answer

Hot Network Questions

How to embed prior distribution of datasets into the classification model

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions