Here is my setup. I have two sets of DNA sequences, one bound by a protein (transcription factor), another not bound by the protein. Let's say I have two candidate motifs, or DNA patterns the protein might tend to bind to. The prevalence ratio of motif 1 is the proportion of bound sequences containing motif 1 divided by the proportion of unbound sequences containing motif 1. Similarly for motif 2. How do I test whether the prevalence ratio of motif 1 is greater than the prevalence ratio of motif 2?
More mathematically, I want to test whether $$\frac{P(M_1|B)}{P(M_1|\bar B)} > \frac{P(M_2|B)}{P(M_2|\bar B)}$$ because by Bayes' rule, $\frac{P(B|M)}{P(\bar B|M)} = \frac{P(M|B)P(B)}{P(M|\bar B)P(\bar B)}$, i.e. the odds of being bound goes up by a factor of the prevalence ratio upon observing the motif. So this is asking whether motif 1 increases the odds of being bound more than motif 2.
More generally we observe:
- $n$ points with the condition (set 1)
- $m$ points without the condition (set 2)
- $k_1$ points in set 1 with marker 1
- $k_2$ points in set 1 with marker 2
- $\ell_1$ points in set 2 with marker 1
- $\ell_2$ points in set 2 with marker 2
Using this we want to test whether $$ \frac{p_1}{q_1} > \frac{p_2}{q_2}$$ where $p_1$ is the probability a datapoint with the condition has marker 1, $q_1$ is the probability a datapoint without the condition has marker 1, $p_2$ is the probability a datapoint with the condition has marker 2, and $q_2$ is the probability a datapoint without the condition has marker 2.
Another example could be, the condition is lung cancer, and marker 1 is smoking, and marker 2 is some genetic condition. So we want to say whether smoking increases the odds of lung cancer more than the genetic condition does, by finding a random set of individuals with lung cancer and a random set of individuals without lung cancer, and comparing the prevalence ratios of smoking / the genetic condition among these sets.