Skip to main content

Questions tagged [weighted-sampling]

If you have survey data with weights, please use "survey-sampling" instead. If you need to draw Monte Carlo samples from a distribution that is intractable/inconvenient, and have to use a sampler from a simpler distribution that you would then correct with weights, please use "importance-sampling", "monte-carlo" and/or "simulation" instead.

Filter by
Sorted by
Tagged with
3 votes
1 answer
143 views

I am analyzing proteomic data from a biobank/large cohort study, which includes both a randomly selected subset of participants and two non-random subsets. Since these non-random selections could ...
AEP's user avatar
  • 425
1 vote
0 answers
68 views

I'm recently looking into SIR, I'm not very familiar with probability. Here it mentioned that if the variance of the sum approaches 0, then the output PDF approaches the target PDF, I wonder what is ...
Wenjian Zhou's user avatar
1 vote
0 answers
49 views

I am currently analyzing the social factors associated with the prevalence of infectious diseases in European countries using GEE models. My data is based on patients covered by a specific health ...
Tal Michael's user avatar
0 votes
0 answers
77 views

I am looking at e-cig prevalence within a city. I used surveys to collect data from residents, and I have a query around weighing data. I have made the assumption, due to over and underrepresentation ...
Aidan's user avatar
  • 1
3 votes
2 answers
392 views

Background The 2023 article "Remarks on some misconceptions about unequal probability sampling without replacement" by Tillé suggests the sample function ...
LBogaardt's user avatar
  • 712
1 vote
0 answers
39 views

Say I'm sampling from a large population of size $N$ without replacement, and denote by $\pi_i$ the probability that unit $i$ is included in the sample, and $\pi_{ij}$ the probability that both $i$ ...
crf's user avatar
  • 319
0 votes
0 answers
89 views

I have a highly imbalanced dataset and I'd like to train a simple ANN classifier on it. My model currently is a simple 2-layer feed-forward neural network with ReLU activation in between. After a few ...
Green 绿色's user avatar
1 vote
2 answers
428 views

If my understanding is correct, the key difference is that: In stratified SRS you intentionally draw $N_h$ samples from each of your $k$ strata ($h = 1...k$, $\sum_{1}^{k}{N_h} = N$) and are ...
k13's user avatar
  • 57
2 votes
1 answer
215 views

I need to bound on a covariance quantity that has come up in a sampling problem. $\widehat{Y}$ and $\widehat{T}$ are Horvitz-Thompson estimators of population totals, $Y=\sum_{i=1}^N y_i$ and $T=\sum_{...
Eaman's user avatar
  • 41
1 vote
0 answers
49 views

Consider observational study with single outcome $Y$, single covariate $X$ and treatment assignment variable $W$. Under unconfounded treatment assignment assumption, $E_{sp}[Y(1)]=E[\frac{Y_i^{obs}W_i}...
user45765's user avatar
  • 1,465
1 vote
1 answer
309 views

I am estimating the population mean of the 2023 value of cars from a stratified sample. The value of the cars is right skewed on visual inspection, and some basic diagnostics indicate normality ...
burnt_pianos's user avatar
2 votes
0 answers
146 views

In a previous question I've asked How to estimate the (approximate) variance of the weighted mean?, specifically, how to prove the following formula: $$ \widehat{\sigma_{\bar{y}_w}^2} = \frac{1}{(\sum{...
Tal Galili's user avatar
  • 22.1k
1 vote
0 answers
67 views

Consider you are scoring tweets for tone based on some sentiment analysis implementation. Each tweet has hypothetically a 90% chance of being correctly scored, while 10% get it wrong for whatever ...
geotheory's user avatar
  • 647
5 votes
1 answer
331 views

Suppose $p(x) = \lambda e^{-\lambda x}$. However, our probability of observing a given sample of $x$ (denoted $z$) is further proportional to $x$ itself, i.e., $p(z\mid x) = \lambda e^{-\lambda x}$. ...
jessexknight's user avatar
1 vote
1 answer
168 views

Setup: The setup is weighted sampling without replacement. By which I mean: You have a set of $n$ items, indexed by integers 1 through $n$, and the items have associated weights $\{w_1,\ldots,w_n\}$ ...
postylem's user avatar
  • 163
1 vote
0 answers
399 views

If I have designed a study where participants from 3 disease groups of fixed size were being sampled and suppose the three groups A, B and C are of sizes n_A=50, n_B=50 and n_C=100. Group A is a ...
s.stats's user avatar
  • 485
6 votes
1 answer
2k views

I want to assess the normality of a dataset (which is log-normally distributed data transformed back to normal) using a Q-Q plot. I stumbled on the fact that there are many ways to build such a plot, ...
Aubergine's user avatar
  • 183
1 vote
0 answers
184 views

in a project of mine i want to use a propensity score weighted gamlss model. However, the gamlss user guide states "In general using weights that are not frequencies is not recommended unless the ...
AStieb's user avatar
  • 11
7 votes
2 answers
3k views

Background: weighted mean In the context of survey statistics it so happens that a sample of respondents from a survey are fit some weights to adjust their answers to the general population. These ...
Tal Galili's user avatar
  • 22.1k
1 vote
1 answer
249 views

Consider a set $X := \{x_1, \ldots, x_n\}$ with corresponding weights $p_1, \ldots, p_n$. Suppose we would like to draw $m < n$ distinct (i.e. unique) elements in a way that the probability of ...
Nikolaj Theodor Thams's user avatar
1 vote
0 answers
34 views

Here's an example of what I mean. 500 Consider a hypothetical game played by members of a population of unknown size. Group A is 13% of the whole population and scores 500 points in a game. Group B ...
JessicaR's user avatar
1 vote
0 answers
64 views

Imagine we have N items and some weights w for each item, we first draw a random integer s (fom uniform) $s \sim int(U(1, N))$ and then we sample s items according to weights w (no replacement). ...
Dirk N's user avatar
  • 325
1 vote
0 answers
54 views

I have some weighted simulations of two variables. I would like to have an idea of how they are correlated. An option is to use a bivariate density estimate which allows the weights. Another option is ...
Stéphane Laurent's user avatar
2 votes
0 answers
139 views

Inverse propensity weighing involves a machine learning model that takes features and outputs the predicted probability that this person is in the sample. Let $w_i$ be the inverse of the output for ...
Andrew NC's user avatar
  • 339
1 vote
2 answers
2k views

According to what i learned in machine learning, the loss function is derived by the Maximum likelihood estimation of training data. Taking logistic regression as an example: we got a train data set $\...
ConnellyM's user avatar
1 vote
2 answers
163 views

I have a table with N rows and n unique elements. Let j denote the row index and i denote the element. In the table below $N=9, n=3$. Let $w_i$ denote the count of element i. For example, $w_1=4, w_2=...
elexhobby's user avatar
  • 865
1 vote
2 answers
266 views

I have a table containing elements in $[1,c]$. The elements may be repeated in the table. I want to sample $m$ unique elements from this table. I can reduce this problem to weighted sampling without ...
elexhobby's user avatar
  • 865
1 vote
0 answers
367 views

I'm currently trying to do semantic segmentation with a deep learning model on images. The dataset is highly imbalanced and i would like to try weighted sampling. I'm using pytorch and a dataloader ...
Sabse's user avatar
  • 31
1 vote
0 answers
424 views

I am a beginner with basic knowledge of statistics - just learning. I have a doubt regarding weighting survey sample distribution to population distribution. I have to create a weighting variable that ...
user275379's user avatar
4 votes
2 answers
6k views

It's my understanding that sample weights are used to ensure that each observation used to train a machine learning model are given a weight corresponding to its perceived importance/value to the ...
pmse234's user avatar
  • 135
2 votes
1 answer
255 views

In terms of importance sampling for numerical Monte Carlo integration we can proceed as follows: \begin{align} \int_{\Omega} p(\mathbf{x}) d\mathbf{x} &= \int_{\Omega} p(\mathbf{x}) \frac{q(\...
tisPrimeTime's user avatar
0 votes
1 answer
90 views

I want to weigh my sample to include non-respondents in my estimations. We have multiple factors which should be taken into account. So it's not only about weighing after gender for example. We have ...
urban-a's user avatar
4 votes
1 answer
3k views

I've doubts about the correct use of sample weights in the NHANES survey, which uses a complex, multistage probability sampling design (1). I'm aware about the importance of the use of the sample ...
Borexino's user avatar
  • 362
8 votes
1 answer
1k views

Imagine we have a webserver, which serves a total of N static URLS. There are users visiting the URLs every day. At the end of each day, we have data like this: ...
Dimitris Andreou's user avatar
2 votes
1 answer
1k views

I am looking at a dataset from Pew Research Center. Inside the dataset, different survey waves have their own weight variable with sampling weights. I thought in general it is supposed to range from 0 ...
Kang Inkyu's user avatar
3 votes
1 answer
395 views

How to use derivatives of a function to better estimate its variance over the domain? I have a scalar smooth function $f(x)$ and a multivariate random variable $x$ with known distribution (e.g. ...
MInner's user avatar
  • 293
0 votes
0 answers
54 views

I have the following statistic for which I need to figure out a sample size: $$S= \frac{1}{n}\sum_{i=1}^n \left(c_i+\sum_{j=1}^{100} b_{ij}X_i\right)$$ where $c_i$ and $b_{ij}$ are constants and $...
Lee's user avatar
  • 1
7 votes
1 answer
5k views

I'm working on a regression problem with imbalanced data, and I would like to know if I'm weighting the errors correctly. I'll try to illustrate the concept with a simple example. Imagine I'm ...
Mario's user avatar
  • 71
0 votes
1 answer
56 views

Suppose there is a school, and I want to know what proportion of students like the color red better than green, or vice versa (suppose there is no "other" option, just a binary variable). The school ...
Mike Kayser's user avatar
0 votes
1 answer
64 views

Cross posted from data science due to lack of response This is probably a thing I am just not searching for correctly, but essentially my idea is this: given some machine learning classification $C$ ...
dashnick's user avatar
  • 171
1 vote
1 answer
2k views

I understand that one may use a particle filter to solve the filtering problem (estimating the hidden state of a system which can be described as a Hidden Markov Model). If I have a system where I ...
SomeRandomPhysicist's user avatar
3 votes
1 answer
833 views

I have been exploring the use of the svyglm function in R's survey package to analyse surveys with both equal and unequal sampling probabilites. For an unequal ...
Andy Davey's user avatar
3 votes
1 answer
492 views

I have only summary statistics for each state in the United States. I have the mean and median prices for each state and that’s it. How can I estimate an “overall” median price for the nation? I ...
Didi's user avatar
  • 31
2 votes
1 answer
183 views

I have a sample with two metrics and one ratio per attribute. I am trying to rank the attributes based on the ratio and variable amounts and from my research I have found that most people find ...
cphill's user avatar
  • 159
1 vote
0 answers
504 views

My question is what is the effect of assigning weights to observations when training a Classifier such as a Logistic Regression model. The glm function documentation in R for example states: Non-...
rf7's user avatar
  • 947
0 votes
1 answer
2k views

Let's say that you have a list of numbers and a weight for each number e.g. X = [(1, 2342), (2, 55), (3...] In the above example, 2342 and 55 are weights. Is weighted random sampling N items from ...
jameszhao00's user avatar
2 votes
0 answers
453 views

Suppose I have $i$ coins, all of which are weighted to have a different probability $p$ of flipping heads. This results in $i$ Bernoulli distributions with different $p_i$. Cumulatively, this results ...
Flow Nuwen's user avatar
0 votes
1 answer
101 views

I am working on Fitness case importance for Symbolic Regression and found a Paper "Step-wise Adaptation of Weights for Symbolic Regression with Genetic Programming" which talks about ...
Quamber Ali's user avatar
1 vote
0 answers
135 views

I'm trying to understand the SVM algorithm but not able to understand what weight vector and bias is ? Could anyone explain it in laymen terms.
Shivam Chaurasia's user avatar
6 votes
3 answers
8k views

$X$ is a random variable with unknown distribution. A number of experiments are conducted to estimate $X$. Each experiment has a different reliability measure in estimating $X$. These $n$ experiments ...
Gerry's user avatar
  • 255