Questions tagged [weighted-sampling]
If you have survey data with weights, please use "survey-sampling" instead. If you need to draw Monte Carlo samples from a distribution that is intractable/inconvenient, and have to use a sampler from a simpler distribution that you would then correct with weights, please use "importance-sampling", "monte-carlo" and/or "simulation" instead.
140 questions
3
votes
1
answer
143
views
How to Account for Sampling Bias Using Inverse-Probability Weighting (IPW): Best Practices for Covariate Selection and Handling Missing Data
I am analyzing proteomic data from a biobank/large cohort study, which includes both a randomly selected subset of participants and two non-random subsets. Since these non-random selections could ...
1
vote
0
answers
68
views
sampling importance resampling -- what is the formula of variance?
I'm recently looking into SIR, I'm not very familiar with probability. Here it mentioned that if the variance of the sum approaches 0, then the output PDF approaches the target PDF, I wonder what is ...
1
vote
0
answers
49
views
Clustering and samples weighting (using GEE)
I am currently analyzing the social factors associated with the prevalence of infectious diseases in European countries using GEE models. My data is based on patients covered by a specific health ...
0
votes
0
answers
77
views
Weighing Data Issue
I am looking at e-cig prevalence within a city. I used surveys to collect data from residents, and I have a query around weighing data.
I have made the assumption, due to over and underrepresentation ...
3
votes
2
answers
392
views
Is R's weighted sample without replacement function misleading?
Background
The 2023 article "Remarks on some misconceptions about unequal probability sampling without replacement" by Tillé suggests the sample function ...
1
vote
0
answers
39
views
Is there any statistical advantage to using a deterministic sample size in unequal probability sampling with the Horvitz-Thompson estimator?
Say I'm sampling from a large population of size $N$ without replacement, and denote by $\pi_i$ the probability that unit $i$ is included in the sample, and $\pi_{ij}$ the probability that both $i$ ...
0
votes
0
answers
89
views
Assign weights to examples in a highly imbalanced dataset
I have a highly imbalanced dataset and I'd like to train a simple ANN classifier on it. My model currently is a simple 2-layer feed-forward neural network with ReLU activation in between. After a few ...
1
vote
2
answers
428
views
Stratified SRS vs. probability-proportional-to-size (PPS) sampling - what's the difference?
If my understanding is correct, the key difference is that:
In stratified SRS you intentionally draw $N_h$ samples from each of your $k$ strata ($h = 1...k$, $\sum_{1}^{k}{N_h} = N$) and are ...
2
votes
1
answer
215
views
Upper bound for covariance of Hortvitz-Thompson Estimators
I need to bound on a covariance quantity that has come up in a sampling problem. $\widehat{Y}$ and $\widehat{T}$ are Horvitz-Thompson estimators of population totals, $Y=\sum_{i=1}^N y_i$ and $T=\sum_{...
1
vote
0
answers
49
views
Is there relationship between propensity score based causal inference and sampling weights?
Consider observational study with single outcome $Y$, single covariate $X$ and treatment assignment variable $W$. Under unconfounded treatment assignment assumption, $E_{sp}[Y(1)]=E[\frac{Y_i^{obs}W_i}...
1
vote
1
answer
309
views
Non-parametric bootstrap for 95%CI calculation in stratified sample in R
I am estimating the population mean of the 2023 value of cars from a stratified sample. The value of the cars is right skewed on visual inspection, and some basic diagnostics indicate normality ...
2
votes
0
answers
146
views
What is the (Ratio estimator for the) covariance of two weighted means? [closed]
In a previous question I've asked How to estimate the (approximate) variance of the weighted mean?, specifically, how to prove the following formula:
$$
\widehat{\sigma_{\bar{y}_w}^2} = \frac{1}{(\sum{...
1
vote
0
answers
67
views
Amplification effect of retweets on uncertainty
Consider you are scoring tweets for tone based on some sentiment analysis implementation. Each tweet has hypothetically a 90% chance of being correctly scored, while 10% get it wrong for whatever ...
5
votes
1
answer
331
views
Density of sampled exponential data, with sampling weights proportional to x itself
Suppose $p(x) = \lambda e^{-\lambda x}$. However, our probability of observing a given sample of $x$ (denoted $z$) is further proportional to $x$ itself, i.e., $p(z\mid x) = \lambda e^{-\lambda x}$. ...
1
vote
1
answer
168
views
Probability of drawing one element before another in weighted sampling without replacement
Setup:
The setup is weighted sampling without replacement. By which I mean:
You have a set of $n$ items, indexed by integers 1 through $n$, and the items have associated weights $\{w_1,\ldots,w_n\}$ ...
1
vote
0
answers
399
views
Logistic regression for case-control studies
If I have designed a study where participants from 3 disease groups of fixed size were being sampled and suppose the three groups A, B and C are of sizes n_A=50, n_B=50 and n_C=100. Group A is a ...
6
votes
1
answer
2k
views
Best way to construct a QQ-plot
I want to assess the normality of a dataset (which is log-normally distributed data transformed back to normal) using a Q-Q plot.
I stumbled on the fact that there are many ways to build such a plot, ...
1
vote
0
answers
184
views
Propensity Score Weighting in GAMLSS
in a project of mine i want to use a propensity score weighted gamlss model. However, the gamlss user guide states "In general using weights that are not frequencies is not recommended unless the ...
7
votes
2
answers
3k
views
How to estimate the (approximate) variance of the weighted mean?
Background: weighted mean
In the context of survey statistics it so happens that a sample of respondents from a survey are fit some weights to adjust their answers to the general population. These ...
1
vote
1
answer
249
views
An alternative sampling without replacement
Consider a set $X := \{x_1, \ldots, x_n\}$ with corresponding weights $p_1, \ldots, p_n$. Suppose we would like to draw $m < n$ distinct (i.e. unique) elements in a way that the probability of ...
1
vote
0
answers
34
views
Comparing rates of different populations given as percentages instead of raw numbers
Here's an example of what I mean.
500
Consider a hypothetical game played by members of a population of unknown size.
Group A is 13% of the whole population and scores 500 points in a game.
Group B ...
1
vote
0
answers
64
views
Sampling with a variable number number of picks
Imagine we have N items and some weights w for each item, we first draw a random integer s (fom uniform) $s \sim int(U(1, N))$ and then we sample s items according to weights w (no replacement).
...
1
vote
0
answers
54
views
"Weighted empirical distribution", terminology question
I have some weighted simulations of two variables. I would like to have an idea of how they are correlated. An option is to use a bivariate density estimate which allows the weights. Another option is ...
2
votes
0
answers
139
views
How should one compute confidence intervals for means computed with inverse propensity weights (IPW)?
Inverse propensity weighing involves a machine learning model that takes features and outputs the predicted probability that this person is in the sample. Let $w_i$ be the inverse of the output for ...
1
vote
2
answers
2k
views
Questions about object function and loss function in weighted logistic regression
According to what i learned in machine learning, the loss function is derived by the Maximum likelihood estimation of training data. Taking logistic regression as an example:
we got a train data set $\...
1
vote
2
answers
163
views
How can I prove that two algorithms for weighted sampling without replacement are equivalent?
I have a table with N rows and n unique elements. Let j denote the row index and i denote the element. In the table below $N=9, n=3$. Let $w_i$ denote the count of element i. For example, $w_1=4, w_2=...
1
vote
2
answers
266
views
Sample unique elements from an array containing repeated values
I have a table containing elements in $[1,c]$. The elements may be repeated in the table. I want to sample $m$ unique elements from this table.
I can reduce this problem to weighted sampling without ...
1
vote
0
answers
367
views
Neural Networks: How to set the weights for weighted sampling for semantic segmentation?
I'm currently trying to do semantic segmentation with a deep learning model on images. The dataset is highly imbalanced and i would like to try weighted sampling. I'm using pytorch and a dataloader ...
1
vote
0
answers
424
views
Help with weighting sample according to population
I am a beginner with basic knowledge of statistics - just learning. I have a doubt regarding weighting survey sample distribution to population distribution.
I have to create a weighting variable that ...
4
votes
2
answers
6k
views
Is there any reason to factor in sample weights when applying a scoring function to a test set?
It's my understanding that sample weights are used to ensure that each observation used to train a machine learning model are given a weight corresponding to its perceived importance/value to the ...
2
votes
1
answer
255
views
Nested Uniform Distributions in Monte Carlo Integration
In terms of importance sampling for numerical Monte Carlo integration we can proceed as follows:
\begin{align}
\int_{\Omega} p(\mathbf{x}) d\mathbf{x} &= \int_{\Omega} p(\mathbf{x}) \frac{q(\...
0
votes
1
answer
90
views
Calibrate Sample: What kind of data do I need to address non-respondents? [closed]
I want to weigh my sample to include non-respondents in my estimations.
We have multiple factors which should be taken into account.
So it's not only about weighing after gender for example.
We have ...
4
votes
1
answer
3k
views
Correct use of the sample weights in a complex survey design for association analysis (Logit OR)
I've doubts about the correct use of sample weights in the NHANES survey, which uses a complex, multistage probability sampling design (1).
I'm aware about the importance of the use of the sample ...
8
votes
1
answer
1k
views
How to compute confidence intervals from *weighted* samples?
Imagine we have a webserver, which serves a total of N static URLS.
There are users visiting the URLs every day. At the end of each day, we have data like this:
...
2
votes
1
answer
1k
views
why sampling weights that I have range from 1, not 0?
I am looking at a dataset from Pew Research Center. Inside the dataset, different survey waves have their own weight variable with sampling weights. I thought in general it is supposed to range from 0 ...
3
votes
1
answer
395
views
How to use derivatives of a function to better estimate its variance over the domain?
How to use derivatives of a function to better estimate its variance over the domain?
I have a scalar smooth function $f(x)$ and a multivariate random variable $x$ with known distribution (e.g. ...
0
votes
0
answers
54
views
Computing the Sample Size for the sum of Bernoulli RVs with different probabilites times a constant
I have the following statistic for which I need to figure out a sample size:
$$S= \frac{1}{n}\sum_{i=1}^n \left(c_i+\sum_{j=1}^{100} b_{ij}X_i\right)$$
where $c_i$ and $b_{ij}$ are constants and $...
7
votes
1
answer
5k
views
Which is the right way to handle imbalanced data in a regression problem?
I'm working on a regression problem with imbalanced data, and I would like to know if I'm weighting the errors correctly. I'll try to illustrate the concept with a simple example.
Imagine I'm ...
0
votes
1
answer
56
views
Uncertainty-minimizing stratified sampling strategy
Suppose there is a school, and I want to know what proportion of students like the color red better than green, or vice versa (suppose there is no "other" option, just a binary variable). The school ...
0
votes
1
answer
64
views
Resampling to get equal predictive power per observation
Cross posted from data science due to lack of response
This is probably a thing I am just not searching for correctly, but essentially my idea is this: given some machine learning classification $C$ ...
1
vote
1
answer
2k
views
How to calculate importance weights for update step of an SIR (Sequential Importance Resampling) Particle filter?
I understand that one may use a particle filter to solve the filtering problem (estimating the hidden state of a system which can be described as a Hidden Markov Model).
If I have a system where I ...
3
votes
1
answer
833
views
Equivalence of svyglm and glm for simple random surveys
I have been exploring the use of the svyglm function in R's survey package to analyse surveys with both equal and unequal sampling probabilites.
For an unequal ...
3
votes
1
answer
492
views
Finding median without raw data?
I have only summary statistics for each state in the United States. I have the mean and median prices for each state and that’s it.
How can I estimate an “overall” median price for the nation? I ...
2
votes
1
answer
183
views
Application of Bayesian Averaging for Ranking
I have a sample with two metrics and one ratio per attribute. I am trying to rank the attributes based on the ratio and variable amounts and from my research I have found that most people find ...
1
vote
0
answers
504
views
What is the Effect of Weighting observations when training a Classifier and how it can be combined with Subsampling?
My question is what is the effect of assigning weights to observations when training a Classifier such as a Logistic Regression model.
The glm function documentation in R for example states:
Non-...
0
votes
1
answer
2k
views
Equivalent to weighted random sample? [closed]
Let's say that you have a list of numbers and a weight for each number e.g.
X = [(1, 2342), (2, 55), (3...]
In the above example, 2342 and 55 are weights.
Is weighted random sampling N items from ...
2
votes
0
answers
453
views
Hypothesis testing on Weighted Poisson Binomial Distribution
Suppose I have $i$ coins, all of which are weighted to have a different probability $p$ of flipping heads. This results in $i$ Bernoulli distributions with different $p_i$. Cumulatively, this results ...
0
votes
1
answer
101
views
Can someone point me towards research works relevant to Importance or Weighting Datapoints like SAW(Stepwise adaptation of weights) technique?
I am working on Fitness case importance for Symbolic Regression and found a Paper "Step-wise Adaptation of Weights for Symbolic Regression with Genetic Programming" which talks about ...
1
vote
0
answers
135
views
what is weight vector and bias in svm [duplicate]
I'm trying to understand the SVM algorithm but not able to understand what weight vector and bias is ? Could anyone explain it in laymen terms.
6
votes
3
answers
8k
views
SE of weighted mean
$X$ is a random variable with unknown distribution. A number of experiments are conducted to estimate $X$. Each experiment has a different reliability measure in estimating $X$. These $n$ experiments ...