Newest 'weighted-sampling' Questions

3 votes

1 answer

143 views

How to Account for Sampling Bias Using Inverse-Probability Weighting (IPW): Best Practices for Covariate Selection and Handling Missing Data

I am analyzing proteomic data from a biobank/large cohort study, which includes both a randomly selected subset of participants and two non-random subsets. Since these non-random selections could ...

AEP

425

asked Feb 6 at 14:21

1 vote

0 answers

68 views

sampling importance resampling -- what is the formula of variance?

I'm recently looking into SIR, I'm not very familiar with probability. Here it mentioned that if the variance of the sum approaches 0, then the output PDF approaches the target PDF, I wonder what is ...

Wenjian Zhou

11

asked Sep 12, 2024 at 6:57

1 vote

0 answers

49 views

Clustering and samples weighting (using GEE)

I am currently analyzing the social factors associated with the prevalence of infectious diseases in European countries using GEE models. My data is based on patients covered by a specific health ...

Tal Michael

11

asked Aug 10, 2024 at 8:50

0 votes

0 answers

77 views

Weighing Data Issue

I am looking at e-cig prevalence within a city. I used surveys to collect data from residents, and I have a query around weighing data. I have made the assumption, due to over and underrepresentation ...

Aidan

1

asked Apr 13, 2024 at 2:27

3 votes

2 answers

392 views

Is R's weighted sample without replacement function misleading?

Background The 2023 article "Remarks on some misconceptions about unequal probability sampling without replacement" by Tillé suggests the sample function ...

LBogaardt

712

asked Feb 13, 2024 at 18:59

1 vote

0 answers

39 views

Is there any statistical advantage to using a deterministic sample size in unequal probability sampling with the Horvitz-Thompson estimator?

Say I'm sampling from a large population of size $N$ without replacement, and denote by $\pi_i$ the probability that unit $i$ is included in the sample, and $\pi_{ij}$ the probability that both $i$ ...

crf

319

asked Oct 20, 2023 at 21:53

0 votes

0 answers

89 views

Assign weights to examples in a highly imbalanced dataset

I have a highly imbalanced dataset and I'd like to train a simple ANN classifier on it. My model currently is a simple 2-layer feed-forward neural network with ReLU activation in between. After a few ...

Green 绿色

201

asked Oct 14, 2023 at 9:46

1 vote

2 answers

428 views

Stratified SRS vs. probability-proportional-to-size (PPS) sampling - what's the difference?

If my understanding is correct, the key difference is that: In stratified SRS you intentionally draw $N_h$ samples from each of your $k$ strata ($h = 1...k$, $\sum_{1}^{k}{N_h} = N$) and are ...

k13

57

asked Oct 6, 2023 at 14:51

2 votes

1 answer

215 views

Upper bound for covariance of Hortvitz-Thompson Estimators

I need to bound on a covariance quantity that has come up in a sampling problem. $\widehat{Y}$ and $\widehat{T}$ are Horvitz-Thompson estimators of population totals, $Y=\sum_{i=1}^N y_i$ and $T=\sum_{...

Eaman

41

asked Aug 1, 2023 at 20:25

1 vote

0 answers

49 views

Is there relationship between propensity score based causal inference and sampling weights?

Consider observational study with single outcome $Y$, single covariate $X$ and treatment assignment variable $W$. Under unconfounded treatment assignment assumption, $E_{sp}[Y(1)]=E[\frac{Y_i^{obs}W_i}...

user45765

1,465

asked Jul 28, 2023 at 14:03

1 vote

1 answer

309 views

Non-parametric bootstrap for 95%CI calculation in stratified sample in R

I am estimating the population mean of the 2023 value of cars from a stratified sample. The value of the cars is right skewed on visual inspection, and some basic diagnostics indicate normality ...

burnt_pianos

11

asked Apr 7, 2023 at 2:01

2 votes

0 answers

146 views

What is the (Ratio estimator for the) covariance of two weighted means? [closed]

In a previous question I've asked How to estimate the (approximate) variance of the weighted mean?, specifically, how to prove the following formula: $$ \widehat{\sigma_{\bar{y}_w}^2} = \frac{1}{(\sum{...

Tal Galili

22.1k

asked Mar 26, 2023 at 18:09

1 vote

0 answers

67 views

Amplification effect of retweets on uncertainty

Consider you are scoring tweets for tone based on some sentiment analysis implementation. Each tweet has hypothetically a 90% chance of being correctly scored, while 10% get it wrong for whatever ...

geotheory

647

asked Jan 15, 2023 at 3:47

5 votes

1 answer

331 views

Density of sampled exponential data, with sampling weights proportional to x itself

Suppose $p(x) = \lambda e^{-\lambda x}$. However, our probability of observing a given sample of $x$ (denoted $z$) is further proportional to $x$ itself, i.e., $p(z\mid x) = \lambda e^{-\lambda x}$. ...

jessexknight

301

asked Dec 18, 2022 at 18:11

1 vote

1 answer

168 views

Probability of drawing one element before another in weighted sampling without replacement

Setup: The setup is weighted sampling without replacement. By which I mean: You have a set of $n$ items, indexed by integers 1 through $n$, and the items have associated weights $\{w_1,\ldots,w_n\}$ ...

postylem

163

asked Jul 30, 2022 at 3:08

1 vote

0 answers

399 views

Logistic regression for case-control studies

If I have designed a study where participants from 3 disease groups of fixed size were being sampled and suppose the three groups A, B and C are of sizes n_A=50, n_B=50 and n_C=100. Group A is a ...

s.stats

485

asked Jul 7, 2021 at 20:21

6 votes

1 answer

2k views

Best way to construct a QQ-plot

I want to assess the normality of a dataset (which is log-normally distributed data transformed back to normal) using a Q-Q plot. I stumbled on the fact that there are many ways to build such a plot, ...

Aubergine

183

asked Jun 2, 2021 at 1:00

1 vote

0 answers

184 views

Propensity Score Weighting in GAMLSS

in a project of mine i want to use a propensity score weighted gamlss model. However, the gamlss user guide states "In general using weights that are not frequencies is not recommended unless the ...

AStieb

11

asked May 25, 2021 at 8:13

7 votes

2 answers

3k views

How to estimate the (approximate) variance of the weighted mean?

Background: weighted mean In the context of survey statistics it so happens that a sample of respondents from a survey are fit some weights to adjust their answers to the general population. These ...

Tal Galili

22.1k

asked May 24, 2021 at 17:47

1 vote

1 answer

249 views

An alternative sampling without replacement

Consider a set $X := \{x_1, \ldots, x_n\}$ with corresponding weights $p_1, \ldots, p_n$. Suppose we would like to draw $m < n$ distinct (i.e. unique) elements in a way that the probability of ...

Nikolaj Theodor Thams

108

asked May 14, 2021 at 18:16

1 vote

0 answers

34 views

Comparing rates of different populations given as percentages instead of raw numbers

Here's an example of what I mean. 500 Consider a hypothetical game played by members of a population of unknown size. Group A is 13% of the whole population and scores 500 points in a game. Group B ...

JessicaR

11

asked Mar 28, 2021 at 15:21

1 vote

0 answers

64 views

Sampling with a variable number number of picks

Imagine we have N items and some weights w for each item, we first draw a random integer s (fom uniform) $s \sim int(U(1, N))$ and then we sample s items according to weights w (no replacement). ...

Dirk N

325

asked Feb 26, 2021 at 9:21

1 vote

0 answers

54 views

"Weighted empirical distribution", terminology question

I have some weighted simulations of two variables. I would like to have an idea of how they are correlated. An option is to use a bivariate density estimate which allows the weights. Another option is ...

Stéphane Laurent

20.7k

asked Nov 5, 2020 at 21:33

2 votes

0 answers

139 views

How should one compute confidence intervals for means computed with inverse propensity weights (IPW)?

Inverse propensity weighing involves a machine learning model that takes features and outputs the predicted probability that this person is in the sample. Let $w_i$ be the inverse of the output for ...

Andrew NC

339

asked Oct 28, 2020 at 7:41

1 vote

2 answers

2k views

Questions about object function and loss function in weighted logistic regression

According to what i learned in machine learning, the loss function is derived by the Maximum likelihood estimation of training data. Taking logistic regression as an example: we got a train data set $\...

ConnellyM

13

asked Sep 27, 2020 at 15:29

1 vote

2 answers

163 views

How can I prove that two algorithms for weighted sampling without replacement are equivalent?

I have a table with N rows and n unique elements. Let j denote the row index and i denote the element. In the table below $N=9, n=3$. Let $w_i$ denote the count of element i. For example, $w_1=4, w_2=...

elexhobby

865

asked May 22, 2020 at 7:39

1 vote

2 answers

266 views

Sample unique elements from an array containing repeated values

I have a table containing elements in $[1,c]$. The elements may be repeated in the table. I want to sample $m$ unique elements from this table. I can reduce this problem to weighted sampling without ...

elexhobby

865

asked May 16, 2020 at 3:18

1 vote

0 answers

367 views

Neural Networks: How to set the weights for weighted sampling for semantic segmentation?

I'm currently trying to do semantic segmentation with a deep learning model on images. The dataset is highly imbalanced and i would like to try weighted sampling. I'm using pytorch and a dataloader ...

Sabse

31

asked May 14, 2020 at 7:41

1 vote

0 answers

424 views

Help with weighting sample according to population

I am a beginner with basic knowledge of statistics - just learning. I have a doubt regarding weighting survey sample distribution to population distribution. I have to create a weighting variable that ...

user275379

11

asked Mar 2, 2020 at 12:43

4 votes

2 answers

6k views

Is there any reason to factor in sample weights when applying a scoring function to a test set?

It's my understanding that sample weights are used to ensure that each observation used to train a machine learning model are given a weight corresponding to its perceived importance/value to the ...

pmse234

135

asked Feb 20, 2020 at 22:41

2 votes

1 answer

255 views

Nested Uniform Distributions in Monte Carlo Integration

In terms of importance sampling for numerical Monte Carlo integration we can proceed as follows: \begin{align} \int_{\Omega} p(\mathbf{x}) d\mathbf{x} &= \int_{\Omega} p(\mathbf{x}) \frac{q(\...

tisPrimeTime

585

asked Jan 27, 2020 at 12:48

0 votes

1 answer

90 views

Calibrate Sample: What kind of data do I need to address non-respondents? [closed]

I want to weigh my sample to include non-respondents in my estimations. We have multiple factors which should be taken into account. So it's not only about weighing after gender for example. We have ...

urban-a

3

asked Jun 11, 2019 at 14:05

4 votes

1 answer

3k views

Correct use of the sample weights in a complex survey design for association analysis (Logit OR)

I've doubts about the correct use of sample weights in the NHANES survey, which uses a complex, multistage probability sampling design (1). I'm aware about the importance of the use of the sample ...

Borexino

362

asked May 29, 2019 at 12:47

8 votes

1 answer

1k views

How to compute confidence intervals from weighted samples?

Imagine we have a webserver, which serves a total of N static URLS. There are users visiting the URLs every day. At the end of each day, we have data like this: ...

Dimitris Andreou

181

asked May 23, 2019 at 12:53

2 votes

1 answer

1k views

why sampling weights that I have range from 1, not 0?

I am looking at a dataset from Pew Research Center. Inside the dataset, different survey waves have their own weight variable with sampling weights. I thought in general it is supposed to range from 0 ...

Kang Inkyu

479

asked Apr 25, 2019 at 17:23

3 votes

1 answer

395 views

How to use derivatives of a function to better estimate its variance over the domain?

How to use derivatives of a function to better estimate its variance over the domain? I have a scalar smooth function $f(x)$ and a multivariate random variable $x$ with known distribution (e.g. ...

MInner

293

asked Feb 14, 2019 at 18:57

0 votes

0 answers

54 views

Computing the Sample Size for the sum of Bernoulli RVs with different probabilites times a constant

I have the following statistic for which I need to figure out a sample size: $$S= \frac{1}{n}\sum_{i=1}^n \left(c_i+\sum_{j=1}^{100} b_{ij}X_i\right)$$ where $c_i$ and $b_{ij}$ are constants and $...

Lee

1

asked Feb 1, 2019 at 17:21

7 votes

1 answer

5k views

Which is the right way to handle imbalanced data in a regression problem?

I'm working on a regression problem with imbalanced data, and I would like to know if I'm weighting the errors correctly. I'll try to illustrate the concept with a simple example. Imagine I'm ...

Mario

71

asked Jan 16, 2019 at 10:12

0 votes

1 answer

56 views

Uncertainty-minimizing stratified sampling strategy

Suppose there is a school, and I want to know what proportion of students like the color red better than green, or vice versa (suppose there is no "other" option, just a binary variable). The school ...

Mike Kayser

3

asked Dec 13, 2018 at 16:01

0 votes

1 answer

64 views

Resampling to get equal predictive power per observation

Cross posted from data science due to lack of response This is probably a thing I am just not searching for correctly, but essentially my idea is this: given some machine learning classification $C$ ...

dashnick

171

asked Oct 24, 2018 at 2:32

1 vote

1 answer

2k views

How to calculate importance weights for update step of an SIR (Sequential Importance Resampling) Particle filter?

I understand that one may use a particle filter to solve the filtering problem (estimating the hidden state of a system which can be described as a Hidden Markov Model). If I have a system where I ...

SomeRandomPhysicist

235

asked Apr 12, 2018 at 8:56

3 votes

1 answer

833 views

Equivalence of svyglm and glm for simple random surveys

I have been exploring the use of the svyglm function in R's survey package to analyse surveys with both equal and unequal sampling probabilites. For an unequal ...

Andy Davey

31

asked Mar 16, 2018 at 16:05

3 votes

1 answer

492 views

Finding median without raw data?

I have only summary statistics for each state in the United States. I have the mean and median prices for each state and that’s it. How can I estimate an “overall” median price for the nation? I ...

Didi

31

asked Oct 16, 2017 at 12:26

2 votes

1 answer

183 views

Application of Bayesian Averaging for Ranking

I have a sample with two metrics and one ratio per attribute. I am trying to rank the attributes based on the ratio and variable amounts and from my research I have found that most people find ...

cphill

159

asked Sep 6, 2017 at 14:09

1 vote

0 answers

504 views

What is the Effect of Weighting observations when training a Classifier and how it can be combined with Subsampling?

My question is what is the effect of assigning weights to observations when training a Classifier such as a Logistic Regression model. The glm function documentation in R for example states: Non-...

rf7

947

asked Aug 2, 2017 at 7:10

0 votes

1 answer

2k views

Equivalent to weighted random sample? [closed]

Let's say that you have a list of numbers and a weight for each number e.g. X = [(1, 2342), (2, 55), (3...] In the above example, 2342 and 55 are weights. Is weighted random sampling N items from ...

jameszhao00

13

asked Mar 20, 2017 at 4:59

2 votes

0 answers

453 views

Hypothesis testing on Weighted Poisson Binomial Distribution

Suppose I have $i$ coins, all of which are weighted to have a different probability $p$ of flipping heads. This results in $i$ Bernoulli distributions with different $p_i$. Cumulatively, this results ...

Flow Nuwen

121

asked Mar 17, 2017 at 21:28

0 votes

1 answer

101 views

Can someone point me towards research works relevant to Importance or Weighting Datapoints like SAW(Stepwise adaptation of weights) technique?

I am working on Fitness case importance for Symbolic Regression and found a Paper "Step-wise Adaptation of Weights for Symbolic Regression with Genetic Programming" which talks about ...

Quamber Ali

3

asked Mar 11, 2017 at 11:15

1 vote

0 answers

135 views

what is weight vector and bias in svm [duplicate]

I'm trying to understand the SVM algorithm but not able to understand what weight vector and bias is ? Could anyone explain it in laymen terms.

Shivam Chaurasia

111

asked Dec 21, 2016 at 10:39

6 votes

3 answers

8k views

SE of weighted mean

$X$ is a random variable with unknown distribution. A number of experiments are conducted to estimate $X$. Each experiment has a different reliability measure in estimating $X$. These $n$ experiments ...

Gerry

255

asked Dec 18, 2016 at 1:21

Questions tagged [weighted-sampling]