Skip to main content

Questions tagged [cluster-sample]

Cluster sampling is a sampling design in which the observation units have to be grouped together for logistical reasons (e.g., students clustered in schools or households clustered in a geographic area). Typically, cluster samples are multistage samples, so geographic areas are selected in the first stage and households in the subsequent stage.

Filter by
Sorted by
Tagged with
3 votes
1 answer
131 views

Formulas for the "design effect" of cluster sampling are often of the form $\text{DE} = 1 + (\text{Avg Cluster Size} - 1)\cdot\text{ICC}.$ I see the result of this calculation (or another ...
Vasco Brazao's user avatar
0 votes
0 answers
53 views

I had to learn about statistical models to approach a genetics project that I inherited: we obtained genotypes for hundreds of biallelic SNPs (possible values for each SNP: 0 = non-carrier, 1 = ...
txen's user avatar
  • 1
0 votes
0 answers
50 views

I am trying to calculate the sample size needed for an experiment with two independent study groups with a dichotomous/binary outcome. The outcome consists of making a selection. Group 1 has three ...
TreatmentGroups's user avatar
1 vote
0 answers
63 views

I want to compute a confidence interval for the F1-score of a machine learning model for the classification of blood cells on test data. The data is clustered, as I have multiple cells for every ...
saveturn's user avatar
2 votes
0 answers
79 views

I seek a peer check on my approach in calculating the sample size for pair-matched or stratified cluster randomized trials assuming 80% power. Some background on the study: I am working with ...
Tavaro Evanis's user avatar
3 votes
1 answer
125 views

I am working on patients' data. I want to do multilevel logistic regression. The cluster is hospital, exposure variable is treatment (A, B, C), and independent variables include sex, age and others. I ...
W Ramadi's user avatar
0 votes
0 answers
60 views

I am trying to interpret an article involving data from a cluster randomised trial, where the confidence intervals for effect sizes are said to have been adjusted 'using the standard errors of the ...
Roger Gomm's user avatar
2 votes
0 answers
72 views

I'm analyzing data from the usual care period only of a stepped wedge cluster-randomized trial. The goal is to describe the usual care period as though it was a cohort study because much higher ...
telegraph's user avatar
1 vote
0 answers
148 views

I am interested stratified sampling for the purposes of cluster validation. The purpose is to perform cluster analysis in a subset of the data and check to see if the precise distribution of variables ...
pvelayudhan's user avatar
3 votes
1 answer
79 views

I've recently come across this problem in my textbook: To gather information about the validity of a new standardized test for high school juniors across the United States, a random sample of 20 high ...
wyatt400's user avatar
  • 105
0 votes
0 answers
120 views

What formula or software package can be used for sample size caculations for correlated eye data? An observational study is being conducted in which participants that have two normal eyes and ...
s.stats's user avatar
  • 485
0 votes
0 answers
32 views

Can I apply a Poisson distribution if I have different sample sizes for each cluster? My experiment is about diversity in International Baccalaureate vs non-IB classes, and I used single-stage cluster ...
Ben's user avatar
  • 21
1 vote
1 answer
130 views

Is it possible to do multiclass classification using GPBoost? For example when we have 3 or more classes (e.g. specie A/ specie B/ specie C) from a clustered data set (e.g. several measurements over ...
Oscar's user avatar
  • 11
0 votes
0 answers
96 views

I want to fit a linear regression model, but I know that if I treat each instance as unique there will be many pseudo replicates. I want to explore the relationship between 2 binary variables and one ...
taz's user avatar
  • 1
1 vote
0 answers
120 views

I need to design a survey such that the margin of error for a binary, categorical variable is bounded at a certain level. The survey needs to be a one or two-stage cluster sample where clusters are of ...
Josh's user avatar
  • 2,477
2 votes
1 answer
69 views

I'm reading STATA's Survey Data Reference Manual. There is written that: Cluster sampling typically results in larger sample-to-sample variability than sampling individuals directly. Do you have an ...
robertspierre's user avatar
0 votes
0 answers
135 views

I am doing a research on the effect of Covid on fundraising success. I already implemented logisitc regression models with Covid as interaction variable but my supervisor wants me to use DiD. I asked ...
Zeinab Elashidy's user avatar
1 vote
1 answer
242 views

I understand that you would consider multilevel or hierarchical linear mixed effects model with your data are nested with multiple level and be grouped. However, I assume that the observation will ...
drexel star's user avatar
0 votes
0 answers
303 views

When using a regressor ("generated regressor") that is generated in a first-stage equation and used in a second-stage equation, then standard errors will be understated (here is a readable ...
minimouse's user avatar
0 votes
0 answers
121 views

I have determined malaria prevalence in 8 villages with household-level clustering. As an initial test (basic descriptive statistics), what test (as the independence assumption of K-W is violated) is ...
Trypanosoma's user avatar
1 vote
0 answers
166 views

What value of ICC should I use when calculating the effective sample size of clustered data? A previous publication conducting a similar study reported ICCs of 0.04 for the full mixed model (random ...
Trypanosoma's user avatar
1 vote
0 answers
351 views

I am performing a risk factor analysis of the individual and household-level factors associated with Plasmodium infection in individuals located in 8 villages in Cambodia. I want to perform an initial ...
Trypanosoma's user avatar
1 vote
0 answers
76 views

Good morning, I am performing a risk factor analysis of the individual and household-level factors associated with Plasmodium infection in individuals located in 8 villages in Cambodia. I have ...
Trypanosoma's user avatar
1 vote
1 answer
274 views

Consider the area under wheat for a sample of 44 clusters selected from 11 different villages. Four clusters were selected from each of the 11 villages and each cluster consists of 8 survey numbers(...
simran's user avatar
  • 408
0 votes
1 answer
110 views

I have the gender composition for thousands of boards (there is no sampling involved. The data set contains all boards). Boards are consisted of different number of male and female directors. So, to ...
Nima Darbari's user avatar
1 vote
0 answers
123 views

In my setting, I have post-intervention observational data in which individuals are nested into villages. The treatment consists of an information campaign that targeted villages in a non-random ...
edo's user avatar
  • 23
1 vote
0 answers
75 views

Checking out this file published by the european social survey website https://www.europeansocialsurvey.org/docs/methodology/ESS_weighting_data_1.pdf it states that : • when analysing data for one ...
An116's user avatar
  • 367
1 vote
1 answer
207 views

I have a longitudinal data, with different follow-up number for individuals. I have considered measurements for each individual as a curve which I already smoothed them, then calculated the area under ...
user358238's user avatar
0 votes
0 answers
66 views

i am working with some data regarding my PhD, where i am interested to see the ability of the institution of different villages of four districts in enhancing people's living condition. here, through ...
Mrinal saikia's user avatar
0 votes
1 answer
63 views

Is there a reasonable situation where the clusters are anonymous? What I mean is that one can ensure the subjects sampled are from the same cluster, but she does not know exactly which cluster they ...
Ypbor's user avatar
  • 181
6 votes
1 answer
119 views

Let's say that I want to measure the effect of a treatment on the performance of a firm. However my sample is very small. Let's say 10 firms. It is not possible to observe more firms. All these firms ...
Tom's user avatar
  • 538
1 vote
1 answer
103 views

I would like to create logistic model using the OCA STAT Act data in R, however, since this data is a compilation of court appearances since 2020, some defendants may have appeared before court more ...
Elijah Appelson's user avatar
1 vote
1 answer
85 views

Can Machine Learning Models Recover "Experimental, Design and Hierarchical Structures" Within the Data? At times, real world data can contain "embedded structures" - these ...
stats_noob's user avatar
0 votes
1 answer
2k views

I'm working with research that has cross-sectional data. I have collected information about publicly-listed banks in many countries. For example, for each bank I collected the following information: ...
Asas24's user avatar
  • 1
1 vote
1 answer
442 views

I am trying to choose between GEE and hierarchical linear regression for analysis of experimental vignette (2x2 factorial (0/1) design) data. Each respondent (N=160) filled in 2 vignettes, thus the ...
GabrieleC's user avatar
2 votes
2 answers
462 views

I always had the following question: Can data having "naturally occurring hierarchical structure" be transformed to better make use of this hierarchical structure at different levels? To ...
stats_noob's user avatar
0 votes
0 answers
572 views

I am running a difference in difference to examine the effect of a merger on petrol prices. I am looking to see whether the prices of company A have increased due to a merger with company B. Local ...
Consequence's user avatar
1 vote
2 answers
811 views

Let's say that I have experimental data where the level of treatment is at a higher level of aggregation than the level of observation. For example, imagine some subset of schools adopted a new ...
StatStudent19's user avatar
3 votes
1 answer
400 views

Abadie, 2017 have a paper about when we should cluster. And this paper has been summarized by McKenzie here. I used the paper of Dasgupta,2019 to link to the summarized work of McKenzie. So, in ...
Phil Nguyen's user avatar
1 vote
0 answers
48 views

Yesterday, from a suggestion of @Dimitriy V. Masterov here, I saw from the given link about one of the reason we can avoid clustering is You want to say something about the association between ...
Phil Nguyen's user avatar
0 votes
0 answers
68 views

I have a dataset and divided the sample into 6 groups based on 4 binary criteria (e.g. "1" for has a Chief Digital Officer and "0" otherwise). Now I want to conduct a regression of ...
kra's user avatar
  • 1
1 vote
0 answers
188 views

Suppose that we have $n$ mutually independent (not i.i.d.) random variables $X_1,\dots,X_n$. We assume that these random variables can be divided into $k$ distinct groups ($k<n$), where in each ...
Amir's user avatar
  • 31
0 votes
2 answers
97 views

Going through the a course on Survey Weights and it says that even though a dataset may sample using 3 clusters (like Counties, City Blocks, and households), you only need to specify the first level ...
Kevin's user avatar
  • 1
1 vote
0 answers
43 views

I'm curious if it's still valid to use analysis techniques for clustered data when only one of the two comparison groups consists of multiple clusters. For example, the control group consists of 10 ...
Janice's user avatar
  • 11
0 votes
0 answers
279 views

i have kmeans clustered data, and cluster centroids of the kmeans. I want to calculate density of each cluster centroid and remove the cluster of the highest cluster centroid density. I did my ...
Serkan Gün's user avatar
2 votes
2 answers
690 views

Suppose I have a large population and I want to test if installing a new special light bulb can reduce energy consumption. Since I have a large population, I'll have people that usually consume high ...
Numbermind's user avatar
1 vote
1 answer
649 views

I am working on an empirical paper using repeated cross-sectional data, and a reviewer has asked that we cluster our standard errors at the same level as our geographic fixed effects. Given the ...
Letti234's user avatar
1 vote
1 answer
1k views

I have a timeseries dataset of users with different profiles. I want to use lstm for predicting 1 day ahead of each user. My approach to the problem is first clustering users of same behaviour. And ...
D.small's user avatar
  • 11
3 votes
1 answer
2k views

I am having trouble determining which method to use to analyze my data. Here is the info: -575 observations nested within 292 groups -some groups only have one observation, the max number is 23 in a ...
Brad's user avatar
  • 31
9 votes
5 answers
1k views

I am fuzzy on the distinctions between sampling strata and sampling clusters. Both seem to aim at designs aiming at creating useful estimates of between/within group (strata, cluster) variation, and ...
Alexis's user avatar
  • 31.5k