Newest 'cluster-sample' Questions

3 votes

1 answer

131 views

Which ICC (conditional or unconditional) to use for calculating Design Effect and effect sizes?

Formulas for the "design effect" of cluster sampling are often of the form $\text{DE} = 1 + (\text{Avg Cluster Size} - 1)\cdot\text{ICC}.$ I see the result of this calculation (or another ...

Vasco Brazao

101

asked Aug 19 at 10:26

0 votes

0 answers

53 views

Cluster number and size in modelling a categorical variable by GEE

I had to learn about statistical models to approach a genetics project that I inherited: we obtained genotypes for hundreds of biallelic SNPs (possible values for each SNP: 0 = non-carrier, 1 = ...

txen

1

asked Feb 12 at 22:45

0 votes

0 answers

50 views

Sample Size Calculation for Experiment Two Independent Groups with Binary Outcome With Unequal Number of Participants but Equal Number of Outcomes

I am trying to calculate the sample size needed for an experiment with two independent study groups with a dichotomous/binary outcome. The outcome consists of making a selection. Group 1 has three ...

TreatmentGroups

1

asked Oct 24, 2024 at 0:47

1 vote

0 answers

63 views

BCa Bootstrap Confidence Intervals for Clustered Data

I want to compute a confidence interval for the F1-score of a machine learning model for the classification of blood cells on test data. The data is clustered, as I have multiple cells for every ...

saveturn

21

asked Oct 11, 2024 at 8:09

2 votes

0 answers

79 views

Sample Size: Cluster Randomized Trials [closed]

I seek a peer check on my approach in calculating the sample size for pair-matched or stratified cluster randomized trials assuming 80% power. Some background on the study: I am working with ...

Tavaro Evanis

143

asked Jul 15, 2024 at 16:42

3 votes

1 answer

125 views

Lack of within-cluster variability

I am working on patients' data. I want to do multilevel logistic regression. The cluster is hospital, exposure variable is treatment (A, B, C), and independent variables include sex, age and others. I ...

W Ramadi

31

asked Jun 28, 2024 at 5:25

0 votes

0 answers

60 views

Interpreting differences between confidence intervals with and without adjustment for clustering. Should those from adjustment be wider?

I am trying to interpret an article involving data from a cluster randomised trial, where the confidence intervals for effect sizes are said to have been adjusted 'using the standard errors of the ...

Roger Gomm

1

asked Jun 22, 2024 at 13:29

2 votes

0 answers

72 views

Is this multicollinearity, and how can I specify my model better?

I'm analyzing data from the usual care period only of a stepped wedge cluster-randomized trial. The goal is to describe the usual care period as though it was a cohort study because much higher ...

telegraph

78

asked May 4, 2024 at 0:45

1 vote

0 answers

148 views

Stratified sampling across several variables individually

I am interested stratified sampling for the purposes of cluster validation. The purpose is to perform cluster analysis in a subset of the data and check to see if the precise distribution of variables ...

pvelayudhan

71

asked May 2, 2024 at 15:06

3 votes

1 answer

79 views

Cluster sample or stratified random sample?

I've recently come across this problem in my textbook: To gather information about the validity of a new standardized test for high school juniors across the United States, a random sample of 20 high ...

wyatt400

105

asked Apr 27, 2024 at 22:05

0 votes

0 answers

120 views

Sample size calculation for correlated eye data

What formula or software package can be used for sample size caculations for correlated eye data? An observational study is being conducted in which participants that have two normal eyes and ...

s.stats

485

asked Mar 21, 2024 at 18:20

0 votes

0 answers

32 views

Poisson Distribution from Samples of Varying Sizes [duplicate]

Can I apply a Poisson distribution if I have different sample sizes for each cluster? My experiment is about diversity in International Baccalaureate vs non-IB classes, and I used single-stage cluster ...

Ben

21

asked Dec 10, 2023 at 22:25

1 vote

1 answer

130 views

Clustered data and multiclass classification with GPBoost

Is it possible to do multiclass classification using GPBoost? For example when we have 3 or more classes (e.g. specie A/ specie B/ specie C) from a clustered data set (e.g. several measurements over ...

Oscar

11

asked Nov 20, 2023 at 12:29

0 votes

0 answers

96 views

I want to fit a linear regression model, but I know that if I treat each instance as unique there will be many pseudo replicates

I want to fit a linear regression model, but I know that if I treat each instance as unique there will be many pseudo replicates. I want to explore the relationship between 2 binary variables and one ...

taz

1

asked Oct 25, 2023 at 22:22

1 vote

0 answers

120 views

Sample size for desired margin of error with clusters of unequal size

I need to design a survey such that the margin of error for a binary, categorical variable is bounded at a certain level. The survey needs to be a one or two-stage cluster sample where clusters are of ...

Josh

2,477

asked Sep 27, 2023 at 12:26

2 votes

1 answer

69 views

Cluster sampling result in larger sample-to-sample variability

I'm reading STATA's Survey Data Reference Manual. There is written that: Cluster sampling typically results in larger sample-to-sample variability than sampling individuals directly. Do you have an ...

robertspierre

3,403

asked Aug 24, 2023 at 9:53

0 votes

0 answers

135 views

How to perform two-way fixed effect Difference in Difference test?

I am doing a research on the effect of Covid on fundraising success. I already implemented logisitc regression models with Covid as interaction variable but my supervisor wants me to use DiD. I asked ...

Zeinab Elashidy

19

asked Jul 18, 2023 at 15:11

1 vote

1 answer

242 views

Multilevel model for nested data with obs could be in multiple groups

I understand that you would consider multilevel or hierarchical linear mixed effects model with your data are nested with multiple level and be grouped. However, I assume that the observation will ...

drexel star

11

asked May 17, 2023 at 5:38

0 votes

0 answers

303 views

Pairs (Cluster) Bootstrap R

When using a regressor ("generated regressor") that is generated in a first-stage equation and used in a second-stage equation, then standard errors will be understated (here is a readable ...

minimouse

1

asked May 12, 2023 at 8:30

0 votes

0 answers

121 views

Alternatives to the Kruskal-Wallis test?

I have determined malaria prevalence in 8 villages with household-level clustering. As an initial test (basic descriptive statistics), what test (as the independence assumption of K-W is violated) is ...

Trypanosoma

171

asked Jan 30, 2023 at 15:33

1 vote

0 answers

166 views

What value of ICC to use when calculating effective sample size?

What value of ICC should I use when calculating the effective sample size of clustered data? A previous publication conducting a similar study reported ICCs of 0.04 for the full mixed model (random ...

Trypanosoma

171

asked Jan 30, 2023 at 13:41

1 vote

0 answers

351 views

Adjusted Chi-square test or standard Chi-square test for clustered data?

I am performing a risk factor analysis of the individual and household-level factors associated with Plasmodium infection in individuals located in 8 villages in Cambodia. I want to perform an initial ...

Trypanosoma

171

asked Jan 30, 2023 at 0:11

1 vote

0 answers

76 views

Clustered data and the Friedman test

Good morning, I am performing a risk factor analysis of the individual and household-level factors associated with Plasmodium infection in individuals located in 8 villages in Cambodia. I have ...

Trypanosoma

171

asked Jan 29, 2023 at 4:24

1 vote

1 answer

274 views

What should be the degree of freedom in ANOVA table

Consider the area under wheat for a sample of 44 clusters selected from 11 different villages. Four clusters were selected from each of the 11 villages and each cluster consists of 8 survey numbers(...

simran

408

asked Jan 10, 2023 at 4:58

0 votes

1 answer

110 views

Comparing median of groups within many clusters

I have the gender composition for thousands of boards (there is no sampling involved. The data set contains all boards). Boards are consisted of different number of male and female directors. So, to ...

Nima Darbari

1

asked Dec 6, 2022 at 14:45

1 vote

0 answers

123 views

What are appropriate ways to use matching with clustered data (with hierarchical structure)?

In my setting, I have post-intervention observational data in which individuals are nested into villages. The treatment consists of an information campaign that targeted villages in a non-random ...

edo

23

asked Nov 23, 2022 at 13:35

1 vote

0 answers

75 views

multiple questions about setting up survey weights with the application of a combination of different types of weights - ESS

Checking out this file published by the european social survey website https://www.europeansocialsurvey.org/docs/methodology/ESS_weighting_data_1.pdf it states that : • when analysing data for one ...

An116

367

asked Aug 30, 2022 at 11:05

1 vote

1 answer

207 views

dealing with zero inflation in the regression model

I have a longitudinal data, with different follow-up number for individuals. I have considered measurements for each individual as a curve which I already smoothed them, then calculated the area under ...

user358238

139

asked Aug 9, 2022 at 20:38

0 votes

0 answers

66 views

will it take the form of cluster? is regression analysis possible here?

i am working with some data regarding my PhD, where i am interested to see the ability of the institution of different villages of four districts in enhancing people's living condition. here, through ...

Mrinal saikia

11

asked Jul 12, 2022 at 5:22

0 votes

1 answer

63 views

How to justify anonymous clusters?

Is there a reasonable situation where the clusters are anonymous? What I mean is that one can ensure the subjects sampled are from the same cluster, but she does not know exactly which cluster they ...

Ypbor

181

asked Jun 16, 2022 at 4:27

6 votes

1 answer

119 views

What happens statistically, if you create more observations by measuring more aspects of the same observational unit

Let's say that I want to measure the effect of a treatment on the performance of a firm. However my sample is very small. Let's say 10 firms. It is not possible to observe more firms. All these firms ...

Tom

538

asked Apr 20, 2022 at 10:22

1 vote

1 answer

103 views

Logistic regression on correlated data (without clusters)

I would like to create logistic model using the OCA STAT Act data in R, however, since this data is a compilation of court appearances since 2020, some defendants may have appeared before court more ...

Elijah Appelson

13

asked Apr 11, 2022 at 14:25

1 vote

1 answer

85 views

Can Machine Learning Models Recover "Experimental, Design and Hierarchical Structures" Within the Data?

Can Machine Learning Models Recover "Experimental, Design and Hierarchical Structures" Within the Data? At times, real world data can contain "embedded structures" - these ...

stats_noob

1

asked Nov 24, 2021 at 7:19

0 votes

1 answer

2k views

Fixed effects in a cross-sectional data

I'm working with research that has cross-sectional data. I have collected information about publicly-listed banks in many countries. For example, for each bank I collected the following information: ...

Asas24

1

asked Nov 11, 2021 at 19:25

1 vote

1 answer

442 views

GEE vs Hierarchical linear regression

I am trying to choose between GEE and hierarchical linear regression for analysis of experimental vignette (2x2 factorial (0/1) design) data. Each respondent (N=160) filled in 2 vignettes, thus the ...

GabrieleC

11

asked Nov 5, 2021 at 9:22

2 votes

2 answers

462 views

Data with Hierarchical Structure and Multicollinearity (E.g. ZIP Postal Codes)

I always had the following question: Can data having "naturally occurring hierarchical structure" be transformed to better make use of this hierarchical structure at different levels? To ...

stats_noob

1

asked Oct 24, 2021 at 17:49

0 votes

0 answers

572 views

Clustering standard errors for difference in difference

I am running a difference in difference to examine the effect of a merger on petrol prices. I am looking to see whether the prices of company A have increased due to a merger with company B. Local ...

Consequence

1

asked Jul 11, 2021 at 5:37

1 vote

2 answers

811 views

Is it always better to analyze at the most granular level possible? What is the best unit of analysis in the context of hierarchically clustered data?

Let's say that I have experimental data where the level of treatment is at a higher level of aggregation than the level of observation. For example, imagine some subset of schools adopted a new ...

StatStudent19

371

asked Jul 5, 2021 at 23:20

3 votes

1 answer

400 views

Asking about clustering condition following Abadie, Wooldridge 2017

Abadie, 2017 have a paper about when we should cluster. And this paper has been summarized by McKenzie here. I used the paper of Dasgupta,2019 to link to the summarized work of McKenzie. So, in ...

Phil Nguyen

629

asked Jun 23, 2021 at 21:32

1 vote

0 answers

48 views

What do "random sample" and "particular population" mean in clustering?

Yesterday, from a suggestion of @Dimitriy V. Masterov here, I saw from the given link about one of the reason we can avoid clustering is You want to say something about the association between ...

Phil Nguyen

629

asked Jun 23, 2021 at 20:48

0 votes

0 answers

68 views

Cluster regression

I have a dataset and divided the sample into 6 groups based on 4 binary criteria (e.g. "1" for has a Chief Digital Officer and "0" otherwise). Now I want to conduct a regression of ...

kra

1

asked May 25, 2021 at 17:18

1 vote

0 answers

188 views

How to cluster a finite number of random variables based on their distributions?

Suppose that we have $n$ mutually independent (not i.i.d.) random variables $X_1,\dots,X_n$. We assume that these random variables can be divided into $k$ distinct groups ($k<n$), where in each ...

Amir

31

asked May 3, 2021 at 19:59

0 votes

2 answers

97 views

Why do you only need to identify the first cluster level in svydesign(), even if you have multi level clustering?

Going through the a course on Survey Weights and it says that even though a dataset may sample using 3 clusters (like Counties, City Blocks, and households), you only need to specify the first level ...

Kevin

1

asked Apr 17, 2021 at 18:43

1 vote

0 answers

43 views

Statistical analysis of clustered data- but clusters in one comparison group only

I'm curious if it's still valid to use analysis techniques for clustered data when only one of the two comparison groups consists of multiple clusters. For example, the control group consists of 10 ...

Janice

11

asked Feb 5, 2021 at 19:54

0 votes

0 answers

279 views

How can i calculate density of every centroid in python

i have kmeans clustered data, and cluster centroids of the kmeans. I want to calculate density of each cluster centroid and remove the cluster of the highest cluster centroid density. I did my ...

Serkan Gün

101

asked Dec 27, 2020 at 1:04

2 votes

2 answers

690 views

Determine sample size for each cluster

Suppose I have a large population and I want to test if installing a new special light bulb can reduce energy consumption. Since I have a large population, I'll have people that usually consume high ...

Numbermind

217

asked Dec 2, 2020 at 14:59

1 vote

1 answer

649 views

If fixed effects and robust standard errors both necessary, do they have to be at the same level, and why?

I am working on an empirical paper using repeated cross-sectional data, and a reviewer has asked that we cluster our standard errors at the same level as our geographic fixed effects. Given the ...

Letti234

13

asked Sep 21, 2020 at 14:34

1 vote

1 answer

1k views

How to use lstm for clustered data?

I have a timeseries dataset of users with different profiles. I want to use lstm for predicting 1 day ahead of each user. My approach to the problem is first clustering users of same behaviour. And ...

D.small

11

asked Sep 2, 2020 at 7:22

3 votes

1 answer

2k views

Single Observation with Some Groups. Multilevel model or other analysis?

I am having trouble determining which method to use to analyze my data. Here is the info: -575 observations nested within 292 groups -some groups only have one observation, the max number is 23 in a ...

Brad

31

asked Aug 11, 2020 at 19:32

9 votes

5 answers

1k views

Are the differences between sampling clusters and sampling strata, conceptual, methodological, neither or both?

I am fuzzy on the distinctions between sampling strata and sampling clusters. Both seem to aim at designs aiming at creating useful estimates of between/within group (strata, cluster) variation, and ...

Alexis

31.5k

asked Jul 13, 2020 at 19:50

Questions tagged [cluster-sample]