Questions tagged [cluster-sample]
Cluster sampling is a sampling design in which the observation units have to be grouped together for logistical reasons (e.g., students clustered in schools or households clustered in a geographic area). Typically, cluster samples are multistage samples, so geographic areas are selected in the first stage and households in the subsequent stage.
148 questions
3
votes
1
answer
131
views
Which ICC (conditional or unconditional) to use for calculating Design Effect and effect sizes?
Formulas for the "design effect" of cluster sampling are often of the form $\text{DE} = 1 + (\text{Avg Cluster Size} - 1)\cdot\text{ICC}.$ I see the result of this calculation (or another ...
0
votes
0
answers
53
views
Cluster number and size in modelling a categorical variable by GEE
I had to learn about statistical models to approach a genetics project that I inherited: we obtained genotypes for hundreds of biallelic SNPs (possible values for each SNP: 0 = non-carrier, 1 = ...
0
votes
0
answers
50
views
Sample Size Calculation for Experiment Two Independent Groups with Binary Outcome With Unequal Number of Participants but Equal Number of Outcomes
I am trying to calculate the sample size needed for an experiment with two independent study groups with a dichotomous/binary outcome. The outcome consists of making a selection. Group 1 has three ...
1
vote
0
answers
63
views
BCa Bootstrap Confidence Intervals for Clustered Data
I want to compute a confidence interval for the F1-score of a machine learning model for the classification of blood cells on test data. The data is clustered, as I have multiple cells for every ...
2
votes
0
answers
79
views
Sample Size: Cluster Randomized Trials [closed]
I seek a peer check on my approach in calculating the sample size for pair-matched or stratified cluster randomized trials assuming 80% power.
Some background on the study: I am working with ...
3
votes
1
answer
125
views
Lack of within-cluster variability
I am working on patients' data. I want to do multilevel logistic regression. The cluster is hospital, exposure variable is treatment (A, B, C), and independent variables include sex, age and others. I ...
0
votes
0
answers
60
views
Interpreting differences between confidence intervals with and without adjustment for clustering. Should those from adjustment be wider?
I am trying to interpret an article involving data from a cluster randomised trial, where the confidence intervals for effect sizes are said to have been adjusted 'using the standard errors of the ...
2
votes
0
answers
72
views
Is this multicollinearity, and how can I specify my model better?
I'm analyzing data from the usual care period only of a stepped wedge cluster-randomized trial. The goal is to describe the usual care period as though it was a cohort study because much higher ...
1
vote
0
answers
148
views
Stratified sampling across several variables individually
I am interested stratified sampling for the purposes of cluster validation. The purpose is to perform cluster analysis in a subset of the data and check to see if the precise distribution of variables ...
3
votes
1
answer
79
views
Cluster sample or stratified random sample?
I've recently come across this problem in my textbook:
To gather information about the validity of a new standardized test
for high school juniors across the United States, a random sample of
20 high ...
0
votes
0
answers
120
views
Sample size calculation for correlated eye data
What formula or software package can be used for sample size caculations for correlated eye data? An observational study is being conducted in which participants that have two normal eyes and ...
0
votes
0
answers
32
views
Poisson Distribution from Samples of Varying Sizes [duplicate]
Can I apply a Poisson distribution if I have different sample sizes for each cluster? My experiment is about diversity in International Baccalaureate vs non-IB classes, and I used single-stage cluster ...
1
vote
1
answer
130
views
Clustered data and multiclass classification with GPBoost
Is it possible to do multiclass classification using GPBoost?
For example when we have 3 or more classes (e.g. specie A/ specie B/ specie C) from a clustered data set (e.g. several measurements over ...
0
votes
0
answers
96
views
I want to fit a linear regression model, but I know that if I treat each instance as unique there will be many pseudo replicates
I want to fit a linear regression model, but I know that if I treat each instance as unique there will be many pseudo replicates.
I want to explore the relationship between 2 binary variables and one ...
1
vote
0
answers
120
views
Sample size for desired margin of error with clusters of unequal size
I need to design a survey such that the margin of error for a binary, categorical variable is bounded at a certain level. The survey needs to be a one or two-stage cluster sample where clusters are of ...
2
votes
1
answer
69
views
Cluster sampling result in larger sample-to-sample variability
I'm reading STATA's Survey Data Reference Manual.
There is written that:
Cluster sampling typically results in larger sample-to-sample variability than sampling individuals directly.
Do you have an ...
0
votes
0
answers
135
views
How to perform two-way fixed effect Difference in Difference test?
I am doing a research on the effect of Covid on fundraising success. I already implemented logisitc regression models with Covid as interaction variable but my supervisor wants me to use DiD.
I asked ...
1
vote
1
answer
242
views
Multilevel model for nested data with obs could be in multiple groups
I understand that you would consider multilevel or hierarchical linear mixed effects model with your data are nested with multiple level and be grouped. However, I assume that the observation will ...
0
votes
0
answers
303
views
Pairs (Cluster) Bootstrap R
When using a regressor ("generated regressor") that is generated in a first-stage equation and used in a second-stage equation, then standard errors will be understated (here is a readable ...
0
votes
0
answers
121
views
Alternatives to the Kruskal-Wallis test?
I have determined malaria prevalence in 8 villages with household-level clustering.
As an initial test (basic descriptive statistics), what test (as the independence assumption of K-W is violated) is ...
1
vote
0
answers
166
views
What value of ICC to use when calculating effective sample size?
What value of ICC should I use when calculating the effective sample size of clustered data?
A previous publication conducting a similar study reported ICCs of 0.04 for the full mixed model (random ...
1
vote
0
answers
351
views
Adjusted Chi-square test or standard Chi-square test for clustered data?
I am performing a risk factor analysis of the individual and household-level factors associated with Plasmodium infection in individuals located in 8 villages in Cambodia.
I want to perform an initial ...
1
vote
0
answers
76
views
Clustered data and the Friedman test
Good morning, I am performing a risk factor analysis of the individual and household-level factors associated with Plasmodium infection in individuals located in 8 villages in Cambodia.
I have ...
1
vote
1
answer
274
views
What should be the degree of freedom in ANOVA table
Consider the area under wheat for a sample of 44 clusters
selected from 11 different villages. Four clusters were selected from each of the
11 villages and each cluster consists of 8 survey numbers(...
0
votes
1
answer
110
views
Comparing median of groups within many clusters
I have the gender composition for thousands of boards (there is no sampling involved. The data set contains all boards). Boards are consisted of different number of male and female directors. So, to ...
1
vote
0
answers
123
views
What are appropriate ways to use matching with clustered data (with hierarchical structure)?
In my setting, I have post-intervention observational data in which individuals are nested into villages. The treatment consists of an information campaign that targeted villages in a non-random ...
1
vote
0
answers
75
views
multiple questions about setting up survey weights with the application of a combination of different types of weights - ESS
Checking out this file published by the european social survey website
https://www.europeansocialsurvey.org/docs/methodology/ESS_weighting_data_1.pdf
it states that :
• when analysing data for one ...
1
vote
1
answer
207
views
dealing with zero inflation in the regression model
I have a longitudinal data, with different follow-up number for individuals. I have considered measurements for each individual as a curve which I already smoothed them, then calculated the area under ...
0
votes
0
answers
66
views
will it take the form of cluster? is regression analysis possible here?
i am working with some data regarding my PhD, where i am interested to see the ability of the institution of different villages of four districts in enhancing people's living condition.
here, through ...
0
votes
1
answer
63
views
How to justify anonymous clusters?
Is there a reasonable situation where the clusters are anonymous?
What I mean is that one can ensure the subjects sampled are from the same cluster, but she does not know exactly which cluster they ...
6
votes
1
answer
119
views
What happens statistically, if you create more observations by measuring more aspects of the same observational unit
Let's say that I want to measure the effect of a treatment on the performance of a firm. However my sample is very small. Let's say 10 firms. It is not possible to observe more firms. All these firms ...
1
vote
1
answer
103
views
Logistic regression on correlated data (without clusters)
I would like to create logistic model using the OCA STAT Act data in R, however, since this data is a compilation of court appearances since 2020, some defendants may have appeared before court more ...
1
vote
1
answer
85
views
Can Machine Learning Models Recover "Experimental, Design and Hierarchical Structures" Within the Data?
Can Machine Learning Models Recover "Experimental, Design and Hierarchical Structures" Within the Data?
At times, real world data can contain "embedded structures" - these ...
0
votes
1
answer
2k
views
Fixed effects in a cross-sectional data
I'm working with research that has cross-sectional data. I have collected information about publicly-listed banks in many countries. For example, for each bank I collected the following information:
...
1
vote
1
answer
442
views
GEE vs Hierarchical linear regression
I am trying to choose between GEE and hierarchical linear regression for analysis of experimental vignette (2x2 factorial (0/1) design) data. Each respondent (N=160) filled in 2 vignettes, thus the ...
2
votes
2
answers
462
views
Data with Hierarchical Structure and Multicollinearity (E.g. ZIP Postal Codes)
I always had the following question: Can data having "naturally occurring hierarchical structure" be transformed to better make use of this hierarchical structure at different levels?
To ...
0
votes
0
answers
572
views
Clustering standard errors for difference in difference
I am running a difference in difference to examine the effect of a merger on petrol prices. I am looking to see whether the prices of company A have increased due to a merger with company B.
Local ...
1
vote
2
answers
811
views
Is it always better to analyze at the most granular level possible? What is the best unit of analysis in the context of hierarchically clustered data?
Let's say that I have experimental data where the level of treatment is at a higher level of aggregation than the level of observation. For example, imagine some subset of schools adopted a new ...
3
votes
1
answer
400
views
Asking about clustering condition following Abadie, Wooldridge 2017
Abadie, 2017 have a paper about when we should cluster. And this paper has been summarized by McKenzie here.
I used the paper of Dasgupta,2019 to link to the summarized work of McKenzie. So, in ...
1
vote
0
answers
48
views
What do "random sample" and "particular population" mean in clustering?
Yesterday, from a suggestion of @Dimitriy V. Masterov here, I saw from the given link about one of the reason we can avoid clustering is
You want to say something about the association between ...
0
votes
0
answers
68
views
Cluster regression
I have a dataset and divided the sample into 6 groups based on 4 binary criteria (e.g. "1" for has a Chief Digital Officer and "0" otherwise).
Now I want to conduct a regression of ...
1
vote
0
answers
188
views
How to cluster a finite number of random variables based on their distributions?
Suppose that we have $n$ mutually independent (not i.i.d.) random variables $X_1,\dots,X_n$. We assume that these random variables can be divided into $k$ distinct groups ($k<n$), where in each ...
0
votes
2
answers
97
views
Why do you only need to identify the first cluster level in svydesign(), even if you have multi level clustering?
Going through the a course on Survey Weights and it says that even though a dataset may sample using 3 clusters (like Counties, City Blocks, and households), you only need to specify the first level ...
1
vote
0
answers
43
views
Statistical analysis of clustered data- but clusters in one comparison group only
I'm curious if it's still valid to use analysis techniques for clustered data when only one of the two comparison groups consists of multiple clusters.
For example, the control group consists of 10 ...
0
votes
0
answers
279
views
How can i calculate density of every centroid in python
i have kmeans clustered data, and cluster centroids of the kmeans. I want to calculate density of each cluster centroid and remove the cluster of the highest cluster centroid density. I did my ...
2
votes
2
answers
690
views
Determine sample size for each cluster
Suppose I have a large population and I want to test if installing a new special light bulb can reduce energy consumption. Since I have a large population, I'll have people that usually consume high ...
1
vote
1
answer
649
views
If fixed effects and robust standard errors both necessary, do they have to be at the same level, and why?
I am working on an empirical paper using repeated cross-sectional data, and a reviewer has asked that we cluster our standard errors at the same level as our geographic fixed effects. Given the ...
1
vote
1
answer
1k
views
How to use lstm for clustered data?
I have a timeseries dataset of users with different profiles. I want to use lstm for predicting 1 day ahead of each user. My approach to the problem is first clustering users of same behaviour. And ...
3
votes
1
answer
2k
views
Single Observation with Some Groups. Multilevel model or other analysis?
I am having trouble determining which method to use to analyze my data. Here is the info:
-575 observations nested within 292 groups
-some groups only have one observation, the max number is 23 in a ...
9
votes
5
answers
1k
views
Are the differences between sampling clusters and sampling strata, conceptual, methodological, neither or both?
I am fuzzy on the distinctions between sampling strata and sampling clusters. Both seem to aim at designs aiming at creating useful estimates of between/within group (strata, cluster) variation, and ...