This question is related to an earlier post.
I have been trying to create a binary variable with ICC in a multilevel context.
In R there is the ICCbin package which offers the rcbin()function which can do that.
I have tried to understand the function and simplified it somewhat so that it retains it function. However, I'm having problems understanding what exactly is going on.
Here is my simplified function:
# Simplified function to generate clustered binary data
simplified_icc_function <- function (prop = 0.5, n_cluster, n_students, rho)
{
cluster <- rep(1:n_cluster, each = n_students) # Create clusters
ri <- sqrt(rho)
zi <- rbinom(n_cluster, size = 1, prob = prop)
yij <- rbinom(n_cluster * n_students, size = 1, prob = prop)
uij <- rbinom(n_cluster * n_students, size = 1, prob = ri)
x <- (1 - uij) * yij + uij * rep(zi, each = n_students) # Final outcome
cbcdata <- data.frame(cid = as.factor(cluster), y = x)
return(cbcdata)
}
I have deleted the prvar argument, which controls how much the event probability can vary from cluster to cluster around prop. I also deleted the csvarargument denoting percent of variation in cluster sizes (csize).
Here is the original function from the package:
# Function to generate clustered binary data with specified parameters (taken from ICCbin package)
original_icc_function <- function (prop = 0.5, prvar = 0, noc, csize, csvar = 0, rho)
{
cluster <- c()
x <- c()
for (i in 1:noc) {
min_csize <- ifelse((csize - round(csize * csvar)) >=
2, csize - round(csize * csvar), 2)
csizen <- abs(round(csize + (csize * csvar) * rnorm(1)))
while (csizen < min_csize) {
csizen <- abs(round(csize + (csize * csvar) * rnorm(1)))
}
min_prop <- ifelse((prop - prop * prvar) >= 0, prop -
prop * prvar, 0)
max_prop <- ifelse((prop + prop * prvar) <= 1, prop +
prop * prvar, 1)
propn <- abs(prop + (prop * prvar) * rnorm(1))
while (propn < min_prop | propn > max_prop) {
propn <- abs(prop + (prop * prvar) * rnorm(1))
}
ri <- sqrt(rho)
zi <- rbinom(n = 1, size = 1, prob = propn)
for (j in 1:csizen) {
yij <- rbinom(n = 1, size = 1, prob = propn)
uij <- rbinom(n = 1, size = 1, prob = ri)
xij <- (1 - uij) * yij + uij * zi
cluster <- c(cluster, i)
x <- c(x, xij)
}
}
cbcdata <- data.frame(cid = as.factor(cluster), y = x)
return(cbcdata)
}
Here I apply my function and use the iccbin function to check the ICC.
install.packages("ICCbin")
library(ICCbin)
data <- simplified_icc_function(prop = 0.5, n_cluster = 50, n_students = 20, rho = 0.2)
icc_result <- iccbin(cid = data$cid, y = data$y)
print(icc_result)
I can follow the code and see what's happening and kind of make out that I'm combining a between group variance with the individual variance component. However, I'm having problem understanding why this is working and if someone maybe can point me to some literature.