1
$\begingroup$

This question is related to an earlier post.

I have been trying to create a binary variable with ICC in a multilevel context. In R there is the ICCbin package which offers the rcbin()function which can do that.

I have tried to understand the function and simplified it somewhat so that it retains it function. However, I'm having problems understanding what exactly is going on.

Here is my simplified function:


# Simplified function to generate clustered binary data
simplified_icc_function <- function (prop = 0.5, n_cluster, n_students, rho) 
{
  cluster <- rep(1:n_cluster, each = n_students)  # Create clusters
  
  ri <- sqrt(rho) 
  zi <- rbinom(n_cluster, size = 1, prob = prop) 
  yij <- rbinom(n_cluster * n_students, size = 1, prob = prop)  
  uij <- rbinom(n_cluster * n_students, size = 1, prob = ri)  
  
  x <- (1 - uij) * yij + uij * rep(zi, each = n_students)  # Final outcome
  
  cbcdata <- data.frame(cid = as.factor(cluster), y = x)
  return(cbcdata)
}

I have deleted the prvar argument, which controls how much the event probability can vary from cluster to cluster around prop. I also deleted the csvarargument denoting percent of variation in cluster sizes (csize).

Here is the original function from the package:

# Function to generate clustered binary data with specified parameters (taken from ICCbin package)
original_icc_function <- function (prop = 0.5, prvar = 0, noc, csize, csvar = 0, rho) 
{
  cluster <- c()
  x <- c()
  for (i in 1:noc) {
    min_csize <- ifelse((csize - round(csize * csvar)) >= 
                          2, csize - round(csize * csvar), 2)
    csizen <- abs(round(csize + (csize * csvar) * rnorm(1)))
    while (csizen < min_csize) {
      csizen <- abs(round(csize + (csize * csvar) * rnorm(1)))
    }
    min_prop <- ifelse((prop - prop * prvar) >= 0, prop - 
                         prop * prvar, 0)
    max_prop <- ifelse((prop + prop * prvar) <= 1, prop + 
                         prop * prvar, 1)
    propn <- abs(prop + (prop * prvar) * rnorm(1))
    while (propn < min_prop | propn > max_prop) {
      propn <- abs(prop + (prop * prvar) * rnorm(1))
    }
    ri <- sqrt(rho)
    zi <- rbinom(n = 1, size = 1, prob = propn)
    for (j in 1:csizen) {
      yij <- rbinom(n = 1, size = 1, prob = propn)
      uij <- rbinom(n = 1, size = 1, prob = ri)
      xij <- (1 - uij) * yij + uij * zi
      cluster <- c(cluster, i)
      x <- c(x, xij)
    }
  }
  cbcdata <- data.frame(cid = as.factor(cluster), y = x)
  return(cbcdata)
}

Here I apply my function and use the iccbin function to check the ICC.

install.packages("ICCbin")  
library(ICCbin)

data <- simplified_icc_function(prop = 0.5, n_cluster = 50, n_students = 20, rho = 0.2)


icc_result <- iccbin(cid = data$cid, y = data$y)
print(icc_result)


I can follow the code and see what's happening and kind of make out that I'm combining a between group variance with the individual variance component. However, I'm having problem understanding why this is working and if someone maybe can point me to some literature.

$\endgroup$
1
  • $\begingroup$ I think the OP is asking why the simplified function generates binary data with given correlation, or: what is the math behind the function.That is a good question for CV, which I would also be interested in. So @Linus, could you reformulate your question in the "mathematical" direction? Thanks in advance. $\endgroup$ Commented Nov 12 at 11:31

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.