Clustered data

The function genCluster generates multilevel or clustered data based on a previously generated data set that is one “level” up from the clustered data. For example, if there is a data set that contains school level (considered here to be level 2), classrooms (level 1) can be generated. And then, students (now level 1) can be generated within classrooms (now level 2)

In the example here, we do in fact generate school, class, and student level data. There are eight schools, four of which are randomized to receive an intervention. The number of classes per school varies, as does the number of students per class. (It is straightforward to generate fully balanced data by using constant values.) The outcome of interest is a test score, which is influenced by gender and the intervention. In addition, test scores vary by schools, and by classrooms, so the simulation provides random effects at each of these levels.

We start by defining the school level data:

gen.school <- defData(varname = "s0", dist = "normal", formula = 0, variance = 3, 
    id = "idSchool")
gen.school <- defData(gen.school, varname = "nClasses", dist = "noZeroPoisson", 
    formula = 3)

dtSchool <- genData(8, gen.school)
dtSchool <- trtAssign(dtSchool, n = 2)

dtSchool

The classroom level data are generated with a call to genCluster, and then school level data is added by a call to addColumns:

gen.class <- defDataAdd(varname = "c0", dist = "normal", formula = 0, variance = 2)
gen.class <- defDataAdd(gen.class, varname = "nStudents", dist = "noZeroPoisson", 
    formula = 20)

dtClass <- genCluster(dtSchool, "idSchool", numIndsVar = "nClasses", level1ID = "idClass")
dtClass <- addColumns(gen.class, dtClass)

head(dtClass, 10)
##     idSchool trtGrp        s0 nClasses idClass         c0 nStudents
##  1:        1      1  3.451188        2       1  1.2163467        15
##  2:        1      1  3.451188        2       2  0.2017886        32
##  3:        2      0  2.439826        5       3 -0.5598482        21
##  4:        2      0  2.439826        5       4  2.4921737        22
##  5:        2      0  2.439826        5       5 -0.5496087        16
##  6:        2      0  2.439826        5       6  0.5155666        15
##  7:        2      0  2.439826        5       7 -0.3371761        17
##  8:        3      1 -1.621878        4       8 -2.1386839        18
##  9:        3      1 -1.621878        4       9  0.6723469        25
## 10:        3      1 -1.621878        4      10 -2.1875160        25

Finally, the student level data are added using the same process:

gen.student <- defDataAdd(varname = "Male", dist = "binary", 
    formula = 0.5)
gen.student <- defDataAdd(gen.student, varname = "age", dist = "uniform", 
    formula = "9.5; 10.5")
gen.student <- defDataAdd(gen.student, varname = "test", dist = "normal", 
    formula = "50 - 5*Male + s0 + c0 + 8 * trtGrp", variance = 2)
dtStudent <- genCluster(dtClass, cLevelVar = "idClass", numIndsVar = "nStudents", 
    level1ID = "idChild")

dtStudent <- addColumns(gen.student, dtStudent)

This is what the clustered data look like. Each classroom is represented by a box, and each school is represented by a color. The intervention group is highlighted by dark outlines: