Example 4: Construct validation of two ability tests

Accuracy of the correlation in a two-dimensional 2PL model

Author

Ulrich Schroeders & Timo Gnambs

Published

December 26, 2024


Objective

The example describes the examination of the relationship between two achievement tests: A 20-item math test and a 20-item reading test in secondary school. Both tests are scored dichotomously (true/false) with a correlation of \(\rho = .70\) between the skills. The parameter of interest is the standard error of the estimated correlation between math and reading. The test design features to be varied are the sample size and if there is a speededness effect present. In this example, we will use a speededness effect that will induce some missingness in the last third of the items.

I. Determining the data generation for the complete dataset


  1. Number and distribution of factors (unidimensional vs. multidimensional)
  2. Number of items and item parameters (discriminations, difficulties)
  3. Item type (dichotomous, polytomous)

True item parameters are generated for the 30 dichotomous items on the two dimensions (math and reading).

# Clear work space and load necessary packages
rm(list = ls())
library(tidyverse)   # General data handling and manipulation
library(mirt)        # Simulate data from a multivariate normal distribution
library(mice)        # Induce missingness at random to the data

# Set the seed for random number generation to ensure reproducibility
set.seed(2024)

# Number of items
n_items <- 30

# Discrimination parameters for both dimensionas (a_i1, a_i2) are randomly 
  # drawn from a log-normal as it reflects the natural variation and 
  # positive skew typically observed in empirical data.
a_i1 <- c(rlnorm(n_items/2,.2,.3), rep(NA, n_items/2))
a_i2 <- c(rep(NA, n_items/2), rlnorm(n_items/2,.2,.3))
a_i <- cbind(a_i1, a_i2) %>% as.matrix()

# Difficulty parameters (b_i) are randomly drawn from a normal distribution
b_i <- rnorm(n_items, 0, 1)

The conditional probability that a person \(p\) with ability parameters on both dimensions, \(\theta_{p1}\) and \(\theta_{p2}\), answers the item \(i\) correctly is given by the following formula:

\[ P(X_{pi} = 1 | a_{i1}, a_{i2}, b_{i}, \theta_{p1}, \theta_{p2}) = \frac{1}{1 + \exp (-(a_{i1} \theta_{p1} + a_{i2} \theta_{p2} - b_i))} \]

where \(a_{i1}\) and \(a_{i2}\) denote the discrimination parameters for the two dimensions, and \(b_i\) the difficulty parameter of item \(i\).

The function generate_data_mv uses mirt::simdata to simulate dichotomous data given the item discriminations (\(a\)), the item difficulties (\(b\)), and the number of observations (\(n\)). The responses are generated according to a multivariate normal distribution with a predefined covariance (or correlation) matrix (\(\Sigma\)) and a mean vector (\(\mu\)). Note that in the mirt package, the transformation \(d = -a*b\) is used to match the slope-intercept parameterization of the logistic model.

# Simulate dichotomous item responses based on the following parameters
  # - 'a' denotes the item discriminations
  # - 'b' denotes and the item difficulties
  # - 'n' denotes the number of observations
# The remaining arguments are preset
  # - 'sigma' denotes the covariance matrix 
  # - 'mu' denotes the mean vector
  # - 'itemtype' specifies the type of item (i.e., dichotomous)

generate_data_mv <- function(a, b, n) {
  resp <- 
    simdata(a = a, d = -a*b, N = n, 
            mu = c(0, 0), 
            sigma = matrix(c(1, 0.7, 0.7, 1), nrow = 2),
            itemtype = "dich") %>% 
    as.data.frame()
  
  return(resp)
}

II. Defining the test design and the process of missing values


  1. Pattern of missingness (e.g., type of missingness, linking design)
  2. Amount of missing data

Each achievement test consists of 15 items. There is a speededness effect for the last 5 items for each of the separately timed tests. Thus, later items in both tests are less likely to be answered (response rate item 11: ~82%, items 12-13: ~73%, and items 14-15: ~46%, which results from the response pattern and the missing rate). In the present case, the probability of missing data depends on the observed data only, which is equivalent to Missing At Random (MAR).

The function data_MAR_design uses the complete simulated data and deletes values using mice::ampute. For more information on the multivariate amputation procedure see Schouten et al. (2016) and the accompanying vignette.

# Induce missingness at random (MAR) to the complete simulated data set
  # - 'resp' denotes the complete data set
  # - 'prop' denotes the severity of missingness 
  #   (has to be seen in combination with missingness pattern)
# The remaining arguments are preset
  # - 'patterns' denotes a matrix defining a specific missing data patterns
  # - 'mech' denotes the mechanism of missing data
  # - 'type' denotes the type of missingness, 'RIGHT' means that later 
  #    items more likely to be missing

data_MAR_design <- function(resp, miss_rates = 0) {
  
  if(miss_rates != 0) {
    # create some pattern of missingness for the first test (only the last 5 items)
    # adp <- ampute.default.patterns(5)
    adp <- cbind(c(0,1,1,1,1),
                 c(1,0,1,1,1),
                 c(1,1,0,1,1),
                 c(1,1,1,0,1),
                 c(1,1,1,1,0))
    pattern <- rbind(
    cbind(matrix(1, nrow =5, ncol = 10), adp),
        c(1,1,1,1,1,1,1,1,1,1,1,1,1,0,0),
        c(1,1,1,1,1,1,1,1,1,1,1,1,0,0,0),
        c(1,1,1,1,1,1,1,1,1,1,1,0,1,0,0),
        c(1,1,1,1,1,1,1,1,1,1,0,1,1,0,0),
        c(1,1,1,1,1,1,1,1,1,1,1,0,0,0,0))
  
    # Resample rows of patterns of missingness for the second test
    pattern_res1 <- pattern[sample(10, 10, replace = FALSE), ]
    pattern_res2 <- pattern[sample(10, 10, replace = FALSE), ]

    # Append the resampled pattern
    pattern_miss <- unique(rbind(cbind(pattern, pattern_res1),
                                 cbind(pattern, pattern_res2)))

    # Apply ampute to create missing data under MAR
    dat_amp <- ampute(data = resp,
                      prop = miss_rates,          # severity of missingness
                      patterns = pattern_miss,    # custom patterns of missingness
                      mech = "MAR",               # missing mechanism
                      type = "RIGHT")             # missingness type
    resp <- dat_amp$amp
    # md.pattern(resp)
  }

  return(resp)
}

III. Selecting the IRT model and the parameter of interest


  1. Underlying IRT model (e.g., 1PL, 2PL)
  2. IRT modeling software and estimation method
  3. Parameters to extract

The estimate_irt function estimates the multidimensional two-parametric item response model (Birnbaum model) with mirt using the simulated data and returns the standard error of the correlation between the latent traits.

# Estimate item response model
  # - 'resp' denotes the data set

estimate_irt <- function(resp) {
  model_2dim <- paste('math = 1-', n_items/2, "\n",
                      'read = ', n_items/2+1, '-', n_items, "\n",
                      'COV = math * read ')

# Fit the two-dimensional model using mirt
  mod <- tryCatch(
    mirt(data = resp,              # Item responses
         model = model_2dim,
         itemtype = '2PL',         # 2PL model
         SE = TRUE),
    error = function(e) NULL
  )
  

  # Calculate standard errors for the correlation if model is successfully estimated
  if (!is.null(mod)) {
    est <- coef(mod, printSE = TRUE)$GroupPars["SE", "COV_21"]    # Extract the standard error of the correlation
  } else {
    est <- NA
  }
  
  return(est)
}

IV. Setting up the Monte Carlo simulation


  1. Number of iterations
  2. Sample sizes to evaluate

The Monte Carlo simulation runs n_iterations times, including the previous steps of (i) determining the data generation for the complete dataset, (ii) defining the test design and the process of missing values, (iii) selecting the IRT model and the parameter of interest. The simulation is run for different sample sizes between 100 and 500 (in increments of 50), with and without the speededness effect.

Using an estimated standard deviation for the standard error of the correlation (\(\sigma = 0.013\)) derived from 500 iterations, a specified level of accuracy (\(\delta = .001\)), and a significance level (\(\alpha = .05\)), the number of iterations needed amounts to approximately 649.

# Number of iterations
n_iterations <- 649

# Sample sizes (from 100 to 500, in steps of 50) and 
  # the presence of a speededness factor (yes/no)
grid <- expand.grid(n_persons = seq(100, 500, 50), 
                    miss_rates = c(.90, 0))

# Create data frame for results (res)
res <- data.frame()

# Check if result file already exists
if (file.exists("example_4_res.rds")) {
  res <- readRDS("example_4_res.rds") 
} else {
  # Loop over different sample sizes and missing rate combinations
  for (g in 1:nrow(grid)) {
  
    # Nested loop, running 'n_iterations' times
    for (iter in 1:n_iterations) {
      # Generate a random sample with 'n_persons' from a multivariate distribution
      dat <- 
        generate_data_mv(a_i, b_i, grid[g, "n_persons"]) %>% 
        data_MAR_design(miss_rates = grid[g, "miss_rates"])
    
      res <- bind_rows(res, 
                       data.frame(iteration = iter,
                                  n_persons = grid[g, "n_persons"],
                                  percent_miss = grid[g, "miss_rates"],
                                  se_est = estimate_irt(dat)))
    }
  }
  
  # Save the results
  saveRDS(res, file = "example_4_res.rds")
}

Results

The figure shows the average standard error for the correlation between the latent traits as a function of sample size on the one hand, and the presence or absence of a speededness effect on the other.

# Preparation and aggregation of results
res_plot <- res %>%
  group_by(n_persons, percent_miss) %>%
  summarise(
    m_SE = mean(se_est^2, na.rm = TRUE)^.5,          # Calculate the average SE
    .groups = 'drop')


# Plot standard error of correlation
ggplot(data=res_plot, aes(x=n_persons, y = m_SE, color = factor(percent_miss))) +
  geom_line(linewidth = 0.8, alpha = 0.8) +
  geom_point(size = 1.5, alpha = 0.8) +
  geom_abline(intercept = .05, slope = 0, col="red", lty = "twodash") +
  labs(
    x = "Sample size",
    y = "Average SE(Correlation)",
    color = "Speededness\neffect",
    linetype = "Speededness\neffect"
  ) +
  scale_color_manual(values = c("0.9" = "aquamarine3", 
                                "0" = "darkolivegreen2"),
                     labels = c("0.9" = "yes",
                                "0" = "no")) +
  
  ylim(0, 0.12) +
  xlim(100, 500) +
  theme_bw()  +
  theme(
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12),
    legend.title = element_text(size = 14),
    legend.text = element_text(size = 12),
    legend.position = "inside", 
    legend.position.inside =  c(.85, .80)
  )

# Documentation for transparency and reproducibility
print(sessionInfo(), locale=FALSE)
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] mice_3.17.0     mirt_1.42.6     lattice_0.22-6  lubridate_1.9.3
 [5] forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2    
 [9] readr_2.1.5     tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1  
[13] tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1     farver_2.1.2         R.utils_2.12.3      
 [4] fastmap_1.2.0        rpart_4.1.23         digest_0.6.37       
 [7] timechange_0.3.0     lifecycle_1.0.4      Deriv_4.1.6         
[10] dcurver_0.9.2        cluster_2.1.6        survival_3.6-4      
[13] magrittr_2.0.3       compiler_4.4.1       rlang_1.1.4         
[16] tools_4.4.1          utf8_1.2.4           yaml_2.3.10         
[19] knitr_1.49           labeling_0.4.3       htmlwidgets_1.6.4   
[22] curl_5.2.2           withr_3.0.2          R.oo_1.27.0         
[25] nnet_7.3-19          grid_4.4.1           fansi_1.0.6         
[28] jomo_2.7-6           colorspace_2.1-1     future_1.34.0       
[31] progressr_0.15.0     GPArotation_2024.3-1 iterators_1.0.14    
[34] globals_0.16.3       scales_1.3.0         MASS_7.3-60.2       
[37] cli_3.6.3            rmarkdown_2.29       vegan_2.6-8         
[40] generics_0.1.3       rstudioapi_0.16.0    future.apply_1.11.3 
[43] SimDesign_2.17.1     tzdb_0.4.0           sessioninfo_1.2.2   
[46] minqa_1.2.8          pbapply_1.7-2        audio_0.1-11        
[49] splines_4.4.1        parallel_4.4.1       beepr_2.0           
[52] vctrs_0.6.5          boot_1.3-30          glmnet_4.1-8        
[55] Matrix_1.7-0         jsonlite_1.8.9       hms_1.1.3           
[58] mitml_0.4-5          listenv_0.9.1        testthat_3.2.1.1    
[61] foreach_1.5.2        snow_0.4-4           glue_1.8.0          
[64] parallelly_1.38.0    pan_1.9              nloptr_2.1.1        
[67] codetools_0.2-20     shape_1.4.6.1        stringi_1.8.4       
[70] gtable_0.3.6         lme4_1.1-35.5        munsell_0.5.1       
[73] pillar_1.9.0         htmltools_0.5.8.1    brio_1.1.5          
[76] R6_2.5.1             evaluate_1.0.1       R.methodsS3_1.8.2   
[79] backports_1.5.0      RPushbullet_0.3.4    broom_1.0.6         
[82] Rcpp_1.0.13-1        gridExtra_2.3        nlme_3.1-164        
[85] permute_0.9-7        mgcv_1.9-1           xfun_0.49           
[88] pkgconfig_2.0.3