Example 1: Piloting Testing with a Linked Test Design

Accuracy of item difficulties in Rasch model

Author

Ulrich Schroeders & Timo Gnambs

Published

December 26, 2024


Objective

The example demonstrates how to determine the sample size required to estimate the item difficulties of a one-parametric item response model with a given precision. In the study, two test versions, A and B, are administered, each containing 18 items. Twelve items are unique to each test version, while six items are common to both test versions. The parameter of interest is the Mean Squared Error (MSE) of the item difficulty parameters in a one-parametric item response model.

I. Determining the data generation for the complete dataset


  1. Number and distribution of factors (unidimensional vs. multidimensional)
  2. Number of items and item parameters (discriminations, difficulties)
  3. Item type (dichotomous, polytomous)

The true item parameters for the 30 dichotomous items are generated.

# Clear work space and load necessary packages
rm(list = ls())
library(tidyverse)   # General data handling and manipulation
library(mirt)        # Item response theory modeling

# Set the seed for random number generation to ensure reproducibility
set.seed(2024)

# Number of items
n_items <- 30

# Discrimination parameters are randomly drawn from a normal distribution
#   with a mean of 1 and a standard deviation of 0.1 to result in parameters
#   that closely conform to the 1PL while still exhibiting some misfit, that
#   is the discriminations vary slightly around 1.
a_i <- rnorm(n_items, 1, 0.1)

# Difficulty parameters are equally spaced between -2 and 2 to cover the
#   expected difficulty range of the latent proficiency distribution.
b_i <- seq(-2, 2, length = n_items)

The generate_dich_data function uses mirt::simdata to simulate dichotomous data given the item discriminations (\(a\)), item difficulties (\(b\)), and sample size (\(n\)).

Note that the mirt package uses the slope-intercept parameterization. Therefore, the item difficulty parameters (\(b\)) must be transformed into the item intercepts \(d_i = -a_i*b_i\). The transformation between the traditional IRT parameters (item discrimination and item difficulty) and the slope-intercept parameters can also be done with mirt::traditional2mirt.

# Simulate dichotomous item responses for n respondents to all items
  # - 'a' denotes the item discriminations
  # - 'b' denotes the item difficulties
  # - 'n' denotes the sample size

generate_dich_data <- function(a, b, n) {
  resp <-
    mirt::simdata(a = a, d = -a*b, N = n, itemtype = "dich") %>% 
    as.data.frame()
  
  return(resp)
}

II. Defining the test design and the process of missing values


  1. Pattern of missingness (e.g., type of missingness, linking design)
  2. Amount of missing data

The data_link_design function uses the complete simulated data to delete observations. Since there are two test versions in the study, each respondent is randomly assigned to test version A or test version B. Test version A contains the odd-numbered items and test version B contains the even-numbered items. Items 13 through 18 are present on both versions as linking items. Depending on the assigned test version, responses to items not included in the test version will have missing values.

# Induce missingness to the complete simulated data set
  # - 'resp' denotes the complete data set

data_link_design <- function(resp) {
  n <- nrow(resp)
  n_items <- ncol(resp)
  
  # Generate an indicator of the administered test version for each respondent
  #   assuming that about half of the sample receives each test version
  resp$version <- sample(c(1, 2), n, replace = TRUE)
  
  # Item responses not included in a test version are set to missing 
  #   depending on the generated indicator of the administered test version.
  resp[resp$version == 1, setdiff(seq(1, n_items), c(seq(1, n_items, 2), 13:18))] <- NA
  resp[resp$version == 2, setdiff(seq(1, n_items), c(seq(2, n_items, 2), 13:18))] <- NA
  resp <- subset(resp, select = -c(version))
  
  return(resp)
}

III. Selecting the IRT model and the parameter of interest


  1. Underlying IRT model (e.g., 1PL, 2PL)
  2. IRT modeling software and estimation method
  3. Parameters to extract

The estimate_irt function estimates the one-parametric IRT model (Rasch model) with mirt using the simulated data and returns a data.frame with the estimated item difficulties.

# Estimate item response model
  # - 'resp' denotes the data set

estimate_irt <- function(resp) {
  
  # Estimate a 1PL model using try-catch to handle errors
  mod <- tryCatch(
    mirt(data = resp,              # Item responses
         itemtype = 'Rasch',       # 1PL model
         verbose = FALSE), 
    error = function(e) NULL
  )
  
  # Extract item difficulties if model is estimated, else return NA
  if (!is.null(mod)) {
    est <- coef(mod, IRTpars = TRUE, simplify = TRUE)$items[, "b"]
  } else {
    est <- rep(NA, ncol(resp))
  }
  
  return(est)
}

IV. Setting up the Monte Carlo Simulation


  1. Number of iterations
  2. Sample sizes to evaluate

The Monte Carlo simulation runs n_iterations times, including the previous steps of (i) determining the data generation for the complete dataset, (ii) defining the test design and the process of missing values, (iii) selecting the IRT model and the parameter of interest.

Based on the estimated standard deviation of the MSE (\(\sigma = 0.523\)), a specified level of accuracy (\(\delta = .05\)), and a significance level (\(\alpha = .05\)), the required number of iterations was calculated as 438. Therefore, the simulation is run for different sample sizes between 100 and 600 (in increments of 50).

# Number of iterations
n_iterations <- 438

# Create data frame for results (res)
res <- data.frame()

# Check if result file already exists
if (file.exists("example_1_res.rds")) {
  res <- readRDS("example_1_res.rds") 
} else {
# Loop over different sample sizes (from 100 to 600, in steps of 100)
  for (n_persons in seq(100, 600, 50)) {  
  
    # Nested loop, running 'n_iterations' times
    for (iter in 1:n_iterations) {
      dat <- generate_dich_data(a_i, b_i, n_persons) %>% 
            data_link_design()
      res <- bind_rows(res,
                      data.frame(item = paste0("Item_", 1:length(b_i)),
                                 iteration = iter,
                                 n_persons = n_persons,
                                 b_est = estimate_irt(dat),
                                 b_true = b_i))
    }
  }

  # Save the results
  saveRDS(res, file = "example_1_res.rds")
}

Results

The figure shows the MSE and its 95% confidence interval (± 1.96 MCSE [Monte Carlo Standard Error]) for two items by sample size. For simplicity, only the results for items 1 and 15 are shown.

# Preparation and aggregation of results
res_plot <- res %>%
  as_tibble() %>% 
  mutate(b_diff2 = (b_est - b_true)^2) %>% 
  group_by(item, n_persons) %>%
  summarise(
    MSE = mean(b_diff2, na.rm = TRUE),                 # Calculate the MSE for item difficulties
    MCSE = sd(b_diff2, na.rm = TRUE) / sqrt(n() - 1),  # Calculate the MCSE
    .groups = 'drop') %>%
  filter(item %in% c("Item_1", "Item_15"))         # Filter data for the 1st and 15th items

# Plot mean squared error by sample size for item 1 and 15
ggplot(data=res_plot, aes(x=n_persons, color=item)) +
  
  geom_line(aes(y = MSE), linewidth = 1, linetype = "solid") +
  geom_point(aes(y = MSE), size = 2) +
  
   # CI boundaries
  geom_line(aes(y = MSE - 1.96 * MCSE), linewidth = 0.8, alpha = 0.7, linetype = "dashed") +
  geom_point(aes(y = MSE - 1.96 * MCSE), size = 1.5, alpha = 0.7) +
  geom_line(aes(y = MSE + 1.96 * MCSE), linewidth = 0.8, alpha = 0.7, linetype = "dashed") +
  geom_point(aes(y = MSE + 1.96 * MCSE), size = 1.5, alpha = 0.7) +
  
  geom_abline(intercept = .05, slope = 0, col="red", lty = "twodash") +
  labs(
    x = "Sample size",
    y = "Mean Squared Error (MSE)",
    color = "Item",
    linetype = "Item"
  ) +
  scale_color_manual(
    values = c("Item_1" = "cornflowerblue", 
               "Item_15" = "goldenrod1"),
    labels = c("Item_1" = "Item 1 (b = -2.0)", 
               "Item_15" = "Item 15 (b = -0.07)")
  ) +
  ylim(0, 0.4) +
  xlim(100, 600) +
  theme_bw()  +
  theme(
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12),
    legend.title = element_text(size = 14),
    legend.text = element_text(size = 12),
    legend.position = "inside",
    legend.position.inside = c(.85, .85)
  )

# Documentation for transparency and reproducibility
print(sessionInfo(), locale=FALSE)
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] mirt_1.42.6     lattice_0.22-6  lubridate_1.9.3 forcats_1.0.0  
 [5] stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2     readr_2.1.5    
 [9] tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.6         xfun_0.49            htmlwidgets_1.6.4   
 [4] tzdb_0.4.0           vctrs_0.6.5          tools_4.4.1         
 [7] generics_0.1.3       curl_5.2.2           parallel_4.4.1      
[10] fansi_1.0.6          cluster_2.1.6        pkgconfig_2.0.3     
[13] R.oo_1.27.0          RPushbullet_0.3.4    Matrix_1.7-0        
[16] lifecycle_1.0.4      farver_2.1.2         compiler_4.4.1      
[19] GPArotation_2024.3-1 brio_1.1.5           munsell_0.5.1       
[22] codetools_0.2-20     permute_0.9-7        dcurver_0.9.2       
[25] snow_0.4-4           htmltools_0.5.8.1    yaml_2.3.10         
[28] pillar_1.9.0         MASS_7.3-60.2        R.utils_2.12.3      
[31] vegan_2.6-8          sessioninfo_1.2.2    parallelly_1.38.0   
[34] nlme_3.1-164         Deriv_4.1.6          tidyselect_1.2.1    
[37] digest_0.6.37        future_1.34.0        stringi_1.8.4       
[40] listenv_0.9.1        labeling_0.4.3       splines_4.4.1       
[43] fastmap_1.2.0        grid_4.4.1           colorspace_2.1-1    
[46] cli_3.6.3            beepr_2.0            magrittr_2.0.3      
[49] utf8_1.2.4           future.apply_1.11.3  SimDesign_2.17.1    
[52] withr_3.0.2          scales_1.3.0         timechange_0.3.0    
[55] rmarkdown_2.29       globals_0.16.3       audio_0.1-11        
[58] gridExtra_2.3        progressr_0.15.0     R.methodsS3_1.8.2   
[61] hms_1.1.3            pbapply_1.7-2        evaluate_1.0.1      
[64] knitr_1.49           testthat_3.2.1.1     mgcv_1.9-1          
[67] rlang_1.1.4          Rcpp_1.0.13-1        glue_1.8.0          
[70] rstudioapi_0.16.0    jsonlite_1.8.9       R6_2.5.1