The item and test parameters derived from the IRT analysis are expressed as a function of θ, representing the continuum of the latent trait (i.e. paranoia) where values denote standard deviations from the average level (θ = 0). As such, higher values of θ represent more severe paranoia. The ability of each item to discriminate different levels of paranoia is denoted by the discrimination parameter (a), with higher values indicating small shifts in severity lead to increases in the probability that an item will be endorsed. Discrimination parameters above 1 are highly discriminative, whilst those below 0.5 are considered unacceptable (Baker and Kim, 2017 ). The difficulty parameters (b) describe the level of severity that the item measures, with the four difficulty parameters for each item denoting the 50% probability of responding at the boundary between each response option. Higher difficulty parameters indicate that the item responses typically measure more severe levels of paranoia.
The reliability of the GPTS was evaluated using the test information (TI) function, representing the precision of the measure at different points along the θ spectrum. To aid interpretation, the TI at specific values of θ were converted to an equivalent α reliability on a 0–1 scale with the formula 1/√TI(θ) (O'Connor, 2018 (link)). To evaluate measurement invariance, we conducted differential item functioning (DIF) analysis for age and gender, with the criteria of a β change above 10% and a pseudo R2 above 0.13 indicating significant item variance (Crane et al., 2007 (link); Choi et al., 2011 ). The presence of DIF reflects a measurement bias where demographic factors influence the way participants respond to the items (Holland and Wainer, 2012 ).