There are a number of helpful introductory and detailed descriptions of IRT techniques available (e.g. Reise and Waller, 2009 ; Embretson and Reise, 2013 ; van der Linden and Hambleton, 2013 ). IRT analyses were conducted using all available data for each subscale of the GPTS (Part A = 1218, Part B = 10 545). Where appropriate, unidimensional IRT analyses were conducted to examine the item and test properties of the individual factors of the GPTS. IRT was only conducted if the assumption of unidimensionality was met. The EFA and Mokken analysis were used to evaluate whether items conform to a single scale, with Loevinger's H above 0.3 indicating unidimensionality (Stochl et al., 2012 (link)). A two-parameter graded response model (GRM) was fitted to the items (Samejima, 1969 ). Person fit statistics were calculated to detect outliers where the pattern of responses across the items was atypical and therefore likely guided by other response mechanisms (e.g. random responding). Participants with atypical response patterns, determined by extreme person fit statistic scores (z < −3 or >3), were excluded (Felt et al., 2017 (link)).
The item and test parameters derived from the IRT analysis are expressed as a function of θ, representing the continuum of the latent trait (i.e. paranoia) where values denote standard deviations from the average level (θ = 0). As such, higher values of θ represent more severe paranoia. The ability of each item to discriminate different levels of paranoia is denoted by the discrimination parameter (a), with higher values indicating small shifts in severity lead to increases in the probability that an item will be endorsed. Discrimination parameters above 1 are highly discriminative, whilst those below 0.5 are considered unacceptable (Baker and Kim, 2017 ). The difficulty parameters (b) describe the level of severity that the item measures, with the four difficulty parameters for each item denoting the 50% probability of responding at the boundary between each response option. Higher difficulty parameters indicate that the item responses typically measure more severe levels of paranoia.
The reliability of the GPTS was evaluated using the test information (TI) function, representing the precision of the measure at different points along the θ spectrum. To aid interpretation, the TI at specific values of θ were converted to an equivalent α reliability on a 0–1 scale with the formula 1/√TI(θ) (O'Connor, 2018 (link)). To evaluate measurement invariance, we conducted differential item functioning (DIF) analysis for age and gender, with the criteria of a β change above 10% and a pseudo R2 above 0.13 indicating significant item variance (Crane et al., 2007 (link); Choi et al., 2011 ). The presence of DIF reflects a measurement bias where demographic factors influence the way participants respond to the items (Holland and Wainer, 2012 ).
Free full text: Click here