Inference is as in (2.2), by the EM algorithm. The νi are fixed at 1 for all classes except the heterozygous class, which is fixed at 1.3. The values of μi, Σi for the unknown class are fixed with identical values to (2.2).
All four classes have initial class probabilities set to 0.25, and for the three genotype classes initial covariance matrices are set to (2c/N)×I2 with c the cost (Arthur and Vassilvitskii, 2007 ) of a k means ++ clustering on the data, and N the number of intensity points. The transformation of intensities has accounted for shifts, and so location parameters of the two homozygous classes can be initialized to the extremes of y(1), and the heterozygous class will then fall somewhere in between, thus the μi are initialized to
where the min/max are taken over a filtered version of the intensity data, with the lowest 1% of untransformed intensity values in the x(1) direction and lowest one percent in the x(2) direction removed. is the mean of the yj over the second axis, and k is a shift parameter for the location of the heterozygous class, that takes one of three values, 0.45, 0.5 or 0.55, resulting in three sets of initial values dependent on the value of k. For each set of starting values, the EM algorithm is run until genotype calls are concordant for two consecutive iterations, and the optimal parameters are chosen to be the final values with the highest likelihood.
Genotype calls are made using genotype posterior probabilities [using the πi inferred from this step unlike (2.3)] with a 0.7 call threshold. By default, SNPs that fail the HWE test subsequent to this step have all genotypes called unknown.
In our experiments, we have found the occurrence of the rescue step, and the subsequent chances of a successful rescue, to vary with the quality of the dataset. On a number of Immunochip datasets, rescue steps tended to occur on between 3 and 10% of SNPs, with 30–50% being successful.