For the polygenic risk scores (PRS) we clumped the summary stats applying standard Ricopili parameters (see Supplementary Note for details). To avoid potential strand conflicts, we excluded all ambiguous markers for summary statistics not generated by Ricopili using the same imputation reference. PRS were generated at the default p-value thresholds (5e-8, 1e-6, 1e-4, 0.001, 0.01, 0.05, 0.1, 0.2, 0.5 and 1) as a weighted sum of the allele dosages in the ASD GWAS sample. Summing over the markers abiding by the p-value threshold in the training set and weighing by the additive scale effect measure of the marker (log(OR) or β) as estimated in the training set. Scores were normalized prior to analysis.
We evaluated the predictive power using Nagelkerke’s R2 and plots of odds ratios and confidence intervals over score deciles. Both R2 and odds ratios were estimated in regression analyses including the relevant PCs and indicator variables for genotyping waves.
Lacking a large ASD sample outside of iPSYCH and PGC, we trained a set of PRS for ASD internally in the following way. We divided the sample in five subsamples of roughly equal size respecting the division into batches. We then ran five GWAS leaving out each group in turn from the training set and meta-analyzed these with the PGC results. This produced a set of PRS for each of the five subsamples trained on their complement. Prior to analyses, each score was normalized on the group where it was defined. We evaluated the predictive power in each group and on the whole sample combined.
To exploit the genetic overlap with other phenotypes to improve prediction, we created a series of new PRS by adding to the internally trained ASD score the PRS of other highly correlated phenotypes in a weighted sum. See supplementary info for details.
To analyze ASD subtypes in relation to PRS we defined a hierarchical set of phenotypes in the following way: First hierarchical subtypes was childhood autism, hierarchical atypical autism was defined as everybody with atypical autism and no childhood autism diagnosis, hierarchical Asperger’s as everybody with an Asperger’s diagnosis and neither childhood autism nor atypical autism. Finally, we lumped other pervasive developmental disorders and pervasive developmental disorder, unspecified into pervasive disorders developmental mixed, and the hierarchical version of that consists of everybody with such a diagnosis and none of the preceding ones (Supplementary Table 13). We examined the distribution over the distinct ASD subtypes of PRS for a number of phenotypes showing high rG with ASD (as well as a few with low rG as negative controls), by doing multivariate regression of the scores on the subtypes while adjusting for relevant PCs and wave indicator variables in a linear regression. See Supplementary Note for details.