Genomic DNA was extracted from the peripheral blood samples of the participants. DNA libraries were constructed using the SureSelect Human All Exon V6 kit (Agilent Technologies) and sequenced on NovaSeq or HiSeq sequencers (Illumina). VCF files were created using DRAGEN (version 3.9.5; Illumina). The sequence reads were mapped to the human reference genome (GRCh37/hg19 with decoy sequences [hs37d5]).
Exome data of the patients and control individuals were subjected to SKAT‐O using the SNP and Variation Suite (version 8.4.1; Golden Helix). We focused on protein‐altering variants (missense and nonsense variants, indels, and splice‐site substitutions), whose allele frequency in the ToMMo database (version 8.3KJPN; https://www.megabank.tohoku.ac.jp/) was less than 5%.25 All missense variants underwent in silico functional assessment; we selected variants that were assessed as damaging by three or more of the six programs in dbNSFP (version 3.0, http://database.liulab.science/dbNSFP/).
SKAT‐O was carried out using the very‐small‐sample algorithm with the rho = 1 setting. We searched for genes whose rare variants were more commonly present in the patient group than in the control group. In addition, we examined whether rare variants of known PCOS‐related genes accumulated in the patient group. Bonferroni‐corrected p‐values of <0.05 were considered statistically significant.
The effects of identified variants on protein function and structure were assessed using the combined annotation‐dependent depletion program (CADD; https://cadd.gs.washington.edu/snv) and PyMOL (version 2.5, https://pymol.org/2/), respectively. CADD scores of ≥20 were assessed as probably damaging.26 The protein IDs were obtained from the protein data bank (https://www.rcsb.org/). The effects of the variants on splice‐site recognition were analyzed with Human Splicing Finder (https://hsf.genomnis.com/home), Alternative Splice‐Site Predictor (http://wangcomputing.com/assp/), and NNSPLICE (http://fruitfly.org/seq_tools/splice.html), and the effects on the protein stability were predicted using I Mutant Suite (http://gpcr2.biocomp.unibo.it/cgi/predictors/I‐Mutant3.0/I‐Mutant3.0.cgi). In addition, the hydrophobicity of wildtype and variant GSTO2 proteins was assessed by using ProtScale (https://web.expasy.org/protscale/) with the Kyte and Doolittle model.27 The formation of intrinsically disordered regions, a region without fixed 3‐dimensional structures,28 was predicted by the Predictor of natural disordered regions (PONDR; http://www.pondr.com/) using the VL‐XT method.29