A very brief methods overview is provided here. Detailed methods are provided in (42 ). Repeats in the T2T-CHM13 assembly were annotated by parsing and combining output from RepeatMasker [provided in (40 (
link))] along with custom-built pipelines for annotating αSat and HSat2,3 (42 ). Regions identified as “SAR” by RepeatMasker were annotated as HSat1A, and regions annotated as “HSATI” by RepeatMasker were annotated as HSat1B. αSat HOR-haps were identified by (i) generating multiple alignments of all HOR units (or subregions of HOR units) from an array, (ii) deriving a consensus sequence, (iii) recoding the individual sequences into binary vectors based on matches to the consensus, and (iv) clustering these binary vectors by use of
k-means clustering. Phylogenetic analyses of αSat sequences were performed with MEGA5. Dotplots colored by percent identity were produced with StainedGlass (88 (
link)).
To analyze short-read NChIP-seq and CUT&RUN data, two parallel methods were developed: (i) marker-assisted mapping to the T2T-CHM13 reference and (ii) reference-free region-specific marker enrichment. For marker-assisted mapping, reads were aligned to the reference then filtered to include only alignments that overlap precomputed nucleotide oligomers of length
k (
k-mers) that occur in only one distinct position in the reference. For reference-free enrichment analysis, a set of k-mers that are enriched in CENP-A–targeted sequencing reads (relative to reads from input or immunoglobulin G controls) were first identified. Next, these enriched k-mers were compared with precomputed k-mers in the reference that occur exclusively within a single window of a given size (“region-specific markers”). Windows with multiple matches to enriched
k-mers were reported as enriched for CENP-A. We performed a similar analysis using HOR-hap–specific markers on chrX, to reveal the broad enrichment of CENP-A on each HOR-hap across multiple individuals (
fig. S21).
Altemose N., Glennis A., Bzikadze A.V., Sidhwani P., Langley S.A., Caldas G.V., Hoyt S.J., Uralsky L., Ryabov F.D., Shew C.J., Sauria M.E., Borchers M., Gershman A., Mikheenko A., Shepelev V.A., Dvorkina T., Kunyavskaya O., Vollger M.R., Rhie A., McCartney A.M., Asri M., Lorig-Roach R., Shafin K., Aganezov S., Olson D., de Lima L.G., Potapova T., Hartley G.A., Haukness M., Kerpedjiev P., Gusev F., Tigyi K., Brooks S., Young A., Nurk S., Koren S., Salama S.R., Paten B., Rogaev E.I., Streets A., Karpen G.H., Dernburg A.F., Sullivan B.A., Straight A.F., Wheeler T.J., Gerton J.L., Eichler E.E., Phillippy A.M., Timp W., Dennis M.Y., O'Neill R.J., Zook J.M., Schatz M.C., Pevzner P.A., Diekhans M., Langley C.H., Alexandrov I.A, & Miga K.H. (2022). Complete genomic and epigenetic maps of human centromeres. Science (New York, N.Y.), 376(6588), eabl4178.