The procedure starts with the isolation of genomic DNA from cultured cells using a standard proteinase K digestion method. However, prior to DNA extraction, cells must be cultured for a limited duration to allow for nuclease expression to induce cleavage of the bait break-site and for translocation of bait broken ends to prey DSBs. Prey DSBs can be generated by endogenous mechanisms (e.g. activation-induced cytidine deaminase (AID) or recombination activating gene 1/2 (RAG) cleavage sites, transcriptional start sites, etc.) or ectopic mechanisms (e.g. nuclease-generated DSBs). The numerous approaches available to generate cells with bait DSBs (e.g. transfection, viral transduction, nucleofection) are not described in this procedure but are described elsewhere for commonly used cell lines6 (link),7 ,9 (link),10 (link).
Genomic DNA is sheared by sonication and the bait-prey junctions are then amplified by LAM-PCR16 , using directional primers lying on one or the other side of the bait break-site (or sites). LAM-PCR with a single 5’ biotinylated primer amplifies across the bait sequence into the unknown prey sequence (Fig. 1b ). Junction-containing ssDNAs are enriched via binding to streptavidin-coated magnetic beads (Fig. 1b ). After washing, bead-bound ssDNAs are unidirectionally ligated to a bridge adapter18 (link). Adapter-ligated, bead-bound ssDNA fragments are then subjected to nested PCR to incorporate a barcode sequence necessary for de-multiplexing (Fig. 1b ). Following an optional blocking digest to suppress the potentially large number of uncut and/or perfectly rejoined or minimally-modified bait sequences (Figs. 1b and 2a,b ), a final PCR step fully reconstructs Illumina Miseq adapter sequences at the ends of the amplified bait-prey junction sequence (Figs. 1b and 2c ). Samples are then separated on an agarose gel, and a resulting population of 0.5-1 kb fragments are collected and quantified prior to Miseq paired-end sequencing, with a typical 2x 250bp HTGTS library sampling ~1×106 sequence reads.
We generated a custom bioinformatic pipeline that can be used to characterize the bait-prey junctions from the library of sequence reads and should be sufficient for most LAM-HTGTS applications using long paired-end sequence reads. The pipeline is available athttp://robinmeyers.github.io/transloc_pipeline/ and consists of both third-party stand-alone tools (e.g. aligners) as well as custom programs built in Perl and R, enabling the processing of sequence reads directly off the sequencer into fully annotated translocation junctions in as few as two commands (Fig. 3 ). Briefly, library pre-processing steps consist of deconvoluting the barcoded libraries and trimming Illumina primers. The main processing pipeline is made up of three major steps: 1) local read alignment, 2) junction detection, and 3) results filtering. We use bowtie 2 to perform read alignments19 . The junction detection algorithm is based on the Optimal Query Coverage (OQC) algorithm from the YAHA read aligner and breakpoint detector20 (link). The OQC attempts to achieve the following objective: to optimally infer the full paired-end query sequence from one or more alignments to a reference sequence. The optimal set is determined by using a best-path search algorithm, which enables the detection of not only simple bait-prey junction reads, but also un-joined bait sequences, as well as reads harboring multiple consecutive junctions. The algorithm allows for overlapping alignments, which is required for micro-homology analyses and naturally extends to paired-end reads. The final characterization is an ordered set of alignments termed the Optimal Coverage Set (OCS). The library of resulting OCSs is subjected to a number of filters; the combination of filters and filter parameters used will depend largely on the application. Description of the filters currently employed can also be found at http://robinmeyers.github.io/transloc_pipeline .
Genomic DNA is sheared by sonication and the bait-prey junctions are then amplified by LAM-PCR16 , using directional primers lying on one or the other side of the bait break-site (or sites). LAM-PCR with a single 5’ biotinylated primer amplifies across the bait sequence into the unknown prey sequence (
We generated a custom bioinformatic pipeline that can be used to characterize the bait-prey junctions from the library of sequence reads and should be sufficient for most LAM-HTGTS applications using long paired-end sequence reads. The pipeline is available at