Copy Number and Allelic Content Analysis

Workflow: the workflow of Control-FREEC consists of three steps: (i) calculation and segmentation of copy number profiles; (ii) calculation and segmentation of smoothed BAF profiles; (iii) prediction of final genotype status, i.e. copy number and allelic content for each segment (for example, A, AB, AAB, etc.).

(i) Calculation of copy number profiles is mainly done as described in our previous publication (Boeva et al., 2010). The most important features of the procedure are: (a) possibility to use GC-content and mappability profiles to normalize read count if a control sample is unavailable; (b) proper characterization of overdiploid genomes; (c) correction for possible contamination by normal cells when constructing the copy number profile of a tumor genome. The new tool Control-FREEC can also be used on non-mammalian genomes and includes many new user control settings, such as (a) defining the program's behavior in low mappability regions (http://bioinfo.curie.fr/projects/freec/tutorial.html); (b) choosing the minimal number of consecutive windows required to call a CNA.

(ii) We characterize the allelic content via the BAF introduced previously for SNP arrays (Popova et al., 2009 (link)). We limit the list of genomic positions that we consider to evaluate allelic content to known SNPs only (Sherry et al., 2001 (link)). By the B allele, we mean the alternative variant in SNP database (dbSNP). SNPs that are homozygous in the genome being considered give no information about allelic content (in SNP arrays they are denoted as non-informative); therefore putatively homozygous positions are discarded. A position is discarded if the probability of having variation due to sequencing errors under the condition of actual homozygosity is greater than a specified threshold (Supplementary Materials).

We calculate the total coverage and B-allele coverage for each known putatively heterozygous SNP position. For each window i, we calculate the median of the BAF values: Med_j = median(abs(x_ij−0.5)), where {x_ij} are BAF values of the remaining SNP positions. We segment {Med_j} using the same lasso-based algorithm as used for copy numbers (Harchaoui and Lévy-Leduc, 2008 ).

(iii) We predict genotype status for each genomic segment independently, by choosing the allelic content that corresponds to the maximal log-likelihood, given the copy number detected previously.

First, we combine breakpoints issued from both copy number and median BAF segmentations to get genomic segments with presumably one status. Second, copy number status of each segment is detected as described previously (Boeva et al., 2010). If the CNA is present in most of the cells, there is no ambiguity in determining exact copy number of the region (see Supplementary Materials for more details on the strategy in the case of presence of subclones or normal contamination). Third, given the copy number of the region, we fit Gaussian mixture models (GMMs) with fixed means to the observed BAF values and select the model that provides the highest log-likelihood. For example, for a region with a copy number of two, we fit a two component model (mixture of ‘AA’ and ‘BB’ alleles) and a three component model (‘AA’, ‘AB’ and ‘BB’, with a condition on the minimal weight of ‘AB’). The component means in the GMM depend on the level of contamination by normal DNA (Supplementary Materials).
Input and output: the input consists of a SAM pileup (http://samtools.sourceforge.net/pileup.shtml) and a dbSNP file. The control dataset is optional if a reference genome is provided. The output contains a list of CNAs and LOH regions as well as read count, copy number, BAF and genotype information for each window. If a control (matched normal) dataset is available, each event is annotated as somatic or germline.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Boeva V., Popova T., Bleakley K., Chiche P., Cappo J., Schleiermacher G., Janoueix-Lerosey I., Delattre O, & Barillot E. (2011). Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics, 28(3), 423-425.

Publication 2011

Alleles Cells Genome Genome profile Genotype Germline Heterozygous Homozygosity Mammalian Sequencing variation Somatic Tumor

Corresponding Organization :

Other organizations : Institut Curie, Institut national de recherche en informatique et en automatique, Inserm, École Nationale Supérieure des Mines de Paris

Top 5 similar protocols

Protocol cited in 327 other protocols

Variable analysis

independent variables

GC-content and mappability profiles to normalize read count if a control sample is unavailable
Defining the program's behavior in low mappability regions
Choosing the minimal number of consecutive windows required to call a CNA

dependent variables

Copy number profiles
Smoothed BAF profiles
Predicted final genotype status (copy number and allelic content for each segment)

control variables

Proper characterization of overdiploid genomes
Correction for possible contamination by normal cells when constructing the copy number profile of a tumor genome

positive controls

Reference genome (if provided)

negative controls

Control (matched normal) dataset (if available)

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!