Consensus Sequence
It is a useful tool for identifying conserved regions amoung related sequences and can provide insights into the functional and structural properties of a biomolecule.
Consensus sequences are commonly used in bioinformatics and molecular biology research to help characterize protein families, predict secondary structures, and design primers and probes for genetic analysis.
By summarizing the key features of a sequence family, the Consensus Sequence offers a concise representation that can inform experimental design and data interpretation.
Most cited protocols related to «Consensus Sequence»
The quality of error-corrected reads was evaluated by aligning them to the reference genome using GraphMap (Sović et al. 2016b (link)) with the settings “-a anchorgotoh” and counting the match, mismatch, insertion, and deletion operations in the resulting alignments.
Over the last two years, we have focused on two broad areas: completing the first pass manual annotation across the entire mouse reference genome and a dedicated effort to improve the annotation of protein-coding genes in human and mouse.
We have completed the annotation of novel protein-coding genes, lncRNAs and pseudogenes, plus QC and updating previous annotation where necessary for mouse chromosomes 9, 10, 11, 12, 13, 14, 15, 16 and 17. These updates bring the fraction of the mouse genome with completed first pass manual annotation to approximately 97%. In addition, we have continued to work with the NCBI and Mouse Genome Informatics project at the Jackson Laboratory to resolve annotation differences for protein-coding, pseudogene and lncRNA loci. For protein-coding genes this is under the umbrella of the Consensus Coding Sequence (CCDS) project (16 (link)).
We have also manually investigated unannotated regions of high protein-coding potential identified by whole genome analysis using PhyloCSF (17 (link)) (a tool described in more detail below). In human, this led to the addition of 144 novel protein-coding genes and 271 pseudogenes (of which 42 were unitary pseudogenes). In mouse, we annotated orthologous loci for all but 11 of the 144 human protein-coding genes. We have also revisited the annotation of all olfactory receptor loci in both human and mouse, using RNAseq data to define 5′ and 3′ UTR sequences for ∼1400 loci. In human we have also targeted a ‘deep dive’ manual reannotation of genes on clinical panels for paediatric neurological disorders to identify missing functional alternative splicing. Incorporating second and third generation transcriptomic data, we reannotated ∼190 genes and added more than 3600 alternatively spliced transcripts, including ∼1400 entirely novel exons and an additional ∼30kb of CDS. We have also completed an effort to capture all recently described unannotated microexons (18 (link)) into GENCODE, and further added an additional 146 novel microexons mined from public SLRseq data (19 (link)).
As part of the CCDS collaboration with RefSeq, we have checked a large subset of human loci where there was disagreement over gene biotype. Similarly, we have checked all UniProt manually annotated and reviewed (i.e. Swiss-Prot) accessions that lack an equivalent in GENCODE. As a result, we added 32 novel protein-coding loci to GENCODE and rejected more than 200 putative coding loci. Finally, we are manually reviewing genes previously annotated as protein-coding, but with weak or no support based on a method incorporating UniProt, APPRIS, PhyloCSF, Ensembl comparative genomics, RNA-seq, mass spectrometry and variation data (20 (link),21 (link)). Of the 821 loci investigated to date, 54 have had their coding status removed while a further 110 potentially dubious cases remain under review.
The approach taken reflects in the kinds of updates captured in the annotation. For example, the targeted reannotation in human leads to the annotation of few novel protein-coding loci but many novel transcripts at updated protein-coding and lncRNA loci. Conversely, in mouse the emphasis on clone-by-clone annotation identifies many more novel loci and transcripts across a broader range of biotypes (Figure
The free parameters in the discrete PSMC-HMM model are the scaled mutation rate, recombination rate and piecewise constant population sizes. The time interval each size parameter spans was manually chosen. The estimation-maximization iteration started from a constant-sized population history. The estimation step was done analytically; Powell’s direction set method is used for the maximization step. Parameter values stablized by the 20th iteration, and these were taken as the final estimate. All parameters are scaled to a constant that is further determined under the assumption of a neutral mutation rate 2.5×10−8.
Most recents protocols related to «Consensus Sequence»
Example 6
Ceres cDNA 12723147 encodes an Arabidopsis putative aldo/keto reductase. Ectopic expression of Ceres cDNA 12723147 under the control of the CaMV35S promoter induces the following phenotypes:
-
- Germination on high concentrations of polyethylene glycol (PEG), mannitol and abscissic acid (ABA).
- Continued growth on high concentration of PEG, mannitol and ABA.
Generation and Phenotypic Evaluation of T1 Lines Containing 35S::cDNA 12723147.
Wild-type Arabidopsis Wassilewskija (WS) plants were transformed with a Ti plasmid containing cDNA 12723147 in the sense orientation relative to the CaMV35S constitutive promoter. The Ti plasmid vector used for this construct, CRS338, contains PAT and confers herbicide resistance to transformed plants. Ten independently transformed events were selected and evaluated for their qualitative phenotype in the T1 generation. No positive or negative phenotypes were observed in the T1 plants.
Screens of Superpools on High PEG, Mannitol, and ABA as Surrogate Screens for Drought Tolerance.
Seeds from 13 superpools (1,200 T2 seeds from each superpool) from the CaMV35S or 32449 over-expression lines were tested on 3 drought surrogate screens (high concentrations of PEG, mannitol, and ABA) as described above. T3 seeds were collected from the resistant plants and analyzed for resistance on all three surrogate drought screens.
Once cDNA 12723147 was identified in resistant plants from each of the three surrogate drought screens, the five individual T2 events containing this cDNA (SR01013) were screened on high PEG, mannitol, and ABA to identify events with the resistance phenotype.
Superpools (SP) are referred to as SP1, SP2 and so on. The letter following the hyphen refers to the screen (P=PEG, M=mannitol, and A=ABA) and the number following the letter refers to a number assigned to each plant obtained from that screen on that superpool. For example, SP1-M18 is the 18th plant isolated from a mannitol screen of Superpool 1.
Qualitative and Quantitative Analysis of 2 Independent Events Representing 35S::cDNA 12659859 (SR01010) on PEG, Mannitol and ABA
To identify two independent events of 35S::cDNA 12659859 showing PEG, mannitol, and ABA resistance, 36 seedlings from each of two events, SR01013-01 and -02 were screened as previously described. BastaR segregation was assessed to verify that the lines contained a single insert segregating in a 3:1 (R:S) ratio as calculated by a chi-square test (Table 6-1). Both lines (01 and 02) segregated for a single insert in the T2 generation (Table 1)
Lines SR01013-01 and -02 were chosen as the two events because they had a strong and consistent resistance to PEG, mannitol and ABA. The controls were sown the same day and in the same plate as the individual lines. The PEG (Tables 6-2 and 6-3), mannitol (Tables 6-4 and 6-5) and ABA (Tables 6-6 and 6-7) segregation ratios observed for SR01013-01 and -02 are consistent with the presence of single insert as demonstrated by chi-square, similar to what we observed for BastaR resistance (Table 6-1).
The progeny from one resistant T2 plant from each of these two events were tested in the same manner as the T2. Resistance to PEG, mannitol and ABA was also observed in the T3 generation. Taken together, the segregation of resistant seedlings containing cDNA 12723147 from two events on all three drought surrogate screens and the inheritance of this resistance in a subsequent generation, provide strong evidence that cDNA 12723147 when over-expressed can provide tolerance to drought.
Top products related to «Consensus Sequence»
More about "Consensus Sequence"
Also known as a 'representative sequence' or 'majority sequence,' a consensus sequence summarizes the most commonly occurring nucleotides or amino acids at each position in a multiple sequence alignment.
This concise representation can provide valuable insights into the functional and structural properties of a biomolecule, informing experimental design and data interpretation.
Consensus sequences are widely utilized in molecular biology and genetic analysis workflows.
They are commonly employed to help characterize protein families, predict secondary structures, and design primers and probes for applications like PCR amplification and DNA sequencing.
Popular sequencing platforms like the MiSeq and HiSeq 2500 often rely on consensus sequences to optimize read quality and accuracy.
Bioinformatics tools like the CLC Genomics Workbench can be used to generate and analyze consensus sequences, while molecular biology kits such as the BigDye Terminator v3.1 Cycle Sequencing Kit, QIAquick PCR Purification Kit, and QIAamp Viral RNA Mini Kit facilitate the experimental steps needed to produce high-quality sequencing data.
The Dual-Luciferase Reporter Assay System and Lipofectamine 2000 transfection reagent may also be leveraged in consensus sequence-based research, such as for functional validation of predicted structural motifs.
By summarizing the key features of a sequence family, the consensus sequence offers a concise and informative representation that can guide researchers towards more effective experimental design and data interpretation.
Whether you're working with genes, transcripts, or proteins, consensus sequences can be a powerful tool in your bioinformatics and molecular biology workflows.