The largest database of trusted experimental protocols

20 protocols using bioinformatics toolbox

1

Predicting TF Binding Site Types

Check if the same lab product or an alternative is used in the 5 most similar protocols
We trained a random forest (RF) classifier to predict whether a TF binding site is a direct or indirect site using the proximal binding of other TFs in the co-binding region. We used the TreeBagger implementation of RF in the MATLAB software (MATLAB and Bioinformatics Toolbox Release 2015b, The MathWorks, Inc., Natick, Massachusetts, United States). More specifically, using the region-TF matrix (159,204 × 167), we took the rows that contained either direct or indirect binding sites of the TF, used the columns corresponding to the direct or indirect binding of the TF as the prediction target and the rest of the columns (binding of the other TFs) as the features. For each sequence specific co-binding TF, the dTF and iTF columns were combined, ignoring the motif information. Therefore, the prediction is harder because it based only on the identity but not the motif information of the co-binding TFs. For each sequence-specific TF, we trained five RFs, each with a distinct random subset (80%) of the data, and then tested on the rest of 20% data. The prediction accuracy values of the five classifiers are then averaged for each TF. The correlation between direct and indirect binding of a TF (shown in Fig. 5c) were computed using the corresponding TF vectors in the module-TF matrix.
+ Open protocol
+ Expand
2

Diurnal Gene Expression Analysis of Three Plants

Check if the same lab product or an alternative is used in the 5 most similar protocols
The diurnal expression data with 4-h intervals for Arabidopsis thaliana were obtained from Mockler et al.75 (link) and adjusted to 2-h interval time series by interpolation using the SRS1 cubic spline function (http://www.srs1software.com/). The diurnal expression data with 2-h intervals for K. fedtschenkoi was generated in this study. The diurnal expression data with 2-h intervals for Ananas comosus was obtained from Ming et al.4 (link). The gene expression data were normalized by Z-score transformation. The hierarchical clustering of gene expression was performed for genes in each ortholog group using the Bioinformatics Toolbox in Matlab (Mathworks, Inc.) based on Spearman correlation (Supplementary Method 14).
+ Open protocol
+ Expand
3

Quantitative Phosphoproteomics Analysis

Check if the same lab product or an alternative is used in the 5 most similar protocols
To identify phosphopeptides with significant changes in abundance relative to the 0 s LightR-Src condition, we utilized paired Student’s t-test (p-value<0.05). Additionally, to ensure phosphopeptide changes were not simply background, we filtered for peptides with average abundances (n = 3) that were >1.4 relative to the 0 s LightR-Src condition. The average abundance for each phosphopeptide was log2-transformed and visualized using MATLAB (version R2019b, Bioinformatics Toolbox version 4.13, MathWorks). Data were plotted with the ‘clustergram’ function with hierarchical clustering using Euclidean distance.
+ Open protocol
+ Expand
4

Downstream Cytokine/Chemokine/Growth Factor Analysis

Check if the same lab product or an alternative is used in the 5 most similar protocols
Downstream data analysis exported from xPONENT was performed using MATLAB 2020a with Bioinformatics Toolbox (MathWorks). Sample technical replicates were averaged, and the lowest dilution was generally used for all cytokines/chemokines/growth factors (C/C/GF). When the measured value exceeded the top standard curve, the lower 50-fold dilution value was used instead. No 50-fold dilution values exceeded the top standard value. When the measured value fell below the lowest standard, the concentration was linearly interpolated between the blank and the lowest standard MFI values, provided that the value was 4 standard deviations above background wells.[27 (link),144 (link),145 ] If all samples contained low values below the lowest standard, the cytokine was excluded from further analysis. For heatmap visualization and principal component analysis, C/C/GFs were z-scored across samples. Hierarchical clustering was done using Euclidean distance. For fold change samples, non-stimulated C/C/GFs with very low expression (>50% of samples below the lowest standard or any sample measuring 0 concentration) were removed from the analysis. For principal component plots, error ellipses were drawn using the error ellipse function (AJ Johnson (2020). error_ellipse, MATLAB Central File Exchange).
+ Open protocol
+ Expand
5

Identification and Analysis of Positively Selected Orthologs

Check if the same lab product or an alternative is used in the 5 most similar protocols
The orthologous gene pairs between two species were identified through the combination of both Best Reciprocal Hits (BRH) and OrthoMCL strategies. The coding sequences were aligned using PAL2NAL [58 (link)], guided by protein sequence alignment generated by MAFFT (linsi; version 7.045b) [59 (link)], and gaps in the alignment were removed. The gapless coding sequence alignments were used for Ka/Ks ratio calculation using the Bioinformatics Toolbox in Matlab (Mathworks, Inc.) with a 50-codon sliding window. For identifying positively selected sites, coding sequences from Arabidopsis, maize, rice and Agave were aligned by Translatorx [60 (link)] using the standalone script. The HyPhy package were used to identify positively selected sites as described [61 (link)], and the tests of FUBAR and REL models as implemented in Datamonkey webserver were used with default settings [62 (link)]. Since we used a sliding window to study the regions of protein with positive selection, we calculated the probabilities of Ka/Ks positive regions to a null hypothesis that Ka/Ks equals to one by one-sided t-test, as described by Schmid and Yang (2008) [63 (link)].
+ Open protocol
+ Expand
6

Phylogenetic Tree Construction from MSAs

Check if the same lab product or an alternative is used in the 5 most similar protocols
The trees were constructed by using subsamples of 4,000 sequences from the original MSAs. The Jukes–Cantor pairwise distance was calculated for the subsamples; this is defined as a maximum-likelihood estimate of the number of substitutions based on the Hamming distance between two sequences. The phylogenetic-tree construction was done by using the neighbor-joining method, assuming equal variance and independence of evolutionary distance estimates, as in refs. 81 (link) and 82 (link). No ancestral sequences were reconstructed. Both the Jukes–Cantor distance and the neighbor-joining implementations are provided in the Bioinformatics Toolbox in MATLAB (MathWorks, Inc.).
+ Open protocol
+ Expand
7

Identifying High-Affinity Egr-1 Binding Sites

Check if the same lab product or an alternative is used in the 5 most similar protocols
Analysis of high-affinity sites for Egr-1 in the human genome was conducted using MATLAB software together with the Bioinformatics Toolbox (MathWorks, Inc.; Natick, MA). First, all possible 9-bp sequences were generated. For each of them, the difference in the binding free energy, ΔΔG, for Egr-1 was predicted from the ΔΔG data for single substitutions, and the number of base-pair matches with the Egr-1 recognition sequence was counted. High-affinity sequences for Egr-1 were identified as those that exhibit ΔΔG < 1.3 kcal/mol and ≥ 6 base-pair matches with the Egr-1 recognition sequence. The total number of each of these high-affinity Egr-1-binding sequences in the human genome was counted using the GenBank GRCh38.p7 assembly.
+ Open protocol
+ Expand
8

Automated Metabolite Analysis in MATLAB

Check if the same lab product or an alternative is used in the 5 most similar protocols
Firstly, the raw data obtained through the Bruker LC/Q-TOF were converted to a NetCDF data file (or mzXML files) through the Compass DataAnalysis software 5.2 (Bruker, Germany) and imported into MATLAB R2020a computing and visualization environment. This could be performed either using the MSroi app as described by Pérez-Cova et al. [21 ] in MATLAB or via mass spectrometry directly with the functions of the Bioinformatics toolbox (The Mathworks, Inc., 2020b).
The data from the amino acids’ standard were imported using the MSroi GUI app as a single chromatographic run (single sample option) while the data from the 7 replicate fish embryo samples were imported as multiple chromatographic runs (using the multi-sample option) arranged as a column-wise augmented data matrix (see Fig. 1).
+ Open protocol
+ Expand
9

Next-Generation Sequencing of PCR Product

Check if the same lab product or an alternative is used in the 5 most similar protocols
The PCR product was subjected to Next-Generation Sequencing by Eurofins Genomics. Sequences were generated with a MiSeq system using a 2x250 paired-end module. A single run was conducted which yielded 20.95 million reads and 5.24 gigabasepairs of data. The percentage of reads with Q score above 30 was 76.19% and the mean Q score was 30.59. The pair-end reads were combined using the FLASH program [24 (link)]. The result was a set of 8,573,790 individual nucleotide sequences (about 82% yield). Sequences were translated and converted to fasta format using Matlab with the BioInformatics toolbox (Mathworks). Searches within this database were carried out using BLAST (National Library of Medicine).
+ Open protocol
+ Expand
10

Comparative Analysis of Diurnal Gene Expression in Arabidopsis and Agave

Check if the same lab product or an alternative is used in the 5 most similar protocols
The ArabidopsisAgave orthologous gene pairs were identified through the combination of both OrthoMCL strategies and the reciprocal best hits (RBH) based on BLASTp with an E-value cutoff of 1e-5. The diurnal expression data for Arabidopsis thaliana were obtained from Mockler et al. (2007) [9 (link)]. Both Arabidopsis and Agave plants were grown under a photoperiod of 12 h light:12 h dark cycle. The Arabidopsis expression data were collected at 0, 4, 8, 12, 16, 20, and 24 h, whereas the Agave data were collected at 0, 3, 6, 9, 12, 15, 18, and 21 h after the start of the light period [14 (link)]. The cubic interpolation algorithm implemented in Matlab (Mathworks, Inc.) was used to simulate the gene expression levels at additional time points, so that both time-course data sets consisted of the same time points: 0, 3, 4, 6, 8, 9, 12, 15, 16, 18, 20, and 21 h after the start of the light period. The gene expression data were normalized by Z score transformation. The hierarchical clustering of gene expression was performed using the Bioinformatics Toolbox in Matlab (Mathworks, Inc.).
+ Open protocol
+ Expand

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Sign up for free.
Registration takes 20 seconds.
Available from any computer
No download required

Sign up now

Revolutionizing how scientists
search and build protocols!