The largest database of trusted experimental protocols

DNA Motifs

DNA motifs are short, recurring patterns within DNA sequences that play crucial roles in gene regulation, chromatin organization, and other biological processes.
These sequence elements, often just a few base pairs in length, can serve as binding sites for transcription factors, insulators, or other regulatory proteins, influencing gene expression and chromatin structure.
Analyzing and identifying DNA motifs is a key step in understanding the complex mechanisms underlying gene regulation and cellular function.
Researchers can leverage AI-powered platforms like PubCompare.ai to streamline DNA motif analysis, locate relevant protocols, and enhance the reproducibility and accuracy of their scientific studies.
With a user-friendly interface and the power of artificial intelligence, PubCompare.ai empowers researchers to experiance the future of DNA motif analysis today.

Most cited protocols related to «DNA Motifs»

Protocol full text hidden due to copyright restrictions

Open the protocol to access the free full text link

Publication 2018
Cell Lines Cloning Vectors Diploid Cell DNA Methylation DNA Motifs Genes Genome, Human Glioma Histocompatibility Testing Hydatidiform Mole Induced Pluripotent Stem Cells Methylation Microarray Analysis Multiple Birth Offspring Pluripotent Stem Cells RNA, Messenger RNA-Seq S-pentachlorobuta-1,3-dien-yl-cysteine Stem Cells Transcription, Genetic
Our goals were to produce a resource that (i) contains a comprehensive collection of relevant motifs for each factor; (ii) avoids repetitive, weakly enriched motifs that do not contribute to the in vivo specificity of the factor or its partners; and (iii) excludes variants of the same motif, particularly among the discovered motifs. With this in mind, we conducted motif discovery separately on each data set using five motif discovery tools and manually placed all its data sets into ‘factor groups’ on the basis of known motifs and homology (Figure 2). Known motifs from the literature and the top 10 most enriched discovered motifs (excluding duplicates) were collected for each factor group (see Supplementary Methods) and named as TF_known# for known motifs and TF_disc# for discovered motifs, where TF denotes the factor group (e.g. FOXA, CTCF, etc.). Known motifs were ordered arbitrarily, whereas the discovered motifs were ordered in descending order of the enrichment value that was used for their selection.

Outline of motif discovery pipeline. Input regions for each data set are randomly partitioned into two groups. The top 250 regions of one of the partitions are scanned for motifs using five de novo motif discovery tools. These motifs are evaluated using the peaks from the other partitioned and pooled across data sets for a factor group to produce the final list of discovered motifs for each factor group.

The 427 ENCODE experiments analyzed correspond to 123 TFs, which we place into 84 factor groups (Figure 3a). We failed to discover an enriched motif for only 12 of the 84 factor groups, of which 9 lack DNA binding domains (BRF, CTBP2, HDAC8, KAT2A, NELFE, SUPT20H, SUZ12, WRNIP1 and XRCC4) as identified by UniProt (27 (link)), and 6 have all their data sets flagged as unreliable based on various quality metrics [BRF, KAT2A, NELFE, NR4A, SUPT20H and ZZZ3; see (A. Kundaje, L.Y. Jung, P.V. Kharchenko, B. Wold, A. Sidow, S. Batzoglou and P.J. Park, in preparation)]. Of these factor groups, only NR4A has a previously identified known motif.

(a) Summary of input data used. The outside ring indicates the experimental data sets (one tick for each of 427), which are separated into 123 transcription factors (second ring). The TFs are further grouped into 84 factor groups (third ring). We are able to find a matching discovered motif for 41 of the 56 factor groups with a known motif; 29 of these 41 factor groups have additional discovered motifs that may be associated with cofactors. For all but 1 of the 15 factor groups where the known motif is not recovered we still find enriched discovered motifs. We also discovered enriched motifs for 17 of the 28 factor groups without a known motif. (b) Recovery of known motifs by each of the discovery tools. Performance of discovery in terms of number of factor groups for which the known motif was recovered. A motif is considered a match if it matches any of the known motifs for a factor group (see Supplementary Methods for details on how matches are computed). The number of additional factors that have a match is shown with each additional motif (only three motifs are taken from each individual method, whereas we have up to 10 for the pipeline). The number of factor groups with no motif match is shown in parenthesis. When multiple data sets exist for a factor group, the fraction that matches is used in computing its contribution for computing the performance of the individual tools.

We exclude from the discussion below motifs that we consider unlikely to be relevant to our analysis, while maintaining them as part of the overall resource where they may be useful. These include 46 discovered motifs that are either low-complexity (e.g. dinucleotide repeats) or consistently have weak enrichment (<2) and do not match known motifs (Supplementary Table S1). These are likely a consequence of slight biases in the discovery pipeline, or are due to real, but relatively weak, specificity for the factor. We also exclude an additional 36 motifs that have a weak similarity to the known motif for the factor but for which a better matching and enriched motif is also found (Supplementary Table S2). These are most frequently seen for longer motifs that can be broken up into recognizable, but globally dissimilar, patterns that are not captured by our automatic exclusion criteria (see Supplementary Methods). Together, these represent 28% of the 293 discovered motifs.
Publication 2013
A-factor (Streptomyces) CTGF protein, human Debility Dinucleotide Repeats DNA Motifs factor A Factor IX Factor XII negative elongation factor E, human Ticks Transcription Factor XRCC4 protein, human
The motif affinity function, f(Sg, M, tm) (Eqn. 2), is used in MEA to assign a motif affinity score, Xg, to a DNA sequence, Sg. The score represents the affinity for the sequence of a DNA-binding molecule with binding motif M. The most commonly used motif affinity functions either count the number of "matches" to a motif in the DNA sequence or compute some function that represents the total binding of the TF or microRNA to the sequence. We study both of these types of affinity function and, in both cases, we represent the motif, M, by a log likelihood ratio PWM [11 (link)]. All motif PWMs were generated using a uniform background model in the denominator of the likelihood ratio.
When counting matches, we use FIMO [7 (link)], which scores each position in a sequence, Sg, (on both strands) using the PWM, M, and computes the p-value of each score. (The p-value is based on a zero-order Markov model of the input sequences.) The value of the affinity function, f(Sg, M, tm), is the number of positions the sequence with p-value less than or equal to tm, the motif score threshold. We refer to this motif affinity function as "MC" (for "match-count").
For our other motif affinity function, which estimates the total binding of the TF or microRNA represented by the motif, we use the AMA algorithm [12 (link)] to compute the average motif affinity (AMA) score of the sequence, Sg, to the motif, M [13 (link)]. The AMA score is equal to the average likelihood ratio (not the log likelihood ratio) of the sequence (on both strands). We use a minor variant of the AMA score, which we call RMA (for relative motif affinity), when computing the linear regression association function (see below). To compute RMA, we divide the AMA score by the maximum possible AMA score of a single position in any sequence. This ensures that the range of the binding affinity function is [0,...,1]. No motif match threshold (tm) is required when using AMA as the motif affinity function.
Publication 2010
DNA, A-Form DNA Motifs MicroRNAs
RcisTarget is a new R/Bioconductor implementation of the motif enrichment framework of i-cisTarget and iRegulon. RcisTarget identifies enriched transcription factor binding motifs and candidate transcription factors for a gene list. In brief, RcisTarget is based on two steps. First, it selects DNA motifs that are significantly over-represented in the surroundings of the transcription start site (TSS) of the genes in the gene-set. This is achieved by applying a recovery-based method on a database that contains genome-wide cross-species rankings for each motif. The motifs that are annotated to the corresponding TF and obtain a Normalized Enrichment Score (NES) > 3.0 are retained. Next, for each motif and gene-set, RcisTarget predicts candidate target genes (i.e. genes in the gene-set that are ranked above the leading edge). This method is based on the approach described by Aerts et al. 32 (link) which is also implemented in i-cisTarget (web interface) 33 (link) and iRegulon (Cytoscape plug-in) 34 (link). Therefore, when using the same parameters and databases, RcisTarget provides the same results as i-cisTarget or iRegulon, benchmarked against other TFBS-enrichment tools in Janky et al. 34 (link). More details about the method and its implementation in R are given in the package documentation.
To build the final regulons, we merge the predicted target genes of each TF-module that show enrichment of any motif of the given TF. To detect repression, it is theoretically possible to follow the same approach with the negative-correlated TF modules. However, in the datasets we analyzed, these modules were less numerous and showed very low motif enrichment, suggesting that these are lower quality modules. For this reason, we finally decided to exclude the detection of direct repression from the workflow, and continue only with the positive-correlated targets. The databases used for the analyses presented in this paper are the "18k motif collection" from iRegulon (gene-based motif rankings) for human and mouse. For each species, we used two gene-motif rankings (10kb around the TSS or 500bp upstream the TSS), which determine the search space around the transcription start site.
Publication 2017
DNA Motifs Gene Modules Genes Genome Homo sapiens Mus Regulon Repression, Psychology Transcription, Genetic Transcription Factor Transcription Initiation Site
The FIET [10 (link)] is an analytical computation of the Pearson χ2 P value. In particular, this calculation is important when marginal frequencies are small, which is often the case in position frequency matrices. The marginal P value of the contingency table for DNA motifs (Table 3) follows the multiple hypergeometric distribution [24 ]:
P=NXNXA,NXC,NXG,NXTNYNYA,NYC,NYG,NYTN NA,NC,NG,NT MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfeBSjuyZL2yd9gzLbvyNv2Caerbhv2BYDwAHbqedmvETj2BSbqee0evGueE0jxyaibaiKI8=vI8tuQ8FMI8Gi=hEeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciGacaGaaeqabaqadeqadaaakeaacaWGqbGaeyypa0ZaaSaaaeaacaWGobWaaSbaaSqaaiaadIfaaeqaaOGaamOtamaaBaaaleaacaWGybGaamyqaaqabaGccaGGSaGaamOtamaaBaaaleaacaWGybGaam4qaaqabaGccaGGSaGaamOtamaaBaaaleaacaWGybGaam4raaqabaGccaGGSaGaamOtamaaBaaaleaacaWGybGaamivaaqabaGccaWGobWaaSbaaSqaaiaadMfaaeqaaOGaamOtamaaBaaaleaacaWGzbGaamyqaaqabaGccaGGSaGaamOtamaaBaaaleaacaWGzbGaam4qaaqabaGccaGGSaGaamOtamaaBaaaleaacaWGzbGaam4raaqabaGccaGGSaGaamOtamaaBaaaleaacaWGzbGaamivaaqabaaakeaacaWGobGaaeiiaiaad6eadaWgaaWcbaGaamyqaaqabaGccaGGSaGaamOtamaaBaaaleaacaWGdbaabeaakiaacYcacaWGobWaaSbaaSqaaiaadEeaaeqaaOGaaiilaiaad6eadaWgaaWcbaGaamivaaqabaaaaaaa@5D72@
The formula for protein motifs is similar. The two-sided P value for the table is the sum of probabilities of all tables that are at least as extreme. This P value is computed using the algorithm described by Mehta and Patel [25 ]. As with the χ2 test, this P value is used as an additive score.
Publication 2007
DNA Motifs

Most recents protocols related to «DNA Motifs»

We obtained six cancer cell stemness scores calculated from mRNA expression (RNA expression-based stemness scores [RNAss], epigenetically regulated RNA expression-based stemness scores [EREG-EXPss]), DNA methylation signatures (DNA methylation-based stemness scores [DNAss], epigenetically regulated DNA methylation-based stemness scores [EREG-METHss], differentially methylated probes-based stemness scores [DMPss], and enhancer elements/DNA methylation-based stemness scores [ENHss]) from previous studies (25 (link)), and integrated the stemness scores and gene expression data of the samples for correlation analysis.
The role of ITGA8 in regulating cancer cell stemness was evaluated with Gene Oncology (GO) (26 (link)) and Kyoto Encyclopedia of Genes (KEGG) (27 (link)) enrichment analysis of the genes, which highly correlated with ITGA8 using Person’s correlation analysis (r > 0.7, p < 0.05), using “clusterProfiler” package in R.
Publication 2023
Cells DNA Methylation DNA Motifs Enhancer Elements, Genetic EREG protein, human Gene Expression Genes Malignant Neoplasms Neoplasms RNA, Messenger Transcription, Genetic
The DNA methylation data and corresponding clinical data of TGCT patients were obtained from the Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/) database by using the R TCGAbiolinks package14 (link). All DNA methylation data were generated from the Illumina Infinium Human Methylation 450 platform and the levels of DNA methylation were expressed as β values, and calculated as M/(M + U + 100). M and U represent the signal from methylated beads and unmethylated beads at the target CpG sites, respectively. The methylomic data from patients with complete clinicopathological information were selected. The most recent clinicopathological and follow-up information was obtained from the TCGA database on 6 January 2023, clinical information and methylation data of a total of 128 TGCT samples were downloaded and analyzed in this study, and the samples were randomly classified into training cohort (89 samples) and validation cohort (39 samples) at a ratio of 7:3. Prognostic DNA methylation signature was identified based on the training cohort data, and the evaluation of the predictive ability was performed on the basis of the validation cohort data. Progression-free survival was specified as the primary clinical endpoint, referring to the time period between the date of diagnosis and the date when a new event associated with the cancer—such as progression, local recurrence, distant metastases or death—occurred.
Publication 2023
Diagnosis Disease Progression DNA Methylation DNA Motifs Genome Homo sapiens Malignant Neoplasms Methylation Neoplasm Metastasis Patients Recurrence Testicular Germ Cell Tumor
We searched for the short tandem repeat (STR)-signature of the employed cell product within isolated baboon DNA. Three SUR-sampled pieces of lung, liver, and spleen tissue per animal were obtained prior to isolation of genomic DNA using column-based techniques (Qiagen) and pooling of equal amounts DNA per animal. The same procedure was performed for frozen arterial blood pellets obtained during, and 24 or 72 hours after cell therapy and the administered cell product. We then ran 100 ng genomic DNA per animal and sample using a highly sensitive, 17 loci-extended forensic STR analysis kit (AmpFLSTR NGM SElect) on an Applied Biosystems 3500 forensic genetic analyzer. Data was analyzed utilizing the GeneMapper ID-X software package (all from Thermo Fisher). The STR-pattern of the employed cell product, as well as baboon-specific STR were identified and separated from the signatures of other human DNA (laboratory, medical and veterinarian staff) contaminating the samples.
Publication 2023
Animals Arteries BLOOD Cell Therapy DNA Motifs Freezing Genome Homo sapiens isolation Liver Lung Papio Pellets, Drug Short Tandem Repeat Spleen Tandem Repeat Sequences Tissues Veterinarian

Protocol full text hidden due to copyright restrictions

Open the protocol to access the free full text link

Publication 2023
Binding Sites Biopharmaceuticals Cells DNA Insertion Elements DNA Motifs Gene Expression HTLV-I Infections Human T-lymphotropic virus 1 Nucleic Acids Physical Examination Pokeweed Mitogens Radionuclide Imaging Sequence Insertion Transcription, Genetic Transcription Factor Virus
Motif analysis was mainly based on the chromVAR (38 (link)) R package. In brief, we run the AddMotifs function to add the DNA sequence motif information required for motif analyses. Then, we could calculate a per-cell motif activity score by running chromVAR and identify differential activity scores between cell types. Motif activity scores were normalized by z-scores, and the differential activity scores between cell types were replaced with “avg_diff.” TF footprinting was gathered by Footprint function and plotted by PlotFootprint function.
Publication 2023
Cells DNA Motifs DNA Sequence

Top products related to «DNA Motifs»

Sourced in United States
The ForenSeq™ DNA Signature Prep Kit is a laboratory product designed for sample preparation prior to DNA sequencing. It enables the simultaneous amplification of multiple genetic markers for forensic DNA profiling.
Sourced in United States
The MiSeq FGx is a benchtop sequencing system designed for forensic and human identification applications. It utilizes Illumina's proprietary sequencing-by-synthesis technology to generate high-quality sequencing data. The system is capable of analyzing a variety of sample types and is suitable for use in accredited forensic laboratories.
Sourced in United States, China, Germany
The LightShift Chemiluminescent EMSA Kit is a laboratory tool designed to detect and analyze protein-DNA interactions. It uses chemiluminescent detection to visualize and quantify the binding of proteins to specific DNA sequences.
Sourced in United States, China, Germany, United Kingdom, Canada, Switzerland, Sweden, Japan, Australia, France, India, Hong Kong, Spain, Cameroon, Austria, Denmark, Italy, Singapore, Brazil, Finland, Norway, Netherlands, Belgium, Israel
The HiSeq 2500 is a high-throughput DNA sequencing system designed for a wide range of applications, including whole-genome sequencing, targeted sequencing, and transcriptome analysis. The system utilizes Illumina's proprietary sequencing-by-synthesis technology to generate high-quality sequencing data with speed and accuracy.
The MiSeq FGx Reagent Kit is a sequencing reagent designed for use with Illumina's MiSeq FGx forensic genomics system. It provides the necessary reagents and consumables required to perform DNA sequencing on the MiSeq FGx platform.
Sourced in United States, Germany, China, Canada, Italy, United Kingdom, Australia, Netherlands
The EZ DNA Methylation-Gold Kit is a product offered by Zymo Research for bisulfite conversion of DNA samples. It is designed to convert unmethylated cytosine residues to uracil, while leaving methylated cytosines unchanged, enabling the detection and analysis of DNA methylation patterns.
Sourced in United States, Lithuania, United Kingdom, Germany, India
The GeneJET Genomic DNA Purification Kit is a lab equipment product designed for the rapid and efficient extraction of high-quality genomic DNA from a variety of sample types. The kit uses a simple and reliable spin-column-based method to isolate DNA, which can then be used in downstream applications such as PCR, sequencing, and other molecular biology procedures.
The PyroMark Q24 2.0.6 Software is a software package designed for the analysis and interpretation of pyrosequencing data generated by the PyroMark Q24 system. It provides tools for sequence analysis, quality control, and data management.
Sourced in United States
LentiCas9-Blast is a lentiviral vector that expresses the Cas9 endonuclease from Streptococcus pyogenes and a blasticidin resistance marker. It is designed for the delivery and expression of Cas9 in target cells.

More about "DNA Motifs"