> Phenomena > Natural Phenomenon or Process > Natural Selection

Natural Selection

Natural Selection is a fundamental process in evolutionary biology, in which organisms with favorable traits are more likely to survive and reproduce, passing on their genetic information to future generations.
This process drives the adaptation of species to their environment, leading to the emergence of new and diverse life forms over time.
Researchers utilize various methods, including computational modeling and experiments, to study the mechanisms and dynamics of natural selection, providing insights into the evolutionary history and future of living organisms.
PubCompare.ai's cutting-edge AI-powered tools can assist in this research by optimizing experimental protocols, comparing findings from literature, preprints, and patents, and identifying the most effective approaches to advancing the field of natural selection.
Explore the frontiers of this essential biological principle with the innovative technology of PubCompare.ai.

Most cited protocols related to «Natural Selection»

Improving RNA-seq Data Analysis with GLM-based RUVg

Although GLM are a natural choice for count data and have been successfully applied to address a broad range of questions in RNA-seq ^{32 (link),33 (link)}, a simpler alternative is to consider a linear model (LM) for some suitable transformation of the read counts (e.g., logarithmic transformation). Such an LM-based version of RUVg reduces to RUV-2 (refs. 19 (link),20 ). Additionally, using a linear model allows approaches such as RUV-4 and RUV-inv (ref. 20 ).
Supplementary Figures 19 and 20 show that LM-based RUVg on log counts does not perform as well as our proposed GLM-based RUVg. In particular, although LM-based RUVg seems effective at removing the unwanted variation (cf. uniform distribution of p-values in Supplementary Fig. 19), it does not yield enough power to detect any DE genes, neither when using a standard t-test nor when using an empirical Bayes moderated t-test (limma ³⁴).

Risso D., Ngai J., Speed T.P, & Dudoit S. (2014). Normalization of RNA-seq data using factor analysis of control genes or samples. Nature biotechnology, 32(9), 896-902.

Publication 2014

Genes Natural Selection RNA-Seq

Bayesian Hierarchical Model for Methylation

To characterize the data, we propose the following Bayesian hierarchical model, based on the beta-binomial distribution. Notation for our model is as follows: at the i-th CpG site, j-th group and k-th replicate, is the number of reads that show methylation, is the total number of reads that cover this position and is the underlying ‘true’ methylation proportion. Since the process of sequencing involves the random sampling of two kinds of reads—methylated or unmethylated, will follow a binomial distribution:

Since the true methylation proportions among replicates can be anywhere between 0 and 1, we assume that the proportions for each CpG site within each group of replicates follow a beta distribution. The beta distribution has long been a natural choice to model binomial proportions as it is a conjugate distribution of the binomial distribution and is the most flexible distribution with a support interval of [0,1].

Here the beta distribution is parameterized by mean (denoted by ) and dispersion (denoted by ). Compared with the traditional parameterization of the Beta ( ) distribution, the parameters have the following relationship:

In this hierarchical model, the biological variation among replicates is captured by the beta distribution and the variation due to the random sampling of DNA segments during sequencing is captured by the binomial distribution. The dispersion parameter captures the variation of a CpG site’s methylation proportion relative to the group mean. We allow each CpG site within a single condition (e.g. within cases, or controls) to have its own dispersion. This is a flexible assumption because it allows either different or common dispersions for both conditions; however, our software also includes an option to assume a common dispersion for cases and controls.
To combine information across all CpG sites, based on the observed distribution of dispersion from a publicly available RRBS dataset on mouse embryogenesis (21 (link)), we assumed the following prior on :

where and are mean and variance parameters that can be estimated from the data. For each CpG site in this dataset, we applied a method of moments (MOM) estimator to estimate the dispersion parameters. As shown in Figure 1, the genome-wide distribution of logarithm dispersion parameter estimates is approximately Gaussian with mean = –3.39 and SD = 1.08, suggesting that the dispersion parameters can be well-described by a log-normal distribution. However, simulations using dispersions from different distributions also show that our proposed method is robust to violations of this log-normal assumption (Supplementary Figure S1).
Figure 1.

Histogram of the logarithm of estimated CpG-specific dispersion (, estimated by MOM) from mouse embryogenesis data (21) for one chromosome. The solid line is the theoretical density curve for a normal distribution with parameters estimated from This demonstrates that can be approximately modeled as a log-normal distribution.

Feng H., Conneely K.N, & Wu H. (2014). A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Research, 42(8), e69.

Publication 2014

Biopharmaceuticals Chromosomes DNA Replication Embryonic Development Genome Methylation Mice, Laboratory Natural Selection

Lipid Bilayer Simulations: Ensemble, Pressure, and Temperature Protocols

All simulations were performed
in the isothermal–isobaric ensemble, NPT,
at a pressure of 1 atm. The pressure was held constant by using the
Parrinello–Rahman barostat⁷⁷ with
a coupling constant of 10.0 ps with an isothermal compressibility
of 4.5 × 10^–5 bar^–1. For
the bulk liquids an isotropic pressure coupling was used and for the
bilayer simulations a semi-isotropic pressure coupling scheme was
used. The temperature was kept constant by the Nosé–Hoover
thermostat^{78 ,79 (link)} with a coupling constant of 0.5 ps. The
lipid bilayer and water were coupled separately to the thermostat.
Long-range electrostatic interactions were treated by a particle-mesh
Ewald scheme^{80 ,81} with a real-space cutoff at 1.4
nm with a Fourier spacing of 0.10 nm and a fourth-order interpolation
to the Ewald mesh. Single-atom charge groups were used. van der Waals
interactions were truncated at 1.5 nm and treated with a switch function
from 1.4 nm. Long-range corrections for the potential and pressure
were added.⁵¹ The inclusion of long-range
corrections should eliminate the LJ cutoff dependency in the simulations.
Due to the fact that lipid bilayers are inhomogeneous systems the
method introduced by Lagüe et al.⁸² to add long-range corrections could be applied instead. Periodic
boundary conditions were imposed in every dimension. A time step of
2 fs was used with a Leap-Frog integrator. The LINCS algorithm⁸³ was used to freeze all covalent bonds in the
lipid, and the analytical SETTLE⁸⁴ method
was used to hold the bonds and angle in water constant. The TIP3P
water model⁸⁵ was the water model of choice.
The choice of water model can be explained by the fact that TIP3P
is the default water model in major FFs such as AMBER and CHARMM and
since one of the aims of the work presented here was to create a lipid
FF compatible with AMBER this was a natural choice. Further, earlier
work of Högberg et al.^{31 (link)} has shown
that there is flexibility in the choice of water model for AA simulations
of lipid bilayers. Atomic coordinates were saved every 1 ps and the
neighbor list was updated every 10th step.
Bulk liquids were
simulated with a simulation box consisting of 128 molecules for the
larger alkanes and 256 for the smaller alkanes (hexane and heptane)
at a temperature of 298.15 K. The lipid bilayer systems were prepared
using the CHARMM-GUI^{86 (link),87 (link)} with 128 lipids in total, 64
in each leaflet. In order to achieve proper hydration, 30 TIP3P water
molecules were added per lipid. Three different lipid types were simulated,
DLPC (12:0/12:0), DMPC (14:0/14:0), and DPPC (16:0/16:0). These system
were investigated under a range of temperatures; see Table 1 for an overview of all simulations performed. All
lipid bilayer systems were equilibrated for 40 ns before production
runs were initiated which lasted for 300–500 ns. All MD simulations
were performed with the Gromacs⁸⁸ software
package (versions 4.5.3 and 4.5.4). All analysis were made with the
analysis tools that come with the MDynaMix software package.⁸⁹ System snapshots were rendered and analyzed
with VMD.⁹⁰ Neutron scattering form factors
were computed with the SIMtoEXP software.^{91 (link)}The calculations of free energies of solvation in
water and cyclohexane
were performed by using thermodynamic integration over 35 λ
values in the range between 0 and 1. A soft core potential (SCP) was
used to avoid singularities when the solute is almost decoupled from
the solvent. The α-parameters used for the SCP and the simulation
workflow were set following the methodology described by Sapay and
Tieleman.^{92 (link)} The amino acid analogues were
solvated with 512 and 1536 molecules of cyclohexane and water, respectively.

Jämbeck J.P, & Lyubartsev A.P. (2012). Derivation and Systematic Validation of a Refined All-Atom Force Field for Phosphatidylcholine Lipids. The Journal of Physical Chemistry. B, 116(10), 3164-3179.

Publication 2012

Alkanes Amber Amino Acids ARID1A protein, human Cyclohexane Dietary Fiber Dimyristoylphosphatidylcholine Electrostatics Freezing Heptane Lipid Bilayers Lipids Maritally Unattached n-hexane Natural Selection Pressure Rana Solvents

Prioritizing Candidate Genes for Genotyping

A list of candidate genes for a particular disease can be gleaned from published association studies, gene expression studies, disease pathways and the specific interests of an investigator. Such lists may be very large, so we first filter the list against GWAS results as shown in Figure 1. We use SNPs that have genotype data in dbSNP as our source of SNPs in and near a gene (for a user-specified flanking region around the gene). We keep a gene if it has at least one small P-value SNP (less than or equal to a user-specified threshold, T1) in the GWAS. We also keep genes that were not adequately represented by SNPs in the GWAS panel. The percent of common SNPs (within a gene and flanking region) in high LD (pairwise r² ≥ a user-specified threshold) with any GWAS SNP (including GWAS SNPs outside the gene and flanking region) is calculated and genes with coverage less than a user-specified cutoff A% are retained. Genes that do not have SNPs with small P-value but do have sufficient coverage by GWAS SNPs are excluded from further analysis.
Figure 1.

GenePipe: decision tree to prioritize SNPs for candidate genes based on GWAS results, SNP functional prediction characteristics and pair-wise LD. The six-sided boxes represent decision points and rectangles represent action steps or end points.

For the candidate genes that pass the above screen we extract SNPs from dbSNP and process this list as shown in Figure 1. If a SNP was examined in the GWAS and had a P-value less than the user-specified threshold T1 it is retained. If a SNP was not in the GWAS but was in high LD with a GWAS SNP that had a P-value larger than T1 it is eliminated because we reason that it was adequately evaluated by the GWAS and found to have no association with disease. We then score all retained SNPs for functional significance and apply different minor allele frequency (MAF) filters depending on the functional category of the SNP. These user-specified MAF filters are provided because functionally important SNPs often have lower MAF due to natural selection (6 (link)) and we wish to provide extra flexibility to retain functional SNPs below the MAF filter being applied to SNPs without such function. The details of the functional predictions used in this and other pipelines are provided in a separate section below.
In the final processing step we select LD tag SNPs. Because there are certain advantages to having functional and small P-value SNPs directly assessed by the genotyping panel (instead of being indirectly assessed via LD) we provide for the assignment of user-specified weights for different categories of functional SNPs and small P-value SNPs. If weights are assigned the null value of 1, then tag SNPs are selected simply by rank order, so that SNPs that are in high LD with the largest number of SNPs are selected first and SNPs that tag only themselves (singleton tags) are selected last. If a functional SNP has a weight applied, then the weight act as multiplier of the actual number of SNPs tagged so that it is more likely to be selected early. For example, a functional SNP with a weight of two that is in LD with four SNPs (including itself) would have a weighted tag value of 2 × 4 = 8. Investigators may modify a variety of values (e.g. P-value threshold T1, LD threshold, or weights) to adjust selected SNP counts to fit their genotyping panel size and budget. We provide two options for additional SNP reduction that we think are useful: (i) Each SNP must be in LD with a user-specified minimum number of common SNPs (after multiplied by the user-assigned weights). For example, this option can be used to eliminate singleton SNPs. (ii) A user can also specify the maximum number of SNPs that are allowed for any one gene using a method which is similar to selecting the best N SNPs to optimize power (7 (link)). To insure that each gene has some coverage, we also provide a user-specified minimum number of best SNPs (in terms of number of SNPs captured at a specific LD threshold) that must be selected for each gene even if they do not meet the previous criterion for tag SNPs.

Xu Z, & Taylor J.A. (2009). SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Research, 37(Web Server issue), W600-W605.

Publication 2009

Gene Expression Genes Genes, vif Genome-Wide Association Study Natural Selection Single Nucleotide Polymorphism

Propensity Score Matching Considerations

Matching on the propensity score was not dealt with in depth by any of the three papers. Zanutto simply stated that “it is less clear in this case [matching] how to incorporate the survey weights from a complex survey design” (page 69),⁵ while Ridgeway et al. did not consider matching on the propensity score. When using propensity score matching, DuGoff et al. suggested fitting a survey-weighted regression model in the propensity score matched sample. In their simulations, the continuous outcome variable was regressed on an indicator variable denoting treatment status and on the single baseline covariate, resulting in a conditional effect estimate within the matched sample. While this approach may be suitable when outcomes are continuous, such an approach is likely to be problematic when outcomes are binary or time-to-event in nature. The reason for this is that propensity score methods result in marginal estimates of effect, rather than conditional estimates of effect.⁸ When outcomes are continuous, a linear treatment effect is collapsible: the conditional and marginal estimates coincide. When the outcome is binary, regression adjustment in the propensity score matched sample will typically result in an estimate of the odds ratio. The odds ratio (like the hazard ratio) is not collapsible; thus the marginal and conditional estimates will not coincide.⁹ Prior research has demonstrated that propensity score matching results in biased estimation of both conditional and marginal odds ratios.^{10 (link),11 (link)} Thus, the method proposed by DuGoff for use with propensity score matching may not perform well when outcomes are binary.
Prior to presenting alternate estimators, we briefly introduce the potential outcomes framework.¹² Let Y(1) and Y(0) denote the potential outcomes observed under the active treatment (Z = 1) and the control treatment (Z = 0), respectively. The effect of treatment is defined as Y(1) − Y(0). The average treatment effect (ATE) is defined as

E [Y (1) - Y (0)]

. The average treatment effect in the treated (ATT) is defined as

E [Y (1) - Y (0) | Z = 1]

. Imai et al. distinguish between two different estimands: the sample average treatment effect (SATE) and the population average treatment effect (PATE).¹³ The former is the effect of treatment in the analytic sample, while the latter refers to the effect of treatment in the population from which the sample was drawn. The PATE is defined as

\frac{1}{N_{population}} \sum_{i = 1}^{N_{population}} (Y_{i} (1) - Y_{i} (0))

, while the SATE is defined as

\frac{1}{N_{sample}} \sum_{i = 1}^{N_{sample}} (Y_{i} (1) - Y_{i} (0))

, where

N_{population}

and

N_{sample}

denote the number of subjects in the population and in the sample, respectively. We would argue that the population estimand is usually of greater interest than the sample estimand, as researchers typically want to make inferences about the larger population from which the sample was drawn. Typically, one uses a sample estimate to make inferences about a population parameter. In doing so, one must take appropriate analytic steps to ascertain that the estimate pertains to the target population. For this reason, all of the methods that we consider for estimating the effect of treatment in a matched sample will employ the survey weights.
There are a large number of possible algorithms for matching treated and control subjects on the propensity score.^{14 (link)} Popular approaches include nearest neighbour matching (NNM) and NNM within specified calipers of the propensity score.^{15 ,16 (link)} NNM selects a treated subject (typically at random, although one can sequentially select the treated subjects from highest to lowest propensity score) and then selects the control subject whose propensity score is closest to that of the treated subject. The most frequent approach is to use matching without replacement, in which each control is selected for matching to at most one treated subject. NNM within specified calipers of the propensity score is a refinement of NNM, in which a match is considered permissible only if the difference between the treated and control subjects’ propensity scores is below a pre-specified maximal difference (the caliper width). Optimal choice of calipers was studied elsewhere.^{17 (link)} An alternative to these approaches is optimal matching, in which matched pairs are formed so as to minimize the average within-pair difference in the propensity score.¹⁸When using propensity score matching, one is estimating the ATT. For each treated subject, the missing potential outcome under the control intervention is imputed by the observed outcome for the control subject to whom the treated subject was matched. By using the above estimate of the ATT, rather than fit an outcomes regression model in the matched sample, one can simply obtain a marginal estimate of the outcome in treated subjects and a marginal estimate of the outcome in control subjects. These are estimated as the mean outcome in treated and control subjects, respectively. The ATT can then be estimated as the difference in these two quantities.^{19 (link)}As the research interest usually focusses on the population average treatment effect in the treated (PATT), rather than its sample analogue (SATT), the mean potential outcome under the active treatment can be estimated as

\hat{Y} (1) = \frac{1}{\sum_{i = 1}^{N_{match}} w_{1, i}} \sum_{i = 1}^{N_{match}} w_{1, i} Y_{1, i}

, where

Y_{1, i}

denotes the observed outcome for the ith treated subject in the matched sample,

w_{1, i}

denotes the sampling weight associated with this subject, and N_match is the number of matched pairs in the propensity score matched sample. Similarly, the mean potential outcome under the control condition can be estimated as

\hat{Y} (0) = \frac{1}{\sum_{i = 1}^{N_{match}} w_{0, i}} \sum_{i = 1}^{N_{match}} w_{0, i} Y_{0, i}

. The PATT for both continuous and binary outcomes can then be estimated as

\hat{Y} (1) - \hat{Y} (0)

. Failure to include the sampling weights in estimating the ATT would result in an estimate of the SATT, rather than the PATT.
An unaddressed question is which weights should be used for the matched control subjects. As noted above, Zanutto suggested that “it is less clear in this case [matching] how to incorporate the survey weights from a complex survey design” (page 69).⁵ In propensity score matching, one is attempting to create a control group that resembles the treated group. However, when using weighted survey data, there are two possible populations to which one can standardize the matched control subjects: (i) the population of control subjects that resemble the treated subjects; (ii) the population of treated subjects. The natural choice of weight to use for each control subject would be to use each control subject’s original sampling weight. In using these weights, one is weighting the control subjects to reflect the population of control subjects that resemble the population of treated subjects. An alternative choice would be to weight the matched control subjects using the population of treated subjects as the reference population. To do so, one would have each matched control subject inherit the weight of the treated subject to whom they were matched. Treated and control subjects with the same propensity score have observed baseline covariates that come from the same multivariable distribution.¹ This suggests that if control subjects inherit the weight of the treated subject to whom they were matched, then the distribution of baseline covariates in the weighted sample will be similar between treated and control subjects, using the population of treated subjects as the reference population. In this paper, we use the term ‘natural weight’ when each matched control subject retains its own survey sampling weight, and the term ‘inherited weight’ when each matched control subject inherits the weight of the treated subject to whom it was matched.

Austin P.C., Jembere N, & Chiu M. (2016). Propensity score matching and complex surveys. Statistical Methods in Medical Research, 27(4), 1240-1257.

Publication 2016

Natural Selection Specimen Handling Target Population Training Programs

Most recents protocols related to «Natural Selection»

Positive Selection Analysis of TPS-b Genes

To understand the molecular evolution at the amino acid level and the intensity of natural selection acting on metabolism in a specific clade, we used a tree based on codon alignment produced by the maximum-likelihood method using the software EasyCodeML^{109 (link)}. We retrieved Coding Sequencing (CDS) sequences from TPS-b genes from A. thaliana, E. grandis, P. cattleyanum, V. vinifera and P. trichocarpa species in Phytozome v11 (http://phytozome.jgi.doe.gov/; last accessed November 2020), to use in positive selection analysis. The dataset included 76 sequences and 389 amino acids from five species. We performed statistical analysis using the CodeML program in PAML version 4.9 software using the site, branch, and branch-site models^{110 (link)}, implemented in EasyCodeML^{109 (link)}.
Parameter estimates (ω) and likelihood scores^{111 (link)} were calculated for the three pairs of models. These were M0 (one-ratio, assuming a constant ω ratio for all coding sites) vs. M3 (discrete, allowed for three discrete classes of ω within the gene), M1a (nearly neutral, allowed for two classes of ω sites: negative sites with ω0 < 1 estimated from our data and neutral sites with ω1 = 1) vs. M2a (positive selection, added a third class with ω2 possibly > 1 estimated from our data), and M7 (beta, a null model in which ω was assumed to be beta-distributed among sites) vs. M8 (beta and ω, an alternative selection model that allowed an extra category of positively selected sites)^{112 (link)}.
A series of branch models and branch site models were tested: the one-ratio model for all lineages and the two-ratio model, where the original enzyme functional evolution occurred. The branch-site model assumes that the branches in the phylogeny are divided into the foreground (the one of interest for which positive selection is expected) and background (those not expected to exhibit positive selection).
Likelihood ratio tests (LRT) were conducted to determine which model measured the statistical significance of the data. The twice the log likelihood difference between each pair of models (2ΔL) follows a chi-square distribution with the number of degrees of freedom equal to the difference in the number of free parameters, resulting in a p-value for this^{113 (link)}. A significantly higher likelihood of the alternative model compared to the null model suggests positive selection. Positive sites with high posterior probabilities (> 0.95) were obtained using empirical Bayes analysis. If ω > 1, then there is a positive selection on some branches or sites, but the positive selection sites may occur in very short episodes or on only a few sites during the evolution of duplicated genes; ω < 1 suggests a purifying selection (selective constraints), and ω = 1 indicates neutral evolution. Finally, naive empirical Bayes (NEB) approaches were used to calculate the posterior probabilities that a site comes from the site class with ω > 1^{112 (link)}. The selected sites and images of protein topology were predicted using Protter^{114 (link)}.

Canal D., Escudero F.L., Mendes L.A., da Silva Ferreira M.F, & Turchetto-Zolet A.C. (2023). Genome-wide identification, expression profile and evolutionary relationships of TPS genes in the neotropical fruit tree species Psidium cattleyanum. Scientific Reports, 13, 3930.

Full text: Click here

Publication 2023

Amino Acids Biological Evolution Codon Enzymes Evolution, Molecular Evolution, Neutral Exons Genes Metabolism Natural Selection Proteins Trees

Genomic Analysis of PfGARP in Malaria Parasites

Sequence analysis included 80 nucleotide sequences of PfGARP from Thai isolates, one clinical isolate from Guinea (isolate MDCU32) and 18 publicly available complete gene sequences whose isolate names, country of origins and their GenBank accession numbers are as follows: 3D7 (Netherlands from West Africa, AL844501), CD01 (Congo, LR129686), Dd2 (Indochina, LR131290), FC27 (Papua New Guinea, J03998), FCC1/HN (Hainan in China, AF251290), GA01 (Gambia, LR131386), GB4 (Ghana, LR131402), KH1 (Cambodia, LR131418), KH2 (Cambodia, LR131306), HB3 (Honduras, LR131338), IGH-CR14 (India, GG6656811), IT (Brazil, LR131322), KE01 (Kenya, LR131354), ML01 (Mali, LR131481), SD01 (Sudan, LR131466), SN01 (Senegal, LR131434), TG01 (Togo, LR131450), and UGT5.1 (Vietnam, KE124372). Of these, the 3D7, FC27and FCC1/HN sequences were determined by Sanger dideoxy-chain termination method whereas the remaining isolates were assembled sequences from next-generation sequencing platforms (Supplemental Table S1). Sequence alignment was performed by using the CLUSTAL_X program, taken into account appropriate codon match in the coding region by manual adjustment to maintain the reading frame. The sequence from the FC27 strain was used as a reference^{6 (link)}. Searching for nucleotide repeats was performed by using the Tandem Repeats Finder version 4.0 program with the default option. Nucleotide diversity (π), the rate of synonymous substitutions per synonymous site (d_S) and the rate of nonsynonymous substitutions per nonsynonymous site (d_N) were determined from the average values of sequence differences in all pairwise comparison of each taxon and the standard error was computed from 1000 bootstrap pseudoreplicates implemented in the MEGA 6.0 program^{41 (link)}. Haplotype diversity and its sampling variance were computed by taking into account the presence of gaps in the aligned sequences using the DnaSP version 5.10 program^{42 (link)}. Natural selection on codon substitution was determined by using fast unconstrained Bayesian approximation (FUBAR) method in the Datamonkey Web-Server^{43 (link),44 (link)}. Neighbor-joining phylogenetic tree based on nucleotide sequences was constructed by using maximum composite likelihood parameter whereas maximum likelihood tree was built using Tamura-Nei model with the rate variation model allowed for some sites to be evolutionarily invariable. The Arlequin 3.5.2.2 software was deployed to determine genetic differentiation between populations, the fixation index (F_ST), using analysis of molecular variance approach (AMOVA) akin to the Weir and Cockerham’s method but taken into account the number of mutations between haplotypes^{45 (link)}. One hundred permutations were deployed to determine the significance levels of the fixation indices. Prediction of linear B cell epitopes in PfGARP was performed by using a sequence similarity to known experimentally verified epitopes from the Immune Epitope DataBase (IEDB) implemented in the BepiBlast Web Server^{11 (link)}. Furthermore, linear B cell epitopes were also predicted based on protein language models implemented in BepiPred-3.0^{12 (link)}. Potential HLA-class II-binding peptides were analyzed by using the IEDB recommended 2.22 algorithm with a default 12–18 amino acid residues option. The predicted HLA-class II-binding peptides were predicted based on the percentile rank < 10 and the IC₅₀ threshold for HLA binding affinity ≤ 1000 nM^{14 (link)}. The analysis mainly concerned the common HLA class II haplotypes among Thai populations with allele frequency > 0.1^{13 (link)}.

Rojrung R., Kuamsab N., Putaporntip C, & Jongwutiwes S. (2023). Analysis of sequence diversity in Plasmodium falciparum glutamic acid-rich protein (PfGARP), an asexual blood stage vaccine candidate. Scientific Reports, 13, 3951.

Full text: Click here

Publication 2023

Amino Acids Codon Epitopes Epitopes, B-Lymphocyte Genes Genetic Drift Haplotypes Hereditary Nonpolyposis Colorectal Cancer Type 1 Mutation Natural Selection Nucleotides Peptides Population Group Proteins Reading Frames Sequence Alignment Sequence Analysis Strains Tandem Repeat Sequences Thai Trees

Bayesian Latent Hierarchical Compositional MANOVA for Community Analysis

We used a Bayesian latent hierarchical compositional manova with a multinomial observation model to determine how final proportional cover was affected by treatments. A manova is the obvious way to examine patterns in multiple species, and a compositional approach is needed because we have relative abundance data, for which the standard vector addition and scalar multiplication operations used in manova are not appropriate. Pawlowsky-Glahn, Egozcue & Tolosana-Delgado (2015) is a good introduction to compositional data analysis. A multinomial observation model is the obvious choice for data derived from point counts. We analyzed the pre-treatment data from the final photographic sampling date, and included only A. aurita growing directly on panels, bare panel and other taxa contributing at least 20 points to the point count data for at least one panel: Botrylloides spp., Bugula spp. and Molgula tubifera. Together, these five taxa accounted for 90–100 points out of 100 on every panel in the pre-treatment point count data from the final week, and no other taxon contributed more than seven points on any panel. Compositional data analysis is subcompositionally coherent (Egozcue & Pawlowsky-Glahn, 2011 , Section 2.3.2), which means that results for the subcomposition we studied do not depend on excluded taxa. We therefore analyzed final subcompositions of the form

c = (c_{1}, c_{2}, c_{3}, c_{4}, c_{5})

, where parts one to five represent A. aurita on panel, bare panel, Botrylloides spp., Bugula spp. and M. tubifera, respectively. We represented these final subcompositions in isometric logratio (ilr) coordinates (Egozcue et al., 2003 (link)) using the contrast matrix described in the supporting information, Section S1.
Let

y_{j k l}

be the vector of point count data for the single panel from depth

j

, treatment

k

, block

l

, and let

n_{j k l}

be the total number of points counted in this observation (between 90 and 100). We modelled these data using a Bayesian latent hierarchical compositional manova with a multivariate observation model:

y_{j k l} \sim multinomial (n_{j k l}, ρ_{j k l}), ρ_{j k l} = {ilr}^{- 1} (μ + α_{j} + β_{k} + γ_{j k} + δ_{l} + ε_{j k l}), δ_{l} \sim N (0, Z), ε_{j k l} \sim N (0, Σ) .

Here,

ρ_{j k l}

is the vector of expected relative abundances for the panel from depth

j

, treatment

k

, block

l

. The isometric log transformation of

ρ_{j k l}

is a vector in

R^{4}

, formed from the sum of an overall mean vector

μ

, the effect

α_{j}

of depth

j

, the effect

β_{k}

of treatment

k

, the effect

γ_{j k}

of the interaction between depth

j

and treatment

k

, the effect

δ_{l}

of block

l

and the effect

ε_{j k l}

of the panel from depth

j

, treatment

k

, block

l

. The block and panel effects are modelled hierarchically, drawn from 4-dimensional multivariate normal distributions with mean vector

0

and covariance matrices

Z

and Σ respectively (independent of each other and of the explanatory variables). Note that

ρ_{j k l}

can be written in the simplex

S^{4}

ρ_{j k l} = μ^{'} \oplus α_{j}^{'} \oplus β_{k}^{'} \oplus γ_{j k} \oplus δ_{l}^{'} \oplus ε_{j k l}^{'},

where the primes indicate

{ilr}^{- 1}

transformations of the corresponding parameters in

R^{4}

⁴, and

\oplus

denotes the perturbation operator (Aitchison, 1986 , p. 42). We coded treatment effects as described in the supporting information, Section S2. Similar models have been used for effects of vegetation disturbance and predator manipulation on terrestrial arthropod communities (Billheimer, Guttorp & Fagan, 2001 (link)), effects of depth on community composition at our study site (Chong & Spencer, 2018 (link)), and effects of cyclones and bleaching on coral reef composition (Vercelloni et al., 2020 (link)).
We fitted the model using Bayesian estimation in cmdstan 2.23.0 (Carpenter et al., 2017 (link)), which implements a dynamic Hamiltonian Monte Carlo algorithm (Hoffman & Gelman, 2014 ). Details of priors are given in the supporting information, Section S3. Details of fitting, checking and calibration are given in the supporting information, Section S4.
We compared the ability to predict new observations between the full model and simpler models (without the interaction between depth and treatment, without depth, or without treatment) using leave-one-cluster-out cross-validation. The natural choice for “new observations” is a new block of panels, because a replication of the experiment would involve a new set of blocks, rather than new panels within existing blocks or new observations on existing panels. We therefore evaluated models based on marginal rather than conditional likelihoods with respect to block and panel effects (Merkle, Furr & Rabe-Hesketh, 2019 (link)). Details are in the supporting information, Section S5.
Our primary interest is in responses of A. aurita, bare panel and potential competitors as a whole, rather than variation within the subcomposition of potential competitors. Visualizing

S^{4}

is not easy, so we decomposed treatment effects into two orthogonal components, each of which can be represented in a ternary plot: effects on A. aurita, bare panel and potential competitors as a whole, and effects on the subcomposition of potential competitors (supporting information, Section S6).
We assessed the effects of potential competitors on A. aurita using differences in

logit (A. aurita)

between potential competitor removal (O) and control (C) treatments. Similarly, we assessed the effects of A. aurita on potential competitors using differences in

logit (potential competitors)

between A. aurita removal (A) and control (C) treatments, as described in the supporting information, Section S7.

Boughton J., Hirst A.G., Lucas C.H, & Spencer M. (2023). Negative and positive interspecific interactions involving jellyfish polyps in marine sessile communities. PeerJ, 11, e14846.

Full text: Click here

Publication 2023

Arthropods Cardiac Arrest Cloning Vectors Cyclonic Storms DNA Replication Natural Selection Specimen Handling

Hierarchical Life-History Trait Model

The model of Pichancourt et al. (2014) (link) is primarily based on the life-history theory. According to this theory, biophysical constraints on the allocation of energy between reproduction, growth and self-maintenance are viewed as the primary explanation for why species do not possess arbitrary combinations of life-history traits (LHTs) between organs and life-stages, throughout the life cycle of the organism. These ultimately drive the population growth rate of species to adapt to their environmental conditions. To reflect this principle, the model is structured according to a multiple-tier approach to LHTs (Fig. 2).
The first tier of LHTs represents the specific vital rates of stage and size throughout the life cycle of a tree species (i.e., seed survival, germination, tree growth, survival and fertility). These traits are constrained by allometric traits that are assumed to be optimally defined by natural selection (the second tier of LHTs, as outlined by the scaling theory of ecology). Finally, the second tier of LHTs is itself constrained by the third tier—the metabolic LHTs—based on the different physiological processes, e.g., photosynthetic carbon assimilation, respiration, V_max, J_max, biomass turnover, water absorption, carbon biomass production (as outlined, e.g., by the metabolic theory of ecology).
For plant species, the theory also predicts that ~50% of the variability of most of the tree LHTs on Earth can ultimately be reduced to three ((van Bodegom, Douma & Verheijen, 2014 (link)): specific leaf area (SLA) (m².kg⁻¹); specific wood density (SWD) (kg.m⁻³); and seed size (SS) (kg)). Under this realistic assumption, species with similar values of these three LHTs share other similar 1–2- and 3-tier LHT and life-cycle strategies. Based on this organization, mathematical models can be developed to create a wide range of unique tree species life cycle strategies (see summary of models in Pichancourt et al. (2014) (link) and in van Bodegom, Douma & Verheijen (2014) (link)). In this article, computational capabilities limited our exploration to eight species, representing all the combinations between a range of extreme values of LHTs found in the literature (see Pichancourt et al., 2014 (link)): SLA (2.5–20 m².kg⁻¹); SWD (400–1,000 kg.m⁻³); and SS (10⁻⁷–10⁻³ kg per seed).

, & Pichancourt J.B. (2023). Some fundamental elements for studying social-ecological co-existence in forest common pool resources. PeerJ, 11, e14731.

Full text: Click here

Publication 2023

Carbon Cell Respiration Fertility Germination Life History Traits Natural Selection Photosynthesis Physiological Processes Plant Leaves Plants Reproduction Trees

Phase Estimation for Quantum Simulation

The energy of a state can also be extracted by tracking the global phase that it acquires during time propagation. This is given by the phase of the autocorrelation function, which is a discrete time series of the inner product between a propagated state and the initial state. Suppose that we time propagate an eigenstate Ψ_n of

{\hat{H}}_{tot}

with energy E_n. The corresponding autocorrelation signal is

⟨ Ψ_{n} (t) ∣ Ψ_{n} (t = 0) ⟩ = e^{- i E_{n} t}

where the absolute value ∣⟨Ψ(t = 0)∣Ψ(t)⟩∣ should be close to unity, and in this work, we use it as one measure of a simulation’s veracity.
On a quantum computer, this otherwise-unobservable global phase is efficiently extracted using phase estimation. Phase estimation through the use of ancilla qubits (67 ) is one of the fundamental techniques used in diverse applications of quantum computing, and its utility in the context of SO-QFT and first-quantized simulation is well recognized [see, e.g., (5 (link))].
In the IPE approach, even a single ancilla (8 , 68 ) is sufficient to learn this phase; a resource cost saving that will be welcome in the early fault-tolerant regime. The method is summarized on the left of Fig. 10. We conditionally apply N SO-QFT steps U^N(δt) controlled by an ancillary qubit in the ∣+⟩ state. At the Nth step, we measure the ancillary qubit in the ∣+⟩ basis, at which point the state is discarded. We see that for an eigenstate, the global phase information is encoded in the relative phase between the ∣0⟩ and ∣1⟩ state of the ancillary qubit

\begin{aligned} ∣ + ⟩ ∣ Ψ_{n} (0) ⟩ & \Rightarrow \frac{1}{\sqrt{2}} [∣ 0 ⟩ ∣ Ψ_{n} (0) ⟩ + ∣ 1 ⟩ ∣ Ψ_{n} (t) ⟩] \\ = \frac{1}{\sqrt{2}} (∣ 0 ⟩ + e^{- {i E}_{n} t} ∣ 1 ⟩) ∣ Ψ_{n} (0) ⟩ \end{aligned}

The probability of finding the ancillary qubit in state ∣+⟩ fluctuates as the phase. We use this to extract a periodic time signal a(t) where the frequency is proportional to the energy of the simulated wave function

a (t) = \cos^{2} (\frac{E_{n}}{2} t)

Because the number of qubits that we can classically emulate is limited, using the single-ancilla IPE for our demonstrations here is a natural choice. We report the exact evolution of a(t) plotted at regular time points; this is straightforward since we use classically emulated quantum processors. On a real device, because the single-ancilla projection probability is statistical in nature, the time propagation and measurement will have to be repeated multiple times.
If more ancilla qubits are available, then this naturally extends to the standard Fourier phase estimation; for completeness, we include this in the Supplementary Materials, where we also note the use of classical Fourier analysis to extract features if the hardware is limited to a single ancilla.

Chan H.H., Meister R., Jones T., Tew D.P, & Benjamin S.C. (2023). Grid-based methods for chemistry simulations on a quantum computer. Science Advances, 9(9), eabo7484.

Full text: Click here

Publication 2023

Biological Evolution Medical Devices Natural Selection

Top products related to «Natural Selection»

Prism 8 by GraphPad

Sourced in United States, Austria, Canada, Belgium, United Kingdom, Germany, China, Japan, Poland, Israel, Switzerland, New Zealand, Australia, Spain, Sweden

Prism 8 is a data analysis and graphing software developed by GraphPad. It is designed for researchers to visualize, analyze, and present scientific data.

Seqman by DNASTAR

Sourced in United States, Australia

SeqMan is a DNA sequence assembly software tool developed by DNASTAR. It provides a comprehensive set of features for assembling, aligning, and analyzing DNA sequence data from a variety of sources, including Sanger, Next-Generation, and long-read sequencing technologies.

Prism 5 by GraphPad

Sourced in United States, Germany, United Kingdom, Israel, Canada, Austria, Belgium, Poland, Lao People's Democratic Republic, Japan, China, France, Brazil, New Zealand, Switzerland, Sweden, Australia

GraphPad Prism 5 is a data analysis and graphing software. It provides tools for data organization, statistical analysis, and visual representation of results.

Editseq by DNASTAR

Sourced in United States

EditSeq is a DNA sequence analysis software tool that allows users to view, edit, and manipulate DNA sequences. It provides a range of tools for sequence analysis, including sequence alignment, translation, and primer design.

Jxa 8600 superprobe by JEOL

The JXA-8600 superprobe is an electron probe microanalyzer (EPMA) designed for advanced materials analysis. It features a high-performance electron optical system and a variety of advanced detectors for comprehensive characterization of samples. The JXA-8600 superprobe provides quantitative elemental analysis capabilities with high spatial resolution and accuracy.

Prism 6 by GraphPad

Sourced in United States, United Kingdom, Canada, China, Germany, Japan, Belgium, Israel, Lao People's Democratic Republic, Italy, France, Austria, Sweden, Switzerland, Ireland, Finland

Prism 6 is a data analysis and graphing software developed by GraphPad. It provides tools for curve fitting, statistical analysis, and data visualization.

R software by R Project for Statistical Computing

Sourced in United States, Austria, Japan, Belgium, New Zealand, United Kingdom, France

R is a free, open-source software environment for statistical computing and graphics. It provides a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others.

Tween 60 by Merck Group

Sourced in Germany, United States, France, United Kingdom, Switzerland

Tween 60 is a non-ionic surfactant used in various laboratory applications. It is a polyoxyethylene sorbitan monostearate compound.

Sequencing technology by Illumina

Sourced in China, United States

Illumina sequencing technology is a DNA sequencing platform that utilizes a method called sequencing by synthesis. The technology enables the determination of the precise order of nucleotides within a DNA molecule, providing a comprehensive view of genetic information.

Spme device by Merck Group

Sourced in United States

The SPME (Solid Phase Microextraction) device is a sample preparation tool used for the extraction and concentration of analytes from various matrices. It functions by utilizing a fiber coated with a selective sorbent material that can adsorb target compounds from the sample. The SPME device enables efficient sample preparation for subsequent analysis by techniques such as gas chromatography or liquid chromatography.

What is the key mechanism behind natural selection?

The key mechanism of natural selection is the differential survival and reproduction of organisms with favorable traits. Organisms with traits that are better suited to their environment are more likely to survive and reproduce, passing on their genetic information to the next generation. This process drives the adaptation of species over time, leading to the emergence of new and diverse life forms.

How can researchers study the dynamics of natural selection?

Researchers utilize various methods to study the mechanisms and dynamics of natural selection, including computational modeling and experiments. Computational models can simulate the evolutionary process and help identify the key factors that influence the selection and adaptation of species. Experiments, such as observational studies and controlled laboratory tests, can provide empirical data on the effects of different environmental conditions and genetic variations on the survival and reproduction of organisms.

What are some common challenges in natural selection research?

One of the key challenges in natural selection research is the complexity of the underlying processes. Factors like genetic inheritance, environmental conditions, and the interactions between organisms can all influence the dynamics of natural selection. Researchers may also face difficulties in designing experiments that accurately capture the long-term, multi-generational effects of selection pressures. Additionally, the vast diversity of living organisms and their unique adaptations can make it challenging to generalize findings across species.

How can PubCompare.ai assist in natural selection research?

PubCompare.ai's cutting-edge AI-powered tools can assist researchers in natural selection studies by optimizing experimental protocols, comparing findings from literature, preprints, and patents, and identifying the most effective approaches to advancing the field. The platform's AI-driven analysis can help pinpoint critical insights, such as key differences in protocol effectiveness, enabling researchers to choose the best options for reproducibility and accuracy. By screening protocol literature more efficiently and leveraging AI-powered comparisons, PubCompare.ai can help researchers identify the most effective protocols related to natural selection for their specific research goals.

Are there different types or variations of natural selection?

Yes, there are several variations of natural selection that can occur in different contexts. Some common types include: 1. Directional selection: This occurs when a particular trait is favored, leading to a shift in the population towards that trait over time. 2. Stabilizing selection: This occurs when intermediate traits are favored, resulting in a population that remains relatively stable over time. 3. Disruptive selection: This occurs when extreme traits are favored, leading to the emergence of distinct subpopulations within a species. 4. Sexual selection: This is a specific form of natural selection that occurs when certain traits are favored in the context of mate choice and reproduction. Understanding these different variations can provide valuable insights into the evolutionary dynamics of a species and the factors that drive adaptation.

More about "Natural Selection"

evolutionary biology, adaptation, species, survival, reproduction, genetic information, computational modeling, experimentation, Prism 8, SeqMan, GraphPad Prism 5, EditSeq, JXA-8600 superprobe, Prism 6, R software, Tween 60, Illumina sequencing technology, SPME device, PubCompare.ai