Statistics and machine learning toolbox

Manufactured by MathWorks

Sourced in United States

The Statistics and Machine Learning Toolbox provides a comprehensive set of tools for statistical analysis, machine learning, and data mining. It includes functions for data preprocessing, feature selection, model training, and evaluation. The toolbox supports a wide range of statistical and machine learning techniques, including regression, classification, clustering, and dimensionality reduction.

Automatically generated - may contain errors

Lab products found in correlation

52 protocols using statistics and machine learning toolbox

Statistical Analysis of Direction Recognition Index

Check if the same lab product or an alternative is used in the 5 most similar protocols

To determine which changes in the value of the direction recognition index between particular measurement situations should be considered significant, we completed a statistical analysis of the obtained data. For this purpose, a Wilcoxon test (equivalent to the Mann-Whitney U test) was used. The calculations were performed using MATLAB R2017b (version 9.3) with the Statistics and Machine Learning Toolbox (MathWorks Inc., Natick, MA, USA).

Mlynski R, & Kozlowski E. (2019). Localization of Vehicle Back-Up Alarms by Users of Level-Dependent Hearing Protectors under Industrial Noise Conditions Generated at a Forge. International Journal of Environmental Research and Public Health, 16(3), 394.

+ Open protocol

+ Expand

Ensemble of Decision Trees for Stimulus Labeling

Check if the same lab product or an alternative is used in the 5 most similar protocols

We trained an ensemble of 100 decision trees using the fitcensemble function from the Statistics and Machine Learning Toolbox from MathWorks. Decision tree models are simple, highly interpretable, and can be displayed graphically (James et al., 2013 ). Consequently, the decision process of the classifier can be easily interrogated. However, decision tree models have two key disadvantages: (1) mediocre prediction performance (Caruana and Niculescu-Mizil, 2006 ) and (2) high variance due to overfitting (James et al., 2013 ). Both disadvantages can be mitigated by aggregating an ensemble of decision trees. Empirical comparison of classification models shows that ensembles of decision trees outperform other classification algorithms across a variety of problem sets (Caruana and Niculescu-Mizil, 2006 ).
We used a bootstrap aggregation (bag) method for constructing the ensembles. Each tree in the ensemble is trained on a boot-strapped replica of the data—each replica is a random selection of the data with replacement. The predictions from the ensemble model are determined by a majority vote from each individual tree prediction. We trained the ensemble to learn the stimulus labels (TNF, Pam3CSK4, CpG, LPS, and poly(I:C)) from either the entire set of predictors (all 918 metrics, Table S6A) or a subset of predictors termed “signaling codons” (Table S6B).

Adelaja A., Taylor B., Sheu K.M., Liu Y., Luecke S, & Hoffmann A. (2021). Six distinct NFκB signaling codons convey discrete information to distinguish stimuli and enable appropriate macrophage responses. Immunity, 54(5), 916-930.e7.

+ Open protocol

+ Expand

Cognitive Test Reliability and Validity

Check if the same lab product or an alternative is used in the 5 most similar protocols

Within the manuscript, convergent validity, and test-retest reliability for the CGN_ICA test is shown with Pearson’s Correlation. P-values for Pearson’s correlation are based on a Student’s t distribution. Calculations are done using MathWorks’ statistics and machine learning toolbox (https://www.mathworks.com/help/stats/index.html).
To measure dependency of the cognitive tests with level of education, we used explained variance, defined as the square of Pearson’s Correlation between participants’ cognitive score and their level of education (i.e. number of years). Here the statistical significance was obtained by a permutation test (10,000 permutations of participants). To formally assess statistical independence, we used a non-parametric independence test, proposed by Gretton and Gyorfi⁶², based on 10,000 bootstrap resampling of participants.
Finally, we used a single factor analysis of variance (ANOVA) to compare average CGN_ICA scores for participants who had taken the CGN_ICA test every other day for two weeks. The goal was to see if the mean CGN_ICA scores are significantly different at any given day.

Khaligh-Razavi S.M., Habibi S., Sadeghi M., Marefat H., Khanbagi M., Nabavi S.M., Sadeghi E, & Kalafatis C. (2019). Integrated Cognitive Assessment: Speed and Accuracy of Visual Processing as a Reliable Proxy to Cognitive Performance. Scientific Reports, 9, 1102.

+ Open protocol

+ Expand

Quantifying Thermal Ablation Outcomes

Check if the same lab product or an alternative is used in the 5 most similar protocols

All results were statistically analyzed using MATLAB 2016a and its Statistics and Machine Learning Toolbox (Mathworks, Natick, MA, USA). Linear regression was employed to assess the relationship between predicted IRE areas and ablated areas obtained by H&E staining. The slope and coefficient of determination (R²) were calculated for different probabilities of cell death. Bland-Altman analysis^{52 (link)} was applied to assess the agreement between predicted IRE areas and ablated areas within limits of agreement determined as the mean differences between predicted IRE areas and ablated areas ±1.96 standard deviation (95% confidence interval). A p value of less than 0.05 was considered to indicate statistical significance. The number of animals used for experiments was defined using Pearson correlation guidelines^{53 (link)}.

Kranjc M., Kranjc S., Bajd F., Serša G., Serša I, & Miklavčič D. (2017). Predicting irreversible electroporation-induced tissue damage by means of magnetic resonance electrical impedance tomography. Scientific Reports, 7, 10323.

+ Open protocol

+ Expand

Machine Learning Algorithms for Segmented Audio Classification

Check if the same lab product or an alternative is used in the 5 most similar protocols

Three machine learning algorithms were trained and tested to classify 30 s-segments into one of two class labels (Good, Bad): Support Vector Machine, K-nearest neighbors and Decision tree. The choice of these algorithms was motivated by their robustness and generalization power in high-dimensional classification problems [16 ], [26 (link)]–[29 (link)]. All the algorithms were implemented in Matlab 2017a using the Statistics and Machine Learning Toolbox (Mathworks Inc, USA).

Pereira T., Gadhoumi K., Ma M., Liu X., Xiao R., Colorado R.A., Keenan K.J., Meisel K, & Hu X. (2019). A Supervised Approach to Robust Photoplethysmography Quality Assessment. IEEE journal of biomedical and health informatics, 24(3), 649-657.

+ Open protocol

+ Expand

Analyzing Visual Cortex BOLD Responses

Check if the same lab product or an alternative is used in the 5 most similar protocols

To analyze the overall slope within V1, we conducted ROI analyses using Matlab 2019a with Statistics and Machine Learning Toolbox (Mathworks Inc.). Vertices within the ROIs satisfied the following two conditions: 1) the offset contrast was significant and 2) within V1 defined by vcAtlas for each participant. We averaged the effect size of the slope across the ROI for each participant (S2 File). The averaged effect size of the slope and the offset within the ROI were tested for the difference from zero across the participants using a one-sample t-test. The significance level was set at P < 0.05.
To investigate the individual difference of the V1 BOLD response to the visual stimulus frequency, we tested Pearson’s product-moment correlation coefficient between the effect size of the slope or the offset within the ROI and the individual characteristics of age, MMSE, or UFOV performance. The significance level was set at P < 0.05 for the correlation test.

Uchiyama Y., Sakai H., Ando T., Tachibana A, & Sadato N. (2021). BOLD signal response in primary visual cortex to flickering checkerboard increases with stimulus temporal frequency in older adults. PLoS ONE, 16(11), e0259243.

+ Open protocol

+ Expand

Noise Exposure Analysis of Shooter Exercises

Check if the same lab product or an alternative is used in the 5 most similar protocols

Statistical analysis of the obtained measurement data was carried out in order to assess whether there were significant differences in exposure to noise resulting from the two variants of the number of shooters simultaneously participating in the exercises and to assess the impact of distance on the values of noise parameters. Data were analyzed using MATLAB R2017b (version 9.3) software with the Statistics and Machine Learning Toolbox (MathWorks Inc., Natick, MA, USA). The analysis used the Wilcoxon test, equivalent to the Mann-Whitney U test.

Mlynski R, & Kozlowski E. (2019). Selection of Level-Dependent Hearing Protectors for Use in An Indoor Shooting Range. International Journal of Environmental Research and Public Health, 16(13), 2266.

+ Open protocol

+ Expand

Protein Sequence Network Analysis

Cited 2 times

Check if the same lab product or an alternative is used in the 5 most similar protocols

The distributions of the number of direct neighbours for each sequence (degree, n) and of the number of sequences forming connected networks (cluster size, s) were derived from protein sequence networks at different thresholds of pairwise sequence identity. The number of nodes N(n) having a degree of n was fitted by a power law N(n) ~ n^-γ, and the scaling exponent γ was derived from a log-log plot [64 (link)]. The number of clusters N(s) with size s was fitted by a power law N(s) ~ s^-τ, and the Fisher exponent τ was derived from a log-log plot, too [65 (link)]. Logarithmic histograms for the cluster sizes s were obtained for subsequent intervals (2 ≤ s ≤ 10, 11 ≤ s ≤ 100, 101 ≤ s ≤1000, and 1001 ≤ s ≤ 10,000). The slopes τ_h of these histograms were determined for sequence networks at different thresholds of sequence identity. The Fisher exponent τ was derived by fitting τ_h against model distributions as described previously [65 (link)].
The distributions of degrees and cluster sizes were analysed by linear fitting using the fitlm function from the Statistics and Machine Learning Toolbox (version 11.7) in MATLAB (version R2020a, The MathWorks, Natick, MA, USA).

Orlando M., Buchholz P.C., Lotti M, & Pleiss J. (2021). The GH19 Engineering Database: Sequence diversity, substrate scope, and evolution in glycoside hydrolase family 19. PLoS ONE, 16(10), e0256817.

+ Open protocol

+ Expand

Mass Spectrometry Data Analysis Protocol

Check if the same lab product or an alternative is used in the 5 most similar protocols

For mass spectral data acquisition and mass spectra comparison, the MSD ChemStation Software (Agilent Technologies) was used. Data export and Savitzky-Golay smoothing of the IMS data were performed with the LAV Software version 2.2.1 (Gesellschaft für Analytische Sensorsysteme mbH). Further data pre-processing of IMS and MS data, model building, validation, and calculation of figures of merit were implemented in own MATLAB routines and carried out in MATLAB (The MathWorks Inc., Natick, MA, USA) using the Statistics and Machine Learning Toolbox (MathWorks).

Brendel R., Schwolow S., Rohn S, & Weller P. (2020). Gas-phase volatilomic approaches for quality control of brewing hops based on simultaneous GC-MS-IMS and machine learning. Analytical and Bioanalytical Chemistry, 412(26), 7085-7097.

+ Open protocol

+ Expand

Optimal Model Selection Decision Boundary

Check if the same lab product or an alternative is used in the 5 most similar protocols

To generate a decision boundary for the optimal choice of either AIC_GMM or BIC_RSS based on the properties of the time series, we used MATLAB’s fitcsvm function from the Statistics and Machine Learning Toolbox (The MathWorks). For each data series from each tested model, we computed a preference score for AIC_GMM vs. BIC_RSS based on their individual F1 scores (Eq. 17).

Preference for {AIC}_{GMM} = \frac{F 1_{GMM}}{F 1_{GMM} + F 1_{RSS}}

For each model, these scores were binned in two dimensions according to the effective SNR and log₁₀(number of samples) of their corresponding series. Each bin was labeled as either “AIC_GMM optimal” or “BIC_RSS optimal” according to its average preference score (AIC_GMM optimal for preference scores ≥0.5). These bin labels were used to train a support vector machine (SVM) classifier to determine a linear decision boundary in the two-dimensional space of effective SNR and number of samples.

Bandyopadhyay A, & Goldschen-Ohm M.P. (2021). Unsupervised selection of optimal single-molecule time series idealization criterion. Biophysical Journal, 120(20), 4472-4483.

+ Open protocol

+ Expand

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!