Model validation was then performed by using the fine-tuned StarDist-3D algorithm to predict seed labels for all 3D µCT sub-volumes from the ‘validation’ dataset, which were then compared to the number and shape of seeds manually counted and labelled during annotation. Accuracy of seed detection and segmentation was then quantified for various levels of threshold τ, defined as the IoU between the predicted label and ground-truth label for each seed. The value of τ ranged between 0, where even a very slight overlap between predicted seeds and actual seeds counted as correctly predicted, and 1, where only predicted seed labels with pixel-perfect overlap with ground-truth labels counted as correctly predicted (Weigert et al., 2020 ).
Object detection accuracy was measured using the number of true positive results (TP), or number manually counted and labelled seeds that were correctly detected seeds, the number of false negative results (FN), or the number of manually counted and labelled seeds that were missed, the number of false positive results (FP), or number of objects other than seeds than were detected, recall, precision and F1-score. Recall related to the fraction of relevant objects that were successfully detected and was defined as:
Precision related to the fraction of all detected objects that were relevant and was defined as:
F1-score related to the harmonic mean of precision and recall, with the impact of precision and recall being given equal weight. F1-score was defined as:
The accuracy of seed segmentation, or the accuracy of seed size and shape prediction, for the validation dataset was determined based on the mean matched score, defined as the mean IoU between the predicted and actual shape of true positive results, the mean true score, defined as the mean IoU between the predicted and actual shape of true positive results normalised by the total number of ground-truth labelled seeds, and panoptic quality, as defined in Eq.1 of Kirillov et al., 2019 .
StarDist-3D models allow for specification of two values, the τ-threshold and the nms-threshold to optimize model output (Schmidt et al., 2018 ; Weigert et al., 2020 ). The τ-threshold refers to the minimum intersection-over-union between pairs of predicted and ground-truthed seeds required for detections to be classified as true positives, and can be set at 0.1 interval levels between 0.1 and 1 with 0.1 indicating a 10% overlap in the pixels within the predicted shape of a seed and the ground-truthed label and 1 respreseting a 100% overlap (Schmidt et al., 2018 ; Weigert et al., 2020 ). The nms-threshold, refers to the level of non-maximum suppression applied to the results of object detection and instance segmentation to prune the number of predicted star-convex polyhedra in ideally retain a single predicted shape for each true object, in this case each seed, within an image. The nms-threshold can be set at 0.1 interval levels between 0 and 1 with higher levels indicating more aggressive pruning of predicted shapes which therefore leads to fewer detections in the final model output. Therefore a higher nms-threshold is valuable in cases where the number of false positives expected in unfiltered model predictions is high. Both the τ-threshold and the nms-threshold for the fine-tuned StarDist-3D algorithm were set to optimal values based on the ‘validation’ dataset using the ‘optimize_thresholds’ function of StarDist (Schmidt et al., 2018 ).