A StarDist-3D model with a U-Net backbone (Çicek et al., 2016 (link)) was trained to detect and segment individual B. napus seeds in 3D µCT sub-volumes from the labelled ‘training’ dataset using the pipeline described by Weigert et al. (2020) . Model training was performed using a Google Colab runtime with 25.46 GB and a single GPU (Bisong, 2019 (link)). The StarDist-3D model was configured to use 96 Fibonacci rays in shape reconstruction, and to take into account the mean empirical anisotropy, of all labelled seeds in the dataset along each axis as calculated using the method described by Weigert et al., 2020 (X-axis = 1.103448275862069, Y-axis anisotropy = 1.032258064516129, Z-axis anisotropy = 1.0). The training patch size, referring to the size of the tiled portion of the 3D sub-volumes in the ‘training’ within view of the neural network at any one time, was set to Z = 24, X= 96, and Y = 96 and training batch size set to 2. Training ran for 400 epochs with 100 steps per epoch and took 1.36 hours to complete (123ms/step).
Model validation was then performed by using the fine-tuned StarDist-3D algorithm to predict seed labels for all 3D µCT sub-volumes from the ‘validation’ dataset, which were then compared to the number and shape of seeds manually counted and labelled during annotation. Accuracy of seed detection and segmentation was then quantified for various levels of threshold τ, defined as the IoU between the predicted label and ground-truth label for each seed. The value of τ ranged between 0, where even a very slight overlap between predicted seeds and actual seeds counted as correctly predicted, and 1, where only predicted seed labels with pixel-perfect overlap with ground-truth labels counted as correctly predicted (Weigert et al., 2020 ).
Object detection accuracy was measured using the number of true positive results (TP), or number manually counted and labelled seeds that were correctly detected seeds, the number of false negative results (FN), or the number of manually counted and labelled seeds that were missed, the number of false positive results (FP), or number of objects other than seeds than were detected, recall, precision and F1-score. Recall related to the fraction of relevant objects that were successfully detected and was defined as:
Precision related to the fraction of all detected objects that were relevant and was defined as:
F1-score related to the harmonic mean of precision and recall, with the impact of precision and recall being given equal weight. F1-score was defined as:
The accuracy of seed segmentation, or the accuracy of seed size and shape prediction, for the validation dataset was determined based on the mean matched score, defined as the mean IoU between the predicted and actual shape of true positive results, the mean true score, defined as the mean IoU between the predicted and actual shape of true positive results normalised by the total number of ground-truth labelled seeds, and panoptic quality, as defined in Eq.1 of Kirillov et al., 2019 .
StarDist-3D models allow for specification of two values, the τ-threshold and the nms-threshold to optimize model output (Schmidt et al., 2018 ; Weigert et al., 2020 ). The τ-threshold refers to the minimum intersection-over-union between pairs of predicted and ground-truthed seeds required for detections to be classified as true positives, and can be set at 0.1 interval levels between 0.1 and 1 with 0.1 indicating a 10% overlap in the pixels within the predicted shape of a seed and the ground-truthed label and 1 respreseting a 100% overlap (Schmidt et al., 2018 ; Weigert et al., 2020 ). The nms-threshold, refers to the level of non-maximum suppression applied to the results of object detection and instance segmentation to prune the number of predicted star-convex polyhedra in ideally retain a single predicted shape for each true object, in this case each seed, within an image. The nms-threshold can be set at 0.1 interval levels between 0 and 1 with higher levels indicating more aggressive pruning of predicted shapes which therefore leads to fewer detections in the final model output. Therefore a higher nms-threshold is valuable in cases where the number of false positives expected in unfiltered model predictions is high. Both the τ-threshold and the nms-threshold for the fine-tuned StarDist-3D algorithm were set to optimal values based on the ‘validation’ dataset using the ‘optimize_thresholds’ function of StarDist (Schmidt et al., 2018 ).
Free full text: Click here