For mask generation, it is necessary to determine the cut-off values for positive IF signals and remove false-positive signals due to artifacts, registration errors, or non-specific signals from blood cells.
Inconsistencies between the intensities of the DAPI nuclear channel in the IF image and the hematoxylin component in the H&E-stained image, indicating the existence of artifacts or registration errors, were detected by calculating the Pearson’s correlation coefficient between the two signal intensities. Patches with correlation coefficients below 0.5 were removed for further analysis. False-positive signals derived from the autofluorescence of RBCs were removed by masking the positively predicted regions using the RBC segmentation neural network trained on the anti-CD235a antibody-stained dataset. Based on visual inspection, an IF signal intensity >50 (epithelium, smooth muscle, and RBCs) or 25 (others) was regarded as a positive signal in the initial mask generation step.
For the epithelium and smooth muscle, the positive signal area was used as a segmentation mask without modification. For RBCs, the area that was positive in the IF image and red in the H&E-stained image (R > 100 and G < 130, and R > B) was used as a segmentation mask. For leukocytes, myeloid cells, lymphocytes, plasma cells, and endothelial cells, positive signals from the target cells were transferred into the nuclei based on the IF staining pattern to obtain a more consistent result and improve the interpretability of the segmentation model. Cellpose version 0.6.519 (link) was applied to the DAPI nuclear channel in the IF images to detect the nuclei. We selected a model with the following parameters: diameter = 30, channels = [3,0], batch_size = 64, and cellprob_threshold = 0.1. Nuclei were masked if over 40% of them contained positive signals. Finally, one iteration of morphological erosion with a 3 × 3 kernel was applied to each region of the nuclei to prevent multiple cells from sticking together, which could cause an underestimation of the cell count.
For deep neural network training during the mask generation process, all patches were divided into training, validation, or test sets so that all patches from the same TMA spot belonged to the same set. TMA spots in each TMA were detected as clusters by applying the DBSCAN clustering algorithm40 implemented in scikit-learn to patches using the x and y coordinates as the input features, maximum distance set to 3,000 pixels, and min_samples set to 5. The validation and test sets contained patches from two TMA spots in each TMA slide, and the rest were placed into the training set. For deep neural network training after mask generation, we moved the training/validation patches from the patient in the test set to the test set and training patches from the patients in the validation set to the validation set, so that the patches from the same patient did not span the training/validation/test sets.
Free full text: Click here