For training our Mask R-CNN models, we used strategies from Abdulla (2017) . The networks were trained for at least 200 epochs (base models) or 500 epochs (optimized models) with stochastic gradient descent at a learning rate of 0.001, a momentum of 0.9, batch size of one image and a weight decay of 0.001 (Table S3). The number of anchors for RPN was set to 512. The detection threshold was set at 90%. Models were initiated with COCO pre-trained weights (Lin et al., 2015 (link)). The best models were selected based on the lowest loss value in the training and validation datasets. To train U-Net, we used a learning rate of 0.00001 with a batch size of 4 and trained for 500 epochs.
Free full text: Click here