The selection of particles that give rise to 2D class average images with recognizable protein features is often used to discard suboptimal particles from cryo-EM data sets. The selection of suitable 2D classes was done interactively in previous releases of RELION. RELION-4.0 contains a new programe called relion_class_ranker that automates 2D class selection. This programe predicts a score for each class by combining the output of a convolutional neural network that acts on the 2D class average images with 18 features (Figure 1A,B).
The convolutional neural network takes as input individual 2D class average images, cropped to contain only the area defined by the circular mask used in the 2D classification, and rescaled to 64 × 64 pixels. The feature vector is calculated for each class from RELION’s metadata of the 2D classification job, including the estimated accuracies of rotational and translational alignments, the estimated resolution (1/d in 1/Å) and a so-called weighted resolution, which is calculated as d2/lnN, where N is the number of particles assigned to the class. It also contains features that are calculated from the 2D class average images, in particular the first to fourth moments of density values inside an automatically determined mask for the protein region, the solvent region, and for a ring around the outer diameter of the mask that has been applied to the 2D class average images. The combined output from the convolutional neural network and the feature vector is passed through two fully connected layers, with non-linear (ReLU) activation functions between the layers, to predict a single, floating point value, score for each 2D class.
The network in the relion_class_ranker program was trained on 18 051 2D class average images from 233 RELION 2D classification jobs that were performed at the MRC-LMB over a period of approximately 4 years. Each of the jobs was assigned a job score, ranging from zero to one, and within jobs the class averages images were manually divided into four categories depending on their quality. For each 2D class, the combination of its job score, its category assigned and its estimated resolution compared with the best resolution in its 2D classification job, were used to calculate a target class score, ranging from zero to one. The target scores were intended to represent a ranking over all classes in the training set, with a score of one representing the best classes from the best 2D classification jobs, and a score of zero representing the worst classes. The network was implemented and optimized with the Adam optimizer [27 ] for 200 epochs in pytorch [32 ], using a mean-squared error between predicted and assigned scores. All 18 051 class average images, plus their metadata from the 2D classification jobs and their assigned class scores are publicly available through the EMPIAR data base (entry-ID 10812). The code used to optimize and execute the neural network are available from the RELION github pages.
Free full text: Click here