Information on lymph node status, stage and tumour size was available from original histopathology reports for all studies. Expert breast cancer pathologists reviewed FFPE sections stained with haematoxylin and eosin (H&E) from tumours with available material and scored histological tumour type, grade, tumour cellularity and lymphocytic infiltration.
Immunohistochemistry-based (IHC) scoring of ER status was, where available, used to classify ER−positive (ER+) and ER−negative (ER−) tumours. To confirm this classification for samples which had gene expression data available, we fit a two-component Gaussian mixture model to the expression levels of ESR1 using the mixtools package59 in R, and computed the probabilities of the samples belonging to the two distributions defined by the components. The distribution yielding the higher probability was selected to represent the ER status for each sample. Where the calls between the two systems differed, we used the expression-derived classification if the probability of belonging to the opposite distribution was at least 5 × higher than for the distribution described by IHC; this scheme was chosen so as to assign more weight to the IHC classification, as this is currently the clinical gold standard. We performed a similar analysis with ERBB2 expression levels to corroborate the IHC-based HER2 calls. For patients without expression data (n=416), we used the IHC scores to assign ER and HER2 status. Similarly, gene expression-based classification was used for samples without IHC data.
Immunohistochemistry-based (IHC) scoring of ER status was, where available, used to classify ER−positive (ER+) and ER−negative (ER−) tumours. To confirm this classification for samples which had gene expression data available, we fit a two-component Gaussian mixture model to the expression levels of ESR1 using the mixtools package59 in R, and computed the probabilities of the samples belonging to the two distributions defined by the components. The distribution yielding the higher probability was selected to represent the ER status for each sample. Where the calls between the two systems differed, we used the expression-derived classification if the probability of belonging to the opposite distribution was at least 5 × higher than for the distribution described by IHC; this scheme was chosen so as to assign more weight to the IHC classification, as this is currently the clinical gold standard. We performed a similar analysis with ERBB2 expression levels to corroborate the IHC-based HER2 calls. For patients without expression data (n=416), we used the IHC scores to assign ER and HER2 status. Similarly, gene expression-based classification was used for samples without IHC data.
Full text: Click here