Background: Tumor mutation burden (TMB) is the number of non-synonymous mutations present in a cancer exome. In colorectal cancer (CRC), high TMB is associated with microsatellite instability (MSI), POLE mutations, and response to immunotherapy. TMB prediction from whole-slide images (WSIs) could aid workflows that determine MSI and POLE status. Deep learning has previously been used to predict MSI status from WSIs. This approach assumed the morphologies of all regions within the tumor are equally associated with MSI. Here, we predict TMB using a weakly supervised deep learning framework that relaxes this assumption and automatically learns relevant regions within the tumor that are most associated with TMB, potentially uncovering morphological associations.
Methods: Weakly supervised learning methods facilitate classification of samples that contain many individual instances, only some of which are related to the sample label. Here, a given WSI has a single TMB-high or -low label and contains individual regions that may or may not be associated with TMB status. We implemented a ResNet18 attention-based, multiple-instance learning (MIL), convolutional neural network to simultaneously learn which tiles are important for prediction of the slide-level TMB and the tile features that are associated TMB-high and -low. We determined performance through 8-fold cross-validation within a Tempus dataset using a 75%-12.5%-12.5% split of ~940 WSIs for training, validation, and testing folds.
Results: In the cross-validation, we observed a receiver operating characteristic area under the curve of 0.854 (95% CI 0.776-0.932), an average precision of 0.723 (95% CI 0.580-0.865), and an accuracy of 0.889 (95% CI 0.833-0.945) in the held-out test sets. Morphologies predicted as irrelevant for TMB include adipose tissue and WSI artifacts. Visualizations of model weights show morphologies determined to be most associated with TMB-high and -low, such as high tumor/lymphocyte content and vasculature/red blood cells, respectively.
Conclusions: Attention-MIL shows high performance for the prediction of TMB in CRC from H&E images and potentially reveals the morphologies of CRC that are most associated with TMB. Future directions include further investigation of morphological associations, generalizing this model beyond Tempus acquired data, and re-training on the entire Tempus dataset.
VIEW THE PUBLICATION