Background: Oncogenic cancers, such as human papillomavirus (HPV) and Epstein Barr virus (EBV), account for 10 to 12% of all cancers. Knowing the viral status of a tumor is essential as it may change patients’ treatment options. Numerous clinical trials investigating the benefit of radiation or chemotherapy dose reduction for HPV positive head and neck cancers have shown promising results. Additionally, virus-associated tumors are more likely to present higher levels of inflammation and immune infiltration, which make them good candidates for immunotherapy. We developed a novel viral detection algorithm using the patient’s tumor whole transcriptome profiles to predict viral status. As RNA-sequencing of clinical samples becomes increasingly common, the use of gene expression data for viral status and insights about the tumor immune microenvironment may impact clinical treatment decisions for a larger number of patients.
Methods: To develop RNA based classifiers of viral status from tumor transcriptomes, we used the TCGA cervical cancer, head and neck cancer, and gastric cancer cohorts with known viral statuses. To identify predictive genes, we performed a 10-fold cross validation on the TCGA cohorts and trained a Logistic Regression with a L1 regularisation. Since L1 regularization leads to sparse coefficients, only a small subset of genes had non-zero coefficients at each split. Only the genes with non-zero coefficients in more than 80% of the splits were included in the final model. Using this subset of genes, we again trained Logistic Regression classifiers with a L1 regularisation on the TCGA dataset and validated our models on Tempus cohorts.
Results: Our model can predict the HPV subtype with a 99% specificity and 99% sensitivity, and the EBV subtype with a 99% specificity and 100% sensitivity in the TCGA cohorts. In our Tempus validation cohorts, we report 96% specificity and a 88% sensitivity for HPV, and a 97% specificity and a 100% sensitivity for EBV. Subsequent analysis of whole transcriptome data showed a significant increase of the interferon gamma signature, the cytolytic index and the immunotherapy target IDO1 in EBV positive patients in both cohorts. We observed the same pattern in the HPV cohorts, though the increase was not significant in the Tempus cohorts due to the small sample size.
Conclusions: Our models accurately predicts viral infection in tumors using RNA expression data. We confirm viral infections are generally associated with an upregulation of immune responses. Viral detection based on whole transcriptome data could become a useful clinical tool in combination with existing methods, providing insights about the viral status and tumor microenvironment with one test.
VIEW THE PUBLICATION