Introduction: Automated methods to “label” the presence of disease such as atrial fibrillation (AF) within electronic health record (EHR) data are important for secondary uses such as population health management. Labels generated from billing code patterns (e.g. “at least 2 relevant ICD codes used within 1 year”) can be accurate within one health system, but may fail to generalize across systems due to variations in coding practices.
Hypothesis: We hypothesized that a natural language processing (NLP) model could be trained to detect the presence of atrial fibrillation using unstructured clinical notes by learning at scale from labels generated from a validated structured EHR and billing code definition. If successful, this could facilitate scaling disease label methods across large amounts of clinical data without suffering from inaccuracies due to variations in coding practices.
Methods: We collected clinical notes from a regional health system into a training set of roughly 29 million code-labeled episodes and a separate hold-out set of roughly 1.8 million code-labeled episodes. We trained our NLP model on the training set and computed model performance compared to the code-based labels on the hold-out set. We also performed targeted blinded chart reviews. This work was conducted under IRB approval.
Results: The model achieved an AUPRC of 0.91 with 87% recall and 89% precision and learned to distinguish challenging confounders despite not being explicitly trained for these tasks (Fig). Blinded review of select episodes showed that the NLP model was correct 90% of the time when it labeled the episode as positive for atrial fibrillation despite the code-based label (incorrectly) labeling it negative.
Conclusions: NLP models can learn to automatically label the presence or absence of atrial fibrillation within clinical notes and may lead to greater accuracy and generalizability relative to code-based labeling methods.
VIEW THE PUBLICATION
VIEW THE POSTER