Rapid Fire Abstracts
Michelle Fang, BSc
Medical Student
Cleveland Clinic Lerner College of Medicine
Michelle Fang, BSc
Medical Student
Cleveland Clinic Lerner College of Medicine
Eileen Galvani, MD
Resident physician
Mount Sinai Hospital
Xiaotan Sun, MSc
Research Data Scientist I
Cleveland Clinic
Sharmeen Sorathia, MD
Resident physician
Cleveland Clinic
Kevin Dorocak, BSc
Student
Case Western Reserve University School of Medicine
Christopher Nguyen, PhD, FSCMR, FACC
Director, Cardiovascular Innovation Research Center
Cleveland Clinic
Deborah Kwon, MD, FSCMR
Director of Cardiac MRI
Cleveland Clinic
David Chen, PhD
Director of Artificial Intelligence
Cleveland Clinic
Cardiac magnetic resonance imaging (CMR) reports contain a wealth of information on a patient’s cardiovascular status. Automatically extracting data from free-text reports could provide clinical decision support and help populate research and operational databases. Although many studies have looked into extracting information from free-text radiology reports, few studies have focused on CMR. Of those studies, key limitations include: small sample size,1 limited number of cardiovascular conditions,1,2 and inability to address diagnostic uncertainty, disease severity, disease subtype and anatomical locations of the condition.1–3 We sought to develop a clinical data extraction model based on a pre-trained language model, which we call CMR-BERT, for a broad range of CMR-specific findings with detailed attributes.
Methods:
We built a multi-task pre-trained language model trained to identify 34 common CMR findings and cardiovascular conditions and their associated attributes, including certainty, severity, location, and subtype of the condition (Listed in Table 1). This model was trained on 1,778 MRI reports and tested on 397 additional reports. Model performance was evaluated using F1 score and area under the receiver operating characteristic curve. A failure analysis of model misclassifications on select sentences was also done.
Results:
CMR-BERT shows robust performance in identifying select valvular pathologies (Table 1), including mitral valve regurgitation, aortic valve regurgitation, and tricuspid valve regurgitation (F1 = 0.95 – 0.98); as well as a number of cardiac chamber conditions, including atrial dilation, ventricular dilation, ventricular dysfunction, and left ventricle hypertrophy (F1 = 0.87 – 1.00); and aortic dilation (F1 = 0.98). Our model is further able to detect specific attributes relating to these cardiovascular conditions, including certain levels of severity and location of these pathologies (Table 1). Failure analysis shows certain model misclassifications may be attributed to human error in the manual tagging process, yet the model was able to correctly identify some of these erroneous tags (Table 2).
Conclusion:
CMR-BERT can be used to automate the extraction of data from CMR reports to potentially assist with clinical operational optimization and research workflow.