Approximately 1 in 7 women will develop breast cancer (BC) in their lifetime. Despite recent advances in the early detection and treatment of BC, and extended histopathological and molecular characterisation of primary tumours, a large percentage of patients still go on to experience recurrence (local, regional and distant – also called metastasis). Future recurrence events remain difficult to predict.
In this study, we will explore the potential of Natural Language Processing (NLP) methods to predict recurrence events from electronic health records (EHRs), specifically imaging text reports, earlier than other methods. This approach will ultimately improve surveillance by defining at-risk patients and tailoring care for better prognosis.
Data and methods
We propose to use EHRs for BC patients from the Barts Health NHS Trust site of the Breast Cancer Now Tissue Bank (BCNTB). We will apply NLP techniques to extract a number of features associated with recurrence (e.g. timing and site) from the imaging reports. We will then train an AI model on these features together with the additional information extracted from EHRs such as demographics, lifestyle, comorbidities and medications to predict recurrence events. The final aim is to link with genomic data (where available) to drive phenotype-informed genetic analysis for identifying the risk and prognostic factors associated with BC and predicting patient outcome and clinical response to treatment.
We aim to help detect BC recurrence sooner than with current methods, enabling implementation of stratified care to improve patient outcome. We expect to contribute towards the direct benefit of both patients already diagnosed with BC and those at risk of recurrence. This project will establish a scalable method that could contribute to the NHS long-term plan for early detection of cancer and the Getting It Right First Time (GIRFT) programme.