Authors: Carol Dezateux, Georgios Gkoutos.

Lung cancer is the commonest cause of cancer death in the UK, accounting for 21% of all cancer deaths. Late diagnosis is the major reason why overall prognosis for lung cancer patients is so poor: 72-76% of people with a known stage are diagnosed at stage III or IV, ie late stage, and only 4% diagnosed through emergency presentation are alive 3 years after diagnosis. Hence the NHS long term plan has identified several initiatives to tackle this, including introduction of lung health checks.

Need and opportunity
While a number of initiatives are in place to identify ‘actionable moments’ to improve the timeliness of diagnosis, these are impeded by inability to use unstructured data held in imaging investigations, incomplete data on recurrence and site of metastases in national routine cancer data sources, and inconsistency in reporting of morphological diagnosis, stage or presence of molecular markers, all of which have major implications for prognosis and treatment.

Data and analysis 
We will work with the CRUK-funded early diagnosis of lung cancer collaboration (PI Mick Peake) to annotate and apply NLP techniques to text in lung cancer CT and CXR requests and reports, and to morphological diagnosis of lung cancer biopsies, initially using data from the Discovery Programme. We will engage with PPI groups associated with UCLH Cancer Collaborative, NCRI and NIHR from the outset. NLP tools will be tested and scaled using data from Wales (Rolles, Wardle), and NIHR HIC. We will benefit from the data and analytic resources available from NHS Scotland, and Scottish expertise and experience in NLP analysis of radiology and histopathology reports, as well as the scale of images available through the SAIL databank, including, in Cwm Taf Health Board, annually 100k plain films, 12k CT and 15k MRI investigations.

Benefit to community
A NLP pipeline from radiology and pathology reports that can be implemented at scale in systems across the UK, including Wales, Local Health and Care Record Exemplar sites, Birmingham (QEHB and associated hospitals), and Edinburgh. Creation of open source tools openly available as part of the broader NLP platform. Support for delivery of NHS long term plan objective to deliver lung checks and diagnose 75% of cancers at stage 1 or 2 by 2028. Linkage of NLP-derived stage phenotypes with EHRs including NCRAS will inform efforts to deliver earlier stage at diagnosis. Knowledge and expertise generalisable to other cancers.

Share on

HDR UK National Text Analytics Project is funded by Health Data Research UK (HDR UK).

About Us

Useful Links

Contact Us

General Queries:

For specific queries, please contact our Co-PI: Professor Richard Dobson or Dr Angus Roberts.

© 2020 HDR UK-Text | Made by Suara