Job Opportunities

Product Development Senior Developer – NLP Cogstack – Band 8a

We are looking for a Senior Developer to develop Cogstack and associated technologies to meet the needs of this project and beyond.

It would be great if you could share a link to GitHub in your supporting statement to showcase any prior project contributions.

The successful candidate will be joining a friendly multi-disciplinary team, using AI and other cutting edge technologies to improve care delivery and planning.

Closing Date: 3rd October 2021 at midnight

Interview Date: w/c 4th October

Learn more about this job…

IT Project Manager: Cogstack/MedCat Transformation Programme

We are looking for a Project Manager to drive forward this exciting new project, working collegiately with a wide range of staff across the consortia. An organised, analytical mind is needed, as well enthusiasm for the project aims and a proactive approach to problem-solving.

The successful candidate will be joining a friendly multi-disciplinary team, using AI and other cutting edge technologies to improve care delivery and planning.

Closing Date: 3rd October 2021 at midnight

Interview Date: w/c 4th October

Learn more about this job…

The experts in our text analytics community have provided a curated list of applications and datasets for healthcare text analytics which can be found on HDR UK Text’s github “resources” repository.

HDR UK Text Resources

A curated list of applications, datasets and models for healthcare text analytics developed and shared by the Health Data Research (HDR) UK Text community.

If you’d like to contribute a resource, please message us at info@hdruk-text.org.

More health data focused applications, datasets and other resources are available by searching on the HDR UK Gateway.

 

Contents

 

Applications

  • CALIBER drugdose: medication dosage instructions in electronic health records are often in the form of text rather than numbers. This program is designed to convert the text into numbers for the dose, frequency, units, duration etc.
  • CogStack: a locally deployable, distributed, microservice architecture intended to make information retrieval/extraction easier from EHRs.
  • CRIS / SLaM: library of applications available for use within South London and Maudsley (SLaM) on the Clinical Record Interaction Search (CRIS) platform. Access to CRIS must be applied for in order to use applications.
  • EdIE-viz: provides an interface for stroke-related clinical concept recognition and negation detection in brain radiology reports.
  • EdIE-R: a rule-based information extraction tool developed for brain imaging reports.
  • EdIE-BiLSTM: a neural network system for named entity recognition and negation detection with a character-aware BiLSTM sentence encoder for brain imaging reports.
  • EdIE-BERT: a neural network system for named entity recognition and negation detection with a pretrained BERT encoder (BlueBERT) for brain imaging reports.
  • EndoMineR: a rule-based information extraction system for free-text and semi-structured endoscopy reports and their associated pathology specimens.
  • Free text matching algorithm: this computer program is designed to extract diagnoses, dates, durations, laboratory results and selected examination findings (heart rate and blood pressure) from unstructured free text. The program was created based on text in general practice records from the Vision system, and information is encoded using Read Clinical Terms.
  • HELIN: A web API demo for performing named entity recognition and linking (NER-L) on biomedical text.
  • Komenti: uses background knowledge that researchers have already discovered about biology and medicine. It combines and uses this knowledge in new ways, with the aim to learn even more from sources that are usually difficult for computers to understand. For example, it can extract information about a patient and the illnesses they are suffering from letters written by their doctor.
  • Med7: dedicated named entity and recognition system to identify 7 categories: dosage, drug, duration, form, frequency, route and strength.
  • MedCAT: a medical concept annotation system that can be used to extract, structure and organize Health Records. It is based on unsupervised learning with the option of online/supervised learning via the MedCATtrainer interface.
  • SemEHR: a text mining and semantic search system designed for Surfacing Semantic Data from Clinical Notes in Electronic Health Records for Tailored Care, Trial Recruitment and Clinical Research.
  • SIPHS: a collection of software and datasets to support linguistic analysis of online health communities.

 

Datasets & models

  • BioReddit embeddings: a set of word embeddings (GloVe, ELMo, Flair) trained on medical subreddits. The embeddings are trained on ~800,000 Reddit posts from over 60 medical-themed communities.
  • Biomedical ambiguities: abbreviations and gene names: corpora containing examples of two ambiguities from the biomedical domain (abbreviations and gene names).
  • Cardiovascular research abstracts: corpus containing examples of potentially contradictory claims from Medline abstracts describing cardiovascular research intended as a useful resource for researchers working on similar problems.
  • COMETA: an entity linking dataset of layman medical terminology collected by analysing four years of content in 68 health-themed subreddits.
  • PheneBank: 24 million MEDLINE abstracts as well as 3.8M open-access PMC full articles annotated with 9 classes of entity: Phenotype, Disease, Anatomy, Cell, Cell_line, GPR, Gene_variant, Molecule, and Pathway mapped to five major ontologies: SNOMED, HPO, MeSH, PRO, and FMA.
  • SapBERT: Despite the widespread success of self-supervised learning via masked language models, learning representations directly from text to accurately capture complex and fine-grained semantic relationships in the biomedical domain remains as a challenge. SapBERT is a pre-training scheme based on BERT. It self-aligns the representation space of biomedical entities with a metric learning objective function leveraging UMLS, a collection of biomedical ontologies with >4M concepts.

These resources are also available on the HDR UK Gateway along with many more health focused applications, datasets and other resources

HDR UK National Text Analytics Project is funded by Health Data Research UK (HDR UK).

About Us

Useful Links

Contact Us

General Queries: info@hdruk-text.org

For specific queries, please contact our Co-PI: Professor Richard Dobson or Dr Angus Roberts.

© 2020 HDR UK-Text | Made by Suara