Optical Character Recognition Combined With Natural Language Processing to Improve Extraction of Quality Parameters From Colonoscopy Reports
Douglas K. Rex, MD, MASGE, reviewing Laique SN, et al. Gastrointest Endosc 2020 Sep 3.
Measurement and improvement of colonoscopy quality measures, such as the adenoma detection rate (ADR), are critical to optimizing colorectal cancer prevention by colonoscopy. The process of measuring ADR still frequently requires manual data extraction, which is time-consuming and expensive. One method of automating quality measurement is natural language processing (NLP), which has already proven effective but requires machine-readable clinical text and does not work with print or scanned documents. Optical character recognition (OCR) enables conversion of scanned paper documents into editable and searchable text data. In addition to accurately extracting data on polyp and adenoma detection, bowel preparation quality, and success of cecal intubation, the hybrid OCR/NLP technology had 100% accuracy in evaluating quality endpoints (compared to manual annotation) from 30 colonoscopy procedures performed at a different institution and using a different endoscopy writing software.
Note to readers: At the time we reviewed this paper, its publisher noted that it was not in final form and that subsequent changes might be made.
CITATION(S)
Laique SN, Hayat U, Sarvepalli S, et al. Application of optical character recognition with natural language processing for large-scale quality metric data extraction in colonoscopy reports. Gastrointest Endosc 2020 Sep 3. (Epub ahead of print) (https://doi.org/10.1016/j.gie.2020.08.038)