Support us

Clinical text mining of the performance status and progression-free survival to facilitate data collection in cancer research: an exploratory study.

Abstract

MATERIALS AND METHODS

Unstructured Dutch text data were derived from different EMR fields containing mainly information recorded during outpatient visits. A rule-based TM approach using regular expressions was used to extract PS and PFS in the R programming language. For PS, quantitative evaluation metrics, such as the weighted F1-score, were used to determine the accuracy of the TM-extracted data. For PFS, the median PFS was compared between the two approaches using the Kaplan-Meier method. In addition, the C-index was determined.

CONCLUSIONS

The developed TM approach is able to extract PS and PFS from the EMR with a very good performance. Therefore, this approach increases the efficiency of reliable data collection from EMRs, facilitating the use of real-world data (RWD) in clinical research.

RESULTS

A PS was obtained for 196 patients (60%) using the TM approach. In 189 (96%) patients, the TM-curated PS matched the manually curated PS. The weighted F1-score was 96.5%. The median PFS was 7.42 months for the manually curated data (n = 328) and 8.00 months for the TM-curated data (n = 301). The C-index was 0.916.

BACKGROUND

Modern electronic medical records (EMRs) contain a valuable amount of data. These data can be unlocked for research by manual data collection, which is highly labor intensive. Therefore, we explored whether automated text mining (TM) could be used to extract the performance status (PS) and progression-free survival (PFS) in a cohort of 328 non-small-cell lung cancer patients.

More about this publication

ESMO real world data and digital oncology
  • Volume 5
  • Pages 100059
  • Publication date 01-09-2024

This site uses cookies

This website uses cookies to ensure you get the best experience on our website.