We retrospectively analysed 1646 patients (11473 segmented lesions) with contrast-enhanced CT and EGFR mutation status from next-generation sequencing at the Netherlands Cancer Institute, alongside an external NSCLC radiogenomics cohort (n = 158). All visible lesions were segmented, and the exact biopsy site was matched to its segmentation. Radiomic features were extracted, and machine learning models were trained with three lesion selection strategies: all lesions, non-biopsied lesions only, and biopsy-confirmed lesions only. To disentangle label quality from sample size, we created size-matched variants (one lesion per patient) for all-lesion and non-biopsied strategies.
Radiogenomic models trained on biopsy-confirmed lesions outperform conventional all-lesion strategies in external validation, despite using an order of magnitude fewer samples. Prioritising lesion-level label fidelity can mitigate heterogeneity-driven noise, enhancing robustness and clinical translation of imaging-based genomic prediction.
All models achieved significant discrimination of EGFR status on internal validation (AUC = 0.62-0.68). However, performance of the all-lesion and non-biopsied models declined on external validation (AUC = 0.55-0.63), while the biopsy-anchored model maintained stable performance (AUC = 0.62), despite having only 1/10th of the training sample size. When training sets were size-matched, the biopsy-anchored approach significantly outperformed a model trained on all available lesions on external validation (p = 0.037).
Question Does assigning biopsy-derived molecular labels to all lesions introduce heterogeneity-driven label noise that reduces the generalisability of radiogenomic models? Findings Models trained exclusively on biopsy-confirmed lesions demonstrated superior external generalisability compared with all-lesion approaches, despite being trained on substantially fewer samples. Clinical relevance Biopsy-anchored radiogenomics improves the reliability of non-invasive mutation prediction by accounting for tumour heterogeneity, potentially supporting clinical decision-making when tissue sampling is limited or molecular results are discordant across lesions.
Radiogenomics aims to non-invasively predict tumour genotypes from imaging, but most studies assume molecular homogeneity by assigning a single biopsy-derived label to all lesions within a patient. This approach risks substantial label noise given well-documented interlesional heterogeneity. We investigated whether anchoring training to biopsy-confirmed lesions improves radiogenomic model performance and generalisability.
This website uses cookies to ensure you get the best experience on our website.