Sources of variation in multicenter rectal MRI data and their effect on radiomics feature reproducibility.

Show more authors
+ 15
Niels W Schurink
Simon R van Kranen
Sander Roberti
Joost J M van Griethuysen
Nino Bogveradze
Francesca Castagnoli
Najim El Khababi
Frans C H Bakers
Shira H de Bie
Gerlof P T Bosma
Vincent C Cappendijk
Remy W F Geenen
Peter A Neijenhuis
Gerald M Peterson
Cornelis J Veeken
Roy F A Vliegen
Regina G H Beets-Tan
Doenja M J Lambregts

Abstract

KEY POINTS

• Features derived from T2W-MRI and in particular ADC differ significantly between centers when performing multicenter data analysis. • Variations in ADC are mainly (> 60%) caused by hardware and image acquisition differences and less so (< 1%) by patient- or tumor-intrinsic variations. • Features derived using different image segmentations (expert/non-expert) were reproducible, provided that whole-volume segmentations were used. When using different feature extraction software packages with similar settings, higher-order features were less reproducible.

METHODS

T2W and DWI/ADC MRIs from 649 rectal cancer patients were retrospectively acquired in 9 centers. Fifty-two imaging features (14 first-order/6 shape/32 higher-order) were extracted from each scan using whole-volume (expert/non-expert) and single-slice segmentations using two different software packages (PyRadiomics/CapTk). Influence of hardware, acquisition, and patient-intrinsic factors (age/gender/cTN-stage) on ADC was assessed using linear regression. Feature reproducibility was assessed between segmentation methods and software packages using the intraclass correlation coefficient.

RESULTS

Image features differed significantly (p < 0.001) between centers with more substantial variations in ADC compared to T2W-MRI. In total, 64.3% of the variation in mean ADC was explained by differences in hardware and acquisition, compared to 0.4% by patient-intrinsic factors. Feature reproducibility between expert and non-expert segmentations was good to excellent (median ICC 0.89-0.90). Reproducibility for single-slice versus whole-volume segmentations was substantially poorer (median ICC 0.40-0.58). Between software packages, reproducibility was good to excellent (median ICC 0.99) for most features (first-order/shape/GLCM/GLRLM) but poor for higher-order (GLSZM/NGTDM) features (median ICC 0.00-0.41).

OBJECTIVES

To investigate sources of variation in a multicenter rectal cancer MRI dataset focusing on hardware and image acquisition, segmentation methodology, and radiomics feature extraction software.

CONCLUSIONS

Significant variations are present in multicenter MRI data, particularly related to differences in hardware and acquisition, which will likely negatively influence subsequent analysis if not corrected for. Segmentation variations had a minor impact when using whole volume segmentations. Between software packages, higher-order features were less reproducible and caution is warranted when implementing these in prediction models.

More about this publication

European radiology

Volume 32
Issue nr. 3
Pages 1506-1516
Publication date 01-03-2022

Full text links

Publisher website (DOI) 10.1007/s00330-021-08251-8
Europe PubMed Central 34655313
Pubmed 34655313