We used 1500 MRI scans from the PI-CAI challenge training subset. Positive scans had 220 human and 205 AI-generated annotations. The mtU-Net (proposed teacher-student semi-supervised approach) was compared to supervised (trained using only 220 human annotations) and semi-supervised (trained on human and AI-generated annotations) nnU-Net. The 205 AI-annotated scans were manually annotated, and a fully supervised model was trained. External validation was performed on a newly annotated dataset from the PROMIS study (n = 574, 403 lesions) and the Prostate158 dataset (n = 158, 126 lesions). Patient-level performance was evaluated using the area under the curve (AUC) and lesion-level detection (overlap > 0.10) using average precision (AP), along with 95% confidence Intervals (in brackets), and the DeLong test to compare AUCs against the supervised and fully supervised models.
In prostate MRI tumor detection, fully supervised learning performed best. However, in external validation, the semi-supervised methods demonstrated performance that approached that of the fully supervised model, proving a valuable approach when expert annotations are limited.
The fully supervised nnU-Net showed the highest performance on the internal PI-CAI test set (AUC = 0.89 [0.87-0.91], AP = 0.65 [0.60-0.70]) and external validation datasets PROMIS (AUC = 0.68 [0.64-0.72], AP = 0.24 [0.20-0.29]) and Prostate158 (AUC = 0.87 [0.82-0.92], AP = 0.64 [0.56-0.72]), significantly outperforming the supervised baseline (p < 0.0 5). The proposed semi-supervised mtU-Net demonstrated close external validation performance on PROMIS (AUC = 0.66 [0.62-0.71], AP = 0.20 [0.16-0.25]) and Prostate158 (AUC = 0.86 [0.81-0.92], AP = 0.58 [0.49-0.67]), significantly outperforming the supervised baseline on both datasets (p = 0.047 and p = 0.014, respectively), and showing no significant difference to the fully supervised model (p = 0.199 and p = 0.702, respectively).
To evaluate the diagnostic performance of semi-supervised learning models for aggressive prostate cancer detection on MRI compared to fully supervised models trained with additional expert annotations.
Question The need for extensive expert voxel-level annotations delays the development of AI-based prostate cancer diagnostic tools and their implementation in clinical practice. Findings The combination of pseudo-labeling with consistency regularization achieved performance comparable to that of fully supervised methods, demonstrating that data diversity matches the impact of expert annotation volume. Clinical relevance Semi-supervised learning reduces dependence on expert annotations while maintaining detection accuracy, enabling the development of scalable, automated diagnostic tools for prostate cancer amid growing clinical workflow demands.
This website uses cookies to ensure you get the best experience on our website.