Abstract
PET-CT lesion segmentation remains challenging due to heterogeneous lesion appearance, small and dispersed lesions, physiological FDG uptake, and limited annotations. Existing self-supervised methods are mostly designed for unimodal imaging and therefore fail to fully exploit the complementary anatomical and metabolic information in PET-CT. Meanwhile, conventional multi-cancer segmentation strategies often treat different cancer types as a unified task, which weakens cancer-specific features, and existing prompt-based methods still have limited task adaptation and sensitivity to small lesions. To address these limitations, a unified two-stage framework for multi-cancer PET-CT segmentation is presented. First, a modality-guided probabilistic masked autoencoder is introduced to enhance cross-modal PET-CT representation learning through modality-specific masking. Second, a dual-prompt downstream segmentation network is designed to model both cancer-specific characteristics and cross-cancer shared knowledge, with prompt-aware heads further improving task adaptation and small-lesion delineation. Experiments on a multi-cancer PET-CT dataset show consistent improvements over the best-performing non-prompt and prompt-based baselines, with average Dice gains of 2.51% and 2.18%, respectively. The framework is further applied to an unannotated breast cancer cohort for survival analysis, demonstrating promising generalizability and improved risk stratification. The code is available at: https://github.com/XinglongLiang08/DpDNet.