This review aims to identify factors causing heterogeneity in breast DWI-MRI and their impact on its value for identifying breast cancer patients with pathological complete response (pCR) on neoadjuvant systemic therapy (NST). A search was performed on PubMed until April 2020 for studies analyzing DWI for identifying breast cancer patients with pCR on NST. Technical and clinical study aspects were extracted and assessed for variability. Twenty studies representing 1455 patients/lesions were included. The studies differed with respect to study population, treatment type, DWI acquisition technique, post-processing (e.g., mono-exponential/intravoxel incoherent motion/stretched exponential modeling), and timing of follow-up studies. For the acquisition and generation of ADC-maps, various b-value combinations were used. Approaches for drawing regions of interest on longitudinal MRIs were highly variable. Biological variability due to various molecular subtypes was usually not taken into account. Moreover, definitions of pCR varied. The individual areas under the curve for the studies range from 0.50 to 0.92. However, overlapping ranges of mean/median ADC-values at pre- and/or during and/or post-NST were found for the pCR and non-pCR groups between studies. The technical, clinical, and epidemiological heterogeneity may be causal for the observed variability in the ability of DWI to predict pCR accurately. This makes implementation of DWI for pCR prediction and evaluation based on one absolute ADC threshold for all breast cancer types undesirable. Multidisciplinary consensus and appropriate clinical study design, taking biological and therapeutic variation into account, is required for obtaining standardized, reliable, and reproducible DWI measurements for pCR/non-pCR identification.