Integrating clinical data with AI to optimise biopsy decisions after prostate MRI
Integrating clinical data with AI to optimise biopsy decisions after prostate MRI
AW Rix; NS Moreira Da Silva; J Budd; M Yeung; F Giganti; L Davies; PR Burn; RG Hindley; N Vasdev; AJ Bradley; G Maskell; A Andreou; S Liyanage; R Persad; J Aning; T Barrett; MDB Hinton; AR Padhani; E Sala; A Shah
Prostate MRI-directed biopsies have high false-positive rates. We evaluated the accuracy of multi-modal decision support models (combining AI findings, clinical data, and PI-RADS scores) for selecting men for biopsy after a prostate MRI.
MRI images, clinical history, histopathology, and PI-RADS data were obtained retrospectively from five NHS centres in the PAIR-1 retrospective study, spanning multiple vendors, scanner models and field strengths. After exclusions for AI contraindications including prior treatment and quality issues, 352 patients were partitioned for model training and 235 patients for held-out tests. Ground truth was biopsy-verified Gleason grade group (GG)≥2 cancer. Patients received standard-of-care biopsy according to local practice. MRI-negative (PI-RADS 1-2) patients who did not receive a biopsy were assumed negative for this analysis. Multi-stage AI-based software (Lucida Medical, Pi v2.2) that segments and calculates the volume of prostate gland and transition zone (TZ) on MRI, and segments and scores lesions/patients for GG≥2 disease likelihood, was separately developed using the training data together with the PROSTATEx dataset [A]. Machine learning model-based combinations of AI, clinical data and the original radiologists’ PI-RADS scores were developed using the training data. Patient-level sensitivity, specificity and AUC were evaluated on the held-out test data using ROC analysis, with 95% confidence intervals obtained through bootstrapping. For evaluation of sensitivity and specificity, thresholds equivalent to PI-RADS 3 were pre-determined on the training data. Sensitivity and specificity are additionally reported at the nearest threshold offering 90% or higher sensitivity. Performance was compared to the AI score and PI-RADS assessment alone.
Prevalence was 34% in the held-out test set for GG≥2. PI-RADS scores alone detected GG≥2 with sensitivity 1.00 (95% CI 1.00-1.00), specificity 0.67 (0.61-0.75) and AUC 0.94 (0.91-0.97). AI detected GG≥2 with sensitivity 0.97 (0.93-1.00), specificity 0.55 (0.47-0.62) and AUC 0.88 (0.84-0.92), using bpMRI (non-contrast) data. Findings with mpMRI were similar. The model combining AI score and PSA density determined from the transition zone volume (TZ-PSAD) had sensitivity 0.95 (0.90-0.99, KS p-value<0.001), specificity 0.70 (0.63-0.77, KS p-value<0.001) and AUC 0.90 (0.85-0.93, DeLong p-value 0.25). The model combining AI and TZ-PSAD with PI-RADS scores had sensitivity 0.99 (0.96-1.00, KS p-value<0.001), specificity 0.83 (0.77-0.89, KS p-value<0.001) and AUC 0.96 (0.93-0.98, DeLong p-value 0.003). TZ-PSAD had modest additional benefit compared to whole-prostate PSAD. Other variables offered <5% specificity improvements or non-significant benefits. Limitations: In this retrospective study, most MRI-negative cases did not receive a biopsy. This implies near-100% sensitivity for the clinical PI-RADS assessment. Comparison of specificity was therefore considered to be the most appropriate way to evaluate the potential added value of data integration in this analysis.
Per-patient sensitivity vs false positive rate, identifying patients with any GG>=2 cancer
Performance at PI-RADS 3 equivalent pre-set threshold
Performance at threshold closes to 90% sensitivity
Combining PI-RADS, PSAD and prostate MRI AI decision support substantially improved specificity compared to AI or PI-RADS assessments alone, with a small reduction in sensitivity. Additionally, combining PSAD with AI could similarly offer improved specificity at similar sensitivity.