Learned invertible primal-dual schemes with additional memory optimizations can be trained to reconstruct CBCT volumes directly from the projection data with clinically-relevant geometry and resolution. Such methods can offer better reconstruction quality and generalization compared to classical deep learning baselines.
Two LIRE models for small and for large field-of-view (FoV) setting were trained and validated on a set of 260 + 22 thorax CT scans and tested using a set of 142 thorax CT scans plus an out-of-distribution dataset of 79 head and neck CT scans. For both settings, our method surpasses the classical methods and the deep learning baselines on both test sets. On the thorax CT set, our method achieves peak signal-to-noise ratio (PSNR) of 33.84 ± 2.28 for the small FoV setting and 35.14 ± 2.69 for the large FoV setting; U-Net baseline achieves PSNR of 33.08 ± 1.75 and 34.29 ± 2.71 respectively. On the head and neck CT set, our method achieves PSNR of 39.35 ± 1.75 for the small FoV setting and 41.21 ± 1.41 for the large FoV setting; U-Net baseline achieves PSNR of 33.08 ± 1.75 and 34.29 ± 2.71 respectively. Additionally, we demonstrate that LIRE can be finetuned to reconstruct high-resolution CBCT data with the same geometry but 1 mm voxel spacing and higher detector panel resolution, where it outperforms the U-Net baseline as well.
In this work, we aim to address these limitations and propose LIRE: a learned invertible primal-dual iterative scheme for CBCT reconstruction.
LIRE is a learned invertible primal-dual iterative scheme for CBCT reconstruction, wherein we employ a U-Net architecture in each primal block and a residual convolutional neural network (CNN) architecture in each dual block. Memory requirements of the network are substantially reduced while preserving its expressive power through a combination of invertible residual primal-dual blocks and patch-wise computations inside each of the blocks during both forward and backward pass. These techniques enable us to train on data with isotropic 2 mm voxel spacing, clinically-relevant projection count and detector panel resolution on current hardware with 24 GB video random access memory (VRAM).
Cone beam computed tomography (CBCT) plays an important role in many medical fields nowadays. Unfortunately, the potential of this imaging modality is hampered by lower image quality compared to the conventional CT, and producing accurate reconstructions remains challenging. A lot of recent research has been directed towards reconstruction methods relying on deep learning, which have shown great promise for various imaging modalities. However, practical application of deep learning to CBCT reconstruction is complicated by several issues, such as exceedingly high memory costs of deep learning methods when working with fully 3D data. Additionally, deep learning methods proposed in the literature are often trained and evaluated only on data from a specific region of interest, thus raising concerns about possible lack of generalization to other regions.