DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization
We present DFT-FE 1.0, building on DFT-FE 0.6 Motamarri et al. (2020) , to conduct fast and accurate large-scale density functional theory (DFT) calculations (reaching ~100,000 electrons) on both many-core CPU and hybrid CPU-GPU computing architectures. This work involves improvements in the real-space formulation—via an improved treatment of the electrostatic interactions that substantially enhances the computational efficiency—as well high-performance computing aspects, including the GPU acceleration of all the key compute kernels in DFT-FE. We demonstrate the accuracy by comparing the ground-state energies, ionic forces and cell stresses on a wide-range of benchmark systems against those obtained from widely used DFT codes. Further, we demonstrate the numerical efficiency of our GPU acceleration, which yields ∼20× speed-up on hybrid CPU-GPU nodes of the Summit supercomputer. Notably, owing to the parallel-scaling of the GPU implementation, we obtain wall-times of 80 - 140 seconds for full ground-state calculations, with stringent accuracy, on benchmark systems containing ~6,000 - 15,000 electrons using 64 - 224 nodes of the Summit supercomputer.