TTDFT: A GPU accelerated Tucker tensor DFT code for large-scale Kohn-Sham DFT calculations
We present the Tucker tensor DFT (TTDFT) code which uses a tensor-structured algorithm with graphic processing unit (GPU) acceleration for conducting ground-state DFT calculations on large-scale systems. The Tucker tensor DFT algorithm uses a localized Tucker tensor basis computed from an additive separable approximation to the Kohn-Sham Hamiltonian. The discrete Kohn-Sham problem is solved using Chebyshev filtered subspace iteration method that relies on matrix-matrix multiplications of a sparse symmetric Hamiltonian matrix and a dense wavefunction matrix, expressed in the localized Tucker tensor basis. These matrix-matrix multiplication operations, which constitute the most computationally intensive step of the solution procedure, are GPU accelerated providing ∼8-fold GPU-CPU speedup for these operations on the largest systems studied. The computational performance of the TTDFT code is presented using benchmark studies on aluminum nano-particles and silicon quantum dots with system sizes ranging up to ∼7,000 atoms.