DUGKS-GPU: An efficient parallel GPU code for 3D turbulent flow simulations using Discrete Unified Gas Kinetic Scheme

Published: 7 May 2024| Version 1 | DOI: 10.17632/yykv5s9g2n.1


This paper presents a parallel implementation of the Discrete Unified Gas Kinetic Scheme (DUGKS) on the GPU system using the CUDA Fortran and CUDA C++ programming languages. Firstly, we conducted an extensive revision of our original CPU-based code, resulting in a threefold decrease in memory usage. This new implementation is also paired with a novel approach to compute cell face flux using trilinear interpolation. It is shown analytically that the interpolation-based approach to flux calculation is more accurate compared to the one used in the original DUGKS. The initial simulation results using this new approach suggest that trilinear interpolation can reduce numerical errors on a coarse mesh. For example, in the case of the decaying Taylor-Green vortex flow at a 1283 mesh resolution, the relative numerical error in the energy dissipation rate at ⁎, using the spectral simulation result as the benchmark, is approximately 30% lower than that of the original implementation. The improved GPU DUGKS method is applied to laminar and turbulent flows in periodic and wall-bounded boundary configurations. A performance comparison of the GPU implementation is also presented and compared to the previous CPU implementation. A maximum speedup of 7.64x was achieved on a desktop-level GPU compared to a 32-core CPU. The strong scaling test, conducted on an eight-GPU node, demonstrated the efficient utilization of available multiple GPU resources by the code.



Computational Physics, Turbulent Flow