PittPack: An open-source Poisson’s equation solver for extreme-scale computing with accelerators
We present a parallel implementation of a direct solver for Poisson’s equation on extreme-scale supercomputers with accelerators. We introduce a chunked-pencil decomposition as the domain-decomposition strategy to distribute work among processing elements to achieve improved scalability at high counts of accelerators. Chunked-pencil decomposition enables overlapping MPI communication and data transfer between the central processing units (CPUs) and the graphics processing units (GPUs). It enables contiguous message transfer among the nodes and improves data locality by keeping neighboring elements in adjacent memory locations while permitting the use of shared memory for certain segments of the algorithm when possible. We study two different communication patterns within the chunked-pencil decomposition. The first pattern fully overlaps the communication with data transfer and aims to speedup the overall turnaround time. The second pattern concentrates on low memory usage and is more network friendly than the first pattern for computations at extreme scale. In our parallel implementation, we interleave OpenACC with MPI to support computations on the GPU or the CPU. The numerical solution and its formal second order of accuracy is verified using the method of manufactured solutions for various combinations of boundary conditions. Additionally, we used PittPack within an incompressible flow solver to further validate its accuracy and as well as demonstrate its versatility as a software package. We performed weak scaling analysis with up to 1.1 trillion Cartesian mesh points distributed over 16384 GPUs on a petascale leadership class supercomputer.