Parallel time integration using Batched BLAS (Basic Linear Algebra Subprograms) routines
We present an approach for integrating the time evolution of quantum systems. We leverage the computation power of graphics processing units (GPUs) to perform the integration of all time steps in parallel. The performance boost is especially prominent for small to medium-sized quantum systems. The devised algorithm can largely be implemented using the recently-specified batched versions of the BLAS routines, and can therefore be easily ported to a variety of platforms. Our PARAllelized Matrix Exponentiation for Numerical Time evolution (PARAMENT) implementation runs on CUDA-enabled graphics processing units.