Execution time of double-precision and high-precision GER implementations on Intel Core i5-7500 and NVIDIA Turing RTX 2080
This dataset contains the execution time for general rank-one update kernels (GER, BLAS Level 2) implemented using existing double-precision linear algebra software as well as multiple-precision libraries for CPU and GPU. Each raw file provided contains the results of three test runs in milliseconds. For each test run, the GER kernel was repeated ten times, the total execution time of ten iterations was measured, and then the average was calculated. The complete source code for the tests can be found at https://github.com/kisupov/mpres-blas. Common experiment settings: • Dense, random, 5000-by-5000 general matrix; • Measurements are in milliseconds; • Arithmetic precision from 106 to 424 bits. Experimental environment: • Intel Core i5 7500 processor; • 32GB of DDR4 system memory; • NVIDIA Turing RTX 2080 GPU (2944 CUDA Cores, Compute Capability 7.5, 8GB of GDDR6 memory); • Ubuntu 20.04.5 LTS; • NVIDIA Driver V455.32.00; • CUDA Toolkit V11.1. The following GER implementations are evaluated: • OpenBLAS (OpenMP, 53 bits) – double-precision implementation for CPU using OpenBLAS (https://github.com/xianyi/OpenBLAS); • Custom double on CPU (OpenMP, 53 bits) – custom double-precision parallel (OpenMP) implementation; • MPFR (OpenMP) – multiple-precision parallel implementation using the GNU MPFR Library for CPU (https://www.mpfr.org/); • cuBLAS (53 bits) – double-precision implementation for CUDA using the NVIDIA Basic Linear Algebra Subroutines library (https://docs.nvidia.com/cuda/cublas/index.html); • Custom double on GPU (53 bits) – custom double-precision CUDA implementation; • MPRES-BLAS – multiple-precision CUDA implementation using MPRES-BLAS library (https://github.com/kisupov/mpres-blas); • CAMPARY – multiple-precision CUDA implementation using CAMPARY library (https://homepages.laas.fr/mmjoldes/campary/).