Execution time of high-precision BLAS Level 1 operations on Intel Core i5-4590 and NVIDIA Turing RTX 2060

Published: 11-03-2020| Version 2 | DOI: 10.17632/yrdh6r3sgx.2
Konstantin Isupov,
Vladimir Knyazkov,
Alexander Kuvaev


This dataset contains the execution time of four BLAS Level 1 operations - ASUM, DOT, SCAL and AXPY - implemented using extended- and multiple-precision software for central processing units (CPUs) and CUDA compatible graphics processing units (GPUs). The experiments were conducted on an Intel Core i5-4590 processor and an NVIDIA Turing RTX 2060 graphics card. Each raw file provided contains the results of three test runs. For each test run, the BLAS function was repeated ten times, and the total execution time of ten iterations was measured. The complete source code for the tests can be found at https://github.com/kisupov/mpres-blas. Main parameters of the experiments: • Operation size: 1000000; • Number of repeats: 10; • Input data sets were composed of random floating-point numbers in the range from −1 to 1; • Measurements are in milliseconds. Experimental environment: • Intel Core i5-4590 (3.30 GHz, 4 Cores/4 Threads); • 16 GB DDR3 system memory; • NVIDIA Turing RTX 2060 GPU (1920 CUDA Cores, Compute Capability 7.5, 6 GB GDDR6 memory); • Ubuntu 19.10 (development branch); • GCC compiler version 7.4.0; • CUDA Toolkit 10.1.105; • nvcc flags: -O3 -DNDEBUG -use_fast_math -std=c++14 -Xcompiler=-O3,-fopenmp,-ffast-math. The following software are considered that provide computations with extended or multiple precision: • For CPU: -- MPFR (https://www.mpfr.org); -- ARPREC (https://www.davidhbailey.com/dhbsoftware); -- MPDECIMAL (https://www.bytereef.org/mpdecimal); -- MPACK (http://mplapack.sourceforge.net); -- XBLAS (https://www.netlib.org/xblas). • For GPU: -- GARPREC (https://code.google.com/archive/p/gpuprec/downloads); -- CAMPARY (http://homepages.laas.fr/mmjoldes/campary); -- CUMP (https://github.com/skystar0227/CUMP); -- MPRES-BLAS (https://github.com/kisupov/mpres-blas).