Execution time of double-precision and high-precision SYRk implementations on Intel Core i5-7500 and NVIDIA Turing RTX 2080

Name: Execution time of double-precision and high-precision SYRk implementations on Intel Core i5-7500 and NVIDIA Turing RTX 2080
Creator: Konstantin Isupov
Published: 2022-12-19T18:19:45.480Z
Keywords: High Performance Computing, Graphics Processor, Multiple Precision Arithmetic, Computer Arithmetic, Basic Linear Algebra

Isupov, Konstantin

doi:10.17632/9hksh8srdh.1

Execution time of double-precision and high-precision SYRk implementations on Intel Core i5-7500 and NVIDIA Turing RTX 2080

Published: 19 December 2022| Version 1 | DOI: 10.17632/9hksh8srdh.1

Contributor:

Konstantin Isupov

Description

This dataset contains the execution time for symmetric rank-k update kernels (SYRk, BLAS Level 3) implemented using existing double-precision linear algebra software as well as multiple-precision libraries for CPU and GPU. The operation is C = α * op(A) * op(A^T) + β * C, where α and β are scalars, C is a symmetric matrix, A is a general matrix, and op(A) is one of op(A) = A or op(A) = A^T. Here op(A) is N-by-K. Each raw file provided contains the results of three test runs in milliseconds. The complete source code for the tests can be found at https://github.com/kisupov/mpres-blas. Common experiment settings: • Dense, random, 1000-by-1000 matrices A and C; • Only the upper triangular part of matrix C was used; • Random scalars α and β; • Measurements are in milliseconds; • Arithmetic precision from 106 to 424 bits. Test cases considered: • Non transposed: op(A) = A, op(A^T) = A^T; • Transposed: op(A) = A^T, op(A^T) = A. Experimental environment: • Intel Core i5 7500 processor; • 32GB of DDR4 system memory; • NVIDIA Turing RTX 2080 GPU (2944 CUDA Cores, Compute Capability 7.5, 8GB of GDDR6 memory); • Ubuntu 20.04.5 LTS; • NVIDIA Driver V455.32.00; • CUDA Toolkit V11.1. The following SYRk implementations are evaluated: • OpenBLAS (OpenMP, 53 bits) – double-precision implementation for CPU using OpenBLAS (https://github.com/xianyi/OpenBLAS); • Custom double on CPU (OpenMP, 53 bits) – custom double-precision parallel (OpenMP) implementation; • MPFR (OpenMP) – multiple-precision parallel implementation using the GNU MPFR Library for CPU (https://www.mpfr.org/); • cuBLAS (53 bits) – double-precision implementation for CUDA using the NVIDIA Basic Linear Algebra Subroutines library (https://docs.nvidia.com/cuda/cublas/index.html); • Custom double on GPU (53 bits) – custom double-precision CUDA implementation; • MPRES-BLAS – multiple-precision CUDA implementation using MPRES-BLAS library (https://github.com/kisupov/mpres-blas); • CAMPARY – multiple-precision CUDA implementation using CAMPARY library (https://homepages.laas.fr/mmjoldes/campary/).

Execution time of double-precision and high-precision SYRk implementations on Intel Core i5-7500 and NVIDIA Turing RTX 2080

Description

Files

Categories

Licence