Massive-scale simulations of 2D Ising and Blume-Capel models on rack-scale multi-GPU systems
Description
We present high-performance implementations of the two-dimensional Ising and Blume-Capel models for large-scale, multi-GPU simulations. Our approach takes full advantage of the NVIDIA GB200 NVL72 system, which features up to 72 GPUs interconnected via high-bandwidth NVLink, enabling direct GPU-to-GPU memory access across multiple nodes. By utilizing Fabric Memory and an optimized Monte Carlo kernel for the Ising model, our implementation supports simulations of systems with linear sizes up to L = 2^23, corresponding to approximately 70 trillion spins. This allows for a peak processing rate of nearly 1.15 x 10^5 lattice updates per nanosecond—setting a new performance benchmark for Ising model simulations. Additionally, we introduce a custom protocol for computing correlation functions, which strikes an optimal balance between computational efficiency and statistical accuracy. This protocol enables large-scale simulations without incurring prohibitive runtime costs. Benchmark results show near-perfect strong and weak scaling up to 64 GPUs, demonstrating the effectiveness of our approach for large-scale statistical physics simulations.