Filter Results
4395 results
- kalypsso: A performance portable platform for compressible hydrodynamics simulations using adaptive mesh refinementWe introduce kalypsso (a Kokkos Applicative LaYer for Parallel and Scalable Solvers on Octrees): a new octree-based block-structured adaptive mesh refinement (AMR) framework using the C++ kokkos library for designing performance portable applications in computational fluid dynamics (CFD). Mesh management in distributed memory is implemented with the help of the p4est library, which provides a MPI parallel CPU implementation of the forest of octrees AMR algorithms. All heavyweight application data are allocated on a computing device, either a CPU or a GPU, and managed directly by kalypsso. One of the key design choice of kalypsso architecture is to use a lightweight hash-table-based (or dictionary) data structure for exchanging mesh geometry information between p4est, running on the host CPU, and the computational kernels executed on the accelerated device. Several finite volume methods for compressible monofluid and bifluid hydrodynamics, as well as magnetohydrodynamics are implemented using the kokkos programing model for exploiting shared memory parallelism on most existing CPU and GPU-based architecture. Node-level performance metrics for a second-order MUSCL-Hancock finite volume solver are measured to evaluate the impact of the size of the grid of cells attached onto octree leaves. A single Nvidia GH200 GPU can perform about 1.4 billions cell-updates per second. The performance portability on a cluster of CPU and GPU is demonstrated; a node-to-node weak scaling efficiency of ∼ 80% is obtained on a cluster of 512 Nvidia GH200 GPUs. Using comparable hardware resources and considering the Euler equation solver in kalypsso with AMR activated, a consistent × 5 CPU to GPU time to solution speed-up is obtained.
- Crystal Dislocation Generator (CryDisGen): A versatile toolkit to create general dislocation structures in crystalsThe construction of atomic model featuring realistic initial dislocation structures poses a critical challenge for molecular dynamics (MD) simulations. Most existing methods are limited to generating dislocation with simple geometrical morphologies, restricting the generalization of atomic simulations. In this study, we present a scientific software CryDisGen, a versatile and robust toolkit designed to create atomic models with arbitrary dislocation configurations. Based on the displacement field of dislocation derived from the classical Burgers model, CryDisGen can effectively handle dislocations of complex morphologies. The versatility and effectiveness of CryDisGen have been demonstrated through the successful construction of a variety of representative dislocation structures in face-centered cubic (FCC) and body-centered cubic (BCC) crystals. CryDisGen provides a powerful and flexible framework for generating dislocations in crystalline materials, facilitating the atomic modelling with realistic dislocation structures in MD simulations.
- κALDo 2.0: Scalable thermal transport from first principles and machine learning potentialsWe introduce κALDo 2.0, an open-source Python package for computing vibrational, elastic, and thermal transport properties of crystalline and disordered solids from first principles and machine-learned interatomic potentials. Building on the anharmonic lattice dynamics (ALD) framework, κALDo 2.0 provides efficient CPU and GPU-accelerated implementations of the Boltzmann transport equation (BTE) for crystals and the quasi-harmonic Green-Kubo (QHGK) method. The QHGK formalism extends thermal transport predictions beyond translationally-invariant crystals to materials lacking long-range order, including glasses, alloys, and complex nanostructures. κALDo 2.0 introduces native integration with modern machine-learned potentials (MLPs), enabling thermal transport workflows that combine the accuracy of first-principles methods with the scalability of classical force fields. It also features comprehensive support for temperature-dependent effective potentials (TDEP) workflows, flexible storage backends for large-scale calculations, and advanced quantification of anharmonicity. The software seamlessly interfaces with electronic structure codes (Quantum ESPRESSO, VASP), molecular dynamics packages (LAMMPS), and state-of-the-art MLPs (ACE, NEP, MACE, MatterSim, Orb), enabling thermal transport studies from 0 K to finite temperatures. κALDo 2.0 implements multiple BTE solution strategies (relaxation time approximation, self-consistent iteration, full matrix inversion, and eigendecomposition) and supports essential physical corrections, including isotopic scattering and non-analytical terms for polar materials. A modular Python architecture with lazy evaluation and multiple storage formats (formatted text, NumPy, HDF5) enables simulations of systems containing more than 10,000 atoms. This paper describes the theoretical framework, implementation details, software architecture, and validation examples demonstrating κALDo 2.0’s capabilities for studying complex materials, including halide perovskites with strong anharmonicity and polar oxides requiring long-range electrostatic corrections.
- MF-toolkit: A high-performance python library for multifractal analysis with automated crossover detection, source identification and application to gravitational waves dataMultifractal Detrended Fluctuation Analysis (MFDFA) is a powerful and widely used technique for characterizing the scaling properties and long-range correlations of complex time series. However, its application often involves significant practical challenges, such as the subjective identification of scaling regions (crossovers) and the disambiguation of the physical origins of multifractality. We introduce MF-toolkit, a high-performance, parallelized Python library designed to address these challenges. It integrates three key innovations: (1) fully automatic crossover detection algorithms (CDV-A and SPIC), which remove operator bias and enhance reproducibility; (2) a built-in implementation of the Iterative Amplitude Adjusted Fourier Transform (IAAFT) for generating surrogate data, enabling the robust identification of the source of multifractality; and (3) a comprehensive suite for generating synthetic time series for rigorous validation. We demonstrate the rigor and utility of MF-toolkit through its application to characterize the multifractal properties of non-stationary noise in gravitational wave (LIGO) data. The MF-toolkit library offers a robust, efficient, and user-friendly tool for advanced time series analysis, facilitating more rigorous and reproducible research across physics and other data-intensive fields.
- Computation of the dynamics of rotating 2D/3D Gross-Pitaevskii equations based on the HPC pseudo-spectral solver BEC2HPCThe aim of this paper is to present the extension of the HPC pseudo-spectral solver BEC2HPC to compute the dynamics of 2D/3D rotating Bose-Einstein condensates modeled by the Gross-Pitaevskii equations with a rotation term. Numerical examples are provided to show the efficiency of the solver for large-scale simulations.
- A GPU-accelerated matrix-free FAS multigrid solver for Navier-Stokes equations with memory-efficient implementationsWe develop a matrix-free Full Approximation Storage (FAS) multigrid solver based on staggered finite differences and implemented in MATLAB with GPU acceleration. To improve single-GPU efficiency, intermediate variables are reused and an X-shape Multi-Color Gauss–Seidel (X-MCGS) smoother is introduced. In the present MATLAB GPU setting, this parity-based multicolor organization enables regular vectorized updates and avoids the masking-style implementation that is less convenient for GPU array execution. Restriction and prolongation operators are also implemented in MATLAB GPU arrays. Algebraic and asymptotic convergence tests verify the solver’s robustness and accuracy, while benchmark studies on large-scale problems show effective multigrid execution on large grids. To overcome GPU memory limitations, we further design memory-efficient implementations of first- and second-order projection schemes for the Navier–Stokes equations using a dynamic reuse strategy, which reduces GPU-resident variables from 12 (first-order scheme) and 15 (second-order scheme) to only 8, lowering memory footprint and improving performance by 20–30%. This enables 512^3 Navier–Stokes computations on a single RTX 4090, where classical implementations exceed device memory. The applicability of the solver is further demonstrated through large-scale simulations. Grain growth simulations on a 512^2 grid accommodate up to q = 1189 orientations in 2D and q = 123 in 3D, with fitted growth exponents reproducing the expected scaling laws. Moreover, the memory-efficient Navier–Stokes implementations, coupled with the Cahn–Hilliard equations, enable air–water two-bubble coalescence simulations on a 256 × 256 × 1024 grid using a single RTX 4090 GPU, yielding results in close agreement with experimental observations.
- TNL-SPH: Open-source modular SPH solver for modern computing platforms based on GPU acceleratorsTNL-SPH is a novel open-source implementation of Smoothed Particle Hydrodynamics (SPH), integrated as a submodule of the Template Numerical Library (TNL), designed for modern distributed computing platforms with GPU accelerators. Focusing on hydrodynamic problems using weakly compressible formulations, TNL-SPH offers a modular, high-performance framework for implementing various SPH schemes and particle methods. Developed in C++17, it leverages TNL’s templated vectors and expression templates to provide compact, algebraic representations of numerical schemes, simplifying the development of complex physical models. Its multilayered design separates parallelism, performance, and numerical methods, enabling seamless execution across diverse hardware, including multi-GPU and distributed systems. TNL-SPH outperforms existing SPH codes for free-surface flows and supports scalable, high-performance simulations. This paper presents the design, implementation, and performance of TNL-SPH, alongside its applications in hydraulic problems, demonstrating its versatility and efficiency for scientific and engineering computations.
- GollumFit: An icecube open-source framework for binned-likelihood neutrino telescope analysesWe present GollumFit, a framework designed for performing binned-likelihood analyses on neutrino telescope data. GollumFit incorporates model parameters common to any neutrino telescope and also model parameters specific to the IceCube Neutrino Observatory. We provide a high-level overview of its key features and how the code is organized. We then discuss the performance of the fitting in a typical analysis scenario, highlighting the ability to fit over tens of nuisance parameters. We present some examples showing how to use the package for likelihood minimization tasks. This framework uniquely incorporates the particular model parameters necessary for neutrino telescopes, and solves an associated likelihood problem in a time-efficient manner.
- H-NESSi: The hierarchical non-equilibrium systems simulation packageWe present H-NESSi (The Hierarchical Non-Equilibrium Systems Simulation package), an open-source software package for solving the Kadanoff-Baym equations (KBE) of nonequilibrium Green’s function (NEGF) theory using hierarchical low-rank compression techniques. The simulation of correlated quantum systems out of equilibrium is severely limited by the cubic scaling in propagation time and quadratic memory growth associated with conventional two-time formulations. H-NESSi overcomes these limitations by combining high-order time-stepping schemes with hierarchical off-diagonal low-rank (HODLR) representations of the retarded and lesser Green’s functions, enabling controllable accuracy at substantially reduced computational cost and memory usage. Imaginary-time quantities are efficiently represented using the discrete Lehmann representation (DLR), enabling compact and accurate treatment of thermal initial states. The implementation supports multiorbital systems, adaptive singular value truncation, and both shared-memory (OpenMP) and distributed-memory (MPI) parallelization strategies suitable for large-scale lattice calculations. The workflow closely mirrors established NEGF frameworks while introducing compression transparently into the propagation procedure. Benchmark applications to driven superconductors within dynamical mean-field theory and to the two-dimensional Hubbard model demonstrate favorable scaling compared to conventional implementations, with asymptotic time complexity significantly below the cubic scaling of uncompressed approaches. H-NESSi thus enables long-time and large-system nonequilibrium simulations of correlated quantum materials, which were previously computationally prohibitive.
- ChemNetworks: New capabilities for high-throughput, real-time chemical graph construction and analysisA major revision of the ChemNetworks software (originally published in the Journal of Computational Chemistry, 2014, 35, 495–505) is presented. While the original ChemNetworks provided foundational graph construction capabilities for chemical systems, it was limited to simple distance and 3-body angular edge criteria, was not designed for high-performance computing environments or real-time operation alongside running simulations. This release addresses these limitations through three core contributions. First, a recursive Z-matrix-based search algorithm is introduced that enables chemically intuitive, arbitrarily descriptive three-dimensional structure searches, supporting geometric, energetic, and logical criteria. Second, the DataSpaces data staging framework is incorporated as an optional I/O engine, enabling in-memory data exchange between ChemNetworks and running simulations that eliminates persistent storage bottlenecks and supports real-time graph construction and analysis. Third, a modular analysis framework leveraging the igraph library is introduced, providing a straightforward plugin architecture for community-contributed workflows. Benchmark results demonstrate linear scaling with system size and efficient MPI parallelization across up to 64 cores, with total computational complexity of O(N^R T/P), where N is the number of atoms, R is the Z-matrix depth, T is the number of timesteps, and P is the number of MPI processes.
