New MaX GPU strategy speeds up plane-waves DFT for metals with dense k-point sampling, improving simulations for small-unit-cell systems.
Researchers from MaX CoE, Xuejun Gong and Andrea Dal Corso, have developed an alternative GPU acceleration method for plane-waves pseudopotential electronic structure codes. The work targets a specific but important challenge: metallic systems with small unit cells that require a very large number of k-points to accurately sample the Brillouin zone, as in phonon and thermodynamic studies.
The novelty lies in processing many k-point wavefunctions in parallel directly on the GPU, with each GPU thread handling one wavefunction or a part of it. This is achieved through CUDA Fortran GLOBAL and DEVICE routines, adapted FFT and LAPACK functions, and an optimized workflow that maximizes GPU memory usage. The result is a significant performance improvement over CPU-only and existing GPU-accelerated approaches for the intended systems.
This advancement benefits materials science, condensed matter physics, and high-throughput computational materials design, particularly for simulations where standard GPU methods underperform due to small problem sizes per k-point.
The study was implemented in the MaX code thermo_pw, a driver of Quantum ESPRESSO for thermodynamic and phonon calculations. The new many_k mode allows simultaneous GPU processing of multiple k-points, improving throughput for metallic systems.
By overcoming inefficient GPU scaling in small-unit-cell, many-k-point scenarios, this work opens the door to faster and more energy-efficient simulations of metallic properties, phonons, and finite-temperature effects.
This GPU strategy applies the Hamiltonian in parallel across many k-points, making GPUs highly effective for metallic systems with dense Brillouin-zone sampling. It offers notable speedups and broader applicability for targeted high-performance materials simulations.
More about
thermo_pw is a Quantum ESPRESSO driver developed within MaX. It automates phonon and thermodynamic workflows and now includes many_k GPU mode for parallel multi–k-point wavefunction processing.
Reference article
X. Gong, A. Dal Corso, An alternative GPU acceleration for a pseudopotential plane-waves density functional theory code with applications to metallic systems, Comput. Phys. Commun. (2024). DOI: 10.1016/j.cpc.2024.109439