BigDFT is a modern Density Functional Theory software for ab-initio atomistic simulation of challenging materials and biological systems. It offers calculations for periodic systems, surfaces, wires, or isolated molecules, and thanks to the available linear scaling approach, also very large systems containing thousands of atoms can be simulated. The properties of BigDFT come from its Daubechies wavelets basis set. Wavelets form a flexible, systematic and accurate basis set that allow for an adaptive mesh. It features high-precision cubic-scaling DFT functionalities enabling treatment of molecular, slab-like as well as extended systems, and efficiently supports hardware accelerators such as GPUs since 2009.

License: BigDFT is a free and open source software, made available under the GPL license.

Key information

  • In addition to the traditional cubic-scaling DFT approach, the wavelet-based approach has enabled the implementation of an algorithm for DFT calculations of large systems containing many thousands of atoms, with a computational effort which scales linearly with the number of atoms. This feature enables electronic structure calculations of systems which were impractical to simulate even very recently.

  • Uses dual space Gaussian type norm-conserving pseudopotentials including those with non-linear core corrections, which have proven to deliver all-electron precision on various ground state quantities.

  • Its flexible poisson solver can handle a number of different boundary conditions including free, wire, surface, and periodic. It is also possible to simulate implicit solvents as well as external electric fields. Such technology is employed for the computations of hybrid functionals and time-dependent (TD) DFT.

Features

  • Pseudopotentials

  • Boundary Conditions

  • Electronic structure

Libraries

The compilation of the code suite relies on the splitting of the code components into modules, which are compiled by the \texttt{bundler} package. This package lays the groundwork for developing a common infrastructure for compiling and linking together libraries for electronic structure codes, and it is employed as the basis for the ESL bundle.

I/O requirements

The largest data files output by BigDFT are those containing the support functions. Thanks to the strict localization of these support functions, the file size remains constant as the system size increases (approximately 1 MB per file), and thanks to the fragment approach, the number of files should also not increase significantly so that even very large systems will have small data requirements. We therefore also have minimal I/O requirements: for BigDFT, I/O is limited to the start and end of a calculation, where there is reading and writing of the support functions to disk. The size and number of files will not (or only to a very limited extent) increase with system size. Currently there may be multiple MPI tasks reading from disk; so far this has not proven to be a bottleneck, but should this prove to be a problem when running at scale it would be straightforward to modify the code so that only one MPI task reads each file and communicates the data. Due to the small file sizes of the generated data, analysis will not require significant computational resources and will be limited to tasks such as calculating pair distribution functions and analyzing atomic coordination. These can either be performed as a post-processing step on a desktop computer or during the simulation.

Diffusion

The code is developed by a few individuals, ranging from 6 up to 15 people. The active code developers are, or have been, located in various groups in the world, including EU, UK, US, and Japan. For these reasons, in addition to production calculations aimed at scientific results, the code has been often employed as a test-bed for numerous case-study in computer science and by hardware/software vendors, to test the behavior of novel/prototype computer architectures in realistic runtimes.

References

[1] Ratcliff et al., Flexibilities of wavelets as a computational basis set for large-scale electronic structure calculations. J. Chem. Phys. 21 May 2020; 152 (19): 194110.

[2] Mohr et al., Accurate and efficient linear scaling DFT calculations with universal applicability, Phys. Chem. Chem. Phys., 2015,17, 31360-31370.

Performance

Scalability

Quantum ESPRESSO shows a very good scalability by exploiting a hybrid MPI-OpenMP paradigm. Quantum ESPRESSO (more specifically PWscf) is parallelized on different levels: k-points (linear scaling with the number of processors), bands, and plane waves/real space grids (reaching high cpu scaling and memory distribution). Custom (domain specific) FFT’s are implemented and parallelized over planes or sticks and implement task group techniques. Parallel dense linear algebra is also exploited to improve scalability and memory distribution. Thanks to this multi-level parallelism both computation and data structures are distributed in order to fully exploit massively multi-core parallel architectures.

Parallel programming

Two different parallelization paradigms are currently implemented in Quantum ESPRESSO, namely MPI and OpenMP. MPI is a well-established, general-purpose parallelization scheme. In Quantum ESPRESSO several parallelization levels, specified at run-time via command-line options to the executable, are implemented with MPI. This is the first choice for execution on a parallel machine. OpenMP can be implemented via compiler directives (explicit OpenMP) or via multithreading libraries (library OpenMP). Explicit OpenMP requires compilation for OpenMP execution; library OpenMP only requires linking to a multithreading version of mathematical libraries, e.g.: ESSLSMP, ACML MP, MKL (the latter is natively multi-threading).

HPC environments

Most of the applications of the suite are designed for the efficient usage of the state-of-the-art HPC machines using multiple parallelization levels. The basal workload distribution can be done using MPI + OpenMP multithreading, or offloading it to GPGPUs, depending on the nodes’ architecture. This parallelization level also provides an efficient data distribution among the MPI ranks. This allows to compute systems with up to ~10^4 atoms. The offloading or the usage of a growing number of MPI ranks are able to scale down the computational cost of 3D FFTs and other operations on 3D data grids.


Figure 1:
Comparison of a middle-sized benchmark calculation on various parallel systems. The plot shows the performance of 2 nodes on a homogeneous parallel cluster (LEONARDO-DCPG@CINECA) compared to two nodes of two Tier0 heterogeneous parallel clusters (LUMI-G and LEONARDO-BOOSTER@CINECA). On LUMI-G, we show the results of several implementations of the communication scheme to demonstrate the incremental improvements of AMD-Hip-based parallel accelerated machine support over the last two years.


Figure 2:
Benchmark of pool parallelism scaling efficiency of QUANTUM ESPRESSO kernels in heterogeneous parallel machines with NVIDIA A100 GPUs (LEONARDO-BOOSTER@CINECA) and AMD MI250x GPUs (LUMI-G@LUMI). For small-size cases, the QUANTUM ESPRESSO kernels in heterogeneous machines are already very efficient with one node. The suite provides several throughput parallelisation schemes to speed up these calculations.