- About MaX
- Contact us
BigDFT is an electronic structure pseudopotential code that employs Daubechies wavelets as a computational basis, designed for usage on massively parallel architectures. It features high-precision cubic-scaling DFT functionalities enabling treatment of molecular, slab-like as well as extended systems, and efficiently supports hardware accelerators such as GPUs since 2009. Also, it features a linear-scaling algorithm that employs adaptive Support Functions (generalized Wannier orbitals) enabling the treatment of system of many thousand atoms. The code is developed and released as a software suite made of independent, interoperable components, some of which have already been linked and distributed in other DFT codes.
The BigDFT package has been used in production for eight years, mainly in the domain of structure prediction calculations. The recent activities of the BigDFT consortium, performed in the context of various EU projects, have already concentrated some effort in finding solutions for present-day problems for HPC at the petascale level, either from the Developers and Users perspectives. Among the various actions, it is worth to mention, in particular, that the entire code package has been restructured and redesigned such as to be:
BigDFT is freely available under the GPL license. The code is distributed either in form of tarball releases as well as directly available from the developers source repository, in a rolling-release format. Developers and users are able to stay updated about new functionalities and integrate ongoing developments in the development branch. Together with the release of the complete package, the BigDFT suite also provides releases of all the independent and separate libraries that constitute the building-blocks of the code. A mechanism to bind these together is provided in a code-agnostic way, and this mechanism is also used in the Electronic Structure Library (ESL) bundle. Figure 1 provides an example of how these libraries are linked together in the complete package suite.
Performance in Parallel Computation environments
BigDFT is mainly written in Fortran, with some modules employing OpenCL, CUDA. It also provides C and Python bindings to its high level routines. From its initial conception the code has been parallelized with a distributed and shared memory paradigms (MPI and OpenMP). It is a code that exhibit excellent parallel scaling on supercomputers. Since the early days of the GPGPU computing (from 2009) the code profitted of hardware accelerators such as GPUs. Such acceleration is constantly improved and it has recently been inserted in the PSolver package for the efficient calculation of the Fock exchange term. The linear scaling version of the code shows also excellent scaling and state-of-the-art performance figures for time-to-solution for a complete SCF calculation.
Figure 2 shows the capability of the code to be adapted to the different generations of GPU cards by showing the speedup and walltime (in seconds) for the calculation of the Fock exchange operator for a system of 64 water molecules (256 orbitals) with respect to a single-node calculation with 32 CPU cores. These runs were accelerated by using 4 GPU cards per node.
In Figure 3 the time [left panels] and memory [right panels] scaling for a carbon nanoribbon [top] and for a graphene sheet [bottom] are shown, for the cubic scaling, linear scaling and fragment approaches. The cost of the template calculation has been included in the total cost for the fragment approach. Calculations were performed on 20 nodes using 6 MPI tasks and 4 OpenMP threads per node. The memory is the peak memory usage of the root node.
BigDFT incorporates a hybrid MPI/OpenMP parallelization scheme, with parts of the code also ported to GPUs. For both the original cubic scaling version of BigDFT and the more recently developed localized support function approach, the MPI parallelization is at the highest level with orbitals, or in the latter case support functions, divided between MPI tasks. Two different data layouts are used to divide the orbitals/support functions among the tasks depending on the operation involved, such that each task is either responsible for a fixed number of orbitals, or each task is responsible for all orbitals which are defined over a given set of grid points. OpenMP is then used to parallelize operations at a lower level, for example in the calculation of the charge density and when generating the overlap and Hamiltonian matrices.
The largest data files output by BigDFT are those containing the support functions. Thanks to the strict localization of these support functions, the file size remains constant as the system size increases (approximately 1 MB per file), and thanks to the fragment approach, the number of files should also not increase significantly so that even very large systems will have small data requirements. We therefore also have minimal I/O requirements: for BigDFT, I/O is limited to the start and end of a calculation, where there is reading and writing of the support functions to disk. As mentioned above, the size and number of files will not (or only to a very limited extent) increase with system size. Currently there maybe multiple MPI tasks reading from disk; so far this has not proven to be a bottleneck, but should this prove to be a problem when running at scale it would be straightforward to modify the code so that only one MPI task reading each file and communicating the data. Due to the small file sizes of the generated data, analysis will not require significant computational resources and will be limited to tasks such as calculating pair distribution functions and analyzing atomic coordination. These can either be performed as a post-processing step on a desktop computer or during the simulation. Likewise, we have not got an intensive workflow scheme, as the fragment approach merely requires the small 'template' calculation to be run once following which multiple large calculations can be run. However, as discussed above, we will make use of scripting to couple together runs e.g. for different temperatures and different defect types
BigDFT use a hybrid MPI/OpenMP parallelization scheme. The use of a hybrid approach allows good scaling behaviour to be achieved on different architectures for both versions of BigDFT, and in the case where memory is a limiting factor, the usage of OpenMP ensures that all processors on a given node are performing computational work. Further details on the parallelization scheme are given elsewhere.