- About MaX
- Contact us
YAMBO is an open-source code released within the GPL licence. It is suitable to calculate and predict the physical properties of materials related to light-matter interaction. It makes use of ab-initio methods, meaning that the calculated properties do not rely on any adjustable parameters but they are obtained by solving the fundamental equation of quantum mechanics. In particular YAMBO implements Many-Body Perturbation Theory (MBPT) methods (such as GW and BSE) and Time-Dependent Density Functional Theory (TDDFT), which allows for accurate prediction of fundamental properties as band gaps of semiconductors, band alignments, defect quasi-particle energies, optics and out-of-equilibrium properties of materials, including nano-structured systems. The code resorts to previously computed electronic structure, usually at the Density Functional Theory (DFT) level and for this reason it is interfaced with two of the most used planewave DFT codes used in scientific community, Quantum ESPRESSO and Abinit.
Among the variety of physical quantities that can be described by YAMBO, we mention:
All these quantities are ubiquitously adopted for the description and understanding of the optical and electronic properties of materials. Moreover, YAMBO also provides non-standard conceptual and computational tools for emerging fields such as ultrafast optics or Pump&Probe experiments. Due to its capacity of calculating a wealth of physical properties and the added value of a very good performance when resorting to parallel computation -in particular in HPC facilities, YAMBO is largely diffused within the materials science and materials engineering communities and it is mainly adopted to:
The code is under a constant development and fully documented. YAMBO has a user-friendly command-line interface and flexible I/O procedures.
YAMBO has attracted over the years a growing community of code users and developers. The code is routinely used to organise hands–on schools where the most fundamental concepts of the underlying theory are described together with practical tutorial sessions. A lively dedicated user forum is active and supported by developers to answer all user’s questions and doubts. The YAMBO reference paper , published in 2009, has been cited more than 500 times to date. The paper has been used to produce the results published in hundreds of papers by many different groups all over the world. YAMBO counts at the moment more than 20 active developers and the source project is publicly hosted on the GitHub platform (as well as on the MAX GitLab space).
Performance in Parallel Computation environments
YAMBO is parallelised using a hybrid MPI plus OpenMP paradigm. In particular, the YAMBO MPI multilevel structure counts different levels of parallelisms for the polarizability, dipoles, self-energy and BSE kernel, together with an OpenMP coarse grained implementation . At present YAMBO has been shown to be efficient over a wide range of situations, including large-scale simulations on HPC machines (several thousands of MPI tasks combined with tens of OpenMP threads) for most of its calculation kernels. As an example, in Figure 1, we report the scaling tests done for the calculation of the quasiparticle corrections for the precursor polymer of a chevron-like graphene nanoribbon containing 136 atoms and 388 electrons. We can observe that the main runlevels (linear response and self-energy kernels) scale up to 1000 Intel KNL nodes on Marconi at Cineca, A2 partition, corresponding to a computational partition of about 3 PetaFlops.
Fig.1: in the left panel, chemical structure of the precursor polymer of a chevron-like graphene nanoribbon (ch-GNR); in the right panel, YAMBO speedup of the linear response (0) and self-energy (c) kernels during a GW run. The scaling is shown up to 1000 Intel KNL nodes on Marconi at Cineca - A2 partition, corresponding to a computational partition of about 3 PetaFlops. The dashed line indicates the ideal scaling slope.
GW toward exascale: Yambo on GPUs
Recently, an extensive activity of porting of Yambo on heterogeneous architectures - GPUs in particular - has been put in place. This has addressed the kernels computing dipoles, Hartree-Fock, linear response, GW, and Bethe Salpeter equation (BSE). Technically, the porting has been achieved by taking advantage of CUDA Fortran, a programming model which provides a native support for NVIDIA architectures, and exploiting both cuf-kernels directives and CUDA libraries such as cublas, cufft, and cusolver. The porting strategy is based on reading and copying DFT wavefunctions on the GPU memory (feasible since Yambo fully distributes at the MPI level such memory), making them available for heavy computational kernels. Similarly, the response function is also calculated and temporarily stored on the card memory. As a design principle, in order to improve the performance of the porting, the number of data-transfers between host and device is minimised. Due to the modularity of Yambo and to the use of DevXlib, the adoption of CUDA-FORTRAN has a small impact on the code source and the accelerated parts with replicate sources are localised only in a few routines. Noticeably, the optimisation of the code on the GPUs has also permitted to improve the performance of Yambo on the CPUs, for instance by reducing the execution time of the FFT_setup and the cutoff Coulomb potential routines.
Benchmarks have been performed on different hybrid architectures. Results obtained on Piz Daint (XC50 partition, nodes equipped with NVIDIA Tesla P100 GPUs) are reported in Fig. 2. Here, we compare the execution time for the irreducible polarisability χ0, the Hartree-Fock, and the self-energy routines for calculations performed on both CPUs and GPUs (from 2 to 8 nodes). The system considered is a poly-acetylene chain, i.e. an organic polymer with the repeating unit (C2H2)n. Simulations have been performed by adopting a version of Yambo compiled using the PGI compiler for both the CPU and GPU cases. The results point out a 5 to 10× speedup in time to solution for the ported kernels.
Fig.2 MPI (left) and OpenMP (right) scaling for defected H-TiO2 supercell (72+1 atoms). The time for each of the main tasks of the code is given separately. The total time taken to perform other tasks is labeled as"Other".
In Table 1, instead, we report both the timing of the main routines (dipoles, non-interacting response function χ0, reducible response function χ and the exchange (x) and correlation (c) part of the self-energy Σ) and the wall-time for a complete GW calculation performed for a AGNR-N7 graphene nanoribbon. In particular, we compare the timing recorded for a calculation performed on a single node of the MARCONI-KNL@CINECA with the ones obtained on heterogeneous systems based on GPU accelerators, namely Piz Daint@CSCS (XC50 partition, nodes equipped with NVIDIA Tesla P100 GPUs), Galileo@CINECA (node with NVIDIA Tesla V100 GPUs), and a local cluster (Corvina) equipped with Intel chips and NVIDIA TITAN V cards. The simulations on MARCONI-KNL and Galileo CPU-only have been performed by compiling YAMBO with the Intel 2018 compiler while the PGI compiler (supporting CUDA-FORTRAN) v19.x has been adopted for the simulations performed on hybrid architectures. For all the considered systems, simulations have been performed running on a single node and, for the hybrid systems, on a single GPU card. Also, in this case, we observe a 5 to 10× (and even higher in some cases) speedup in the time-to-solution for the ported kernels, when the run is performed on GPUs cards, in particular for what concerns the most time-consuming routines χ0 and Σc, as well as for the wall-time. Most importantly, these data show excellent portability of a complete GW calculation as performed by YAMBO on heterogeneous systems based on GPU cards, independently on the system architecture.
Table 1. Time to solution on different architectures for AGNR-N7 use case. All times are given is sec.
 D. Sangalli et al., Many-body perturbation theory calculations using the yambo code, J. Phys.: Condens. Matter 31, 325902 (2019).
 A. Marini, M. Gruning, D. Varsano and C. Hogan, Yambo: An ab initio tool for excited state calculations, Comp. Phys. Comm. 180, 1392 (2009).
YAMBO is a plane-wave ab-initio code for calculating quasiparticle energies and optical properties of electronic systems within the framework of many-body perturbation theory and time-dependent density functional theory. Quasiparticle energies are calculated within the GW approximation for the self-energy. Optical properties are evaluated either by solving the Bethe–Salpeter equation or by using the adiabatic local density approximation. With YAMBO you can perform:
YAMBO is a FORTRAN/C code used worldwide to calculate quasiparticle corrections and excitonic properties of materials by using the GW+ BSE method. YAMBO relies on the KS data generated by QE. The code is parallelized over several MPI+OpenMP levels. YAMBO stores information in several database files (few tens), the biggest reaching few GBs in size for our systems, for which a NetCDF/HDF5 format is adopted, optimizing IO and data portability. The code has been extensively tested and used on different HPC architectures, for large scale systems. Recent developments in YAMBO concern the implementation of a new algorithm to reduce the number of empty states needed to converge GW calculations (X and G terminators), the implementation of dense parallel linear algebra to make diagonalisation and inversion of large matrices more efficient, and the resolution of memory and parallelism bottlenecks, according to the results of performance profiling. The YAMBO implementation of GW is parallel on the k/q grids, bands summation and quasiparticle energies, using a hybrid MPI-OpenMP approach. Explicit OpenMP support is implemented following different strategies according to the specific kernels. The BSE routines are parallel on k-points, electron-hole basis elements, and transitions. GW and BSE calculations are computationally expensive and, for complex materials and surfaces, can be performed only exploiting the resources offered by modern Tier0 systems.
YAMBO is an open source codes distributed under GNU General Public Licence. The code source is hosted on the Github hosting service https://github.com/yambo-code. YAMBO is a Fortran/C code that exploits a number of optimized libraries such as BLAS, LAPACK, FFTW, SCALAPACK, NetCDF/HDF5, PETSC, and SLEPC.
YAMBO parallelism is based on hybrid MPI plus share-memory OpenMP strategy. YAMBO tends to use MPI parallelism as much as memory allows, then resorting to OpenMP parallelism. YAMBO makes use of several levels of MPI parallelism [S. Sangalli et al. J. Phys.: Condens. Matter 31 (2019) 325902]. The OpenMP multi-thread parallelization has been recently included at the low level when dealing with plane-wave summations and FFT.
Electronic structure data, like for instance the wavefunctions, are written and read by QE’s PWscf executable using a direct access to file: each task uses a raw format to store its information. These files can be used to produce checkpoints (restart). YAMBO calculates quasiparticle corrections and solves the Bethe-Salpeter equation, reading and elaborating raw files generated by PWscf and managing the I/O using the NetCDF/HDF5 libraries. One of the most useful features of YAMBO is its ability to carry out the entire database I/O using the NetCDF/HDF5 libraries. YAMBO stores information in several database files (few tens), the biggest reaching few GBs in size for our systems. YAMBO relies on the KS wavefunctions generated by the DFT PWscf code through the interface p2y (PWscf to yambo interface). p2y can read and process band structure output generated by PWscf. Job-dependent database are also created at run-time, with information specific to each runlevel. Output files are created at the end of each calculation and are intended to human use (for instance for post-processing operations or for plotting). The creation of each database is controlled by its particular Fortran subroutine however, data writing is performed at a low level by common modules, in order to minimize problems associated with portability. YAMBO calculations can be restarted, making these codes suitable for HPC machines with very diverse scheduler policies. Relatively short wall time with large machine partitions can be safely exploited.