YAMBO is a code that implements ground-state as well as excited-state properties in an ab initio context. YAMBO is released within the GPL license and is currently interfaced with two of the mostly used planewaves codes used in the scientific community: QUANTUM ESPRESSO and Abinit.
The code implements MBPT, DFT and Non-Equilibrium Green’s Function Theory (NEGF) in order to allow the user to calculate a wealth of physical properties: reliable band gaps, band alignments, defect quasi-particle energies, optical and out–of–equilibrium properties.
Among the variety of physical quantities that can be described with YAMBO, we mention
- Quasi-particle energies and widths;
- Temperature dependent electronic and optical properties;
- Linear properties within the Bethe-Salpeter equation and Time-Dependent DFT;
- Total Energies within the Adiabatic Connection Fluctuation Dissipation method;
- Non–linear optical properties within the NEGF;
- Advanced post-processing tools to analyse the simulation flow of data.
All these properties are ubiquitous for the understanding of the optical and electronic properties of a wealth of materials and, more importantly, they provide valid tools in fields like Pump&Probe techniques where the numerical tools are scarse if not nonexistent. The code is under a constant development and fully documented. YAMBO has a user-friendly command-line interface, flexible I/O procedures, and it is parallelised by using an hybrid MPI plus OpenMP infrastructure. This has allowed YAMBO to run on several tens of thousand of cores.
YAMBO has attracted over the years a growing community of code users and developers. The code is routinely used to organise hands–on schools where the most fundamental concepts of the underling theory are described. A dedicated user forum is commonly used to answer all user’s questions and doubts. The
YAMBO reference paper,26 published in 2009, has been cited 358 times to date (Mar 2018). The paper has been used to produce the results published in hundreds of papers by many different groups all over the world. YAMBO counts at the moment 12 active developers103 and the source project is publicly hosted on the github platform.
The YAMBO implementation is based on a novel structure written using an hybrid MPI-OpenMP approach. This has permitted to the code to distribute the workload to a large number of parallel levels. In practice, depending on the kind of calculation, all the variables to be used (k/q grids, bands, quasiparticles, etc)
are distributed along the different level of parallelisation. The total number of levels, Npar, is linked to the total number of MPI tasks, Nmpi, from the equation Npar = logM (Nmpi), with M the number of CPU’s participating to the elemental computing unit (M, in general, is a prime number, like 2 or 3). This means
that for a run with 8(12) MPI CPU’s we have 3(4) parallelisation levels to use as M=2(3). It is clear that, in the petascale regime, this gives a huge number of potential parallelisation levels to the user. The OpenMP multi-threads parallelisation has been recently included at the low level when dealing with
plane-wave summations and FFT. At the same time I/O and memory allocation have also been extensively optimised.
At present YAMBO has been shown to be efficient in large-scale simulations (several thousands MPI tasks combined with OpenMP parallelism) for most of its calculation environments. As an example, we report the scaling tests of calculations of quasiparticle corrections properties of a chevronlike graphene nanoribbon containing 136 atoms and 388 electrons. Simulations have been performed using 14-k points in the BZ, about 3,500,000 G vectors to represent the change density and 800 empty states. A large quantity of vacuum has been included to avoid spurious interactions between replicas. This represents a prototype calculation as it involves the evaluation of a response function, of the Hartree-Fock self-energy and, finally, of the correlation part of the self-energy for a realistic system. The speedup normalised with respect 4096 cores, calculated for the linear response (0) and self-energy (c) routines (the most expensive routines in terms of cpu time for a single GW run), are reported in Fig. 13 as a function of the number of cores. The results obtained show a very good scalability up to 60,000 cores.