A new scalability record in a materials science application

Another step towards the exascale was made by a team of MaX researchers at CNR (NANO & ISM) who have run a multi petaFlop simulation with the MaX flagship application Yambo.

A single GW run reached 3 petaFlop/s on 1000 intel Knights Landing (KNL) nodes (68000 cores) of the new Tier-0 MARCONI KNL partition (50 racks, 3600 nodes, 68 cores/mode and peak performance of about 11 PFlop/s). The simulation, related to the growth of complex graphene nanoribbons on a metal surface, is part of an active research project combining computational spectroscopy with cutting edge experimental data from teams in Austria, Italy, and Switzerland. Simulations were performed exploiting computational resources granted by PRACE (via call 14).

This result has been made possible thanks to the intense work done by the Yambo developer team on improving the performance of Yambo on large scale HPC architectures. The parallel scaling performance of two of the main kernel of Yambo in a GW run, i.e., the independent particle linear response Χ0, and the correlation self-energy Σc, have been benchmarked by the CNR MaX team on the KNL partition of Marconi, by executing GW simulations for a realistic polymer used as chemical precursor of chevron-shaped graphene nanoribbons. This one-dimensional system counts 136 atoms and 388 electrons in the unit cell, 8 k-points in the irreducible Brillouin Zone and about 3500000 G vectors to represent the charge density. For this simulation, a FFT grid of (144,288,180) has been adopted and 800 empty states are used to calculate GW calculations.

In Fig. 1 panel (a) the execution times for the Χ0 and Σc routines (the most expensive in terms of cpu time for a single GW run) are reported as a function of the number of cores. The speedup for the same routines is reported in panel (b). The results obtained show a good scalability up to 68000 cores.

Figure 1: Panel (a): the execution time for the Χ0 and Σc routines is reported as a function of the number of cores. Panel (b): the speedup for the Χ0 and Σc routines is reported.

Posted in news, newsletter