Improving towards green computation: QuantumESPRESSO scaling on the new MARCONI-KNL partition

New tests on parallel scaling performance of QuantumESPRESSO –in particular the CP kernel— are now available, thanks to the MARCONI Tier-0 system recently established at CINECA and based on the Lenovo NeXtScale platform. The HPC cluster, co-designed by CINECA, was recently updated with the introduction of a new partition equipped with Intel Knights Landing (KNL) processors; this represents the second step in the development plan of the MARCONI system. MARCONI presently counts two sections: A1, based on Broadwell (BDW) processors (21 racks, 1512 nodes, 36 cores/node and a peak performance of 2 PFlop/s), and the new A2 partition based on KNL processors (50 racks, 3600 nodes, 68 cores/mode and peak performance of about 11 PFlop/s).

The parallel scaling performance of QuantumESPRESSO (and in particular of the CP kernel) have been established by the MaX team of C. Cavazzoni at CINECA on both the KNL and BDW architectures, by executing Car-Parrinello (CP) Molecular Dynamics simulations for a system of 256 water molecules. In Fig 1 the wall clock execution time is reported as a function of the number of nodes (panel a) and the of number of cores (panel b).

Panel (a) shows that QuantumESPRESSO scales well up to 30 nodes on the KNL partition, and performs better on the KNL than on the BDW nodes. Because the KNL processors have lower power consumption than the BDWs,  simulations performed on the KNL partition can lead to a major improvement in energy efficiency: around 30% on the A2 platform with respect to A1 for simulations with the same wall-time and same number of nodes (estimated from E. Pascolo, F. Affinito and C. Cavazzoni, “Performance and energy efficiency in material science simulation on heterogeneous architectures” July 2014, DOI: 10.1109/HPCSim.2014.6903788 Conference: HPCS 2014, and from the top500 linpack run power measurements).  Panel (b) points to a good scalability of the CP kernel up to two thousand cores on A2. Noticeably, the difference in performance between the BDW and the KNL single cores (BDW cores are faster than KNL cores) does not lead to a significant difference in the scaling behavior of the code. Note that the code performance is influenced by the choice of the parallelization parameters.

With MaX efforts on applications and the evolution of MARCONI we are thus meeting green computation targets: gradually increasing the computing performance with a reduction of the power consumption.

Tagged with: , , ,