Giovedì 9 Giugno 2016 - ore 14:30 - Seminario - "Performance and portability of accelerated lattice Boltzmann applications with OpenACC”Enrico Calore | Department of Mathematical, Physical and Computer Sciences

Luogo: Aula Maxwell - Plesso Fisico

Relatore: Dott. Enrico Calore (Università di Ferrara e INFN Ferrara)

E-mail organizzatore: roberto.alfieri@fis.unipr.it

Abstract:

An increasingly large number of HPC systems rely on heterogeneous architectures
combining traditional multi-core CPUs with power efficient accelerators. Designing efficient
applications for these systems have been troublesome in the past as accelerators could
usually be programmed using specific programming languages threatening maintainability,
portability, and correctness.
Several new programming environments try to tackle this problem. Among them,
OpenACC offers a high-level approach based on compiler directives to mark regions of
existing C, C++, or Fortran codes to run on accelerators. This approach directly addresses
code portability, leaving to compilers the support of each different accelerator, but one has
to carefully assess the relative costs of portable approaches versus computing efficiency.
In this talk, will be addressed precisely this issue, using as a test-bench a massively
parallel lattice Boltzmann algorithm.
At first our multi-node implementation, using OpenACC and MPI, will be introduced. Then
benchmarks of the code on a variety of processors, including traditional CPUs and GPUs,
will be presented, making accurate performance comparisons with other GPU
implementations of the same algorithm using CUDA and OpenCL. Eventually the
performance impact associated with portable programming will be assessed, highlighting
the actual portability and performance-portability of OpenACC-based applications across
several state-of-the-art architectures.
A margine del seminario verranno presentate le caratteristiche e i risultati
ottenuti del cluster recentemente installato presso l’Università di Ferrara
composto da 5 nodi ciascuno accelerato con 8 GPU NVIDIA K80.