New Publication: CHAMELEON: Reactive Load Balancing for Hybrid MPI+OpenMP Task-Parallel Applications
Many applications in high performance computing are designed based on underlying performance and execution models. While these models could successfully be employed in the past for balancing load within and between compute nodes, modern software and hardware increasingly make performance predictability difficult if not impossible. Consequently, balancing computational load becomes much more difficult. Aiming to tackle these challenges in search for a general solution, we present a novel library for fine-granular task-based reactive load balancing in distributed memory based on MPI and OpenMP. With our approach, individual migratable tasks can be executed on any MPI rank. The actual executing rank is determined at run time based on online performance data. We evaluate our approach under an enforced power cap and under enforced clock frequency changes for a synthetic benchmark and show its robustness for work-induced imbalances for a realistic application. Our experiments demonstrate speedups of up to 1.31X.
The full text can be found on the web at ScienceDirect.