- Automatische Datenflussoptimierung für parallele Muster auf Architekturen mit hierarchischem Speicher
Wassermann, Christian; Müller, Matthias S. (Thesis advisor); Lübbecke, Marco (Thesis advisor); Schmitz, Adrian (Consultant); Miller, Julian (Consultant)
Aachen : RWTH Aachen University (2023)
Master Thesis
Masterarbeit, RWTH Aachen University, 2023
Abstract
The usage of large-scale high performance computing (HPC) resources has become the norm in a wide variety of disciplines and enables numerous advances in scientific research. As the demand for compute power growths, HPC systems have adopted an increasing amount of heterogeneity in their hardware setups to suit the needs of specialized applications. This includes not just accelerator devices like general-purpose graphics processing units (GPGPUs) but also emerging memory technologies like high-bandwidth memory (HBM). The additional complexity caused by these developments challenges domain scientists and HPC experts alike to utilize the available hardware to its fullest potential. The parallel pattern language (PPL) project was created by the HPC group of the RWTH Aachen University to address these concerns. Its theoretical representation considers parallel applications as a hierarchical combination of parallel patterns. As part of the project a prototypical implementation with a domain specific language for productive and performance portable parallel programming is provided. This thesis builds on the ideas introduced within the PPL project and focuses on its optimization component. It computes the execution schedule of the parallel program for a given hardware composition at compile time. To model this variant of a parallel machine scheduling problem, three new mixed-integer linear programming (MILP) models are formulated especially considering the necessary extensions for incorporating heterogeneous and hierarchical memory resources. Appropriate conversions are laid out to integrate with the existing foundation of the PPL project. The prototypical implementation is evaluated on a configurable benchmark with four different MILP solvers. The results highlight the runtime advantage of a single model and solver combination before assessing model variations and the benefit of shared memory parallelism. A detailed analysis of the computational behavior of the solving process and a comparison to the previously used layer-based scheduling in the PPL project complete the contributions of this thesis.
Institutions
- Department of Computer Science [120000]
- Chair of High Performance Computing (Computer Science 12) [123010]