Topics
Suitable Performance Optimization Approaches for the SPMD IR
In high-performance computing (HPC), their compute cluster systems get increasingly larger and its architecture more and more heterogeneous (multiple compute nodes with CPUs, GPUs, …). This has the aim of satisfying the ever-growing computational demands (for example, from large simulations or AI models) that led to multiple different parallel programming models. These programming models are used in addition to programming languages to implement software that makes use of the fast and parallel HPC hardware efficiently and effectively. To ease the development of tools and compiler passes for parallel programming models following the single program, multiple data (SPMD) principle, the SPMD IR (intermediate representation) was introduced. It approches the problem that often tools only support one model or have implemented the necessary abstraction internally leading to a limited extensibility and reusability. The SPMD IR's prototype is implemented in MLIR/LLVM and supports MPI, NCCL, SHMEM, and NVSHMEM. While its usefulness is shown for the verification of collective communication and data race detection, the question remains which of the performance optimizing approaches or compiler passes in modern compiler systems can make use of it and are suitable and which are not applicable per-design.
This seminar thesis is supposed to conduct a systematic literature review of approaches that conduct performance optimization in the contexct of any of the supported programing models or are part of LLVM or GCC as compiler passes and could be leveraged by the additional information given by the SPMD IR. After analyzing and understanding the SPMD IR, the student is supposed to give an overview of the found approaches and discuss their applicability to the SPMD IR.
Kind of topic: overview
Supervisor: Semih Burak
Emerging MLIR Dialects and their Suitability for the SPMD IR
In high-performance computing (HPC), the compute cluster systems get increasingly larger and their architecture more and more heterogeneous (multiple compute nodes with CPUs, GPUs, …). This has the aim of satisfying the ever-growing computational demands (for example, from large simulations or AI models) that led to multiple different parallel programming models. These programming models are used in addition to programming languages to implement software that makes use of the fast and parallel HPC hardware efficiently and effectively. To ease the development of tools and compiler passes for parallel programming models following the single program, multiple data (SPMD) principle, the SPMD IR (intermediate representation) was introduced. It approches the problem that often tools only support one model or have implemented the necessary abstraction internally leading to a limited extensibility and reusability. The prototype of the SPMD IR is implemented on top of MLIR/LLVM and supports MPI, NCCL, SHMEM, and NVSHMEM. MLIR is evolving rapidly, with new dialects continuously emerging and existing ones being extended on a regular basis. In addition to the core upstream dialects, the SPMD IR introduces its own dedicated SPMD dialect.
This seminar thesis aims to conduct a systematic literature review of existing MLIR dialects. The student will analyze their usage and assess their applicability in combination with the SPMD dialect. As a representative example of a recently introduced dialect, the OpenSHMEM dialect should be examined in detail, and it should be discussed whether its concepts can be integrated into or inspire extensions of the SPMD IR.
Kind of topic: overview
Supervisor: Semih Burak
Parallel File System Parameter Tuning
Modern High Performance Computing Systems need to facilitate highly performing file systems to their users. For this reason, parallel filesystems are used to enable concurrent accesses by a multitude of users and/or processes. These filesystems typically support a wide range of configuration parameters. Since different use cases require different parameter settings for optimal performance, it is not trivial to determine the best configuration for a given system.
For this seminar topic, the student will have to perform an extensive literature survey of different parameter tuning mechanisms. Several heuristics and automatic tuning processes have been proposed in previous research and the student should evaluate these mechanisms, especially with an eye to applicability for a university cluster like RWTH Aachen's CLAIX. There exists a possibility to try out these mechanisms with the ad-hoc filesystem BeeOND.
Kind of topic: overview
Supervisor: Philipp Martin
Utilisation of GPU Direct Storage in High Performance Computing
With modern systems, the HPC workloads using GPUs have seen a dramatic rise. A majority of these workloads in the Artificial Intelligence and Big Data segments also require access to large amounts of data. Traditionally, this data has to be loaded into system memory by the CPU and then transferred to the GPU memory to be utilised. GPU Direct Storage circumvents this additional data path by loading the data directly to GPU memory, bypassing the system memory.
In this seminar thesis, the student should evaluate the possible advantages of GPU Direct Storage and recent advancements in this area. This involves some literature review and a deep dive into the HPC storage stack. There may be an opportunity to try out GPU Direct Storage on the RWTH Aachen CLAIX cluster and to compare it to the traditional approach of transferring data to the GPUs.
Kind of topic: in-depth
Supervisor: Philipp Martin
Evaluating Checkpointing Mechanisms for Enhanced Fault Tolerance in ML/DL Training on HPC Systems
Checkpointing is a critical technique in distributed training of machine learning (ML) and deep learning (DL) models, aimed at recovering from failures that may occur during long-running computations. While frequent checkpointing allows for quick recovery, it can lead to significant performance overhead due to the generation of numerous checkpoints. Recent advancements such as differential checkpointing have shown potential in reducing these costs, making them more relevant for use on computation-time constraint systems like shared HPC clusters.
The seminar thesis will provide an overview of existing checkpointing mechanisms and evaluate their effectiveness in improving fault tolerance while minimizing computational overhead. It will compare traditional frequent checkpointing strategies with innovative approaches. The thesis will discuss implementation considerations for deploying these mechanisms in shared HPC environments.
Kind of topic: overview
Supervisor: Dominik Viehhauser
Assessing Mixed-Precision Benchmarking for State-Of-The-Art GPU Architectures
The rapid growth of machine‑learning workloads on large‑scale systems has driven a shift toward accelerator hardware that is optimized for low‑precision arithmetic (16‑bit floating‑point and below). Traditional HPC benchmarks such as the High‑Performance Linpack (HPL) evaluate only double‑precision performance and therefore no longer reflect the demands of modern applications. To address this gap, mixed‑precision variants—most notably HPL‑MxP and HPG‑MxP—have been introduced to benchmark low‑precision workloads.
This thesis investigates the current state of low‑precision benchmarks and evaluates how their different computational approaches affect their representativeness for HPC and ML applications. This also includes taking a look at if these benchmarks succeed capturing the increasing memory boundness of state-of-the-art accelerators.
Kind of topic: in-depth
Supervisor: Dominik Viehhauser
Evaluating Job Scheduler Modifications for HPC Sustainability Research
With the growing computational demands of scientific applications, the energy consumption and the resulting carbon emissions of HPC clusters are steadily increasing. Operators of HPC clusters seek to reduce their energy bill and carbon emissions by exploiting the natural variability of energy prices and carbon intensities of the energy mix. Energy prices and carbon intensities vary throughout the day due to the naturally fluctuating energy generation from renewable energy sources like solar and wind. To align cluster usage to these daily patterns, modifications to the job scheduler are necessary. Since the job scheduler provides the sole access point for users to submit work to the HPC cluster, any configuration changes need to be carefully evaluated before they can be deployed in production to ensure a stable operation.
Therefore, this thesis should compare different implementations and methodologies from related research for evaluating scheduler modifications. The discussion in this paper needs to address the following key aspects: Which modeling assumptions are made by each method? How accurately is the real system modeled and how does that influence the result accuracy? What data is required to employ the presented approaches? How quickly can scheduling experiments be repeated with modified inputs or configuration settings?
Kind of topic: overview
Supervisor: Christian Wassermann
Enhancing the Observability of HPC Applications with eBPF
The complexity and scale of today’s HPC systems challenges application developers and performance engineers alike. To assess the utilization achieved by a given application, runtime measurements form the basis of the typical HPC performance analysis workflow. For maximum utility, the data collection should be as accurate as possible while not distorting the execution of the analyzed application. A recent addition to the Linux toolbelt is eBPF, allowing a sandboxed execution of user-defined programs within kernel space. Through the interception of kernel events, eBPF enables previously unfeasible forms of observability.
In this thesis, the student should investigate the potential for eBPF in HPC environments specifically considering use cases related to performance analysis and system monitoring. By diving into a few selected papers, benefits and drawbacks of eBPF compared to traditional tools should be highlighted and critically assessed. Throughout the evaluation, technical details should be included to illustrate important low-level eBPF-specific concepts.
Kind of topic: in-depth
Supervisor: Christian Wassermann
More topics following soon...
Supervisors & Organization
Semih Burak
Jannis Klinkenberg
Philipp Martin
Ben Thärigen
Dominik Viehhauser
Christian Wassermann