Seminar Current Topics in High-Performance Computing

Content

High-performance computing is applied to speedup long-running scientific applications, for instance the simulation of computational fluid dynamics (CFD). Today's supercomputers often base on commodity processors, but also have different facets: from clusters over (large) shared-memory systems to accelerators (e.g., GPUs). Leveraging these systems, parallel computing with, e.g., MPI, OpenMP or CUDA must be applied.

This seminar focuses on current research topics in the area of HPC and is based on conference and journal papers. Topics might cover, e.g., novel parallel computer architectures and technologies, parallel programming models, current methods for performance analysis & correctness checking of parallel programs, performance modeling or energy efficiency of HPC systems. The seminar consists of a written study and the presentation of a specific topic.

The objectives of this seminar are the independent elaboration of an advanced topic in the area of high-performance computing and the classification of the topic in the overall context. This includes the appropriate preparation of concepts, approaches and results of the given topic (also with respect to formalities and time schedule), as well as a clear presentation of the contents. Furthermore, the students’ independent work is to be emphasized by looking beyond the edge of one's own nose.

Schedule

This seminar belongs to the area of applied computer science. The topics are assigned during the introductory event. Then, the students work out the topics over the course of the semester. The corresponding presentations take place as block course one day (or two days) at the end of the lecture period or in the exam period. Attendance is compulsory for the introductory event and the presentation block.
Futhermore, we will introduce the students to "Scientific Writing" and "Scientific Presenting" in computer science. These two events are also compulsory in attendance.

The compulsory introductory event (kickoff) is scheduled for Thursday, April 16, 9 a.m. - 11.a.m. The next compulsory meetings are planned for Friday, April 17, 10:30 a.m. - 12 p.m., Wednesday, April 22, 2:30 p.m. - 4 p.m., and Friday, July 17, 10:15 a.m. - 12:15 p.m .

Furthermore, the seminar is an in-person event. That means that you need to be personally present for all compulsory parts of the seminar.

Registration/ Application

Seats for this seminar are distributed by the global registration process of the computer science department only. We appreciate if you state your interest in HPC, and also your pre-knowledge in HPC (e.g., relevant lectures, software labs, and seminars that you have passed) in the corresponding section during the registration process.

Requisites

The goals of a seminar series are described in the corresponding Bachelor and Master modules. In addition to the seminar thesis and its presentation, Master students will have to lead one set of presentations (roughly 3 presentations) as session chair. A session chair makes sure that the session runs smoothly. This includes introducing the title of the presentation and its authors, keeping track of the speaker time and leading a short discussion after the presentation. Further instructions will be given during the seminar.

Prerequisites

The attendance of the lecture "Introduction to High-Performance computing" (Prof. Müller) is helpful, but not required.

Language

Usually, the mandatory meetings will be held in English. We prefer and encourage students to also do the report and presentation in English. But, German is also possible.

Types of Topics

We provide two flavors of seminar topics depending on the particular topic: (a) overview topics, and (b) dive-in topics. It works as the names suggest. Nevertheless, this categorization does not necessarily imply a strict "either-or" but rather provides a guideline for addressing the topic. In general, both types of topics are equally difficult to work on. However, they have different challenges. In the topic list below, you can also find the corresponding categorizations for the seminar topic types.

Topics

Suitable Performance Optimization Approaches for the SPMD IR

In high-performance computing (HPC), their compute cluster systems get increasingly larger and its architecture more and more heterogeneous (multiple compute nodes with CPUs, GPUs, …). This has the aim of satisfying the ever-growing computational demands (for example, from large simulations or AI models) that led to multiple different parallel programming models. These programming models are used in addition to programming languages to implement software that makes use of the fast and parallel HPC hardware efficiently and effectively. To ease the development of tools and compiler passes for parallel programming models following the single program, multiple data (SPMD) principle, the SPMD IR (intermediate representation) was introduced. It approches the problem that often tools only support one model or have implemented the necessary abstraction internally leading to a limited extensibility and reusability. The SPMD IR's prototype is implemented in MLIR/LLVM and supports MPI, NCCL, SHMEM, and NVSHMEM. While its usefulness is shown for the verification of collective communication and data race detection, the question remains which of the performance optimizing approaches or compiler passes in modern compiler systems can make use of it and are suitable and which are not applicable per-design.

This seminar thesis is supposed to conduct a systematic literature review of approaches that conduct performance optimization in the contexct of any of the supported programing models or are part of LLVM or GCC as compiler passes and could be leveraged by the additional information given by the SPMD IR. After analyzing and understanding the SPMD IR, the student is supposed to give an overview of the found approaches and discuss their applicability to the SPMD IR.

Kind of topic: overview
Supervisor: Semih Burak

Emerging MLIR Dialects and their Suitability for the SPMD IR

In high-performance computing (HPC), the compute cluster systems get increasingly larger and their architecture more and more heterogeneous (multiple compute nodes with CPUs, GPUs, …). This has the aim of satisfying the ever-growing computational demands (for example, from large simulations or AI models) that led to multiple different parallel programming models. These programming models are used in addition to programming languages to implement software that makes use of the fast and parallel HPC hardware efficiently and effectively. To ease the development of tools and compiler passes for parallel programming models following the single program, multiple data (SPMD) principle, the SPMD IR (intermediate representation) was introduced. It approches the problem that often tools only support one model or have implemented the necessary abstraction internally leading to a limited extensibility and reusability. The prototype of the SPMD IR is implemented on top of MLIR/LLVM and supports MPI, NCCL, SHMEM, and NVSHMEM. MLIR is evolving rapidly, with new dialects continuously emerging and existing ones being extended on a regular basis. In addition to the core upstream dialects, the SPMD IR introduces its own dedicated SPMD dialect.

This seminar thesis aims to conduct a systematic literature review of existing MLIR dialects. The student will analyze their usage and assess their applicability in combination with the SPMD dialect. As a representative example of a recently introduced dialect, the OpenSHMEM dialect should be examined in detail, and it should be discussed whether its concepts can be integrated into or inspire extensions of the SPMD IR.

Kind of topic: overview
Supervisor: Semih Burak

Quantifying Energy Consumption and Carbon Footprint of Large Language Model Inference

Large Language Models (LLMs) have shown remarkable capabilities across a wide spectrum of tasks, attracting many users worldwide, in the private sector as well as in academia or industry. While proprietary offerings from OpenAI, Anthropic, and Google deliver state‑of‑the‑art performance and cutting‑edge features, many public‑sector organizations, universities, research institutes, and sensitive domains such as healthcare are constrained by data‑privacy regulations that prohibit the use of commercial APIs for personal or patient data. To meet these stricter requirements, an increasing number of institutions are deploying locally hosted LLMs based on open‑source or open-weight models, thereby providing a data‑sovereign alternative to commercial services. While inference speed and throughput remain important, the energy consumption and resulting carbon emissions of these deployments have emerged as key sustainability issues. Further, inefficient prompting, such as overly long or poorly structured queries, can markedly raise computational load, driving up both energy consumption and operating costs. Implementing per‑prompt and regular (weekly or monthly) reporting of energy consumption, emissions, and costs can make users aware of the environmental and monetary impact of their LLM usage and encourage more sustainable prompting behavior.

This seminar thesis provides a systematic survey comparing several existing approaches and methodologies for quantifying the energy consumption and carbon footprint of various LLMs. It evaluates reported inference performance, energy demand, and emissions, and discusses the main factors influencing these metrics such as model size, hardware configuration, batch size, etc.

Kind of topic: overview
Supervisor: Jannis Klinkenberg

Improving Fine-Grained Task Parallelism and Dynamic Load Balancing on Multi-Socket Many-Core Systems

The continuous growth of core counts in modern cloud and HPC systems, which span across sockets and Non-Uniform Memory Access (NUMA) domains, poses increasing challenges for efficient application parallelization. To address this, shared-memory programming paradigms have been developed to simplify parallel programming by abstracting low-level details. A prominent example is OpenMP, which supports both work-sharing and task-based parallelism. In particular, task-based models are well suited for irregular, recursive, or complex workloads that can exploit fine-grained parallelism. However, as system complexity and core counts grow, managing synchronization and shared-memory access across many threads becomes increasingly difficult. Task creation, queuing, and scheduling introduce additional overheads, especially in fine-grained execution scenarios. Traditional OpenMP runtime systems such as GNU OpenMP and LLVM OpenMP often struggle with scalability due to their reliance on synchronization mechanisms like locks. Recent research has therefore explored alternative approaches, including lock-free and lock-less synchronization techniques as well as NUMA-aware scheduling strategies to improve efficiency.

This seminar thesis aims to provide an in-depth overview of advancements in OpenMP task scheduling, including key technical concepts and implementation strategies. Furthermore, it will critically evaluate and compare different task scheduling approaches and policies, particularly those used in GOMP and LLVM OpenMP, with respect to their performance and scalability.

Kind of topic: dive-in
Supervisor: Jannis Klinkenberg

Parallel File System Parameter Tuning

Modern High Performance Computing Systems need to facilitate highly performing file systems to their users. For this reason, parallel filesystems are used to enable concurrent accesses by a multitude of users and/or processes. These filesystems typically support a wide range of configuration parameters. Since different use cases require different parameter settings for optimal performance, it is not trivial to determine the best configuration for a given system.

For this seminar topic, the student will have to perform an extensive literature survey of different parameter tuning mechanisms. Several heuristics and automatic tuning processes have been proposed in previous research and the student should evaluate these mechanisms, especially with an eye to applicability for a university cluster like RWTH Aachen's CLAIX. There exists a possibility to try out these mechanisms with the ad-hoc filesystem BeeOND.

Kind of topic: overview
Supervisor: Philipp Martin

Utilisation of GPU Direct Storage in High Performance Computing

With modern systems, the HPC workloads using GPUs have seen a dramatic rise. A majority of these workloads in the Artificial Intelligence and Big Data segments also require access to large amounts of data. Traditionally, this data has to be loaded into system memory by the CPU and then transferred to the GPU memory to be utilised. GPU Direct Storage circumvents this additional data path by loading the data directly to GPU memory, bypassing the system memory.

In this seminar thesis, the student should evaluate the possible advantages of GPU Direct Storage and recent advancements in this area. This involves some literature review and a deep dive into the HPC storage stack. There may be an opportunity to try out GPU Direct Storage on the RWTH Aachen CLAIX cluster and to compare it to the traditional approach of transferring data to the GPUs.

Kind of topic: dive-in
Supervisor: Philipp Martin

Evaluating Checkpointing Mechanisms for Enhanced Fault Tolerance in ML/DL Training on HPC Systems

Checkpointing is a critical technique in distributed training of machine learning (ML) and deep learning (DL) models, aimed at recovering from failures that may occur during long-running computations. While frequent checkpointing allows for quick recovery, it can lead to significant performance overhead due to the generation of numerous checkpoints. Recent advancements such as differential checkpointing have shown potential in reducing these costs, making them more relevant for use on computation-time constraint systems like shared HPC clusters.

The seminar thesis will provide an overview of existing checkpointing mechanisms and evaluate their effectiveness in improving fault tolerance while minimizing computational overhead. It will compare traditional frequent checkpointing strategies with innovative approaches. The thesis will discuss implementation considerations for deploying these mechanisms in shared HPC environments.

Kind of topic: overview
Supervisor: Dominik Viehhauser

Assessing Mixed-Precision Benchmarking for State-Of-The-Art GPU Architectures

The rapid growth of machine‑learning workloads on large‑scale systems has driven a shift toward accelerator hardware that is optimized for low‑precision arithmetic (16‑bit floating‑point and below). Traditional HPC benchmarks such as the High‑Performance Linpack (HPL) evaluate only double‑precision performance and therefore no longer reflect the demands of modern applications. To address this gap, mixed‑precision variants—most notably HPL‑MxP and HPG‑MxP—have been introduced to benchmark low‑precision workloads.

This thesis investigates the current state of low‑precision benchmarks and evaluates how their different computational approaches affect their representativeness for HPC and ML applications. This also includes taking a look at if these benchmarks succeed capturing the increasing memory boundness of state-of-the-art accelerators.

Kind of topic: dive-in
Supervisor: Dominik Viehhauser

Evaluating Job Scheduler Modifications for HPC Sustainability Research

With the growing computational demands of scientific applications, the energy consumption and the resulting carbon emissions of HPC clusters are steadily increasing. Operators of HPC clusters seek to reduce their energy bill and carbon emissions by exploiting the natural variability of energy prices and carbon intensities of the energy mix. Energy prices and carbon intensities vary throughout the day due to the naturally fluctuating energy generation from renewable energy sources like solar and wind. To align cluster usage to these daily patterns, modifications to the job scheduler are necessary. Since the job scheduler provides the sole access point for users to submit work to the HPC cluster, any configuration changes need to be carefully evaluated before they can be deployed in production to ensure a stable operation.

Therefore, this thesis should compare different implementations and methodologies from related research for evaluating scheduler modifications. The discussion in this paper needs to address the following key aspects: Which modeling assumptions are made by each method? How accurately is the real system modeled and how does that influence the result accuracy? What data is required to employ the presented approaches? How quickly can scheduling experiments be repeated with modified inputs or configuration settings?

Kind of topic: overview
Supervisor: Christian Wassermann

Enhancing the Observability of HPC Applications with eBPF

The complexity and scale of today’s HPC systems challenges application developers and performance engineers alike. To assess the utilization achieved by a given application, runtime measurements form the basis of the typical HPC performance analysis workflow. For maximum utility, the data collection should be as accurate as possible while not distorting the execution of the analyzed application. A recent addition to the Linux toolbelt is eBPF, allowing a sandboxed execution of user-defined programs within kernel space. Through the interception of kernel events, eBPF enables previously unfeasible forms of observability.

In this thesis, the student should investigate the potential for eBPF in HPC environments specifically considering use cases related to performance analysis and system monitoring. By diving into a few selected papers, benefits and drawbacks of eBPF compared to traditional tools should be highlighted and critically assessed. Throughout the evaluation, technical details should be included to illustrate important low-level eBPF-specific concepts.

Kind of topic: dive-in
Supervisor: Christian Wassermann

Supervisors & Organization

Semih Burak
Jannis Klinkenberg
Philipp Martin
Ben Thärigen
Dominik Viehhauser
Christian Wassermann

Kontakt

er/ihm

Ben Thärigen

E-Mail schreiben

Tools

Services

Einrichtungen

Lehrstuhl für Hochleistungsrechnen (Informatik 12)