Seminar Current Topics in High-Performance Computing

Content

High-performance computing is applied to speedup long-running scientific applications, for instance the simulation of computational fluid dynamics (CFD). Today's supercomputers often base on commodity processors, but also have different facets: from clusters over (large) shared-memory systems to accelerators (e.g., GPUs). Leveraging these systems, parallel computing with, e.g., MPI, OpenMP or CUDA must be applied.

This seminar focuses on current research topics in the area of HPC and is based on conference and journal papers. Topics might cover, e.g., novel parallel computer architectures and technologies, parallel programming models, current methods for performance analysis & correctness checking of parallel programs, performance modeling or energy efficiency of HPC systems. The seminar consists of a written study and the presentation of a specific topic.

The objectives of this seminar are the independent elaboration of an advanced topic in the area of high-performance computing and the classification of the topic in the overall context. This includes the appropriate preparation of concepts, approaches and results of the given topic (also with respect to formalities and time schedule), as well as a clear presentation of the contents. Furthermore, the students’ independent work is to be emphasized by looking beyond the edge of one's own nose.

Schedule

This seminar belongs to the area of applied computer science. The topics are assigned during the introductory event. Then, the students work out the topics over the course of the semester. The corresponding presentations take place as block course one day (or two days) at the end of the lecture period or in the exam period. Attendance is compulsory for the introductory event and the presentation block.
Futhermore, we will introduce the students to "Scientific Writing" and "Scientific Presenting" in computer science. These two events are also compulsory in attendance.

The compulsory introductory event (kickoff) is scheduled for Wednesday, October 15, 9 a.m. - 11.a.m. The next compulsory meetings are planned for Wednesday, October 22, 10:30 a.m. - 12:00 p.m. and Friday, October 24, 10:30 a.m. - 12 p.m.

Furthermore, the seminar is an in-person event. That means that you need to be personally present for all compulsory parts of the seminar.

Registration/ Application

Seats for this seminar are distributed by the global registration process of the computer science department only. We appreciate if you state your interest in HPC, and also your pre-knowledge in HPC (e.g., relevant lectures, software labs, and seminars that you have passed) in the corresponding section during the registration process.

Requisites

The goals of a seminar series are described in the corresponding Bachelor and Master modules. In addition to the seminar thesis and its presentation, Master students will have to lead one set of presentations (roughly 3 presentations) as session chair. A session chair makes sure that the session runs smoothly. This includes introducing the title of the presentation and its authors, keeping track of the speaker time and leading a short discussion after the presentation. Further instructions will be given during the seminar.

Prerequisites

The attendance of the lecture "Introduction to High-Performance computing" (Prof. Müller) is helpful, but not required.

Language

We prefer and encourage students to do the report and presentation in English. But, German is also possible.

Types of Topics

We provide two flavors of seminar topics depending on the particular topic: (a) overview topics, and (b) dive-in topics. It works as the names suggest. Nevertheless, this categorization does not necessarily imply a strict "either-or" but rather provides a guideline for addressing the topic. In general, both types of topics are equally difficult to work on. However, they have different challenges. In the topic list below, you can also find the corresponding categorizations for the seminar topic types.

Topics

Exploring the Impact of HBM-Enabled CPUs

High-performance computing (HPC) continues to demand advanced memory solutions to handle increasingly complex and data-intensive workloads. While high-bandwidth memory (HBM) has already been employed extensively in GPU accelerators, its performance impact on CPU-based HPC systems has only recently been explored.

This seminar thesis investigates how HBM architectures influence the performance of demanding HPC workloads and compares the trade-offs between HBM and conventional DDR memory, including various memory modes. In addition, the evaluation will be partially conducted on the CLAIX supercomputer, offering practical insights into both the advantages and limitations of next-generation memory technologies in real-world HPC environments.

Kind of topic: overview
Supervisor: Tim Cramer

Investigating the Influence of Write-Allocate Evasion on HPC Workloads

Modern processors rely heavily on caches to enhance performance by keeping recently accessed data close to the execution units. In conventional write-allocate cache architectures, however, the required read-before-write can introduce unnecessary overhead in certain scenarios. To mitigate this, many systems offer non-temporal (or “streaming”) stores that bypass the cache. Starting with the Ice Lake microarchitecture, Intel introduced a hardware optimization called Write-Allocate Evasion to reduce redundant memory traffic.

This seminar thesis requires the student to explain the Write-Allocate Evasion mechanism and analyze its impact on typical HPC workloads. An in-depth study of a selected benchmark (e.g., CloverLeaf) is expected. The student will have the opportunity to evaluate the effects on the CLAIX supercomputer.

Kind of topic: in-depth
Supervisor: Tim Cramer

Governing OpenMP Task Scheduling Policies

OpenMP runtime implementations don't always choose the optimal strategy for queueing and scheduling tasks. Also, different implementations behave differently. A proposal suggests to add new hint for improved runtime behavior guided by the application developer.

This seminar thesis will provide an overview about different documented (or empirically found) queueing and scheduling strategies of OpenMP runtime implementations. It will survey OpenMP tasking-related publications and provide an overview about strategies to improve task scheduling behavior with existing techniques as well as proposals for extension of the OpenMP API.

Kind of topic: overview
Supervisor: Joachim Jenke

Mitigating Load Imbalances in Hybrid MPI+OpenMP codes

Load imbalances are one limiting factor for parallel scalability. In cases where avoiding the load imbalance is difficult, thread malleability can help to temporarily increase the available resources for a process with higher workload. Different frame works were presented that provide such malleability to application.

The seminar thesis will provide an overview about available works and compare the approaches. It will further discuss the performance gains presented in the related works.

Kind of topic: overview
Supervisor: Joachim Jenke

Evaluating Task-Based Distributed Computing for Heterogeneous Architectures

Maximizing performance in HPC applications targeting heterogeneous architectures across multiple nodes has traditionally relied on manual workload distribution and resource tuning. These systems, composed of diverse compute units such as CPUs, GPUs, and specialized accelerators, present significant challenges in achieving balanced execution. Manual approaches often struggle to adapt to dynamic workloads or evolving resource availability, especially at scale.

To address these challenges, task-based programming models have emerged as a flexible alternative. By abstracting computation into fine-grained tasks with explicit dependencies, task-based runtimes can dynamically schedule and balance workloads across heterogeneous resources and distributed memory nodes. This seminar thesis should investigate the capabilities and conceptual foundations of task-based runtimes for heterogeneous, distributed systems. It should further compare selected frameworks in terms of scheduling, load balancing, and dependency management, and present performance measurements to assess their effectiveness in real-world scenarios.

Kind of topic: overview
Supervisor: Jan Kraus

Investigating the Performance of Stdpar Offloading Implementations

With the introduction of C++17, the C++ standard library gained support for parallel algorithms through the std::execution framework, enabling more expressive and efficient parallelism on CPUs. While originally limited to CPU execution, recent compiler developments have extended this capability to support GPU offloading, making it possible to run standard parallel algorithms on heterogeneous systems without abandoning familiar C++ abstractions.

This seminar thesis should examine the current state of GPU-enabled stdpar implementations, focusing in particular on the support provided by NVIDIA's nvc++ compiler and AdaptiveCpp. The student should analyze how each implementation maps standard parallel algorithms to the GPU, and evaluate their strengths and limitations in terms of programmability and performance. Comparative benchmarks on representative workloads can be conducted optionally to assess the efficiency and practicality of using stdpar as a unified parallel programming model for both CPU and GPU execution.

Kind of topic: dive-in
Supervisor: Jan Kraus

Comparison of Performance Analysis Tools for (OpenMP) Task-Based Applications

Modern applications often have complex dependencies between workpackages and standard blocksynchronous parallelization approaches introduced unnecessary waiting times. Tasking allows to define workpackages and their dependencies directly, which then get scheduled to threads. These fine-grained synchronization can bring performance benefits. While the concept of tasking exists in OpenMP already since more than 15 years, common performance analysis tools still provide very few task-specific analysis options. Task-specific tools have been proposed over 10 years ago, but most of them are no longer developed or existing at all. However, there have been a few publications in recent years that propose new tools for task-based analysis.

This thesis should provide an overview of old and new performance analysis tools that support a (OpenMP) task-specific analysis, based on an extensive literature review. The tools should be compared in regards to their functionalities, usability and overhead. If applicable, a categorization of the tools can also be integrated.

Kind of topic: overview
Supervisor: Ben Thärigen

Determining Parallelization Potential in Parallel Programs

In a world where applications grow larger and more complex by the day and computer systems are dominated by multiple-core architectures, it becomes crucial to effectively parallelize your programs to obtain performant executions. However, doing so can be quite a challenge when having to consider millions of code lines and hundreds of code regions. A lot of work has been done to (partially) automate the workflow of determining code regions which are not only parallelizable, but also promise adequate performance improvements.

For this paper, the student should perform an extensive literature review to discover different strategies for determining parallelization potential of code regions in modern programs. In the paper, the different approaches should be presented and compared to each other by evaluating their strengths and weaknesses.

Kind of topic: overview
Supervisor: Ben Thärigen

GPU Stream Semantics for MPI

Modern HPC systems make extensive use of compute accelerators. Recent communication libraries, including the collective communication libraries NCCL and RCCL, have been developed to define stream-based semantics to enhance support for GPU-accelerated applications. Although MPI is the de facto standard for distributed-memory communication in HPC, as of MPI 5.0, the MPI standard still does not define GPU support, e.g., in the form of stream semantics.

In this thesis, the student should conduct a literature review on approaches to address GPU support in MPI programs. Optionally, the student may evaluate the status of proposed prototypes on CLAIX-2023.

Kind of topic: overview
Supervisor: Felix Tomski

Collective Contracts for Message-Passing Parallel Programs

Extensive research exists on correctness checking MPI programs, mainly focusing on dynamic approaches. One static approach to program verification is procedure contracts, which are widely used outside the HPC world, e.g., for serial C or Java programs.

In this thesis, the student should present the proposed contract theory for collective message-passing procedures and explain how it can be employed to verify the correctness of MPI programs. Optionally, the student may conduct own experiments by evaluating the proposed approach on test cases from commonly used benchmark suites for correctness checking.

Kind of topic: dive-in
Supervisor: Felix Tomski

Evaluation Methodologies of HPC Application Benchmarks for HPC
Procurements

Benchmarking is an essential part of the procurement of HPC systems to ensure that the requested features of the new cluster will and was delivered by the vendor as promised. Especially, before the request of proposal (RFP) of an HPC procurement is published, rigorous benchmarking on current HPC clusters and new testbed hardware must be executed to be able to "predict" performance and energy consumption of the benchmarks running on potential hardware that will be delivered by the vendors in the future. For that, the measured results must be evaluated and conclusions drawn for the HPC tender document. One additional consideration is also which benchmark to include in the HPC tender: HPC application benchmarks shall represent the workload on the cluster and shall be focus of this work.

This seminar thesis shall investigate and compare different Evaluation methodologies of HPC application benchmarks with respect to their usage in HPC procurements and acceptance tests. "Methodolgies" mean various
approaches to compare and predict benchmark evaluation results. Here, the figure of merit of evaluating benchmarks shall be in terms of performance (runtime, bandwidth, efficiency,...), energy consumption and categories that
represent the cluster workload, e.g., by so-called motifs or simply by hardware-dominant behavior (compute, memory, IO,...). HPC benchmark suites used in procurements and shall be (at least) investigated in this seminar thesis are, e.g., the JUPITER Benchmark Suite, the PRACE UEABS, the NERSC-10 Benchmark Suite or the CORAL-2 benchmarks.

Kind of topic: overview
Supervisor: Sandra Wienke

Supervisors & Organization

Tim Cramer
Joachim Jenke
Jan Kraus
Ben Thärigen
Felix Tomski
Sandra Wienke

Contact

Ben Thärigen

Send Email

Tools

Services

Institutions

High Performance Computing (Computer Science 12)