Seminar Current Topics in High-Performance Computing

Content

High-performance computing is applied to speedup long-running scientific applications, for instance the simulation of computational fluid dynamics (CFD). Today's supercomputers often base on commodity processors, but also have different facets: from clusters over (large) shared-memory systems to accelerators (e.g., GPUs). Leveraging these systems, parallel computing with, e.g., MPI, OpenMP or CUDA must be applied.

This seminar focuses on current research topics in the area of HPC and is based on conference and journal papers. Topics might cover, e.g., novel parallel computer architectures and technologies, parallel programming models, current methods for performance analysis & correctness checking of parallel programs, performance modeling or energy efficiency of HPC systems. The seminar consists of a written study and the presentation of a specific topic.

The objectives of this seminar are the independent elaboration of an advanced topic in the area of high-performance computing and the classification of the topic in the overall context. This includes the appropriate preparation of concepts, approaches and results of the given topic (also with respect to formalities and time schedule), as well as a clear presentation of the contents. Furthermore, the students’ independent work is to be emphasized by looking beyond the edge of one's own nose.

Schedule

This seminar belongs to the area of applied computer science. The topics are assigned during the introductory event. Then, the students work out the topics over the course of the semester. The corresponding presentations take place as block course one day (or two days) at the end of the lecture period or in the exam period. Attendance is compulsory for the introductory event and the presentation block.
Futhermore, we will introduce the students to "Scientific Writing" and "Scientific Presenting" in computer science. These two events are also compulsory in attendance.

The compulsory introductory event (kickoff) is scheduled for April 15th, 2024, 2:30pm-4:30pm. The next compulsory meetings are planned for April 16th, 2024, 12:30pm-2pm, April 18th, 2024, 12:30-2pm, and June 25th, 12:15pm-2:15pm.

Furthermore, we plan to do the seminar as an in-person event. That means that you need to be personally present for all compulsory parts of the seminar.

Registration/ Application

Seats for this seminar are distributed by the global registration process of the computer science department only. We appreciate if you state your interest in HPC, and also your pre-knowledge in HPC (e.g., relevant lectures, software labs, and seminars that you have passed) in the corresponding section during the registration process.

Requisites

The goals of a seminar series are described in the corresponding Bachelor and Master modules. In addition to the seminar thesis and its presentation, Master students will have to lead one set of presentations (roughly 3 presentations) as session chair. A session chair makes sure that the session runs smoothly. This includes introducing the title of the presentation and its authors, keeping track of the speaker time and leading a short discussion after the presentation. Further instructions will be given during the seminar.

Prerequisites

The attendance of the lecture "Introduction to High-Performance computing" (Prof. Müller) is helpful, but not required.

Language

We prefer and encourage students to do the report and presentation in English. But, German is also possible.

Types of Topics

We provide two flavors of seminar topics depending on the particular topic: (a) overview topics, and (b) dive-in topics. It works as the names suggest. Nevertheless, this categorization does not necessarily imply a strict "either-or" but rather provides a guideline for addressing the topic. In general, both types of topics are equally difficult to work on. However, they have different challenges. In the topic list below, you can also find the corresponding categorizations for the seminar topic types.

Topics

CIVL-C a suitable IR for static correctness verification?

Verifying the correctness of parallel programs is important as errors are often not trivially visible to the developer and may lead to crashes of the program without information on the root problem. There are dynamic and static correctness verification approaches. While dynamic approaches make use of actual runtime information, static approaches are independent of certain runs but also have less information to work with. Further, one subcategory of static approaches is symbolic execution and model checking techniques. One of the more advanced tools including an IR is CIVL (Concurrency Intermediate Verification Language). The idea behind their IR CIVL-C is to abstract multiple programming models first into one representation before carrying out their verification through model checking.

This seminar thesis should analyze CIVL-C for its expressivness and suitability for other static (verification) approaches so that its abstraction can be reused across tools. By conducting a systematic research review, CIVL-C should be compared to related other IRs, abstracting parallel programming models and concurrency in general. It is in particular interest what the benefits and disadvantages would be to reimplement CIVL-C in MLIR, a multi-layer IR infrastructure building upon LLVM.

Kind of topic: dive-in
Supervisor: Semih Burak

Static Analysis, Verification, and Optimization on MLIR

Recently, MLIR as a multi-layer IR opened up many research fields. It eased the definition of custom IRs that can work side-by-side with existing ones. Being able to switch from high-level to low-level abstraction and back, and even mixing the level of abstraction, is an inherent feature in MLIR. Many compiler tasks that previously needed to be implemented repeatedly for each new IR, can now be neglected by making use of the rich infrastructure of MLIR, building upon LLVMs infrastructure. Static analysis, e.g. necessary for optimizing performance or verifying correctness, only makes use of information available at compile-time in contrast to dynamic approaches also using run-time information. These static analyses are often implemented as compiler passes and can be rather effortlessly implemented in the MLIR infrastructure.

This seminar thesis should conduct a systematic literature review of approaches that conduct static analysis for correctness verification or performance optimization either on existing MLIR dialects or on newly introduced MLIR dialects. Particularly, the introduced MLIR dialects but also the related approaches should be presented and their strengths and shortcomings in comparison to non-MLIR variants pointed out.

Kind of topic: overview
Supervisor: Semih Burak

Modern C++ Language Binding for MPI

As the official C++ binding for the Message Passing Interface (MPI) have been deprecated and removed from the MPI standard, C++ application developers have to fall back on the C bindings to enable MPI communication in their applications. As especially modern C++ capabilities and practices have come a long way from the common root with C, applications developers have also strived to create abstraction libraries to enable the use of MPI functionality within a modern C++ context.

This seminar thesis should look into different approaches of those libraries, their similarities, their differences, how they enable a modern C++ application development and how far the level of support for all MPI functionalities has grown so far.

Kind of topic: overview
Supervisor: Marc-André Hermanns

Agile Performance Analysis Interfaces

Performance analysis tools have a long standing tradition in HPC, with its inherent focus on application performance. Classic tools often provide their data in fixed reports and visual representations pre-configured by the tool developers. However, sometimes identification of underlying performance bugs can only be performed by the combination of performance data not initially anticipated by the tool developer. Agile interfaces to performance data allow users to combine and relate measurement data across performance metrics. Specifying these ad hoc may aid in situations where a source of a performance problem is not initially obvious.

This work should explore the benefits of agile interfaces in HPC performance analysis tools, identify limits and assess how such tools target user in the spectrum of novices to experts.

Kind of topic: overview
Supervisor: Marc-André Hermanns

Hyper-Parameter Optimization for Large Scale Deep Learning Workloads

As machine and deep learning models continue to advance, the role of hyper parameter optimization becomes increasingly critical. Hyper parameters can be used to configure and adjust different aspects including but not limited to: the number or dimensions of hidden model layers, the optimization algorithms and the learning rates in those optimization steps, the choice of cost functions or the number of epochs or batch sizes used for the training procedure. Although hyper parameters are able to significantly impact the performance of these models, influencing their accuracy, convergence speed, and generalization ability, finding the best parameters for a particular situation and dataset would require solving a high dimensional optimization problem and needs extensive compute capabilities to train models for each parameter combination, potentially not making the most efficient use of the available HPC resources.

In this seminar, the student should delve into the realm of hyper parameter optimization, exploring techniques that enhance model accuracy such as Bayesian optimizers, as well as bio-inspired or genetic algorithms while making efficient use of the underlying compute resources. The student should characterize these approaches and provide an overview about their strength and weaknesses as well their potential and usability for AI users on HPC infrastructures.

Kind of topic: overview
Supervisor: Jannis Klinkenberg

Investigation the Performance and Portability of SYCL Compiler Implementations on HPC Systems

Reviewing the Top 500 list of the largest supercomputers in the world reveals that more and more installations feature heterogeneous compute nodes that comprise not only multi-core CPUs but also several accelerators such as GPGPUs. Programming such platforms and distributing work and data between CPU and GPU typically requires the adoption of vendor-specific programming models such as HIP or CUDA, which in turn may limit portability. SYCL is a high-level, single-source language based on C++17, developed by the Khronos group to overcome the shortcomings of those vendor-specific HPC programming models and to improve code portability. Today, there are several SYCL compilers and backends available such as Open SYCL, Intel's DPC++ and ComputeCpp. Nevertheless, there is still a need to investigate SYCL's performance portability across different architectures and whether it can compete with other (potentially vendor-specific) programming models.

In this seminar thesis, the student should provide a short overview about SYCL and its advantages and shortcomings compared to other heterogeneous programming models. Further, the student should delve into the different SYCL compiler implementations and spot their differences, followed by a performance evaluation and comparison to other programming models.

Kind of topic: overview/dive-in
Supervisor: Jannis Klinkenberg

I/O Sharing and Competition in Burst Buffers

This topic deals with the difficulties of using shared resources in a High Performance Computing context. The performance of filesystems usually degrades for concurrent uses from several different applications at once. A new algorithm gets proposed that implements dynamic sharing in order to minimize the waste of I/O resources.

The seminar thesis requires background research about I/O in HPC, specifically about burst buffers. In addition, the sharing algorithm needs to be understood and explained, which will require additional research in how sharing and contention algorithms work. The proposed algorithm should be compared with other techniques and its usefulness should be evaluated. If desired, the CLAIX supercomputer may be used to conduct original experiments.

Kind of topic: dive-in
Supervisor: Philipp Martin

File System Traversal for Large Parallel File Systems

This topic looks at the problem of dealing with the high number of files in parallel file systems. Modern HPC systems may have billions of files and due to the architecture of parallel file systems, the metadata performance may be lacking when trying to aggregate information about all or a significant part of these files. By using the widely used Lustre file system as a case study, a number of optimizations are evaluated that improve the performance of directory traversal operations by up to several orders of magnitude.

In the thesis, it is crucial to explain the background and architecture of parallel file systems and Lustre in particular. Some of the mentioned optimizations will have to be explained in depth and require additional research to fully understand. The thesis should compare and contrast the optimizations and evaluate the experimental findings.

Kind of topic: dive-in
Supervisor: Philipp Martin

Determining Parallelization Potential in Parallel Programs

In a world where applications grow larger and more complex by the day and computer systems are dominated by multiple-core architectures, it becomes crucial to effectively parallelize your programs to obtain performant executions. However, doing so can be quite a challenge when having to consider millions of code lines and hundreds of code regions. A lot of work has been done to (partially) automate the workflow of determining code regions which are not only parallelizable, but also promise adequate performance improvements.

For this paper, the student should perform an extensive literature review to discover different strategies for determining parallelization potential of code regions in modern programs. In the paper, the different approaches should be presented and compared to each other by evaluating their strengths and weaknesses.

Kind of topic: overview
Supervisor: Isa Thärigen

Analyzing Trace Data from Multiple Sources to Detect Performance Issues in Parallel Programs

Tracing is a technique to get chronological information about the execution of a program. Analyzing the generated traces can provide crucial information about performance issues and helps to increase the efficiency of the program. As applications become more complex, an analysis can often no longer consider only one trace, but must instead include information from multiple traces or similar performance logs in the analysis.

For this paper, the student should conduct an extensive literature review to gain an overview of existing tools that have been proposed to analyze trace data from multiple sources. The paper should give an introduction of the different approaches, evaluate their strengths and weaknesses and compare them with each other.

Kind of topic: overview
Supervisor: Isa Thärigen

Calculating Carbon Footprints in Today’s HPC Systems

To limit climate change, net zero emissions are being targeted by many countries. A significant portion of today’s emissions originate from data centers and HPC systems due to the reliance on electricity for operation. A first step towards the reduction of their carbon emissions is an accurate and traceable calculation methodology. This calculation should not only include the operational carbon footprint from the continued operation but also the embodied carbon footprint attributed to one-off actions like the production of all components.

This seminar thesis should compare different carbon accounting methodologies from related literature assessing the following questions: Which system components are accounted for? How is each component accounted for? What data are the calculations based on? What is the scope of the calculations? Can the presented methods be transferred easily to other systems?

Kind of topic: overview
Supervisor: Christian Wassermann

Assessing the Performance Portability of the SPEChpc2021 Benchmark Suite

In the HPC community, standardized benchmarks are frequently being used to evaluate new methods or validate hardware performance. The Standard Performance Evaluation Corporation (SPEC) is a non-profit organization dedicated to providing portable benchmark suites for such purposes. However, are these benchmarks also performance portable meaning do they exhibit similar computational behavior on various HPC clusters?

To assess this question, the student should compare reported results and published analyses of the recently introduced SPEChpc2021 suite from different HPC systems regarding the observed performance characteristics and scaling behaviors. Hands-on experiments with benchmarking on the newly available CLAIX-2023 cluster of the RWTH Aachen are welcome although previous measurements are also available as a fallback.

Kind of topic: dive-in
Supervisor: Christian Wassermann

Supervisors & Organization

Semih Burak
Marc-André Hermanns
Jannis Klinkenberg
Philipp Martin
Isa Thärigen
Christian Wassermann

Kontakt

Isa Thärigen

E-Mail schreiben

Tools

Services

Einrichtungen

Lehrstuhl für Hochleistungsrechnen (Informatik 12)