Seminar Current Topics in High-Performance Computing

Content

High-performance computing is applied to speedup long-running scientific applications, for instance the simulation of computational fluid dynamics (CFD). Today's supercomputers often base on commodity processors, but also have different facets: from clusters over (large) shared-memory systems to accelerators (e.g., GPUs). Leveraging these systems, parallel computing with, e.g., MPI, OpenMP or CUDA must be applied.

This seminar focuses on current research topics in the area of HPC and is based on conference and journal papers. Topics might cover, e.g., novel parallel computer architectures and technologies, parallel programming models, current methods for performance analysis & correctness checking of parallel programs, performance modeling or energy efficiency of HPC systems. The seminar consists of a written study and the presentation of a specific topic.

The objectives of this seminar are the independent elaboration of an advanced topic in the area of high-performance computing and the classification of the topic in the overall context. This includes the appropriate preparation of concepts, approaches and results of the given topic (also with respect to formalities and time schedule), as well as a clear presentation of the contents. Furthermore, the students’ independent work is to be emphasized by looking beyond the edge of one's own nose.

Schedule

This seminar belongs to the area of applied computer science. The topics are assigned during the introductory event. Then, the students work out the topics over the course of the semester. The corresponding presentations take place as block course one day (or two days) at the end of the lecture period or in the exam period. Attendance is compulsory for the introductory event and the presentation block.

More information is available in RWTHmoodle.

Registration/ Application

Seats for this seminar are distributed by the global registration process of the computer science department only. We appreciate if you state your interest in HPC, and also your pre-knowledge in HPC (e.g., relevant lectures, software labs, and seminars that you have passed) in the corresponding section during the registration process.

Requisites

The goals of a seminar series are described in the corresponding Bachelor and Master modules. In addition to the seminar thesis and its presentation, Master students will have to lead one set of presentations (roughly 3 presentations) as session chair. A session chair makes sure that the session runs smoothly. This includes introducing the title of the presentation and its authors, keeping track of the speaker time and leading a short discussion after the presentation. Further instructions will be given during the seminar.

Prerequisites

The attendance of the lecture "Introduction to High-Performance computing" (Prof. Müller) is helpful, but not required.

Language

We prefer and encourage students to do the report and presentation in English. But, German is also possible.

Topics

Virtual Topologies in MPI

Virtual Topologies have been a part of the Message Passing Interface (MPI) since version 2.0 and are to be extended with the upcoming version 4.0. Their benefit is meant to be two-fold. First, they provide an easy-to-use interface to facilitate neighbor communication in a fixed process topology, and second, they should enable internal optimizations in the MPI library to optimize rank placement within the communicator.

This seminar thesis will address some of the following questions: Which MPI implementations (available or made available on CLAIX) support rank reordering based on virtual topology information? Can a performance benefit be observed? Which techniques can be employed by the user and/or the MPI library to optimize rank placement for communication?

Supervisor
Marc-André Hermanns

Understanding Progress in MPI

MPI does not provide a strong progress guarantee for communication. As a result, MPI implementations have a variety of options to implement progress. However, the different options may come with different performance trade-offs and it may be hard for users to choose the right for of progress for their application.

This seminar thesis will address some of the following questions: What different progress options are supported by current MPI implementation (available or made available on CLAIX)? What are the performance implications of the different progress approaches? Can any recommendations be given to the user upfront?

Supervisor
Marc-André Hermanns

I/O Benchmarking Interoperability in Cloud Service and HPC Clusters

Cloud computing usage is getting more prominent in the scientific computing field that is traditionally occupied by HPC clusters. Although the computation and data workload in the Cloud should be similar to the HPC clusters, the methods for assessing Cloud performance can be completely different.

In this seminar, we want to focus on the I/O part of the assessment and discuss where this difference in approaches is coming from. Another discussion expected is about the possibility to utilize I/O benchmarks for Cloud for HPC clusters and vice versa.

Supervisor
Radita Liem

PIOM-PX: Usage, Analysis, and Use Cases

Understanding I/O behavior is fundamental to evaluate the I/O performance of the HPC applications and I/O performance measurement tools play an important role to help application developers and analysts to understand and improve their applications. There's always a new tool develop for this I/O analysis purpose that compliments or better than the other old existing tools

PIOM-PX is a I/O performance evaluation tool that can obtain the main parameters at POSIX-IO to define the I/O behavior model. It can be used to evaluate the impact of the I/O phases on the I/O system and to replicate the application's I/O behavior in different HPC systems.

In this seminar topic, we will take a closer look on PIOM-PX and discuss its usage, how it fares to other existing I/O performance measurement tools, and what are the use cases that need to use PIOM-PX

Supervisor
Radita Liem

Characterizing Parallel I/O Libraries Performance in CLAIX

Parallel I/O libraries (MPI-IO, HDF5, and PNetCDF) can be utilized to improve the performance of I/O-heavy HPC applications. However, the performance improvement is not uniform across all libraries depending on many factors such as filesystems, and libraries configuration. Having a comprehensive understanding from the benchmark data on a specific system of these libraries, can benefit many stake holders from application developers when tuning their application and system administrators for monitoring the health of the service.

This seminar will use several synthetic benchmarks for parallel I/O libraries on CLAIX HPC cluster and creating an analysis from the benchmark result.

Supervisor
Radita Liem

Static data race / data mapping issue detection of OpenMP applications

In correctness analysis we distinguish static (e.g., compile-time), dynamic (execution-time/on-the-fly) and post-mortem (after the execution) analysis. In this case we look at static analysis of OpenMP applications.

The seminar thesis will answer some of the following questions: Which static analysis techniques are used to detect data races and or data mapping issues in OpenMP applications? How do the techniques differ in the key detection metrics? How do the static analysis results differ from dynamic analysis?

Supervisor
Joachim Protze

Evaluation of the expressiveness of DataRaceBench

DataRaceBench is a benchmark to evaluate the detection rate of data race detection tools. The benchmark was extended in several iterations and now consists of more than hundred OpenMP kernel applications.

The seminar thesis will answer some of the following questions: What is DRB? How does it work / how is it supposed to work? How are the metrics calculated? What is the coverage of DRB? Does the automatic evaluation executed by DRB withstand manual evaluation? If not, how could the automatic evaluation be refined?

Supervisor
Joachim Protze

Floating point arithmetic: ordering matters

Due to rounding errors and cancellation, the ordering of arithmetic operations can impact the result. What is (1.0 + 1e-20 - 1.0) ? Since multiplication and addition are commutative, sorting of summands/factors can reduce the impact of rounding errors and cancellation. Various approaches and frameworks can be found in literature.

The seminar thesis will compare different approaches for reducing the impact of floating point errors. What are strengths / weaknesses of these approaches? Can the reordering make things worse? What is the possible performance penalty?

Supervisor
Joachim Protze

Power Management of Heterogeneous Systems

Nowadays, large-scale clusters are often built with heterogeneous hardware, e.g. the flagship cluster “Summit” in 2019. Several accelerators and processors are integrated into a compute node to achieve higher performance and better power efficiency. However, such large-scale clusters still face considerable power and energy challenges because of limited power supply or high energy costs.

In this seminar thesis, the power consumption of different node components should be analyzed. Advanced strategies to manage the power draw of such heterogeneous systems need to be looked at.

Supervisor
Bo Wang

DRAM Latency or Bandwidth: Where is The Performance Bottleneck?

As a standard component of a compute node, DRAM is frequently used to save temporary results. Compared to processors, DRAM’s performance is low. Therefore, DRAM determines the execution performance in many cases if the temporary results are retrieved intensively.

The goal of this topic is understanding how DRAM determines a jobs’ performance, regarding data access latency and memory bandwidth. The candidate should explore methods to distinguish the two contentions. Optimally, he or she can construct a model to present the two aspects.

Supervisor
Bo Wang

A Deep Dive into GPGPU: Performance and Power Management

GPGPUs or similar accelerators almost become standard component of a large-scale cluster. Beside their noticeable high computing power, they also possess a great share on the clusters’ electric power consumption. Compared a commercial CPU with 150 W, a GPU has a power draw up to 400 W.

This topics should evaluate GPUs regarding total costs of ownership and return on investments. In a further step, the candidate need to explore potentials and possibilities managing power and performance of GPUs.

Supervisor
Bo Wang

Supercomputers for COVID-19 Research

The COVID-19 pandemic impacts our daily life. The urgent search for drugs that can inhibit SARS-CoV-2 has been supported by (big) HPC systems. For instance, these supercomputers accelerate the search within the extensive chemical space of potential drugs.

In this seminar thesis, you should examine the usage of supercomputers for COVID-19 research. For example, the COVID-19 HPC Consortium has been established whose members provide compute time and resources. These include top supercomputers in the Top500 list.

Supervisor
Sandra Wienke