Topics
CIVL-C a suitable IR for static correctness verification?
Verifying the correctness of parallel programs is important as errors are often not trivially visible to the developer and may lead to crashes of the program without information on the root problem. There are dynamic and static correctness verification approaches. While dynamic approaches make use of actual runtime information, static approaches are independent of certain runs but also have less information to work with. Further, one subcategory of static approaches is symbolic execution and model checking techniques. One of the more advanced tools including an IR is CIVL (Concurrency Intermediate Verification Language). The idea behind their IR CIVL-C is to abstract multiple programming models first into one representation before carrying out their verification through model checking.
This seminar thesis should analyze CIVL-C for its expressivness and suitability for other static (verification) approaches so that its abstraction can be reused across tools. By conducting a systematic research review, CIVL-C should be compared to related other IRs, abstracting parallel programming models and concurrency in general. It is in particular interest what the benefits and disadvantages would be to reimplement CIVL-C in MLIR, a multi-layer IR infrastructure building upon LLVM.
Kind of topic: dive-in
Supervisor: Semih Burak
Static Analysis, Verification, and Optimization on MLIR
Recently, MLIR as a multi-layer IR opened up many research fields. It eased the definition of custom IRs that can work side-by-side with existing ones. Being able to switch from high-level to low-level abstraction and back, and even mixing the level of abstraction, is an inherent feature in MLIR. Many compiler tasks that previously needed to be implemented repeatedly for each new IR, can now be neglected by making use of the rich infrastructure of MLIR, building upon LLVMs infrastructure. Static analysis, e.g. necessary for optimizing performance or verifying correctness, only makes use of information available at compile-time in contrast to dynamic approaches also using run-time information. These static analyses are often implemented as compiler passes and can be rather effortlessly implemented in the MLIR infrastructure.
This seminar thesis should conduct a systematic literature review of approaches that conduct static analysis for correctness verification or performance optimization either on existing MLIR dialects or on newly introduced MLIR dialects. Particularly, the introduced MLIR dialects but also the related approaches should be presented and their strengths and shortcomings in comparison to non-MLIR variants pointed out.
Kind of topic: overview
Supervisor: Semih Burak
Modern C++ Language Binding for MPI
As the official C++ binding for the Message Passing Interface (MPI) have been deprecated and removed from the MPI standard, C++ application developers have to fall back on the C bindings to enable MPI communication in their applications. As especially modern C++ capabilities and practices have come a long way from the common root with C, applications developers have also strived to create abstraction libraries to enable the use of MPI functionality within a modern C++ context.
This seminar thesis should look into different approaches of those libraries, their similarities, their differences, how they enable a modern C++ application development and how far the level of support for all MPI functionalities has grown so far.
Kind of topic: overview
Supervisor: Marc-André Hermanns
Agile Performance Analysis Interfaces
Performance analysis tools have a long standing tradition in HPC, with its inherent focus on application performance. Classic tools often provide their data in fixed reports and visual representations pre-configured by the tool developers. However, sometimes identification of underlying performance bugs can only be performed by the combination of performance data not initially anticipated by the tool developer. Agile interfaces to performance data allow users to combine and relate measurement data across performance metrics. Specifying these ad hoc may aid in situations where a source of a performance problem is not initially obvious.
This work should explore the benefits of agile interfaces in HPC performance analysis tools, identify limits and assess how such tools target user in the spectrum of novices to experts.
Kind of topic: overview
Supervisor: Marc-André Hermanns
Hyper-Parameter Optimization for Large Scale Deep Learning Workloads
As machine and deep learning models continue to advance, the role of hyper parameter optimization becomes increasingly critical. Hyper parameters can be used to configure and adjust different aspects including but not limited to: the number or dimensions of hidden model layers, the optimization algorithms and the learning rates in those optimization steps, the choice of cost functions or the number of epochs or batch sizes used for the training procedure. Although hyper parameters are able to significantly impact the performance of these models, influencing their accuracy, convergence speed, and generalization ability, finding the best parameters for a particular situation and dataset would require solving a high dimensional optimization problem and needs extensive compute capabilities to train models for each parameter combination, potentially not making the most efficient use of the available HPC resources.
In this seminar, the student should delve into the realm of hyper parameter optimization, exploring techniques that enhance model accuracy such as Bayesian optimizers, as well as bio-inspired or genetic algorithms while making efficient use of the underlying compute resources. The student should characterize these approaches and provide an overview about their strength and weaknesses as well their potential and usability for AI users on HPC infrastructures.
Kind of topic: overview
Supervisor: Jannis Klinkenberg
Investigation the Performance and Portability of SYCL Compiler Implementations on HPC Systems
Reviewing the Top 500 list of the largest supercomputers in the world reveals that more and more installations feature heterogeneous compute nodes that comprise not only multi-core CPUs but also several accelerators such as GPGPUs. Programming such platforms and distributing work and data between CPU and GPU typically requires the adoption of vendor-specific programming models such as HIP or CUDA, which in turn may limit portability. SYCL is a high-level, single-source language based on C++17, developed by the Khronos group to overcome the shortcomings of those vendor-specific HPC programming models and to improve code portability. Today, there are several SYCL compilers and backends available such as Open SYCL, Intel's DPC++ and ComputeCpp. Nevertheless, there is still a need to investigate SYCL's performance portability across different architectures and whether it can compete with other (potentially vendor-specific) programming models.
In this seminar thesis, the student should provide a short overview about SYCL and its advantages and shortcomings compared to other heterogeneous programming models. Further, the student should delve into the different SYCL compiler implementations and spot their differences, followed by a performance evaluation and comparison to other programming models.
Kind of topic: overview/dive-in
Supervisor: Jannis Klinkenberg
I/O Sharing and Competition in Burst Buffers
This topic deals with the difficulties of using shared resources in a High Performance Computing context. The performance of filesystems usually degrades for concurrent uses from several different applications at once. A new algorithm gets proposed that implements dynamic sharing in order to minimize the waste of I/O resources.
The seminar thesis requires background research about I/O in HPC, specifically about burst buffers. In addition, the sharing algorithm needs to be understood and explained, which will require additional research in how sharing and contention algorithms work. The proposed algorithm should be compared with other techniques and its usefulness should be evaluated. If desired, the CLAIX supercomputer may be used to conduct original experiments.
Kind of topic: dive-in
Supervisor: Philipp Martin
File System Traversal for Large Parallel File Systems
This topic looks at the problem of dealing with the high number of files in parallel file systems. Modern HPC systems may have billions of files and due to the architecture of parallel file systems, the metadata performance may be lacking when trying to aggregate information about all or a significant part of these files. By using the widely used Lustre file system as a case study, a number of optimizations are evaluated that improve the performance of directory traversal operations by up to several orders of magnitude.
In the thesis, it is crucial to explain the background and architecture of parallel file systems and Lustre in particular. Some of the mentioned optimizations will have to be explained in depth and require additional research to fully understand. The thesis should compare and contrast the optimizations and evaluate the experimental findings.
Kind of topic: dive-in
Supervisor: Philipp Martin
Determining Parallelization Potential in Parallel Programs
In a world where applications grow larger and more complex by the day and computer systems are dominated by multiple-core architectures, it becomes crucial to effectively parallelize your programs to obtain performant executions. However, doing so can be quite a challenge when having to consider millions of code lines and hundreds of code regions. A lot of work has been done to (partially) automate the workflow of determining code regions which are not only parallelizable, but also promise adequate performance improvements.
For this paper, the student should perform an extensive literature review to discover different strategies for determining parallelization potential of code regions in modern programs. In the paper, the different approaches should be presented and compared to each other by evaluating their strengths and weaknesses.
Kind of topic: overview
Supervisor: Isa Thärigen
Analyzing Trace Data from Multiple Sources to Detect Performance Issues in Parallel Programs
Tracing is a technique to get chronological information about the execution of a program. Analyzing the generated traces can provide crucial information about performance issues and helps to increase the efficiency of the program. As applications become more complex, an analysis can often no longer consider only one trace, but must instead include information from multiple traces or similar performance logs in the analysis.
For this paper, the student should conduct an extensive literature review to gain an overview of existing tools that have been proposed to analyze trace data from multiple sources. The paper should give an introduction of the different approaches, evaluate their strengths and weaknesses and compare them with each other.
Kind of topic: overview
Supervisor: Isa Thärigen
Calculating Carbon Footprints in Today’s HPC Systems
To limit climate change, net zero emissions are being targeted by many countries. A significant portion of today’s emissions originate from data centers and HPC systems due to the reliance on electricity for operation. A first step towards the reduction of their carbon emissions is an accurate and traceable calculation methodology. This calculation should not only include the operational carbon footprint from the continued operation but also the embodied carbon footprint attributed to one-off actions like the production of all components.
This seminar thesis should compare different carbon accounting methodologies from related literature assessing the following questions: Which system components are accounted for? How is each component accounted for? What data are the calculations based on? What is the scope of the calculations? Can the presented methods be transferred easily to other systems?
Kind of topic: overview
Supervisor: Christian Wassermann
Assessing the Performance Portability of the SPEChpc2021 Benchmark Suite
In the HPC community, standardized benchmarks are frequently being used to evaluate new methods or validate hardware performance. The Standard Performance Evaluation Corporation (SPEC) is a non-profit organization dedicated to providing portable benchmark suites for such purposes. However, are these benchmarks also performance portable meaning do they exhibit similar computational behavior on various HPC clusters?
To assess this question, the student should compare reported results and published analyses of the recently introduced SPEChpc2021 suite from different HPC systems regarding the observed performance characteristics and scaling behaviors. Hands-on experiments with benchmarking on the newly available CLAIX-2023 cluster of the RWTH Aachen are welcome although previous measurements are also available as a fallback.
Kind of topic: dive-in
Supervisor: Christian Wassermann
Supervisors & Organization
Semih Burak
Marc-André Hermanns
Jannis Klinkenberg
Philipp Martin
Isa Thärigen
Christian Wassermann