Seminar Current Topics in High-Performance Computing
High-performance computing is applied to speedup long-running scientific applications, for instance the simulation of computational fluid dynamics (CFD). Today's supercomputers often base on commodity processors, but also have different facets: from clusters over (large) shared-memory systems to accelerators (e.g., GPUs). Leveraging these systems, parallel computing with, e.g., MPI, OpenMP or CUDA must be applied.
This seminar focuses on current research topics in the area of HPC and is based on conference and journal papers. Topics might cover, e.g., novel parallel computer architectures and technologies, parallel programming models, current methods for performance analysis & correctness checking of parallel programs, performance modeling or energy efficiency of HPC systems. The seminar consists of a written study and the presentation of a specific topic.
The objectives of this seminar are the independent elaboration of an advanced topic in the area of high-performance computing and the classification of the topic in the overall context. This includes the appropriate preparation of concepts, approaches and results of the given topic (also with respect to formalities and time schedule), as well as a clear presentation of the contents. Furthermore, the students’ independent work is to be emphasized by looking beyond the edge of one's own nose.
This seminar belongs to the area of applied computer science. The topics are assigned during the introductory event. Then, the students work out the topics over the course of the semester. The corresponding presentations take place as block course one day (or two days) at the end of the lecture period or in the exam period. Attendance is compulsory for the introductory event and the presentation block.
Futhermore, we will introduce the students to "Scientific Writing" and "Scientific Presenting" in computer science. These two events are also compulsory in attendance.
The introductory event (kickoff) is scheduled for October 8th, 2021, from 9.30am to 11.30am.
More information is available in RWTHmoodle.
Seats for this seminar are distributed by the global registration process of the computer science department only. We appreciate if you state your interest in HPC, and also your pre-knowledge in HPC (e.g., relevant lectures, software labs, and seminars that you have passed) in the corresponding section during the registration process.
The goals of a seminar series are described in the corresponding Bachelor and Master modules. In addition to the seminar thesis and its presentation, Master students will have to lead one set of presentations (roughly 3 presentations) as session chair. A session chair makes sure that the session runs smoothly. This includes introducing the title of the presentation and its authors, keeping track of the speaker time and leading a short discussion after the presentation. Further instructions will be given during the seminar.
The attendance of the lecture "Introduction to High-Performance computing" (Prof. Müller) is helpful, but not required.
We prefer and encourage students to do the report and presentation in English. But, German is also possible.
Emerging Memory Technologies in High Performance Computing
Over the last decades, the architectural enhancements in high performance computing resulted in a constant increase in computational power of such machines by increasing core frequency, improving the manufacturing process and introducing multi- and many-core CPUs. In contrast, enhancements on the memory side - most systems use classical DRAM - did not equally keep up with that development resulting in a gap between computational power as well as memory latency and memory bandwidth. Consequently, several emerging memory technologies have come up with intending to close that gap by tackling challenges of DRAM. Some examples are High Bandwidth Memory (HBM and HBM2) that provides higher bandwidth but smaller memory capacity and Non-Volatile Memory (NVM) that has a larger capacity than DRAM but higher latency and lower bandwidth.
In this seminar thesis, the student should present an overview of different emerging memory technologies, how they work and highlight strengths and weaknesses of each technology. Further, the seminar thesis should discuss performance implications on scientific workloads and which application types can benefit or suffer from changing to a different technology.
Efficient Data Mapping in Systems with Heterogeneous Memory
Over the last decades, the architectural enhancements in high performance computing resulted in a constant increase in computational power of such machines by increasing core frequency, improving the manufacturing process and introducing multi- and many-core CPUs. In contrast, enhancements on the memory side - most systems use classical DRAM - did not equally keep up with that development resulting in a gap between computational power as well as memory latency and memory bandwidth. Consequently, several emerging memory technologies have come up with intending to close that gap by tackling challenges of DRAM. Some examples are High Bandwidth Memory (HBM and HBM2) that provides higher bandwidth but smaller memory capacity and Non-Volatile Memory (NVM) that has a larger capacity than DRAM but higher latency and lower bandwidth. Compute nodes of the next generation of super computers will most likely be equipped with a heterogeneous memory system combining classical DRAM and one or more of the aforementioned technologies. This will pose new questions, e.g., where to place memory and when to move memory between different memory types to efficiently use resources and reduce power consumption and time.
In this seminar thesis, the student should present an overview of approaches for managing the data placement and movement of data between different kinds of memory. Further, the seminar thesis should discuss the design choices tackling the aforementioned questions and illustrate power consumption and performance results for scientific applications.
Investigating Task-based Programming Frameworks for Distributed Memory Systems
In the past, scientific and industrial applications mainly consisted of well-balanced and regular workloads. Those workloads allowed a straightforward domain decomposition to evenly distribute work across compute nodes and the use of simple work-sharing techniques to evenly distribute the work across cores/threads within a compute node. Over the last decades and years, the complexity of and variety in applications has significantly increased. Applications employ recursive and irregular workloads where the work to perform might differ between processing units or can evolve/change over time leading to the fact that classical domain decomposition and work-sharing do not suffice to efficiently parallelize the application and ensure a proper load balance. Consequently, task-based programming paradigms were introduced - first for shared memory systems - that allow a more flexible specification of work packages and even dependencies between those. Several emerging frameworks aim to bring the task-based programming paradigm also to applications running on distributed memory systems by applying different methodologies and concepts. Examples among others are Charm++, Legion, Chameleon and TaskTorrent.
In this seminar thesis, the student should investigate the capabilities and fundamental conceptual differences of various task-based programming solutions for distributed memory. Further, strength and weaknesses of the approaches with respect to characteristics like scheduling, load balancing and dependencies between tasks should be discussed. Finally, some performance measurements for each approach should be presented.
A Deeper Look into Deep Learning Frameworks on Heterogeneous Architectures (English only)
Deep Learning (DL) frameworks implement Neural Network (NN) architectures while employing different optimizations techniques. The increasing popularity of heterogeneous machines requires a better understanding of the performance components of DL frameworks, i.e. how certain hybrid hardware configurations impact efficiency of these frameworks.
In this seminar thesis we will focus on a set of most common DL frameworks and NN architectures. In order to study their performance variations in heterogeneous environments we will evaluate NN models, resources consumption, and memory access patterns.
Detection and classification of Memory Access Patterns (English only)
To design intelligent runtime systems for improving utilization of future heterogeneous memory systems, identifying, classifying and accelerating common memory access patterns is crucial. This topic is a still evolving research subject as more efficient and complex high-performance computing devices emerge.
In this seminar we:(1) Survey existing memory access pattern classifications, and (2) Summarize techniques to identify them. Finally, we will evaluate pros and cons of these approaches and propose potential research directions.
Memory allocators in HPC (English only)
Dynamic memory allocation is a part of the memory management routine that can become a bottleneck in the HPC systems. Different memory allocators have been developed (jemalloc, tcmalloc, etc) which exhibit various performance trends: memory overhead, scalability, faster execution. In order to utilize memory resources efficiently, it is crucial to understand which memory allocators benefit specific applications.
In this seminar thesis, a student will review available dynamic memory allocation approaches and provide an analysis of their performance in HPC environment. Optional experimentation with the memory allocators can be carried out.
Convergence of Directive-based Offloading Paradigms?
With the increasing demand for computational resources through the advances in science and engineering, specialized hardware accelerators were introduced. They provide high performance and energy efficiency for a limited set of operations. There exists a variety of platforms and devices with multiple target-specific programming models. This leads to increased porting and tuning efforts. To reduce these, compiler directives provide means to the efficient and productive development of accelerator code. Hereby, OpenMP offloading and OpenACC are two commonly-used programming models in HPC. However, both models feature different interfaces, tool-support, and are incompatible in general. Thus, CCAMP provides a translation and optimization framework towards a unified code base and performance portability.
The goal of this seminar thesis is to investigate the current state of the OpenMP and OpenACC programming models and the translation and optimization capabilities through CCAMP. If possible, experiments with CCAMP can be carried out.
Static Analysis of Codes to Identify Parallel Patterns
Parallel programming is a challenging and time-consuming task. To improve the development process, best practices in form of parallel design patterns can be leveraged. They contain template solutions for commonly occurring problems such as map and reduce operations. To parallelize and optimize a code, parallel patterns need to be identified in a sequential code and specified in a parallel pattern framework, which then provides a parallel and optimized version of the pattern. To aid the developers in the identification process, static analysis can be used.
The goal of this seminar thesis is to investigate approaches for identifying and annotating parallel patterns in sequential code. If possible, experiments with these tools can be carried out.
Performance Evaluation of SYCL
The demand for computational resources is rising rapidly due to increased problem and data sizes. To satisfy this demand, computing systems are becoming more specialized and heterogeneous. Thus, the developers require high development efforts to efficiently map applications to such hardware architectures. One approach to minimize these efforts is through high abstraction programming models, which use a unified source code that can be optimized for the specific hardware characteristics of the targeted platform. SYCL is an open standard parallel programming model that provides such abstractions.
The goal of this seminar thesis is to investigate the performance and productivity of SYCL for HPC applications. If possible, experiments with small kernels on different platforms can be carried out.
Analysis of job outcome in relation to system logs
In todays world of heterogeneous architecture and increasing node sizes, hard-/software failure and miss allocation are getting increasing attention. Combining the exit conditions of jobs (completions, failures and timeouts) with system logs, information can be gained on these conditions. Succeeding, one could predict the exit status of a running job to give human operators or users information, time and the possibility to react with a possibly relevant lead-time.
Focus of this seminar work is an overview over existing prediction systems and an estimation, whether this could be employed on the Claix HPC system.
Self optimizing HPC Batch Scheduler
Effective batch job scheduling is essential for the operation of current HPC systems. While traditional scheduling strategies are established and work well for most workloads, this approach is only adaptable for local specificities or differences in a limited fashion. A self optimizing job batch scheduling system learning with 'trial and error' can alleviate some of these problems and find a better local solution to this optimization problem without reliance on expert knowledge.
Focus of this seminar work is an overview over the employed strategies and existing automatic schedulers and an estimation, whether this would be feasible for the Claix HPC system.
Energy Measurements on Heterogeneous HPC Systems
Power consumption and energy requirement are two of the current main characteristics of current HPC systems. With increasing efficiency and the corresponding flops/w requirements, having a solid energy measurement foundation for these systems is paramount. While peculiarities of different systems with their own measurement methods have to be addressed, this can optimally be hidden behind a generic interface. This would enable users to interact with this data independent of the concrete measurement backends and empowers comparisons between different systems.
Focus of this seminar work is an overview of the current state of power measurement in HPC systems (e.g. Claix) and how this could be improved with a unified interface.
Supervisors & Organization
Dr. rer. nat.
- E-Mail schreiben