Závěrečná práce: Bc. Matúš Madzin, učo 207505: Source-to-source compilation of mapped functions sequences in CUDA
Diplomová práce
Source-to-source compilation of mapped functions sequences in CUDA
Anotace
Výsledkom tejto práce je vysoko úrovňový jazyk a kompilátor, ktorý preloži kód napísaný v navrhnutom jazyku do odpovedajúcej reprezentácie v CUDA C jazyku. Kompilátor je automatický nástroj pre fúzovanie GPU implementácií mapovaných funkcií. Táto práca popisuje formálnu definíciu navrhováneho jazyka a proces generovania kódu.
Abstract
The result of this thesis is a high-level language and a compiler which translates a source code written in the designed language to a corresponding representation in the CUDA C language. The compiler is an automated tool for fusion of mapped function implementations on GPU. This thesis describes a formal definition of the designed language, its translation into intermediate representation and code generation from the optimized intermediate representation.
Zadání práce
Modern GPU outperforms contemporary CPU by order of magnitude both in memory bandwidth and arithmetic performance. To be able to unleash the power of GPU it is necessary to carry out enough arithmetical operations compared to the amount of data transfered between GPU on-chip and global memory. It is generally not possible to achieve good ratio of arithmetical operations to memory transfers in the case of mapping functions on a set of small elements in which case the data are not reused enough once is loaded from the global memory. If several consequent functions are being mapped it is possible to fuse these functions and store the intermediate results in faster on-chip memory. However, higher consumption of this memory can reduce the degree of parallelism achieved on the GPU and consequently decrease performance.
Implementation of source-to-source compiler seems to be a suitable solution, enabling to fuse particular functions in automated manner based on a simple description of the computation defining the functions call sequence. The compiler will be able to find the effective balance between reducing the memory transfers by fusing the particular functions and lowering the consumption of the on-chip resources limiting attainable parallelism.
The subject of this thesis is to design simple language allowing programmer to define the sequence of functions call and to generate code of fused kernels and kernels calls according to defined call sequence.
The language for functions call will allow to allocate particular data elements and apply functions to them. Moreover, it will allow to use loops of fixed length and index the data elements. The parser transforming the code in this language to direct acyclic graph representing function calls and parameters passing will be developed.
The DAG is processed by optimizing part of the compiler, which is subject of another thesis. On the output of the optimizer, the subsets of functions which is to be fused is defined as well as the implementation of particular fusions (functions call linearization, memory allocation and reuse etc.). The student will implement the code generator, transforming output of the optimizer to CUDA source code. The code generator will be able to generate GPU kernels, CPU functions calling them and helper code allowing to check GPU results again CPU and benchmark the performance of GPU implementation.
31. 5. 2011 14:29, doc. RNDr. Jiří Filipovič, Ph.D., učo 72898
Vedoucí
Literatura
- AHO, Alfred V. Compilers : principles, techniques, & tools. 2nd ed. Boston: Pearson/Addison Wesley, 2007, xxiv, 1009. ISBN 0321486811.
- PARR, Terence. The definitive ANTLR reference : building domain-specific languages. Raleigh, N.C.: Pragmatic Bookshelf, 2007, xx, 361. ISBN 9780978739256.
- FILIPOVIČ, Jiří; Igor PETERLÍK a Jan FOUSEK. GPU Acceleration of Equations Assembly in Finite Elements Method -- Preliminary Results. In Symposium on Application Accelerators in High Performance Computing 2009. 2009.
- MARKALL, G. R.; D. A. HAM a P. H. J. KELLY. Towards generating optimised finite element solvers for GPUs from high-level specifications. In Proceedings of the 10th International Conference on Computational Science (ICCS 2010). 2010.
Práce na příbuzné téma
Seznam prací, které mají shodná klíčová slova.
-
Návrh a implementace aritmetického kodéru pro platformu CUDA
RNDr. Vít Rusňák, Ph.D., učo 172757 -
GPU akcelerace extrakce řezu z voxelového pole
Mgr. Filip Čáp -
Evolution of Nvidia GPU from microarchitectures Pascal to Ampere
Mgr. Marek Toma, učo 485275 -
Sledování paprsku na programovatelných grafických kartách
RNDr. Marek Vinkler, Ph.D., učo 172521 -
GPU implementace adaptivního aritmetického kodéru pro JPEG2000
Mgr. Martin Šrom, učo 208213 -
Acceleration Data Structure Construction for Ray tracing
RNDr. Marek Vinkler, Ph.D., učo 172521 -
Zlepšení metody predikce výkonu fúzovaných CUDA kernelů
Mgr. Peter Novák -
Key derivation functions and their GPU implementations
Mgr. Ondrej Mosnáček, učo 409879




