Diplomová práce

Source-to-source compilation of mapped functions sequences in CUDA

Bc. Matúš Madzin, učo 207505
Anotace

Výsledkom tejto práce je vysoko úrovňový jazyk a kompilátor, ktorý preloži kód napísaný v navrhnutom jazyku do odpovedajúcej reprezentácie v CUDA C jazyku. Kompilátor je automatický nástroj pre fúzovanie GPU implementácií mapovaných funkcií. Táto práca popisuje formálnu definíciu navrhováneho jazyka a proces generovania kódu.

Abstract

The result of this thesis is a high-level language and a compiler which translates a source code written in the designed language to a corresponding representation in the CUDA C language. The compiler is an automated tool for fusion of mapped function implementations on GPU. This thesis describes a formal definition of the designed language, its translation into intermediate representation and code generation from the optimized intermediate representation.

Zadání práce

Modern GPU outperforms contemporary CPU by order of magnitude both in memory bandwidth and arithmetic performance. To be able to unleash the power of GPU it is necessary to carry out enough arithmetical operations compared to the amount of data transfered between GPU on-chip and global memory. It is generally not possible to achieve good ratio of arithmetical operations to memory transfers in the case of mapping functions on a set of small elements in which case the data are not reused enough once is loaded from the global memory. If several consequent functions are being mapped it is possible to fuse these functions and store the intermediate results in faster on-chip memory. However, higher consumption of this memory can reduce the degree of parallelism achieved on the GPU and consequently decrease performance.

Implementation of source-to-source compiler seems to be a suitable solution, enabling to fuse particular functions in automated manner based on a simple description of the computation defining the functions call sequence. The compiler will be able to find the effective balance between reducing the memory transfers by fusing the particular functions and lowering the consumption of the on-chip resources limiting attainable parallelism.

The subject of this thesis is to design simple language allowing programmer to define the sequence of functions call and to generate code of fused kernels and kernels calls according to defined call sequence.

The language for functions call will allow to allocate particular data elements and apply functions to them. Moreover, it will allow to use loops of fixed length and index the data elements. The parser transforming the code in this language to direct acyclic graph representing function calls and parameters passing will be developed.

The DAG is processed by optimizing part of the compiler, which is subject of another thesis. On the output of the optimizer, the subsets of functions which is to be fused is defined as well as the implementation of particular fusions (functions call linearization, memory allocation and reuse etc.). The student will implement the code generator, transforming output of the optimizer to CUDA source code. The code generator will be able to generate GPU kernels, CPU functions calling them and helper code allowing to check GPU results again CPU and benchmark the performance of GPU implementation.

Práce zkontrolována:
31. 5. 2011 14:29, doc. RNDr. Jiří Filipovič, Ph.D., učo 72898
Plný text práce
752 KB / soubor PDF
Jazyk práce
angličtina angličtina
Termín obhajoby
28. 6. 2011
Práce byla úspěšně obhájena

Vedoucí

doc. RNDr. Jiří Filipovič, Ph.D., učo 72898
VSJF CERIT-SC ÚVT MU

Oponent

Mgr. Aleš Křenek, Ph.D., učo 3086
CUSupp ScColl CERIT-SC ÚVT MU

Literatura

  • AHO, Alfred V. Compilers : principles, techniques, & tools. 2nd ed. Boston: Pearson/Addison Wesley, 2007, xxiv, 1009. ISBN 0321486811.
  • PARR, Terence. The definitive ANTLR reference : building domain-specific languages. Raleigh, N.C.: Pragmatic Bookshelf, 2007, xx, 361. ISBN 9780978739256.
  • FILIPOVIČ, Jiří; Igor PETERLÍK a Jan FOUSEK. GPU Acceleration of Equations Assembly in Finite Elements Method -- Preliminary Results. In Symposium on Application Accelerators in High Performance Computing 2009. 2009.
  • MARKALL, G. R.; D. A. HAM a P. H. J. KELLY. Towards generating optimised finite element solvers for GPUs from high-level specifications. In Proceedings of the 10th International Conference on Computational Science (ICCS 2010). 2010.

Masarykova univerzita Fakulta informatiky
Studijní program
Informatika
 
Název
Vložil
Vloženo
Práva
Archiv závěrečné práce Matúš Madzin FI N-IN PDS, učo 207505 jdfkx/7
30. 5. 2011
  • Přidání souboru

    Soubor nebo složku lze nahrát pomocí tlačítka Přidat.
  • Další operace se soubory

    Podrobnosti lze zjistit označením příslušného řádku.
  • Pohled pro experty

    Pro častou práci je možné zvolit režim Více možností.
  • Vyhledávání souborů

    Vyhledávaný výraz můžete zadat přímo do adresního řádku.
  • Rychlý přístup k souborům

    Pomocí funkce Nedávné je možné se rychle vrátit k právě prohlíženým souborům. Oblíbené soubory je také možné označit Hvězdičkou.