Overview
A compiler can be loosely defined as a program that transforms other programs. Usually, a program written in a high-level programming language is transformed into a sequence of instructions executable by a processor or sometimes by a virtual machine. This transformation, however, does not happen in a single step. The high-level code goes through a series of transformations. Most compilers are divided into three parts: the frontend, the middle-end, and the backend. The frontend is responsible for parsing the high-level code. In the middle-end, the parsed code takes a form known as the intermediate representation that is independent from any high-level language and machine architecture. In fact, in the middle-end, multiple intermediate representations can be used. For instance, in GNU Compiler Collection (GCC), there are three such intermediate representations: GENERIC, GIMPLE and register transfer language (RTL). Each representation enables a different set of optimization passes. The backend finally converts the last intermediate representation into the machine-specific assembly code. Introducing the middle-end and intermediate representations is indispensable for minimizing the effort to build new compilers. Suppose that we want to be able to compile programs written in M programming languages, and for each program we want to target N hardware architectures. Had there been no language- and machine-independent intermediate representation, we would then have to develop M × N compilers. With the middle-end and intermediate representations, it becomes possible to just write M frontends and N backends, and get M × N compilers for free!
In this series of notes, we will present the process of writing a GCC backend by explaining the backend code written for the mrmr16 architecture. mrmr16 is a simple ISA with a small number of instructions, which makes it very suitable for demonstration purposes. Before diving into the backend code, in the remainder of this chapter we will take a glance at how a piece of C code is represented during different phases of GCC compilation, and we will lay out the file organization of a GCC backend.
Evolution of a C Source through GCC
Consider the following C code:
#include <stdint.h> #include <stdio.h> int main(int argc, char *argv[]) { int32_t a = 32; int32_t x = 1234; int32_t b = 5678; int32_t z = a * x + b; // int32_t i = 0; for (int32_t i = 0; i < 128; i++) { z = z + 999; } return 0; }
GCC allows us to peek into some of the representations during compilation via developer options. Put the above C code in a file named
main.c
and rungcc -O0 -fdump-tree-all-graph -fdump-final-insns -S main.c
This command dumps the GIMPLE and RTL forms, and the assembly code of
main.c
. Let us take a look at the dumped files to gain some intuition.
GIMPLE
At the frontend, the code in main.c
is scanned and parsed, and a language-specific tree produced. According to GCC Internals, C and C++ frontends directly transform this language-specific frontend tree into language- and machine-independent GIMPLE representation. After some number of passes the GIMPLE representation takes the following form depicted in C-like code as found in the dumped file main.c.252t.optimized
:
;; Function main (main, funcdef_no=0, decl_uid=2410, cgraph_uid=1, symbol_order=0) __attribute__((access ("^1[ ]", ))) int main (int argc, char * * argv) { int32_t i; int32_t z; int32_t b; int32_t x; int32_t a; int D.2422; int _1; int _9; <bb 2> : a_4 = 32; x_5 = 1234; b_6 = 5678; _1 = a_4 * x_5; z_7 = b_6 + _1; i_8 = 0; goto <bb 4>; [INV] <bb 3> : z_11 = z_2 + 999; i_12 = i_3 + 1; <bb 4> : # z_2 = PHI <z_7(2), z_11(3)> # i_3 = PHI <i_8(2), i_12(3)> if (i_3 <= 127) goto <bb 3>; [INV] else goto <bb 5>; [INV] <bb 5> : _9 = 0; <bb 6> : <L3>: return _9; }
By now, the control flow of the program is understood, basic blocks are determined (here indicated by
<bb #>
), and if enabled, some optimizations are applied. While resembling C code, this representation is actually more basic and lower-level, and can be viewed as some kind of pseudo-assembly. Notice the lack of for loop expressions. Also, relatively complex expressions are reduced to simple three-address code statements. For instance,z = a * x + b;
becomes
_1 = a_4 * x_5; z_7 = b_6 + _1;
A control flow graph depiction of the above code, obtained by rendering the dumped Graphviz file
main.c.252t.optimized.dot
, is given below:
RTL
Next, the GIMPLE representation given above gets converted into a sequence of RTL instructions. During the conversion process, GCC uses the machine-specific details provided by the backend to generate these instructions. As such, for each architecture, different RTL representation of a program is generated. Similar to the GIMPLE phase, the initial RTL representation goes through many passes, some of which are optimization passes. main.c.gkd
contains a Lisp-like code representation of the final sequence of RTL instructions generated during the compilation of main.c
for the AMD64 architecture:
;; Function main (main, funcdef_no=0, decl_uid=2410, cgraph_uid=1, symbol_order=0) (note # 0 0 NOTE_INSN_DELETED) (note # 0 0 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn/f # 0 0 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [ S8 A8]) (reg/f:DI 6 bp)) "main.c":4:34# {*pushdi2_rex64} (nil)) (insn/f # 0 0 2 (set (reg/f:DI 6 bp) (reg/f:DI 7 sp)) "main.c":4:34# {*movdi_internal} (nil)) (insn # 0 0 2 (set (mem/v:BLK (scratch:DI) [ A8]) (unspec:BLK [ (mem/v:BLK (scratch:DI) [ A8]) ] UNSPEC_MEMORY_BLOCKAGE)) "main.c":4:34# {*memory_blockage} (nil)) (note # 0 0 NOTE_INSN_PROLOGUE_END) (insn # 0 0 2 (set (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -36 [0xffffffffffffffdc])) [ argc+0 S4 A32]) (reg:SI 5 di [ argc ])) "main.c":4:34# {*movsi_internal} (nil)) (insn # 0 0 2 (set (mem/f/c:DI (plus:DI (reg/f:DI 6 bp) (const_int -48 [0xffffffffffffffd0])) [ argv+0 S8 A64]) (reg:DI 4 si [ argv ])) "main.c":4:34# {*movdi_internal} (nil)) (note # 0 0 NOTE_INSN_FUNCTION_BEG) (insn # 0 0 2 (set (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -12 [0xfffffffffffffff4])) [ a+0 S4 A32]) (const_int 32 [0x20])) "main.c":5:13# {*movsi_internal} (nil)) (insn # 0 0 2 (set (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -8 [0xfffffffffffffff8])) [ x+0 S4 A64]) (const_int 1234 [0x4d2])) "main.c":6:13# {*movsi_internal} (nil)) (insn # 0 0 2 (set (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -4 [0xfffffffffffffffc])) [ b+0 S4 A32]) (const_int 5678 [0x162e])) "main.c":7:13# {*movsi_internal} (nil)) (insn # 0 0 2 (set (reg:SI 0 ax [85]) (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -12 [0xfffffffffffffff4])) [ a+0 S4 A32])) "main.c":8:19# {*movsi_internal} (nil)) (insn # 0 0 2 (parallel [ (set (reg:SI 0 ax [85]) (mult:SI (reg:SI 0 ax [85]) (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -8 [0xfffffffffffffff8])) [ x+0 S4 A64]))) (clobber (reg:CC 17 flags)) ]) "main.c":8:19# {*mulsi3_1} (nil)) (insn # 0 0 2 (set (reg:SI 1 dx [orig:82 _1 ] [82]) (reg:SI 0 ax [85])) "main.c":8:19# {*movsi_internal} (nil)) (insn # 0 0 2 (set (reg:SI 0 ax [89]) (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -4 [0xfffffffffffffffc])) [ b+0 S4 A32])) "main.c":8:13# {*movsi_internal} (nil)) (insn # 0 0 2 (parallel [ (set (reg:SI 0 ax [88]) (plus:SI (reg:SI 0 ax [89]) (reg:SI 1 dx [orig:82 _1 ] [82]))) (clobber (reg:CC 17 flags)) ]) "main.c":8:13# {*addsi_1} (expr_list:REG_EQUAL (plus:SI (reg:SI 1 dx [orig:82 _1 ] [82]) (mem/c:SI (plus:DI (reg/f:DI 19 frame) (const_int -4 [0xfffffffffffffffc])) [ b+0 S4 A32])) (nil))) (insn # 0 0 2 (set (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -20 [0xffffffffffffffec])) [ z+0 S4 A32]) (reg:SI 0 ax [88])) "main.c":8:13# {*movsi_internal} (nil)) (insn # 0 0 2 (set (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -16 [0xfffffffffffffff0])) [ i+0 S4 A64]) (const_int 0 [0])) "main.c":10:18# {*movsi_internal} (nil)) (jump_insn # 0 0 2 (set (pc) (label_ref #)) "main.c":10:5# {jump} (nil) -> 2) (barrier # 0 0) (code_label # 0 0 3 3 (nil) [1 uses]) (note # 0 0 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn # 0 0 3 (parallel [ (set (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -20 [0xffffffffffffffec])) [ z+0 S4 A32]) (plus:SI (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -20 [0xffffffffffffffec])) [ z+0 S4 A32]) (const_int 999 [0x3e7]))) (clobber (reg:CC 17 flags)) ]) "main.c":11:11# {*addsi_1} (nil)) (insn # 0 0 3 (parallel [ (set (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -16 [0xfffffffffffffff0])) [ i+0 S4 A64]) (plus:SI (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -16 [0xfffffffffffffff0])) [ i+0 S4 A64]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) "main.c":10:35# {*addsi_1} (nil)) (code_label # 0 0 4 2 (nil) [1 uses]) (note # 0 0 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn # 0 0 4 (set (reg:CCGC 17 flags) (compare:CCGC (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -16 [0xfffffffffffffff0])) [ i+0 S4 A64]) (const_int 127 [0x7f]))) "main.c":10:27# {*cmpsi_1} (nil)) (jump_insn # 0 0 4 (set (pc) (if_then_else (le (reg:CCGC 17 flags) (const_int 0 [0])) (label_ref #) (pc))) "main.c":10:27# {*jcc} (nil) -> 3) (note # 0 0 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn # 0 0 5 (set (reg:SI 0 ax [orig:83 _9 ] [83]) (const_int 0 [0])) "main.c":13:12# {*movsi_internal} (nil)) (insn # 0 0 5 (use (reg/i:SI 0 ax)) "main.c":14:1# (nil)) (note # 0 0 NOTE_INSN_EPILOGUE_BEG) (insn # 0 0 5 (set (mem/v:BLK (scratch:DI) [ A8]) (unspec:BLK [ (mem/v:BLK (scratch:DI) [ A8]) ] UNSPEC_MEMORY_BLOCKAGE)) "main.c":14:1# {*memory_blockage} (nil)) (insn/f # 0 0 5 (set (reg/f:DI 6 bp) (mem:DI (post_inc:DI (reg/f:DI 7 sp)) [ S8 A8])) "main.c":14:1# {*popdi1} (expr_list:REG_CFA_DEF_CFA (plus:DI (reg/f:DI 7 sp) (const_int 8 [0x8])) (nil))) (jump_insn # 0 0 5 (simple_return) "main.c":14:1# {simple_return_internal} (nil) -> simple_return) (barrier # 0 0) (note # 0 0 NOTE_INSN_DELETED)
A very important property of this final form is that there is a one-to-one correspondence between this sequence of RTL instructions and the emitted assembly instructions, as we will demonstrate next.
Assembly Code
The final step in compilation is to emit assembly instructions from the aforementioned final sequence of RTL instructions. main.s
contains the assembly code produced as a result of compiling main.c
, targeting AMD64:
.file "main.c" .text .globl main .type main, @function main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl %edi, -36(%rbp) movq %rsi, -48(%rbp) movl $32, -12(%rbp) movl $1234, -8(%rbp) movl $5678, -4(%rbp) movl -12(%rbp), %eax imull -8(%rbp), %eax movl %eax, %edx movl -4(%rbp), %eax addl %edx, %eax movl %eax, -20(%rbp) movl $0, -16(%rbp) jmp .L2 .L3: addl $999, -20(%rbp) addl $1, -16(%rbp) .L2: cmpl $127, -16(%rbp) jle .L3 movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (GNU) 12.2.0" .section .note.GNU-stack,"",@progbits
We have previously mentioned the property of correspondence between RTL and the assembly code. Now, as an example, consider the RTL instruction at lines 35–38 in the previously given RTL representation. This is mapped directly to the instruction at line 17 in the above assembly code. For now, compare the RTL instruction and the assembly instruction, and try to see the semantic similarities between them. The RTL representation will become more meaningful after reading the next chapter.
File Organization of a GCC Backend
Assuming we are at the root directory of GCC source tree, a GCC backend <target>
is placed in the gcc/config/<target>/
directory. In this directory, at least two files must be provided:
<target>.md
: Contains a set of patterns that describe the instructions supported by<target>
. Written in a language akin to RTL.<target>.h
: Contains a set of preprocessor macro definitions that inform GCC about the properties of<target>
such as the number of general purpose registers, the size of memory addresses etc. Usually accompanied with the source file<target>.c
.
With these files we are essentially defining an abstract machine that has the same properties as our real machine. Using the information provided in these files, GCC is able to create a mapping from GIMPLE to <target>
assembly.
GCC uses Automake as its build system. In order to make the backend <target>
visible to Automake, following build configuration files need to be modified:
gcc/config.gcc
config.sub
configure.ac
In our case, for the mrmr16 backend, <target>
will be mrmr16
. You can find the full source code of the mrmr16 backend in this repository.