TAC Three address code Three-address code (tac) will be the intermed iate representation used in our Decaf compiler. It is essentially a generic assembly language that falls in the lower-end of the mid-level IRs. Many compilers use an IR similar to TAC. It is a sequence of instructions, each of which can have at most three operands. The operands could be two operands to a binary arithmetic op and the third the result location, or an operand to compare to zero and a second location to branch to, and so on. For example, here is an arithmetic expression and its corresponding TAC translation t1=b*c t3=t1+t2 a t3 Notice the use of temp variables created by the compiler as needed to keep the number of operands down to three. Of course, it's a little more complicated than the above example, because we have to translate branching and looping instructions, as well as subprogram calls. Here is an example of the tac for a branching translation for an if-statement if (a< b+ c) t1 = b+c t2 = a< t1 工fZt2Goto0 L0:t4=b*c; And here is an example of the translation involving a function call and array access Int n Binky(arr [n] t1=4 t2=t1*n; t3=arr+ t2: LCall Binky(t4) Decaf TAc instruction formats The convention followed in the examples below is that tl, t2, and so on refer to variables(either from source program or temporaries)and Ll, L2, etc are used for
TAC Three address code Three-address code (TAC) will be the intermediate representation used in our Decaf compiler. It is essentially a generic assembly language that falls in the lower-end of the mid-level IRs. Many compilers use an IR similar to TAC. It is a sequence of instructions, each of which can have at most three operands. The operands could be two operands to a binary arithmetic op and the third the result location, or an operand to compare to zero and a second location to branch to, and so on. For example, here is an arithmetic expression and its corresponding TAC translation: a = b * c + b * d t1 = b * c; t2 = b * d; t3 = t1 + t2; a = t3; Notice the use of temp variables created by the compiler as needed to keep the number of operands down to three. Of course, it's a little more complicated than the above example, because we have to translate branching and looping instructions, as well as subprogram calls. Here is an example of the TAC for a branching translation for an if-statement: if (a < b + c) a = a - c; c = b * c; t1 = b + c; t2 = a < t1; IfZ t2 Goto L0; t3 = a - c; a = t3; L0: t4 = b * c; c = t4; And here is an example of the translation involving a function call and array access: int n; n = ReadInteger(); Binky(arr[n]); n = LCall _ReadInteger(); t1 = 4; t2 = t1 * n; t3 = arr + t2; t4 = *(t3); LCall _Binky(t4); Decaf TAC instruction formats The convention followed in the examples below is that t1, t2, and so on refer to variables (either from source program or temporaries) and L1, L2, etc. are used for
labels. Labels are attached to the instruction that serve as targets for goto/branch instructions and are used to identify function/method definitions and vtables Variable declarations Assignment tl ="abcdefg (rvalue can be variable, or string/int constant) Arithmetic: t3=t2/t1 t3=t28t1 t2=-t1; Relationalequality/logical 3=t2==t1 3=t2<t1 t3=t2&&t1 t3=t2||t1; (not all relational operators are present, must synthesize others using the primitives Labels and branchess Goto ll 工fzt1 Goto I1; ifNz tl Goto Ll: (branch test is zero/non-zero Function/method calls Lca11L1(t1,t2,t3); t1 LCall Ll(tl) (different for void vs non-void return) ACa11t1(t2,t3); t0=ACa11t1(); (LCall used for a function with label known at compile-time, ACall for a computed function address, most likely from object vtable) Function definitions BeginFuncWithParams tl, t2 (list can be empty if no params Return tl;
labels. Labels are attached to the instruction that serve as targets for goto/branch instructions and are used to identify function/method definitions and vtables. Variable declarations: Var t1; Assignment: t2 = t1; t1 = "abcdefg"; t1 = 8; (rvalue can be variable, or string/int constant) Arithmetic: t3 = t2 + t1; t3 = t2 - t1; t3 = t2 * t1; t3 = t2 / t1; t3 = t2 % t1; t2 = -t1; Relational/equality/logical: t3 = t2 == t1; t3 = t2 < t1; t3 = t2 && t1; t3 = t2 || t1; t2 = !t1; (not all relational operators are present, must synthesize others using the primitives available) Labels and branches: L1: Goto L1; IfZ t1 Goto L1; ifNZ t1 Goto L1; (branch test is zero/non-zero) Function/method calls: LCall L1(t1, t2, t3); t1 = LCall L1(t1); (different for void vs non-void return) ACall t1(t2, t3); t0 = ACall t1(); (LCall used for a function with label known at compile-time, ACall for a computed function address, most likely from object vtable) Function definitions: BeginFuncWithParams t1, t2; (list can be empty if no params) EndFunc; Return t1; Return;
Memory references t1=*(t2+8) *(t1+-4)=t2 (optional offset must be integer constant, can be positive or negative) Array indexing To access arr[5], add offset multiplied by elem size to base and deref Object fields, method dispatch To access ivars. add offset to base. deref To call method retrieve function address from vtable. use ACall Miscellaneous: (used for runtime errors) Data specification VTable ClassName Ll, L2, Here is an example of a simple Decaf program and its TAC translation void main(( c= a+ b Print(c) BeginFuncWithParams Var ci Var to Var t1 Var t2 t2=a + b LCall PrintInt(c)i What we have to do is figure out how to get from one to the other as we parse. This includes not only generating the TAC, but figuring out the use of temp variables
Memory references: t1 = *(t2); t1 = *(t2 + 8); *(t1) = t2; *(t1 + -4) = t2; (optional offset must be integer constant, can be positive or negative) Array indexing: To access arr[5], add offset multiplied by elem size to base and deref Object fields, method dispatch: To access ivars, add offset to base, deref To call method, retrieve function address from vtable, use ACall Miscellaneous: Halt; (used for runtime errors) Data specification: VTable ClassName = L1, L2, ...; Here is an example of a simple Decaf program and its TAC translation: void main() { int a; int b; int c; a = 1; b = 2; c = a + b; Print(c); } main: BeginFuncWithParams; Var a; Var b; Var c; Var _t0; _t0 = 1; a = _t0; Var _t1; _t1 = 2; b = _t1; Var _t2; _t2 = a + b; c = _t2; LCall _PrintInt(c); EndFunc; What we have to do is figure out how to get from one to the other as we parse. This includes not only generating the TAC, but figuring out the use of temp variables
creating labels, calling functions, etc. Since we have a lot to do, we will make the mechanics of generating the TAC as easy as possible. In our parser, we will create the TAC instructions one at a time. We can immediately print them out or store them for further processing. We will simplify the Decaf language a little by excluding doubles for code generation and internally treating bools as 4-byte integers. Classes, arrays, and strings will be implemented with 4-byte pointers. This means we only ever need to deal with 4-byte integer/pointer variables As each production is reduced, we will create the necessary instructions. This strategy makes our code-generation a bit limited--particularly for the way we would have to do switch statements-but we can translate more-or-less any language structure into an executable program in a single pass, without needing to go back and edit anything which is pretty convenient. To see how a syntax-directed translation can generate TAC, we need to look at the derivation, and figure out where the different TaC statements should be generated as the productions are reduced Lets start with a trivial program void main() Print (hello world") to = hello world call Printstring( to)i Endfunc Notice that we call the library function labelled Print String to do the actual printing Library functions are called like any ordinary global function, but the code for the definition is provided by the compiler as part of linking with the standard language libraries. Here is the derivation of the source program. The trick is to identify where and what processing occurs as these productions are reduced to generate the given TAC Dec⊥List-> ype - Void Formals - Constant - string Constant E -> Constant 4 ExprList-> Expr Printstmt - Print ExprList stmt - printstmt stmtlist - stmtlist stmt StmtBlock - Stmtlist 1 FunctionDefn - Type identifier Formals StmtBlock Decl -> Functiondefn
creating labels, calling functions, etc. Since we have a lot to do, we will make the mechanics of generating the TAC as easy as possible. In our parser, we will create the TAC instructions one at a time. We can immediately print them out or store them for further processing. We will simplify the Decaf language a little by excluding doubles for code generation and internally treating bools as 4-byte integers. Classes, arrays, and strings will be implemented with 4-byte pointers. This means we only ever need to deal with 4-byte integer/pointer variables. As each production is reduced, we will create the necessary instructions. This strategy makes our code-generation a bit limited—particularly for the way we would have to do switch statements—but we can translate more-or-less any language structure into an executable program in a single pass, without needing to go back and edit anything, which is pretty convenient. To see how a syntax-directed translation can generate TAC, we need to look at the derivation, and figure out where the different TAC statements should be generated as the productions are reduced. Let’s start with a trivial program: void main() { Print("hello world"); } main: BeginFuncWithParams; Var _t0; _t0 = "hello world"; LCall _PrintString(_t0); EndFunc; Notice that we call the library function labelled _PrintString to do the actual printing. Library functions are called like any ordinary global function, but the code for the definition is provided by the compiler as part of linking with the standard language libraries. Here is the derivation of the source program. The trick is to identify where and what processing occurs as these productions are reduced to generate the given TAC: DeclList -> Type -> Void Formals -> StmtList -> Constant -> stringConstant Expr -> Constant 4 ExprList-> Expr PrintStmt -> Print ( ExprList ) Stmt -> PrintStmt ; StmtList -> StmtList Stmt StmtBlock -> { StmtList } FunctionDefn -> Type identifier ( Formals ) StmtBlock Decl -> FunctionDefn
Decllist - DeclList Decl Here is another simple program with a bit more complex expression t1 t0+a; LCall PrintInt(a)i Endell Here is the derivation. again, consider where the instructions above must be emitted relative to the parsing stmtlist -> optReceiver - OptReceiver identifier Constant - incOnstant OptReceiver LValue - OptReceiver identifier E simpl Stmt - Simplestmt stmtlist ->stmtlist stmt
DeclList -> DeclList Decl Program -> DeclList Here is another simple program with a bit more complex expression: void main() { int a; a = 2 + a; Print(a); } main: BeginFuncWithParams; Var a; Var _t0; _t0 = 2; Var _t1; _t1 = _t0 + a; a = _t1; LCall _PrintInt(a); EndFunc; Here is the derivation. Again, consider where the instructions above must be emitted relative to the parsing activity: DeclList -> Type -> void Formals -> StmtList -> Type -> int Variable -> Type identifier VariableDecl -> Variable ; Stmt -> VariableDecl StmtList -> StmtList Stmt OptReceiver -> LValue -> OptReceiver identifier Constant -> intConstant Expr -> Constant OptReceiver -> LValue -> OptReceiver identifier Expr -> LValue Expr -> Expr + Expr SimpleStmt -> LValue = Expr Stmt -> SimpleStmt ; StmtList -> StmtList Stmt OptReceiver -> LValue -> OptReceiver identifier Expr -> LValue