purportedly optimized, but at least with the small sample that we have, it looks as if each could be better. we begin with three parallel walkdowns Notes as needed are provided below SPARC character from m→D# Set(byte) DIGIT to 0 Set(byte) DIGit to 0 Is(D3-0)<=0? Is( source-0”)<=0? character from Mfi outl If branch to label zero If < branch to Zero *source-/)<=0? Is(D3-9)<=0? Is (*source-9)<=0? If <= branch to ZeRo If <= branch to label one If neither, branch to zer Add(byte)1 to DIGIt If neither, branch to ZERO: ZERO Add 1 to digit Put a longword 0 in DO ZERO Branch to label done ONE Put a longword I in DO DONE. Put value in do into digit Notes: 1. Moving the character into register to compare it with and a. The first 68000 line moves the next character as a byte into register D3. The other 3 bytes will be ignored in the byte operations. Remember that the program had already moved the pointer to the string into A4 b. The SPARC does the same sort of thing with a pointer in %02, except with the difference that it extends the byte to a longword Sign extension simply copies the sign-bit into the higher-order effectively making 3E into 0000 003E or C2 into FFFF FFC2. That is what the mnemonic means LoaD Signed Byte. C. The VAX compiler takes a totally different approach-a rather poor one, actually. It leaves not only the byte in memory but even the pointer to the byte. Thus, every time it wants to refer to the byte-and it does so numerous times-it must first fetch the address from memory and then fetch the byte itself. This double memory reference is what @4(ap) means: At the address you will find at address 4(ap). The only thing that makes all this apparent coming and going even remotely acceptable is that the VAX will cache(place in a fast local memory) both the address and the character the first time that it gets them. Then, it can refer to them rapidly again. Cache references, however, are not as fast as 2. Testing the character: The next line(3rd for the SPARC)does a comparison between 48 (or 47)and the character. Compare instruction which subtracts one operand from the other, but instead of putting the results some where, it stores only the facts on whether the operation delivered a negative number or zero or resulte in either an overflow or a carry. These are stored in flags, single bits associated with the arithmetic unit The bits can contain only one result at a time. The 68000 and VAX must test immediately after the comparison or they will lose the bits. The SPARC changes the bits only when the instruction says so (the CC on the instruction-"change condition codes"). Thus, the subtraction can be remote from the test-and-branch The SPARC is explicit about where to store the subtraction-in %g0. %g0 is a pseudo-register. It is always a 0 as a source and is a garbage can as a destination. The availability of the 0 and the garbage can serves all the same functions that the special instructions for zeros and comparisons do on the CISCs. 3. The differences in the algorithm to do the tests: There are two different paradigms expressed in these three examples. One says: Figure out which you want to do and then do that one thing. The other says: First set the result false and then if you should set it true. While the second version would seem to do a lot of unnecessary settings to zero, the other algorithm will execute one less branch. That would make it roughly equivalent e 2000 by CRC Press LLC
© 2000 by CRC Press LLC purportedly optimized, but at least with the small sample that we have, it looks as if each could be better. We begin with three parallel walkdowns. Notes as needed are provided below. MC68000 VAX SPARC character from M Æ D# Set (byte) DIGIT to 0 Set (byte) DIGIT to 0 Is (D3-'0') <=0? Is (*source-'0') <=0? character from Mfi out1 If <, branch to label ZERO If <, branch to ZERO Is (*source-'/') <=0? Is (D3-'9')<=0? Is (*source-'9') <=0? If <=, branch to ZERO If <=. branch to label ONE If neither, branch to ZERO Is (*source-'9') <=0? Add (byte) 1 to DIGIT If neither, branch to ZERO: ZERO: Add 1 to DIGIT Put a longword 0 in D0 ZERO: Branch to label DONE ONE: Put a longword 1 in D0 DONE: Put value in D0 into DIGIT Notes: 1. Moving the character into register to compare it with ‘0’ and ‘9’: a. The first 68000 line moves the next character as a byte into register D3. The other 3 bytes will be ignored in the byte operations. Remember that the program had already moved the pointer to the string into A4. b. The SPARC does the same sort of thing with a pointer in %o2, except with the difference that it signextends the byte to a longword. Sign extension simply copies the sign-bit into the higher-order bits, effectively making 3E into 0000 003E or C2 into FFFF FFC2. That is what the mnemonic means: “LoaD Signed Byte.” c. The VAX compiler takes a totally different approach—a rather poor one, actually. It leaves not only the byte in memory but even the pointer to the byte. Thus, every time it wants to refer to the byte—and it does so numerous times—it must first fetch the address from memory and then fetch the byte itself. This double memory reference is what @4(ap) means: “At the address you will find at address 4(ap).” The only thing that makes all this apparent coming and going even remotely acceptable is that the VAX will cache (place in a fast local memory) both the address and the character the first time that it gets them. Then, it can refer to them rapidly again. Cache references, however, are not as fast as register references. 2. Testing the character: The next line (3rd for the SPARC) does a comparison between 48 (or 47) and the character. Compare is an instruction which subtracts one operand from the other, but instead of putting the results somewhere, it stores only the facts on whether the operation delivered a negative number or zero or resulted in either an overflow or a carry. These are stored in flags, single bits associated with the arithmetic unit. The bits can contain only one result at a time. The 68000 and VAX must test immediately after the comparison or they will lose the bits. The SPARC changes the bits only when the instruction says so (the CC on the instruction — “change condition codes”). Thus, the subtraction can be remote from the test-and-branch. The SPARC is explicit about where to store the subtraction—in %g0. %g0 is a pseudo-register. It is always a 0 as a source and is a garbage can as a destination. The availability of the 0 and the garbage can serves all the same functions that the special instructions for zeros and comparisons do on the CISCs. 3. The differences in the algorithm to do the tests: There are two different paradigms expressed in these three examples. One says: “Figure out which thing you want to do and then do that one thing.” The other says: “First set the result false and then figure out if you should set it true.” While the second version would seem to do a lot of unnecessary settings to zero, the other algorithm will execute one less branch. That would make it roughly equivalent
8 sign 2 As block FIGURE 87.7 The stack area of the 68000's memory and the register assignments that the Think C compiler made with global optimization turned of. The stack is shown just after the MOVEM instruction. The items in bold are as they would be after that instruction. while the registers all hold longwords, in typical PC/Macintosh compilers, integers are defined as words. This figure is the programmatic successor to Fig. 87.6. However, the 68000 algorithm is definitely longer---uses more memory for code. That is not really much of an issue, but why put the result first into a temporary register and then where you really want it? Compiler Optimization and assembly language Compiler Operations To understand the optimizing features of compilers and their relation to assembly language, it is best to understand some of the chores for the compiler. This section examines variable allocation and how it can be optimized, and the optimization task of constant expression removal. Examples of how compilers perfor these operations are taken from the various systems used in the article Variable Allocation. Variables in high-level languages are an abstraction of memory cells or locations. One of the compiler's tasks is to assign the abstract variables into physical locations--either registers within the processor or locations within memory. Assignment strategies vary, but an easy and often-used strategy is to place all variables in memory. Easy, indeed, but wasteful of execution time in that it requires memory fetches all HLL variables. Another assignment strategy is to assign as many variables to the registers as possible and then assign any remaining variables to memory; this method is typically sufficient, except when there is a limited number of registers, such as in the 68000. In these cases, the best assignment strategy is to assign registers to the variables which have the greatest use and then assign any remaining variables to memory. In examining the compilers and architecture used in this article, we find examples of all these methods. In the unoptimized mode, VAX and Sparc compilers are among the many which take the easy approach and assign variables only to memory locations. In Figs. 87.6 and 87. 7, the variable assignments are presented for the unoptimized and optimized options. Note that only one or two registers are used, both as scratch pads, in the unoptimized option, whereas the optimization assigns registers to all variables. The expected execution time savings is approximately 42 of the 50 memory references per loop iteration. That does not include additional savings caused by compact code Detailed comparisons are not presented since the interpretation of architectural comparisons is highly subjective Unlike the VAX and Sparc compilers, the 68000 compiler assigns variables to registers in both the un mized nd unoptimized options; these assignments are depicted in Figs. 87.7 and 87. 8. Since there are only eight general-purpose data registers in the 68000 and two are assigned as scratch pads, only six of the programs ten e 2000 by CRC Press LLC
© 2000 by CRC Press LLC However, the 68000 algorithm is definitely longer—uses more memory for code. That is not really much of an issue, but why put the result first into a temporary register and then where you really want it? Compiler Optimization and Assembly Language Compiler Operations To understand the optimizing features of compilers and their relation to assembly language, it is best to understand some of the chores for the compiler. This section examines variable allocation and how it can be optimized, and the optimization task of constant expression removal. Examples of how compilers perform these operations are taken from the various systems used in the article. Variable Allocation. Variables in high-level languages are an abstraction of memory cells or locations. One of the compiler’s tasks is to assign the abstract variables into physical locations—either registers within the processor or locations within memory. Assignment strategies vary, but an easy and often-used strategy is to place all variables in memory. Easy, indeed, but wasteful of execution time in that it requires memory fetches for all HLL variables. Another assignment strategy is to assign as many variables to the registers as possible and then assign any remaining variables to memory; this method is typically sufficient, except when there is a limited number of registers, such as in the 68000. In these cases, the best assignment strategy is to assign registers to the variables which have the greatest use and then assign any remaining variables to memory. In examining the compilers and architecture used in this article, we find examples of all these methods. In the unoptimized mode, VAX and Sparc compilers are among the many which take the easy approach and assign variables only to memory locations. In Figs. 87.6 and 87.7, the variable assignments are presented for the unoptimized and optimized options. Note that only one or two registers are used, both as scratch pads, in the unoptimized option, whereas the optimization assigns registers to all variables. The expected execution time savings is approximately 42 of the 50 memory references per loop iteration. That does not include additional savings caused by compact code. Detailed comparisons are not presented since the interpretation of architectural comparisons is highly subjective. Unlike the VAX and Sparc compilers, the 68000 compiler assigns variables to registers in both the unoptimized and unoptimized options; these assignments are depicted in Figs. 87.7 and 87.8. Since there are only eight general-purpose data registers in the 68000 and two are assigned as scratch pads, only six of the program’s ten FIGURE 87.7 The stack area of the 68000’s memory and the register assignments that the Think C® compiler made with global optimization turned off. The stack is shown just after the MOVEM instruction. The items in bold are as they would be after that instruction. While the registers all hold longwords, in typical PC/Macintosh compilers, integers are defined as words. This figure is the programmatic successor to Fig. 87.6