ees=((source==E)Il (source=='e))&& inside & llatche; CMPLB #S45A4);E”“ compare immediate literal hex 45 BEQ branch if equal to label first CMPL. B #S65,(A4);'e’“ compare immediate” literal hex65, what A4 points at BNE branch if not equal to label second first: “ test word”( subtract0)D6( inside) BEQ second “ branch if equal to label second “ test word”( subtract0)D3( lathe) BEC branch if equal to label third MOVEQ #00,D0 “ move quick literal 0 to Do BRA branch always to label fourth third MO #$0L,D0 move quick literal 1 to Do fourth: D0,-6(A6 “ move word” from Do to -6(FP) There are all sorts of little details in this short example. For example, a common way to indicate a comment is to start with a". The assembler will ignore your comments. The #" indicates a literal, and the"$ that the literal is written in hexadecimal notation. The VAX would use #Ax to express the same idea. Compare"means subtract but save only the condition codes of the result"(wor overflow, n or negative, z or zero, and c or carry) Thus, the first two lines do a subtraction of whatever A4 is pointing at(source) from the ASCII value for 'E and then, if the two were equal(the result, zero), the program jumps to line 5. If*source is not E, then it simply goes to the next line, line 3. The instruction, TST W D6, is quite equivalent to CMPI W D6,#O, but the TST instruction is inherently shorter and faster On a SPARC, where it would be neither shorter nor faster, TST does not exist Exactly what the assembler or linker does to replace the label references with proper addresses, while interesting, is not particularly germane to our current topic. Note that the range of the branch is somewhat limited. In the 68000, the maximum branch is +32K and in the VAX a mere +127 to-128 If you need to go further, you must combine a branch with a jump. For example, if you were doing BEQ farlabel, you would BNE nearlabel Jmp farlabel this instruction can go any distance nearlabel Follow through the example above until the short steps of logic and the addressing modes are clear. Then progress to the next section where we use the addressing modes to introduce the general topic of subroutine calling conventions Whenever you invoke a subroutine in a HLL, the calling routine(caller)must pass to the called routine( callee ne parameters that the subroutine requires. These parameters are defined at compile time to be either pass- by-value or pass-by-pointer (or pass-by-reference), and they are listed in some particular order. The convention for passing the parameters varies from architecture to architecture and hll to hll, but basically it always onsists of building a call block which contains all of the parameters and which will be found where the recipient expects to find it. Along with the passing of parameters, for each system, a convention is defined for register and stack use which establishes. e 2000 by CRC Press LLC
© 2000 by CRC Press LLC ;ees = ((*source=='E') || (*source=='e')) && inside && !latche; CMPI.B #$45,(A4) ; 'E' “compare immediate” literal hex 45, what A4 points at BEQ first ; “branch if equal” to label first CMPI.B #$65,(A4) ; 'e' “compare immediate” literal hex 65, what A4 points at BNE second ; “branch if not equal” to label second first: TST.W D6 ; “test word” (subtract 0) D6 (‘inside’) BEQ second ; “branch if equal” to label second TST.W D3 ; “test word” (subtract 0) D3 (‘latche’) BEQ third ; “branch if equal” to label third second: MOVEQ #00,D0 ; “move quick” literal 0 to D0 BRA fourth ; “branch always” to label fourth third: MOVEQ #$01,D0 ; “move quick” literal 1 to D0 fourth: MOVE.W D0,-6(A6) ; “move word” from D0 to -6(FP) There are all sorts of little details in this short example. For example, a common way to indicate a comment is to start with a “;”. The assembler will ignore your comments. The “#” indicates a literal, and the “$” that the literal is written in hexadecimal notation. The VAX would use #^x to express the same idea. “Compare” means “subtract but save only the condition codes of the result” (v or overflow, n or negative, z or zero, and c or carry). Thus, the first two lines do a subtraction of whatever A4 is pointing at (*source) from the ASCII value for ‘E’ and then, if the two were equal (the result, zero), the program jumps to line 5. If *source is not ‘E’, then it simply goes to the next line, line 3. The instruction, TST.W D6, is quite equivalent to CMPI.W D6, #0, but the TST instruction is inherently shorter and faster. On a SPARC, where it would be neither shorter nor faster, TST does not exist. Exactly what the assembler or linker does to replace the label references with proper addresses, while interesting, is not particularly germane to our current topic. Note that the range of the branch is somewhat limited. In the 68000, the maximum branch is ±32K and in the VAX a mere +127 to –128. If you need to go further, you must combine a branch with a jump. For example, if you were doing BEQ farlabel, you would instead do: BNE nearlabel jmp farlabel ; this instruction can go any distance nearlabel: Follow through the example above until the short steps of logic and the addressing modes are clear. Then progress to the next section where we use the addressing modes to introduce the general topic of subroutine calling conventions. Calling Conventions Whenever you invoke a subroutine in a HLL, the calling routine (caller) must pass to the called routine (callee) the parameters that the subroutine requires. These parameters are defined at compile time to be either passby-value or pass-by-pointer (or pass-by-reference), and they are listed in some particular order. The convention for passing the parameters varies from architecture to architecture and HLL to HLL, but basically it always consists of building a call block which contains all of the parameters and which will be found where the recipient expects to find it. Along with the passing of parameters, for each system, a convention is defined for register and stack use which establishes:
Which registers must be returned from callee to caller with the same contents that the callee received (such registers are said to be preserved across a call) Which registers may be used without worrying about their contents(such registers are called scratch registers) Where the return address is to be found Where the value returned by a function will be found The convention may be supported by hardware or simply a gentlemanly rule of the road. However the rules come into being, they define the steps which must be accomplished coming into and out of a subroutine. The whole collection of such rules forms the calling convention for that machine. In this section, we look at our three different machines to see how all accomplish the same tasks but by rather different mechanisms The two CISCs do almost all of their passing and saving on the stack. The call block will be built on the stack; the return address will be put on the stack; saved registers will be put on the stack. Only a few stack references are passed forward in register; the value returned by the function will be passed back in register How different is the SPARC! The parameters to be passed are placed in the out registers(six are available for this purpose). Only the overflow, if any, would go on the stack. In general, registers are saved by window- blinding rather than moving them to the stack. On return, data is returned in the in registers and the registers restored by reverse window-blinding MC68000 Call and Return. Let us look at the details for two of the machines We start with the 68000, because that is the most open and"conventional. We continue with the function Number Count. Only a single parameter must be passed-the pointer to the text block. The HLL callee sees Number Count(block) as an integer(i.e what will be returned), but the assembly program must do a call and then use the returned integer as instructed. a typical assembly MOVE.L A2,-(SP) pointer to block onto the stack JSR Numbercount return address on the stack and start i executing Numbercount i do something with value returned in Do the,t instruction puts the pointer to the block, which is in A2, on the stack. It first must make room, so in-(A7)first subtracts 4 from A7(making room for the longword) and then moves the longword into e space pointed to by the now-modified A7. The one instruction does two things: the decrementing of SP and the storing of the longword in memory. MOVE.L A2, -(A7) A7 A7-4 ;A7= sP M(A7) A2 M(x)= memory( address x) The next instruction, jump subroutine(Sr), does three things. It decrements SP(i.e, A7)by 4, stores the return address on the top of the stack, and puts the address of Number Count in the program counter. We have just introduced two items which need specific definition: Return address(RA): This will always be the address of the instruction which the callee should return to In the 68000 and the VAX (and all other CISCs), the ra points to the first instruction after the JSR. In the SPARC and almost any RISC, RA will point to the second instruction after JSR. That curious difference will be discussed later. Program counter(PC): This register(which is a general register on the VAX but a special register on the other machines) points to the place(memory location) in the machine language ruction stream where the program is currently operating. As each instruction is ed, the PC is automatically incremented. The action of the jSr is to save the ersion of the PC--the one for the next fetch-and replace it with the starting address of the routine to be jumped to e 2000 by CRC Press LLC
© 2000 by CRC Press LLC • Which registers must be returned from callee to caller with the same contents that the callee received (such registers are said to be preserved across a call) • Which registers may be used without worrying about their contents (such registers are called scratch registers) • Where the return address is to be found • Where the value returned by a function will be found The convention may be supported by hardware or simply a gentlemanly rule of the road. However the rules come into being, they define the steps which must be accomplished coming into and out of a subroutine. The whole collection of such rules forms the calling convention for that machine. In this section, we look at our three different machines to see how all accomplish the same tasks but by rather different mechanisms. The two CISCs do almost all of their passing and saving on the stack. The call block will be built on the stack; the return address will be put on the stack; saved registers will be put on the stack. Only a few stack references are passed forward in register; the value returned by the function will be passed back in register. How different is the SPARC! The parameters to be passed are placed in the out registers (six are available for this purpose). Only the overflow, if any, would go on the stack. In general, registers are saved by windowblinding rather than moving them to the stack. On return, data is returned in the in registers and the registers restored by reverse window-blinding. MC68000 Call and Return. Let us look at the details for two of the machines. We start with the 68000, because that is the most open and “conventional.” We continue with the function NumberCount. Only a single parameter must be passed—the pointer to the text block. The HLL callee sees NumberCount(block) as an integer (i.e., what will be returned), but the assembly program must do a call and then use the returned integer as instructed. A typical assembly routine would be: MOVE.L A2,-(SP) ; move pointer to block onto the stack JSR NumberCount ; save return address on the stack and start ; executing NumberCount ; do something with value returned in D0 The first instruction puts the pointer to the block, which is in A2, on the stack. It first must make room, so the “–” in –(A7) first subtracts 4 from A7 (making room for the longword) and then moves the longword into the space pointed to by the now-modified A7. The one instruction does two things: the decrementing of SP and the storing of the longword in memory. MOVE.L A2,–(A7) A7 ‹ A7–4 ;A7 = SP M(A7) ‹ A2 ;M(x) = memory(address x) The next instruction, jump subroutine (JSR), does three things. It decrements SP (i.e., A7) by 4, stores the return address on the top of the stack, and puts the address of NumberCount in the program counter. We have just introduced two items which need specific definition: Return address (RA): This will always be the address of the instruction which the callee should return to. In the 68000 and the VAX (and all other CISCs), the RA points to the first instruction after the JSR. In the SPARC and almost any RISC, RA will point to the second instruction after JSR. That curious difference will be discussed later. Program counter (PC): This register (which is a general register on the VAX but a special register on the other machines) points to the place (memory location) in the machine language instruction stream where the program is currently operating. As each instruction is fetched, the PC is automatically incremented. The action of the JSR is to save the last version of the PC—the one for the next fetch—and replace it with the starting address of the routine to be jumped to. Summing up these transactions in algebraic form:
Do scratch/retu Di caller data D5 caller data A5 ler FP D7 caller data A7 block FIGURE 87.5 The stack area of the 68000's memory and the register assignments that the called subroutine sees as it is entered at the top. The registers all hold longwords, the size of an address. In typical PC/Macintosh compilers, integers ar defined as 16-bit words. Accordingly, the stack area of memory is shown as words, or half the width of a register JSR Numbercount sP←SP-4 M(SP) PC M(x)=memory (address x) pc c address of Numbercount Should you wonder how the address of Number Count gets in there, the linker, which assigns each section of code to its proper place in memory and therefore knows where all the labels are, will insert the proper address in place of the name. This completes the call as far as building the call block, doing the call itself, and picking up the result. Had there been more parameters to pass, that first instruction would have been replicated enough times to push all of the parameters, one at a time, onto the stack. Now let us look at the conventions from the point of view of the callee. The callee has more work. When the callee picks up the action, the stack and registers are as shown in Figure 87.5. With the exception of Do and A7, the callee has no registers . yet. The callee must make room for local variables in either register or memory. If it wants to use registers, it must save the user's data from the registers. The subroutine can get whatever space it needs on the stack. Only after the setup will it get down to work. The entire section of stack sed for local variables and saving registers is called the callee's frame. It is useful to have a pointer(FP)to the bottom of the frame to provide a static reference to the return address, the passed parameters, and the broutine's local variable area on the stack In the 68000, the convention is to use a6 as fp when our routine Number Count, begins, the address in A6 points to the start of the caller's frame The first thing the callee must do is to establish a local frame. it does that with the instruction linK. Typical of a CISC, each instruction does a large piece of the action. The whole entry operation for the 68000 is contained in two instruction LINK A6, #SFFF8 MOVEM.L D3-D7/A4,-(SP) The first instruction does the frame making; the second does the saving of registers. There are multiple steps in each. Each double step of decrementing SP and moving a value onto the stack will be called a push. The steps are as follows: LINK A6, #SFFF8 push A6(A7. M(A7)A6) emove A7 to A6 O FP) add FFF8(-8) (4 words for local variables e 2000 by CRC Press LLC
© 2000 by CRC Press LLC JSR NumberCount SP ‹ SP-4 ;A7 = SP M(SP) ‹ PC ;M(x) = memory(address x) PC ‹ address of NumberCount Should you wonder how the address of NumberCount gets in there, the linker, which assigns each section of code to its proper place in memory and therefore knows where all the labels are, will insert the proper address in place of the name. This completes the call as far as building the call block, doing the call itself, and picking up the result. Had there been more parameters to pass, that first instruction would have been replicated enough times to push all of the parameters, one at a time, onto the stack. Now let us look at the conventions from the point of view of the callee. The callee has more work. When the callee picks up the action, the stack and registers are as shown in Figure 87.5. With the exception of D0 and A7, the callee has no registers . . . yet. The callee must make room for local variables in either register or memory. If it wants to use registers, it must save the user’s data from the registers. The subroutine can get whatever space it needs on the stack. Only after the setup will it get down to work. The entire section of stack used for local variables and saving registers is called the callee’s frame. It is useful to have a pointer (FP) to the bottom of the frame to provide a static reference to the return address, the passed parameters, and the subroutine’s local variable area on the stack. In the 68000, the convention is to use A6 as FP. When our routine, NumberCount, begins, the address in A6 points to the start of the caller’s frame. The first thing the callee must do is to establish a local frame. It does that with the instruction LINK. Typical of a CISC, each instruction does a large piece of the action. The whole entry operation for the 68000 is contained in two instructions: LINK A6,#$FFF8 MOVEM.L D3-D7/A4,-(SP) The first instruction does the frame making; the second does the saving of registers. There are multiple steps in each. Each double step of decrementing SP and moving a value onto the stack will be called a push. The steps are as follows: LINK A6,#$FFF8 ;push A6 (A7 ‹ A7-4, M(A7) ‹ A6) ;move A7 to A6 (SP to FP) ;add FFF8 (-8) to SP (4 words for local variables) FIGURE 87.5 The stack area of the 68000’s memory and the register assignments that the called subroutine sees as it is entered at the top. The registers all hold longwords, the size of an address. In typical PC/Macintosh compilers, integers are defined as 16-bit words. Accordingly, the stack area of memory is shown as words, or half the width of a register
Do scratch/returm ee block FIGURE 87.6 The stack area of the 68000's memory and the register situation just after MOVEM has been executed. The memory area between the two arrows is the subroutine,s frame. MOVEM.I D3-D7/A4,-(A7) ;push 5 data registers (3. 7)and 1 address ;register (A4) At this point, the stack looks like Fig. 87.6 The subroutine is prepared to proceed. How it uses those free registers and the working space set aside on the stack is the subject of the section on optimization in this chapter. For the moment, however, we simply assume that it will do its thing in exemplary fashion, get the count of the numbers, and return. we continue in this section by considering the rather simple transaction of getting back. The callee is obliged to put the answer back where the caller expects to find it. Two paradigms for return are common. The one that our compiler uses is to put the answer in DO. The other common paradigm is to put the answer back on the stack. The user will have left enough room for that answer at FP+8, whether not that space was also used for transferring parameters in. Using our paradigm, the return becomes W SFFFC (A6), D answer from callee's stack frame [-4(FP)] to 4L(A7)+,D3-D7/A4 registers restored to former values FP, FP M(SP), SPSP+4 ; PC M(SP), SP SP+4 When all of this has transpired, the machine is back to the caller with SP pointing at block. The registers look like Fig. 87.5 except for two important changes. SP is back where the caller left it and DO contains the answer that the caller asked for The final topic in this section is the description of some of the translations of the simple and ordinary phrases of the HLLs into assembly language. We will show some in each of our three machines to show both the similarities and slightly different flavors that each machine architecture gives to the translation The paradigms that we will discuss comprise Arithmetic Replacement Testing and branching, particularly multiple Boolean expressions Stepping through a structure e 2000 by CRC Press LLC
© 2000 by CRC Press LLC MOVEM.L D3-D7/A4,-(A7) ;push 5 data registers (3..7) and 1 address ;register (A4) At this point, the stack looks like Fig. 87.6. The subroutine is prepared to proceed. How it uses those free registers and the working space set aside on the stack is the subject of the section on optimization in this chapter. For the moment, however, we simply assume that it will do its thing in exemplary fashion, get the count of the numbers, and return. We continue in this section by considering the rather simple transaction of getting back. The callee is obliged to put the answer back where the caller expects to find it. Two paradigms for return are common. The one that our compiler uses is to put the answer in D0. The other common paradigm is to put the answer back on the stack. The user will have left enough room for that answer at FP+8, whether or not that space was also used for transferring parameters in. Using our paradigm, the return becomes: MOVE.W $FFFC(A6),D0 ;answer from callee’s stack frame [-4(FP)] to D0 MOVEM.L (A7)+,D3-D7/A4 ;registers restored to former values UNLK A6 ;SP ‹ FP, FP ‹ M(SP), SP ‹ SP+4 RTS ;PC ‹ M(SP), SP ‹ SP+4 When all of this has transpired, the machine is back to the caller with SP pointing at block. The registers look like Fig. 87.5 except for two important changes. SP is back where the caller left it and D0 contains the answer that the caller asked for. Transactional Paradigms The final topic in this section is the description of some of the translations of the simple and ordinary phrases of the HLLs into assembly language. We will show some in each of our three machines to show both the similarities and slightly different flavors that each machine architecture gives to the translation. The paradigms that we will discuss comprise: • Arithmetic • Replacement • Testing and branching, particularly multiple Boolean expressions • Stepping through a structure FIGURE 87.6 The stack area of the 68000’s memory and the register situation just after MOVEM has been executed. The memory area between the two arrows is the subroutine’s frame
Many studies have shown that most computer arithmetic is concerned with addressing, testing, and indexing In Number Count there are several examples of each. For example, near the bottom of the program, there are statements such as: count++ For all three machines, the basic translation is the same: Add an immediate(a constant stored right in the instruction) to a number in register. However, for the CISCs, one may also ask that the number be brought in and even put back in memory. The three translations of this pair comprise VAX SPARC ADDQ W#Sl, SFFFE(A6) INCL RO d%o2,1,%o2 Typical of the VAX, it makes a special case out of adding 1. There is no essential difference in asking it to add I by saying"1, " but if one has a special instruction, it saves a byte of program length. With today s inexpensive memories, a byte is no longer a big deal, but when the VAX first emerged(1978), they were delivered with less memory than a PC or Mac would have today. The VAX, of course, can say ADDL #1, ro just like the 68000, and for any number other than 1 or 0, it would. Note also that the VAX compiler chose to keep count in register, while in Think Ce decided to put it on the stack (2(SP). A RISC has no choice. If you want arithmetic, your numbers must be in register. However, once again, we are really talking about the length of the code, not the ed of the transaction. All transactions take place from registers. The only issues are whether the programmer can see the registers and whether a single instruction can include both moving the operands and doing the operand arithmetic. The RISC separates the address arithmetic (e.g,-2(SP))from the operand arithmetic, its own instruction. Both get the job done. The next items we listed were replacement and testing and branching. We have both within the statement: 9); The translation requires several statements: VAX SPARC MOVE B(A4), D3 clrb rl dd%g00,%1 CMPLB#S30, D3 pb@4(ap),#48 ldsb [%02,%00 BLT ZERO blss zero CMPI B #S39, D3 b@4(ap),#57 ble zero BLE ONE bgtr ZERO bg zero add %g0, 1, %o1 add %01, 0,%13 ZERO: ZERO: ZERO MOVEQ #SO0, DO BRA DONE ONE MOVEQ #So1, DO DONE. MOVE. W DO, SFFF6(A6 To begin with, all three do roughly the same thing. The only noticeable difference in concept is that the SPARC compiler chose to compare the incoming character(source)to 47(the character before 0) and then branch if the result showed the letter to be"less than or equal, " while the other two compared it to 0 as asked and then branched if the result was"less than. No big deal. But let us walk down the several columns to see ne specific details. Prior to beginning, note that all three must bring in the character, run one or two tests, and then set an integer to either zero(false)or not zero(true). Also, let it be said that each snatch of code is e 2000 by CRC Press LLC
© 2000 by CRC Press LLC Many studies have shown that most computer arithmetic is concerned with addressing, testing, and indexing. In NumberCount there are several examples of each. For example, near the bottom of the program, there are statements such as: count++; For all three machines, the basic translation is the same: Add an immediate (a constant stored right in the instruction) to a number in register. However, for the CISCs, one may also ask that the number be brought in and even put back in memory. The three translations of this pair comprise: MC68000 VAX SPARC ADDQ.W #$1,$FFFE(A6) INCL R0 add %o2,1,%o2 Typical of the VAX, it makes a special case out of adding 1. There is no essential difference in asking it to add 1 by saying “1,” but if one has a special instruction, it saves a byte of program length. With today’s inexpensive memories, a byte is no longer a big deal, but when the VAX first emerged (1978), they were delivered with less memory than a PC or Mac would have today. The VAX, of course, can say ADDL #1, r0 just like the 68000, and for any number other than 1 or 0, it would. Note also that the VAX compiler chose to keep count in register, while in Think C® decided to put it on the stack (–2(SP)). A RISC has no choice. If you want arithmetic, your numbers must be in register. However, once again, we are really talking about the length of the code, not the speed of the transaction. All transactions take place from registers. The only issues are whether the programmer can see the registers and whether a single instruction can include both moving the operands and doing the operand arithmetic. The RISC separates the address arithmetic (e.g., –2(SP)) from the operand arithmetic, putting each in its own instruction. Both get the job done. The next items we listed were replacement and testing and branching. We have both within the statement: digit = (*source >= '0') && (*source <= '9'); The translation requires several statements: MC68000 VAX SPARC MOVE.B (A4),D3 clrb r1 add %g0,0,%o1 CMPI.B #$30,D3 cmpb @4(ap),#48 ldsb [%o2],%o0 BLT ZERO blss ZERO subcc %o0, 47,%g0 CMPI.B #$39,D3 cmpb @4(ap),#57 ble ZERO BLE ONE bgtr ZERO nop incb r1 subcc %o0,57,%g0 bg ZERO nop add %g0, 1,%o1 add %o1,0,%l3 ZERO: ZERO: ZERO: MOVEQ #$00,D0 BRA DONE ONE: MOVEQ #$01,D0 DONE: MOVE.W D0,$FFF6(A6) To begin with, all three do roughly the same thing. The only noticeable difference in concept is that the SPARC compiler chose to compare the incoming character (*source) to 47 (the character before ‘0’) and then branch if the result showed the letter to be “less than or equal,” while the other two compared it to ‘0’ as asked and then branched if the result was “less than.” No big deal. But let us walk down the several columns to see the specific details. Prior to beginning, note that all three must bring in the character, run one or two tests, and then set an integer to either zero (false) or not zero (true). Also, let it be said that each snatch of code is