当前位置：和泉文库 > 计算机 > 《编译原理 Principles and Techniques of Compilers》课程教学资源（学习资料）Assemblers,Linkers,and the SPIM Simulator（MIPS32 and SPIM）

《编译原理 Principles and Techniques of Compilers》课程教学资源（学习资料）Assemblers,Linkers,and the SPIM Simulator（MIPS32 and SPIM）

A.1 Introduction A-3 A.2 Assemblers A-10 A.3 Linkers A-18 A.4 Loading A-19 A.5 Memory Usage A-20 A.6 Procedure Call Convention A-22 A.7 Exceptions and Interrupts A-33 A.8 Input and Output A-38 A.9 SPIM A-40 A.10 MIPS R2000 Assembly Language A-45 A.11 Concluding Remarks A-81 A.12 Exercises A-82

文件格式：PDF，文件大小：482.61KB，售价：22.08元

共84页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约84页）

A-12 Appendix A Assemblers,Linkers,and the SPIM Simulator An assembler's first pass reads each line of an assembly file and breaks it into its component pieces.These pieces,which are called lexemes,are individual words, numbers,and punctuation characters.For example,the line ble $t0,100,100p contains six lexemes:the opcode ble,the register specifier $to,a comma,the number 100,a comma,and the symbol 1oop. symbol table A table that If a line begins with a label,the assembler records in its symbol table the name matches names of labels to the of the label and the address of the memory word that the instruction occupies. addresses of the memory words The assembler then calculates how many words of memory the instruction on the that instructions occupy. current line will occupy.By keeping track of the instructions'sizes,the assembler can determine where the next instruction goes.To compute the size of a variable- length instruction,like those on the VAX,an assembler has to examine it in detail. Fixed-length instructions,like those on MIPS,on the other hand,require only a cursory examination.The assembler performs a similar calculation to compute the space required for data statements.When the assembler reaches the end of an assembly file,the symbol table records the location of each label defined in the file. The assembler uses the information in the symbol table during a second pass over the file,which actually produces machine code.The assembler again exam- ines each line in the file.If the line contains an instruction,the assembler com- bines the binary representations of its opcode and operands(register specifiers or memory address)into a legal instruction.The process is similar to the one used in Section 2.4 in Chapter 2.Instructions and data words that reference an external symbol defined in another file cannot be completely assembled(they are unre- solved)since the symbol's address is not in the symbol table.An assembler does not complain about unresolved references since the corresponding label is likely to be defined in another file The BIG Assembly language is a programming language.Its principal difference from high-level languages such as BASIC,Java,and C is that assembly lan- Picture guage provides only a few,simple types of data and control flow.Assembly language programs do not specify the type of value held in a variable. Instead,a programmer must apply the appropriate operations(e.g.,integer or floating-point addition)to a value.In addition,in assembly language, programs must implement all control flow with go tos.Both factors make assembly language programming for any machine-MIPS or 80x86- more difficult and error-prone than writing in a high-level language

A-12 Appendix A Assemblers, Linkers, and the SPIM Simulator An assembler’s first pass reads each line of an assembly file and breaks it into its component pieces. These pieces, which are called lexemes, are individual words, numbers, and punctuation characters. For example, the line ble $t0, 100, loop contains six lexemes: the opcode ble, the register specifier $t0, a comma, the number 100, a comma, and the symbol loop. If a line begins with a label, the assembler records in its symbol table the name of the label and the address of the memory word that the instruction occupies. The assembler then calculates how many words of memory the instruction on the current line will occupy. By keeping track of the instructions’ sizes, the assembler can determine where the next instruction goes. To compute the size of a variablelength instruction, like those on the VAX, an assembler has to examine it in detail. Fixed-length instructions, like those on MIPS, on the other hand, require only a cursory examination. The assembler performs a similar calculation to compute the space required for data statements. When the assembler reaches the end of an assembly file, the symbol table records the location of each label defined in the file. The assembler uses the information in the symbol table during a second pass over the file, which actually produces machine code. The assembler again examines each line in the file. If the line contains an instruction, the assembler combines the binary representations of its opcode and operands (register specifiers or memory address) into a legal instruction. The process is similar to the one used in Section 2.4 in Chapter 2. Instructions and data words that reference an external symbol defined in another file cannot be completely assembled (they are unresolved) since the symbol’s address is not in the symbol table. An assembler does not complain about unresolved references since the corresponding label is likely to be defined in another file Assembly language is a programming language. Its principal difference from high-level languages such as BASIC, Java, and C is that assembly language provides only a few, simple types of data and control flow. Assembly language programs do not specify the type of value held in a variable. Instead, a programmer must apply the appropriate operations (e.g., integer or floating-point addition) to a value. In addition, in assembly language, programs must implement all control flow with go tos. Both factors make assembly language programming for any machine—MIPS or 80x86— more difficult and error-prone than writing in a high-level language. symbol table A table that matches names of labels to the addresses of the memory words that instructions occupy. The BIG Picture

A.2 Assemblers A-13 Elaboration:If an assembler's speed is important,this two-step process can be done in one pass over the assembly file with a technique known as backpatching.In its backpatching A method for pass over the file,the assembler builds a(possibly incomplete)binary representation translating from assembly lan- of every instruction.If the instruction references a label that has not yet been defined, guage to machine instructions the assembler records the label and instruction in a table.When a label is defined,the in which the assembler builds a assembler consults this table to find all instructions that contain a forward reference to (possibly incomplete)binary the label.The assembler goes back and corrects their binary representation to incorpo representation of every instruc- rate the address of the label.Backpatching speeds assembly because the assembler tion in one pass over a program only reads its input once.However,it requires an assembler to hold the entire binary and then returns to fill in previ- representation of a program in memory so instructions can be backpatched.This ously undefined labels. requirement can limit the size of programs that can be assembled.The process is com- plicated by machines with several types of branches that span different ranges of instructions.When the assembler first sees an unresolved label in a branch instruction, it must either use the largest possible branch or risk having to go back and readjust many instructions to make room for a larger branch. Object File Format Assemblers produce object files.An object file on UNIX contains six distinct sec- tions(see Figure A.2.1): The object file header describes the size and position of the other pieces of the file. The text segment contains the machine language code for routines in the source text segment The segment of a file.These routines may be unexecutable because of unresolved references UNIX object file that contains the machine language code for The data segment contains a binary representation of the data in the source routines in the source file. file.The data also may be incomplete because of unresolved references to labels in other files. data segment The segment of a UNIX object or executable file The relocation information identifies instructions and data words that that contains a binary represen- depend on absolute addresses.These references must change if portions of tation of the initialized data the program are moved in memory. used by the program. The symbol table associates addresses with external labels in the source file relocation information The and lists unresolved references. segment of a UNIX object file that identifies instructions and The debugging information contains a concise description of the way in data words that depend on which the program was compiled,so a debugger can find which instruction absolute addresses addresses correspond to lines in a source file and print the data structures in readable form. absolute address A variable's or routine's actual address in The assembler produces an object file that contains a binary representation of memory. the program and data and additional information to help link pieces of a pro-

A.2 Assemblers A-13 Elaboration: If an assembler’s speed is important, this two-step process can be done in one pass over the assembly file with a technique known as backpatching. In its pass over the file, the assembler builds a (possibly incomplete) binary representation of every instruction. If the instruction references a label that has not yet been defined, the assembler records the label and instruction in a table. When a label is defined, the assembler consults this table to find all instructions that contain a forward reference to the label. The assembler goes back and corrects their binary representation to incorporate the address of the label. Backpatching speeds assembly because the assembler only reads its input once. However, it requires an assembler to hold the entire binary representation of a program in memory so instructions can be backpatched. This requirement can limit the size of programs that can be assembled. The process is complicated by machines with several types of branches that span different ranges of instructions. When the assembler first sees an unresolved label in a branch instruction, it must either use the largest possible branch or risk having to go back and readjust many instructions to make room for a larger branch. Object File Format Assemblers produce object files. An object file on UNIX contains six distinct sections (see Figure A.2.1): ■ The object file header describes the size and position of the other pieces of the file. ■ The text segment contains the machine language code for routines in the source file. These routines may be unexecutable because of unresolved references. ■ The data segment contains a binary representation of the data in the source file. The data also may be incomplete because of unresolved references to labels in other files. ■ The relocation information identifies instructions and data words that depend on absolute addresses. These references must change if portions of the program are moved in memory. ■ The symbol table associates addresses with external labels in the source file and lists unresolved references. ■ The debugging information contains a concise description of the way in which the program was compiled, so a debugger can find which instruction addresses correspond to lines in a source file and print the data structures in readable form. The assembler produces an object file that contains a binary representation of the program and data and additional information to help link pieces of a probackpatching A method for translating from assembly language to machine instructions in which the assembler builds a (possibly incomplete) binary representation of every instruction in one pass over a program and then returns to fill in previously undefined labels. text segment The segment of a UNIX object file that contains the machine language code for routines in the source file. data segment The segment of a UNIX object or executable file that contains a binary representation of the initialized data used by the program. relocation information The segment of a UNIX object file that identifies instructions and data words that depend on absolute addresses. absolute address A variable’s or routine’s actual address in memory

A-14 Appendix A Assemblers,Linkers,and the SPIM Simulator Object file Text Data Relocation Symbol Debugging header segment segment information table information FIGURE A.2.1 Objeet file.A UNIX assembler produces an object file with six distinct sections. gram.This relocation information is necessary because the assembler does not know which memory locations a procedure or piece of data will occupy after it is linked with the rest of the program.Procedures and data from a file are stored in a contiguous piece of memory,but the assembler does not know where this mem- ory will be located.The assembler also passes some symbol table entries to the linker.In particular,the assembler must record which external symbols are defined in a file and what unresolved references occur in a file. Elaboration:For convenience,assemblers assume each file starts at the same address (for example,location O)with the expectation that the linker will relocate the code and data when they are assigned locations in memory.The assembler produces relocation information,which contains an entry describing each instruction or data word in the file that references an absolute address.On MIPS,only the subroutine call,load, and store instructions reference absolute addresses.Instructions that use PC-relative addressing,such as branches,need not be relocated. Additional Facilities Assemblers provide a variety of convenience features that help make assembler programs short and easier to write,but do not fundamentally change assembly language.For example,data layout directives allow a programmer to describe data in a more concise and natural manner than its binary representation. In Figure A.1.4,the directive asciiz "The sum from 0 .100 is &d\n" stores characters from the string in memory.Contrast this line with the alternative of writing each character as its ASCII value(Figure 2.21 in Chapter 2 describes the ASCII encoding for characters): .byte84,104,101,32,115,117,109,32 .byte102,114,111,109,32,48,32,46 .byte46,32,49,48,48,32,105,115 .byte32,37,100,10,0 The.asciiz directive is easier to read because it represents characters as letters, not binary numbers.An assembler can translate characters to their binary repre- sentation much faster and more accurately than a human.Data layout directives

A-14 Appendix A Assemblers, Linkers, and the SPIM Simulator gram. This relocation information is necessary because the assembler does not know which memory locations a procedure or piece of data will occupy after it is linked with the rest of the program. Procedures and data from a file are stored in a contiguous piece of memory, but the assembler does not know where this memory will be located. The assembler also passes some symbol table entries to the linker. In particular, the assembler must record which external symbols are defined in a file and what unresolved references occur in a file. Elaboration: For convenience, assemblers assume each file starts at the same address (for example, location 0) with the expectation that the linker will relocate the code and data when they are assigned locations in memory. The assembler produces relocation information, which contains an entry describing each instruction or data word in the file that references an absolute address. On MIPS, only the subroutine call, load, and store instructions reference absolute addresses. Instructions that use PC-relative addressing, such as branches, need not be relocated. Additional Facilities Assemblers provide a variety of convenience features that help make assembler programs short and easier to write, but do not fundamentally change assembly language. For example, data layout directives allow a programmer to describe data in a more concise and natural manner than its binary representation. In Figure A.1.4, the directive .asciiz “The sum from 0 .. 100 is %d\n” stores characters from the string in memory. Contrast this line with the alternative of writing each character as its ASCII value (Figure 2.21 in Chapter 2 describes the ASCII encoding for characters): .byte 84, 104, 101, 32, 115, 117, 109, 32 .byte 102, 114, 111, 109, 32, 48, 32, 46 .byte 46, 32, 49, 48, 48, 32, 105, 115 .byte 32, 37, 100, 10, 0 The .asciiz directive is easier to read because it represents characters as letters, not binary numbers. An assembler can translate characters to their binary representation much faster and more accurately than a human. Data layout directives FIGURE A.2.1 Object file. A UNIX assembler produces an object file with six distinct sections. Object file header Text segment Data segment Relocation information Symbol table Debugging information

点击进入文档下载页（PDF格式）

共84页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录