Feldman, J M, Czeck, E W, Lewis, T.G., Martin, JJ."Programm The Electrical Engineering Handbook Ed. Richard C. Dorf Boca raton crc Press llc. 2000
Feldman, J.M., Czeck, E.W., Lewis, T.G., Martin, J.J. “Programming” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000
87 Programming James M. Feldman 87.1 Assembly Language Edward W. czeck Numbercount().comparisOnsDownontheFactory Floor.compilerOptimizationandASsemblyLanguage 87.2 High-Level Languages Ted G. Lewis What Is a HLL?. How High Is a HLL?. HLLs and Paradigms 87.3 Data Types and Data Structures Abstract Data Types. Fundamental Data Types. Type Johannes J Martin Constructors. Dynamic Types. More Dynamic Data University of New Orleans Types.Object-Oriented Programming 87.1 Assembly Language James M. Feldman and Edward W. Czeck The true language of computers is a stream of Is and Os-bits. Everything in the computer, be it numbers or text or program, spreadsheet or database or 3-D rendering, is nothing but an array of bits. The meaning of the bits is in the"eye of the beholder; it is determined entirely by context. Bits are not a useful medium for human consumption. Instead, we insist that what we read be formatted spatially and presented in a modest range of isually distinguishable characters. 0 and 1 arranged in a dense, page-filling array do not fulfill these require- ments in any way. The several languages that are presented in this handbook are all intended to make something readable to two quite different readers On the one hand, they serve the human reader with his/her requirements on symbols and layout; on the other, they provide a grammatically regular language for interpretation by a mpiler. A compiler, of course, is normally a program running on a computer, but human beings can and sometimes do play both sides of this game. They want to play with both the input and output. Such accessibility requires that not only the input but the output of the compilation process be comfortably readable by humans The language of the input is called a high-level language(HLL). Examples are C, Pascal, Ada and Modula II They are designed to express both regularly and concisely the kinds of operations and the kinds of constructs that programmers manipulate The output end of the compiler generates object code-a generally unreadable, binary representation of machine language, lacking only the services of a linker to turn it into true machine language. The language that has been constructed to represent object code for human consumption is assembly language. That is the subject of this section. Some might object to our statement of purpose for assembly language. While few will contest the concept of assembly language as the readable form of object code, some see writing assembly code as the way to"get their hands on the inner workings of the machine. They see it as a"control"issue. Since most HLLs today give the user reasonably direct ways to access hardware, where does the"control"issue arise? What assembly proponents see as the essential reason for having an assembly language is the option to optimize the"important sections of a program by doing a better job of machine code generation than the compiler does. This perspective was valid enough when compilers were mediocre optimizers. It was not unlike the old days when a car came with a complete set of tools because you needed them. The same thing that has happened to cars has happened to compilers. They are engineered to be"fuel efficient "and perform their assigned functions with remarkable c 2000 by CRC Press LLC
© 2000 by CRC Press LLC 87 Programming 87.1 Assembly Language NumberCount( ) • Comparisons Down on the Factory Floor • Compiler Optimization and Assembly Language 87.2 High-Level Languages What Is a HLL? • How High Is a HLL? • HLLs and Paradigms 87.3 Data Types and Data Structures Abstract Data Types • Fundamental Data Types • Type Constructors • Dynamic Types • More Dynamic Data Types • Object-Oriented Programming 87.1 Assembly Language James M. Feldman and Edward W. Czeck The true language of computers is a stream of 1s and 0s—bits. Everything in the computer, be it numbers or text or program, spreadsheet or database or 3-D rendering, is nothing but an array of bits. The meaning of the bits is in the “eye of the beholder”; it is determined entirely by context. Bits are not a useful medium for human consumption. Instead, we insist that what we read be formatted spatially and presented in a modest range of visually distinguishable characters. 0 and 1 arranged in a dense, page-filling array do not fulfill these requirements in any way. The several languages that are presented in this handbook are all intended to make something readable to two quite different readers. On the one hand, they serve the human reader with his/her requirements on symbols and layout; on the other, they provide a grammatically regular language for interpretation by a compiler. A compiler, of course, is normally a program running on a computer, but human beings can and sometimes do play both sides of this game. They want to play with both the input and output. Such accessibility requires that not only the input but the output of the compilation process be comfortably readable by humans. The language of the input is called a high-level language (HLL). Examples are C, Pascal, Ada and Modula II. They are designed to express both regularly and concisely the kinds of operations and the kinds of constructs that programmers manipulate. The output end of the compiler generates object code—a generally unreadable, binary representation of machine language, lacking only the services of a linker to turn it into true machine language. The language that has been constructed to represent object code for human consumption is assembly language. That is the subject of this section. Some might object to our statement of purpose for assembly language. While few will contest the concept of assembly language as the readable form of object code, some see writing assembly code as the way to “get their hands on the inner workings of the machine.” They see it as a “control” issue. Since most HLLs today give the user reasonably direct ways to access hardware, where does the “control” issue arise? What assembly proponents see as the essential reason for having an assembly language is the option to optimize the “important” sections of a program by doing a better job of machine code generation than the compiler does. This perspective was valid enough when compilers were mediocre optimizers. It was not unlike the old days when a car came with a complete set of tools because you needed them. The same thing that has happened to cars has happened to compilers. They are engineered to be “fuel efficient” and perform their assigned functions with remarkable James M. Feldman Northeastern University Edward W. Czeck Northeastern University Ted G. Lewis Navel Postgraduate School Johannes J. Martin University of New Orleans
ability. When the cars or compilers get good enough and complex enough, the tinkerer may do more harm than good. IBM's superscalar RISC computer-the RS6000--comes with superb compilers and no assembl at all. The Pentagon took a long look at their costs of programming their immense array of computers. Contrary to popular legend, they decided to save money. The first amendment not withstanding, their conclusion was Thou shalt not assemble The Any sizable programming job gets done at least four times faster in a HLL Most modern compilers are good optimizers of code; some are superb. Almost all important code goes through revisions--maintenance Reworking old assembly code is similar to breaking good encryption; it takes fore Most important of all is portability. To move any program to a different computer, you must generate machine code for that new platform. With a program in a HLL, a new platform is almost free; all it requires is another pass through the compiler for the target platform. with assembly code, you are back to square one. Assembly code is unique to the platform Given all of that, the question naturally arises: Why have an article on assembly language? We respond with two reasons, both of which we employ in our work as teachers and programmers An essential ingredient in understanding computer hardware and in designing new computer systems and compilers is a detailed appreciation of the operations of central processing units(CPUs). These are best expressed in assembly language. Our undergraduate Computer Engineering courses include a healthy dose of assembly language programming for this specific reason. If you are concerned about either cpu design or compiler effectiveness, you have to be able to look in great detail at the interface between them-machine language. As we have said, the easiest way to read machine language is by translating it to assembly language. This is one way to get assembly language, not by writing in it as a source of code but by running the object code itself through a backward translator called a disassembler. While many compilers will oblige you by providing an assembly listing if asked, often that listing does not include optimizations that occur only when the several modules are linked together, providing opportunities for truly global optimization. Some compilers"help"the reader by sing macros(names for predefined blocks of code)in place of the real machine instructions and register assignments. The absence of the optimizations and the inclusion of unexpected macros can make the assembly listing almost useless for obtaining insight into the programs fine detail. The compilers that we have used on the DECstations and SPARC machines do macro inclusion. To see what is really going in these machines, you must disassemble the machine code. That is precisely what the Think C& ompiler on the Macintosh does when you ask for machine code. It disassembles what it just did in compiling and linking the whole program. What you see is what is really there. The code we present for the 68000 was obtained in that way. These are important applications. Even if most or all other programming needs can be better met in HLLs, these applications are sufficient reason for many engineers to want to know something about assembly language. There are other applications of assembly language, but they tend to be specific to rather specialized and infrequent tasks. For example, the back end of most HLL compilers is a machine code generator. To write one f those, you certainly must know something about assembly language. On rare occasions, you may find som necessary machine-specific transaction which is not supported by the hLL of choice or which requires some special micro optimization. A"patch" of assembly code is a way to fit this inexpressible thought into the program's vocabulary. These are rare events. The reason why we recommend to you this section on assembly code is that it improves your understanding of HLLs and of computer architecture We will take a single subroutine which we express in C and look at the machine code that is generated or two representative machines. The machines include two widely used complex instruction set computers( CISCs and one reduced instruction set computer(RISC). These are the 68000, the VAX@, and a SParC. We will have e 2000 by CRC Press LLC
© 2000 by CRC Press LLC ability. When the cars or compilers get good enough and complex enough, the tinkerer may do more harm than good. IBM’s superscalar RISC computer—the RS6000—comes with superb compilers and no assembler at all. The Pentagon took a long look at their costs of programming their immense array of computers. Contrary to popular legend, they decided to save money. The first amendment not withstanding, their conclusion was: “Thou shalt not assemble.” The four principal reasons for not writing assembly language are • Any sizable programming job gets done at least four times faster in a HLL. • Most modern compilers are good optimizers of code; some are superb. • Almost all important code goes through revisions—maintenance. Reworking old assembly code is similar to breaking good encryption; it takes forever. • Most important of all is portability. To move any program to a different computer, you must generate machine code for that new platform. With a program in a HLL, a new platform is almost free; all it requires is another pass through the compiler for the target platform. With assembly code, you are back to square one. Assembly code is unique to the platform. Given all of that, the question naturally arises: Why have an article on assembly language? We respond with two reasons, both of which we employ in our work as teachers and programmers: • An essential ingredient in understanding computer hardware and in designing new computer systems and compilers is a detailed appreciation of the operations of central processing units (CPUs). These are best expressed in assembly language. Our undergraduate Computer Engineering courses include a healthy dose of assembly language programming for this specific reason. • If you are concerned about either CPU design or compiler effectiveness, you have to be able to look in great detail at the interface between them—machine language. As we have said, the easiest way to read machine language is by translating it to assembly language. This is one way to get assembly language, not by writing in it as a source of code but by running the object code itself through a backward translator called a disassembler. While many compilers will oblige you by providing an assembly listing if asked, often that listing does not include optimizations that occur only when the several modules are linked together, providing opportunities for truly global optimization. Some compilers “help” the reader by using macros (names for predefined blocks of code) in place of the real machine instructions and register assignments. The absence of the optimizations and the inclusion of unexpected macros can make the assembly listing almost useless for obtaining insight into the program’s fine detail. The compilers that we have used on the DECstations and SPARC machines do macro inclusion. To see what is really going on in these machines, you must disassemble the machine code. That is precisely what the Think C® compiler on the Macintosh does when you ask for machine code. It disassembles what it just did in compiling and linking the whole program. What you see is what is really there. The code we present for the 68000 was obtained in that way. These are important applications. Even if most or all other programming needs can be better met in HLLs, these applications are sufficient reason for many engineers to want to know something about assembly language. There are other applications of assembly language, but they tend to be specific to rather specialized and infrequent tasks. For example, the back end of most HLL compilers is a machine code generator. To write one of those, you certainly must know something about assembly language. On rare occasions, you may find some necessary machine-specific transaction which is not supported by the HLL of choice or which requires some special micro optimization. A “patch” of assembly code is a way to fit this inexpressible thought into the program’s vocabulary. These are rare events. The reason why we recommend to you this section on assembly code is that it improves your understanding of HLLs and of computer architecture. We will take a single subroutine which we express in C and look at the machine code that is generated on two representative machines. The machines include two widely used complex instruction set computers (CISCs) and one reduced instruction set computer (RISC). These are the 68000®, the VAX®, and a SPARC®. We will have two objectives:
To see how a variety of paradigms in HLLs are translated (or, in other words, to see what is really going on when you ask for a particular HLL operation To compare the several architectures to see how they are the same and how they differ The routine attempts to get a count of the number of numbers which occur in a block of text. Since we are seeking numbers and not digits, the task is more complex than you might first assume. This is why we say attempts. The function that we present below handles all of the normal text forms mbers written in a fixed-point format, such as 12.3 or 0. 1738 Numbers written in a floating-point format, such as-127e +19 or 6.781E2 If our program were to scan the indented block of code above, it would report finding six numbers. The symbols that the program recognizes as potentially part of a number include the digits 0 to 9 and the symbols,E, and+. Now it is certainly possible to include other symbols in legitimate numbers, such as HEX numbers or the like, but this little routine will not properly deal with them. Our purpose was not to handle all comers but to provide a routine with some variety of expression and possible application. Let us be Number Count() We enter the program at the top with one pointer passed from the calling routine and a set of local variables comprising two integers and eight Boolean variables. Most of the Boolean variables will be used in pairs. The first element of a pair, for instance, ees of ees and latch, indicates that the current character is one of a particular class of non-numeric characters which might be found inside a number. If you consider that the number begin at the first digit, then these characters can occur legally only once within a given number ees will be set true if the current character is the first instance of either 'e or 'E. The paired variable, latch, is set true if there has ever been one of those characters in the current number. The other pairs are period and latch and sign and latchs There is also a pair of Booleans which indicate if the current character is a digit and if the scanner is currently inside a number. Were you to limit your numbers to integers, these two are the only booleans which would be needed. At the top of the program, all Booleans are reset(made FALSE). Then we step through the block looking for numbers. The search stops when we encounter the first null [char()] marking the end of the block. ry running through the routine with text containing the three forms of number. You will quickly convince yourself that the routine works with all normal numbers. If someone writes.14"or 3.14ee6, the program will count 2 numbers. That is probably right in the first two cases. Who knows in the third? Let us look at this short routine in C define blk length 20001 int Numbercount(char block() atche=o, latch=0, period=O, latchs=o, sign=o char source block digit=(* source>=“0〃)&&(* source<=‘9); period =(*source='')&& inside & !latchp; & !lathe latch =(latch I period); ees=((*source=E)I(*source='e))&& inside &&!lathe; latch =(latch I ees); I (*source=-))&&inside & latch &&!latchs latchs =(latchs I sign) if (!(digit I ees period I sign)) inside=latch=latche=latchs=0; } else if (digit) e 2000 by CRC Press LLC
© 2000 by CRC Press LLC • To see how a variety of paradigms in HLLs are translated (or, in other words, to see what is really going on when you ask for a particular HLL operation) • To compare the several architectures to see how they are the same and how they differ The routine attempts to get a count of the number of numbers which occur in a block of text. Since we are seeking numbers and not digits, the task is more complex than you might first assume. This is why we say “attempts.” The function that we present below handles all of the normal text forms: • Integers, such as 123 or –17 • Numbers written in a fixed-point format, such as 12.3 or 0.1738 • Numbers written in a floating-point format, such as –12.7e+19 or 6.781E2 If our program were to scan the indented block of code above, it would report finding six numbers. The symbols that the program recognizes as potentially part of a number include the digits 0 to 9 and the symbols ‘e’, ‘E’, ‘ . ’, ‘–’ and ‘+’. Now it is certainly possible to include other symbols in legitimate numbers, such as HEX numbers or the like, but this little routine will not properly deal with them. Our purpose was not to handle all comers but to provide a routine with some variety of expression and possible application. Let us begin. NumberCount( ) We enter the program at the top with one pointer passed from the calling routine and a set of local variables comprising two integers and eight Boolean variables. Most of the Boolean variables will be used in pairs. The first element of a pair, for instance, ees of ees and latche, indicates that the current character is one of a particular class of non-numeric characters which might be found inside a number. If you consider that the number begins at the first digit, then these characters can occur legally only once within a given number. ees will be set true if the current character is the first instance of either ‘e’ or ‘E’. The paired variable, latche, is set true if there has ever been one of those characters in the current number. The other pairs are period and latchp and sign and latchs. There is also a pair of Booleans which indicate if the current character is a digit and if the scanner is currently inside a number. Were you to limit your numbers to integers, these two are the only Booleans which would be needed. At the top of the program, all Booleans are reset (made FALSE). Then we step through the block looking for numbers. The search stops when we encounter the first null [char(0)] marking the end of the block. Try running through the routine with text containing the three forms of number. You will quickly convince yourself that the routine works with all normal numbers. If someone writes “3..14” or “3.14ee6”, the program will count 2 numbers. That is probably right in the first two cases. Who knows in the third? Let us look at this short routine in C. # define blk_length 20001 int NumberCount(char block[]) { int count=0,inside=0,digit; int ees=0, latche=0, latchp=0, period=0, latchs=0, sign=0; char *source; source = block; do { digit = (*source >= ‘0’) && (*source <= ‘9’); period = (*source==’.’) && inside && !latchp; && !latche; latchp = (latchp || period); ees = ((*source==’E’) || (*source==’e’)) && inside && !latche; latche = (latche || ees); sign = ((*source==’+’) || (*source==’-’)) && inside && latche && !latchs; latchs = (latchs || sign); if (inside) { if (!(digit || ees || period || sign)) inside=latchp=latche=latchs=0; } else if (digit) {
inside 1 source+t while ((*source !=\0')&&(( source-block )<blk length+1)); return counti To access values within the character array, the normal C paradigm is to step a pointer along the array Source points at the current character in the array; *source is the character("what source points at). source is initialized at the top of the program before the loop (source block; )and incremented(source++;) at the bottom of the loop. Note the many repetitions of *source. Each one means the same current character. If you read that expression as the character which source is pointing to, it looks like an invitation to fetch the same character from memory eight times. A compiler that optimizes by removing common subexpressions should eliminate all but the first such fetch. This optimization is one of the things that we want to look for For those less familiar with C, the meanings of the less familiar symbols are qual (in the logical sense) not equal count++ increment count by 1 unit(after using it) C uses 0 as FALSE and anything else as TRUE. Comparisons Down on the Factory Floor Now let us see what we can learn by running this program through compilers on several quite different hosts. The items that we wish to examine include: I. Subroutine operations comprising: A. Building the call block C. Obtaining memory space for local variables D. Accessing th E. Returning the function value F. Returning to th g routin A. Load and store B. Arithmetic C. Logical IIL. Pro control B. if and the issue of multiple tests Our objectives are to build three quite different pictures An appreciation for the operations underlying the HLL statements An overview of the architectures of several important examples of CISC and RISC processors An appreciation for what a HLL optimizer should be doing for you We will attempt to do all three all of the time. e 2000 by CRC Press LLC
© 2000 by CRC Press LLC count++; inside = 1; } source++; } while ((*source != ‘\0’) && ((source-block)<blk_length+1)); return count; } To access values within the character array, the normal C paradigm is to step a pointer along the array. Source points at the current character in the array; *source is the character (“what source points at”). source is initialized at the top of the program before the loop (source = block;) and incremented (source++;) at the bottom of the loop. Note the many repetitions of *source. Each one means the same current character. If you read that expression as the character which source is pointing to, it looks like an invitation to fetch the same character from memory eight times. A compiler that optimizes by removing common subexpressions should eliminate all but the first such fetch. This optimization is one of the things that we want to look for. For those less familiar with C, the meanings of the less familiar symbols are: == equal (in the logical sense) ! not != not equal && and || or count++ increment count by 1 unit (after using it) C uses 0 as FALSE and anything else as TRUE. Comparisons Down on the Factory Floor Now let us see what we can learn by running this program through compilers on several quite different hosts. The items that we wish to examine include: I. Subroutine operations comprising: A. Building the call block B. The call itself C. Obtaining memory space for local variables D. Accessing the call block E. Returning the function value F. Returning to the calling routine II. Data operations A. Load and store B. Arithmetic C. Logical D. Text III. Program control A. Looping B. if and the issue of multiple tests Our objectives are to build three quite different pictures: • An appreciation for the operations underlying the HLL statements • An overview of the architectures of several important examples of CISC and RISC processors • An appreciation for what a HLL optimizer should be doing for you We will attempt to do all three all of the time