《电脑编程》教学参考书籍文献（C++编程书籍）Addison Wesley - Effcient C++ Programming Techniques.pdf_P11-P15

xi There is also the option of rolling your own version even if a particular service is already available in a library. Libraries are often designed with flexibility and reusability in mind. Often, flexibility and reusability trade off with performance. If, for some critical code fragment, you choose to put performance considerations above the other two, it might be reasonable to override a library service with your own home-grown implementation. Applications are so diverse in their specific needs, it is hard to design a library that will be the perfect solution for everybody, everywhere, all the time. • Compiler optimizations Simply a more descriptive name than “miscellaneous,” this category includes all those small coding tricks that don’t fit in the other coding categories, such as loop unrolling, lifting constant expressions out of loops, and similar techniques for elimination of computational redundancies. Most compilers will perform many of those optimizations for you. But you cannot count on any specific compiler to perform a specific optimization. One compiler may unroll a loop twice, another will unroll it four times, and yet another compiler will not unroll it at all. For ultimate control, you have to take coding matters into your own hands. Our Goal Many books and articles have extolled the virtues of C++ as a language supporting the OO paradigm. C++ is positioned as the latest cure for the software crisis. For those not familiar with the term, the software crisis is our current inability to develop code that is simple enough to be understood, maintained, and extended by a mere mortal, yet powerful enough to provide solutions to complex problems [CE95]. Developers who migrate from other structured languages to C++ have been bombarded with information pertaining to the use of C++ in creating highly flexible and reusable code that will lend itself nicely to easy maintenance and extension. One important issue, however, has received little attention: run-time efficiency. We will examine the relevant performance topics from the perspective of C++ programming. After reading this book you should emerge with a clear understanding of the common C++ performance pitfalls and how to avoid them without compromising the clarity and simplicity of your design. In fact, the highperformance solution is frequently also the simplest solution. This book should also help developers produce C++ code as efficient as its C counterpart while still benefiting from the extended features of C++ and the inherent superiority of the OO paradigm. A famous physicist once said that an expert is one who has made all possible mistakes in a very narrow field. Although making mistakes is a good way to learn, learning from the mistakes of others (the authors, in this case) is even better. A secondary goal of this book is to construct a one-stop shop for C++ performance issues. As a C++ developer, the answers to your performance concerns are not readily available. They are scattered over a long list of books and magazine articles that address different pieces of this puzzle. You would have to research this topic and put it all together yourself. Not many developers are going to do that. We are too busy. It will be helpful to have a one-stop shop that focuses entirely on the important topic of C++ performance. Software Efficiency: Does It Matter? In an era where processor speed doubles every 18 months (Moore's law), do we really need to worry about software efficiency? The fact is that regardless of phenomenal advances in chip technology, software efficiency is still a prominent concern. In 1971, the Intel 4004 was the first commercial processor to fit on a single chip. It was named a microprocessor. Since then, microprocessor technology has embarked on a 25-year streak of doubling processor speed every 18 months. Today's microprocessors are tens of thousands of times faster than the Intel 4004. If processor speed was the answer to inefficient software, the issue would have been resolved and long forgotten. Yet, software efficiency is still a concern with most development organizations. Why? TEAMFLY Team-Fly®

xii Imagine that you are trying to sell your product, say a Web application server, to a Fortune 500 company. They need 600 transactions per second to run their business online. Your application server can support only 60 transactions per second before running out of steam on the latest and greatest server hardware. If the customer is to use your software, they need to string together a cluster of at least 10 servers to reach their 600-transaction per second goal, raising the cost of your solution in terms of hardware, software licenses, network administration, and maintenance. To make matters worse, the customer has invited two of your competitors to pitch their own solutions. If a competitor has a more efficient implementation, they will need less hardware to deliver the required performance, and they will offer a cheaper solution. The speed of the processor is a constant in this situation—the software vendors in this story compete over the same hardware. It is often the case that the most efficient solution wins the bid. You also must examine how processing speed compares to communication speed. If we can transmit data faster than the computer can generate it, then the computer (processor plus software) is the new bottleneck. The limits imposed by physics might soon put the brakes on the fantastic growth of processor speed [Lew1]. Not so for communication speed. Like processing speed, communication speed has enjoyed phenomenal growth. Back in 1970, 4800 bits per second was considered high-speed communication. Today, hundreds of megabits per second is common. The end of the road for communication speed is nowhere in sight [Lew2]. Optical communication technology does not seem to have show-stopping technological roadblocks that will threaten progress in the near future. Several research labs are already experimenting with 100-gigabitper-second all-optical networking. The biggest obstacle currently is not of a technical nature; it is the infrastructure. High-speed networking necessitates the rewiring of the information society from copper cables to optical fiber. This campaign is already underway. Communication adapters are already faster than the computing devices attached to them. Emerging network technologies such as 100 Mbps LAN adapters and high-speed ATM switches make computer speed critical. In the past, inefficient software has been masked by slow links. Popular communication protocols such as SNA and TCP/IP could easily overwhelm a 16 Mbps token ring adapter, leaving software performance bottlenecks undetected. Not so with 100 Mbps FDDI or Fast Ethernet. If 1,000 instructions are added to the protocol's send/receive path, they may not degrade throughput on a token ring connection because the protocol implementation can still pump data faster than the token ring can consume it. But an extra 1,000 instructions show up instantly as degraded throughput on a Fast Ethernet adapter. Today, very few computers are capable of saturating a high-speed link, and it is only going to get more difficult. Optical communication technology is now surpassing the growth rate of microprocessor speed. The computer (processor plus software) is quickly becoming the new bottleneck, and it's going to stay that way. To make a long story short, software performance is important and always will be. This one is not going away. As processor and communication technology march on, they redefine what "fast" means. They give rise to a new breed of bandwidth- and cycle-hungry applications that push the boundaries of technology. You never have enough horsepower. Software efficiency now becomes even more crucial than before. Whether the growth of processor speed is coming to an end or not, it will definitely trail communication speed. This puts the efficiency burden on the software. Further advances in execution speed will depend heavily on the efficiency of the software, not just the processor. Terminology Before moving on, here are a few words to clarify the terminology. "Performance" can stand for various metrics, the most common ones being space efficiency and time efficiency. Space efficiency seeks to minimize the use of memory in a software solution. Likewise, time efficiency seeks to minimize the use of processor cycles. Time efficiency is often represented in terms of response time and throughput. Other metrics include compile time and executable size. The rapidly falling price of memory has moved the topic of space efficiency for its own sake to the back burner. Desktop PCs with plenty of RAM (Random Access Memory) are common. Corporate customers

xiii are not that concerned about space issues these days. In our work with customers we have encountered concerns with run-time efficiency for the most part. Since customers drive requirements, we will adopt their focus on time efficiency. From here on, we will restrict performance to its time-efficiency interpretation. Generally we will look at space considerations only when they interfere with run-time performance, as in caching and paging. In discussing time efficiency, we will often mention the terms "pathlength" and "instruction count" interchangeably. Both stand for the number of assembler language instructions generated by a fragment of code. In a RISC architecture, if a code fragment exhibits a reasonable "locality of reference" (i.e., cache hits), the ratio between instruction counts and clock cycles will approximate one. On CISC architectures it may average two or more, but in any event, poor instruction counts always indicate poor execution time, regardless of processor architecture. A good instruction count is necessary but not sufficient for high performance. Consequently, it is a crude performance indicator, but still useful. It will be used in conjunction with time measurements to evaluate efficiency. Organization of This Book We start the performance tour close to home with a real-life example. Chapter 1 is a war story of C++ code that exhibited atrocious performance, and what we did to resolve it. This example will drive home some performance lessons that might very well apply to diverse scenarios. Object-oriented design in C++ might harbor a performance cost. This is what we pay for the power of OO support. The significance of this cost, the factors affecting it, and how and when you can get around it are discussed in Chapters 2, 3, and 4. Chapter 5 is dedicated to temporaries. The creation of temporary objects is a C++ feature that catches new C++ programmers off guard. C programmers are not used to the C compiler generating significant overhead "under the covers." If you aim to write high-efficiency C++, it is essential that you know when temporaries are generated by the C++ compiler and how to avoid them. Memory management is the subject of Chapters 6 and 7. Allocating and deallocating memory on the fly is expensive. Functions such as new() and delete() are designed to be flexible and general. They deal with variable-sized memory chunks in a multithreaded environment. As such, their speed is compromised. Oftentimes, you are in a position to make simplifying assumptions about your code that will significantly boost the speed of memory allocation and deallocation. These chapters will discuss several simplifying assumptions that can be made and the efficient memory managers that are designed to leverage them. Inlining is probably the second most popular performance tip, right after passing objects by reference. It is not as simple as it sounds. The inline keyword, just like register, is just a hint that the compiler often ignores. Situations in which inline is likely to be ignored and other unexpected consequences are discussed in Chapters 8, 9, and 10. Performance, flexibility, and reuse seldom go hand-in-hand. The Standard Template Library is an attempt to buck that trend and to combine these three into a powerful component. We will examine the performance of the STL in Chapter 11. Reference counting is a technique often used by experienced C++ programmers. You cannot dedicate a book to C++ performance without coverage of this technique, discussed in Chapter 12. Software performance cannot always be salvaged by a single "silver bullet" fix. Performance degradation is often a result of many small local inefficiencies, each of which is insignificant by itself. It is the combination that results in a significant degradation. Over the years, while resolving many performance bugs in various C++ products, we have come to identify certain bugs that seem to float to the surface

1 Chapter 1. The Tracing War Story Every software product we have ever worked on contained tracing functionality in one form or another. Any time your source code exceeds a few thousand lines, tracing becomes essential. It is important for debugging, maintaining, and understanding execution flow of nontrivial software. You would not expect a trace discussion in a performance book but the reality is, on more than one occasion, we have run into severe performance degradation due to poor implementations of tracing. Even slight inefficiencies can have a dramatic effect on performance. The goal of this chapter is not necessarily to teach proper trace implementation, but to use the trace vehicle to deliver some important performance principles that often surface in C++ code. The implementation of trace functionality runs into typical C++ performance obstacles, which makes it a good candidate for performance discussion. It is simple and familiar. We don't have to drown you in a sea of irrelevant details in order to highlight the important issues. Yet, simple or not, trace implementations drive home many performance issues that you are likely to encounter in any random fragment of C++ code. Many C++ programmers define a simple Trace class to print diagnostic information to a log file. Programmers can define a Trace object in each function that they want to trace, and the Trace class can write a message on function entry and function exit. The Trace objects will add extra execution overhead, but they will help a programmer find problems without using a debugger. If your C++ code happens to be embedded as native code in a Java program, using a Java debugger to trace your native code would be a challenge. The most extreme form of trace performance optimization would be to eliminate the performance cost altogether by embedding trace calls inside #ifdef blocks: #ifdef TRACE Trace t("myFuction"); // Constructor takes a function name argument t.debug("Some information message"); #endif The weakness of the #ifdef approach is that you must recompile to turn tracing on and off. This is definitely something your customers will not be able to do unless you jump on the free software bandwagon and ship them your source code. Alternatively, you can control tracing dynamically by communicating with the running program. The Trace class implementation could check the trace state prior to logging any trace information: void Trace::debug(string &msg) { if (traceIsActive) { // log message here } } We don't care about performance when tracing is active. It is assumed that tracing will be turned on only during problem determination. During normal operation, tracing would be inactive by default, and we expect our code to exhibit peak performance. For that to happen, the trace overhead must be minimal. A typical trace statement will look something along the lines of t.debug("x = " + itoa(x)); // itoa() converts an int to ascii This typical statement presents a serious performance problem. Even when tracing is off, we still must create the string argument that is passed in to the debug() function. This single statement hides substantial computation: