当前位置：和泉文库 > 计算机 > 浏览文档

《电脑编程》教学参考书籍文献（C++编程书籍）Addison Wesley - Effcient C++ Programming Techniques

文件格式：PDF，文件大小：1.67MB，售价：29.92元

文档详细内容（约205页）

7 } } Another measurement has shown a significant performance improvement. Response time has dropped from 2,500 ms to 185 ms (see Figure 1.3). Figure 1.3. Impact of conditional creation of the string member. So we have arrived. We took the Trace implementation from 3,500 ms down to 185 ms. You may still contend that 185 ms looks pretty bad compared to a 55-ms execution time when addOne had no tracing logic at all. This is more than 3x degradation. So how can we claim victory? The point is that the original addOne function (without trace) did very little. It added one to its input argument and returned immediately. The addition of any code to addOne would have a profound effect on its execution time. If you add four instructions to trace the behavior of only two instructions, you have tripled your execution time. Conversely, if you increase by four instructions an execution path already containing 200, you have only degraded execution time by 2%. If addOne consisted of more complex computations, the addition of Trace would have been closer to being negligible. In some ways, this is similar to inlining. The influence of inlining on heavyweight functions is negligible. Inlining plays a major role only for simple functions that are dominated by the call and return overhead. The functions that make excellent candidates for inlining are precisely the ones that are bad candidates for tracing. It follows that Trace objects should not be added to small, frequently executed functions. Key Points • Object definitions trigger silent execution in the form of object constructors and destructors. We call it "silent execution" as opposed to "silent overhead" because object construction and destruction are not usually overhead. If the computations performed by the constructor and destructor are always necessary, then they would be considered efficient code (inlining would alleviate the cost of call and return overhead). As we have seen, constructors and destructors do not always have such "pure" characteristics, and they can create significant overhead. In some TEAMFLY Team-Fly®

9 Chapter 2. Constructors and Destructors In an ideal world, there would never be a chapter dedicated to the performance implications of constructors and destructors. In that ideal world, constructors and destructors would have no overhead. They would perform only mandatory initialization and cleanup, and the average compiler would inline them. C code such as { struct X x1; init(&x1); ... cleanup(&x1); } would be accomplished in C++ by: { X x1; ... } and the cost would be identical. That's the theory. Down here in the trenches of software development, the reality is a little different. We often encounter inheritance and composition implementations that are too flexible and too generic for the problem domain. They may perform computations that are rarely or never required. In practice, it is not surprising to discover performance overhead associated with inheritance and composition. This is a limited manifestation of a bigger issue—the fundamental tension between code reuse and performance. Inheritance and composition involve code reuse. Oftentimes, reusable code will compute things you don't really need in a specific scenario. Any time you call functions that do more than you really need, you will take a performance hit. Inheritance Inheritance and composition are two ways in which classes are tied together in an object-oriented design. In this section we want to examine the connection between inheritance-based designs and the cost of constructors and destructors. We drive this discussion with a practical example: the implementation of thread synchronization constructs.[1] In multithreaded applications, you often need to provide thread synchronization to restrict concurrent access to shared resources. Thread synchronization constructs appear in varied forms. The three most common ones are semaphore, mutex, and critical section. [1] Chapter 15 provides more information on the fundamental concepts and terminology of multithreaded programming. A semaphore provides restricted concurrency. It allows multiple threads to access a shared resource up to a given maximum. When the maximum number of concurrent threads is set to 1, we end up with a special semaphore called a mutex (MUTual EXclusion). A mutex protects shared resources by allowing one and only one thread to operate on the resource at any one time. A shared resource typically is manipulated in separate code fragments spread over the application's code. Take a shared queue, for example. The number of elements in the queue is manipulated by both enqueue() and dequeue() routines. Modifying the number of elements should not be done simultaneously by multiple threads for obvious reasons

10 Type& dequeue() { get_the_lock(queueLock); ... numberOfElements--; ... release_the_lock(queueLock); ... } void enqueue(const Type& value) { get_the_lock(queueLock); ... numberOfElements++; ... release_the_lock(queueLock); } If both enqueue() and dequeue() could modify numberOfElements concurrently, we easily could end up with numberOfElements containing a wrong value. Modifying this variable must be done atomically. The simplest application of a mutex lock appears in the form of a critical section. A critical section is a single fragment of code that should be executed only by one thread at a time. To achieve mutual exclusion, the threads must contend for the lock prior to entering the critical section. The thread that succeeds in getting the lock enters the critical section. Upon exiting the critical section,[2] the thread releases the lock to allow other threads to enter. [2] We must point out that the Win32 definition of critical section is slightly different than ours. In Win32, a critical section consists of one or more distinct code fragments of which one, and only one, can execute at any one time. The difference between a critical section and a mutex in Win32 is that a critical section is confined to a single process, whereas mutex locks can span process boundaries and synchronize threads running in separate processes. The inconsistency between our use of the terminology and that of Win32 will not affect our C++ discussion. We are just pointing it out to avoid confusion. get_the_lock(CSLock); { // Critical section begins ... // Protected computation } // Critical section ends release_the_lock(CSLock); In the dequeue() example it is pretty easy to inspect the code and verify that every lock operation is matched with a corresponding unlock. In practice we have seen routines that consisted of hundreds of lines of code containing multiple return statements. If a lock was obtained somewhere along the way, we had to release the lock prior to executing any one of the return statements. As you can imagine, this was a maintenance nightmare and a sure bug waiting to surface. Large-scale projects may have scores of people writing code and fixing bugs. If you add a return statement to a 100-line routine, you may overlook the fact that a lock was obtained earlier. That's problem number one. The second one is exceptions: If an exception is thrown while a lock is held, you'll have to catch the exception and manually release the lock. Not very elegant. C++ provides an elegant solution to those two difficulties. When an object reaches the end of the scope for which it was defined, its destructor is called automatically. You can utilize the automatic destruction to solve the lock maintenance problem. Encapsulate the lock in an object and let the constructor obtain the lock. The destructor will release the lock automatically. If such an object is defined in the function scope

11 of a 100-line routine, you no longer have to worry about multiple return statements. The compiler inserts a call to the lock destructor prior to each return statement and the lock is always released. Using the constructor-destructor pair to acquire and release a shared resource [ES90, Lip96C] leads to lock class implementations such as the following: class Lock { public: Lock(pthread_mutex_t& key) : theKey(key) { pthread_mutex_lock(&theKey); } ~Lock() { pthread_mutex_unlock(&theKey); } private: pthread_mutex_t &theKey; }; A programming environment typically provides multiple flavors of synchronization constructs. The flavors you may encounter will vary according to • Concurrency level A semaphore allows multiple threads to share a resource up to a given maximum. A mutex allows only one thread to access a shared resource. • Nesting Some constructs allow a thread to acquire a lock when the thread already holds the lock. Other constructs will deadlock on this lock-nesting. • Notify When the resource becomes available, some synchronization constructs will notify all waiting threads. This is very inefficient as all but one thread wake up to find out that they were not fast enough and the resource has already been acquired. A more efficient notification scheme will wake up only a single waiting thread. • Reader/Writer locks Allow many threads to read a protected value but allow only one to modify it. • Kernel/User space Some synchronization mechanisms are available only in kernel space. • Inter/Intra process Typically, synchronization is more efficient among threads of the same process than threads of distinct processes. Although these synchronization constructs differ significantly in semantics and performance, they all share the same lock/unlock protocol. It is very tempting, therefore, to translate this similarity into an inheritancebased hierarchy of lock classes that are rooted in a unifying base class. In one product we worked on, initially we found an implementation that looked roughly like this: class BaseLock { public: // (The LogSource object will be explained shortly) BaseLock(pthread_mutex_t &key, LogSource &lsrc) {}; virtual ~BaseLock() {}; }; The BaseLock class, as you can tell, doesn't do much. Its constructor and destructor are empty. The BaseLock class was intended as a root class for the various lock classes that were expected to be derived from it. These distinct flavors would naturally be implemented as distinct subclasses of BaseLock. One derivation was the MutexLock: class MutexLock : public BaseLock { public: MutexLock (pthread_mutex_t &key, LogSource &lsrc); ~MutexLock(); private: pthread_mutex_t &theKey;

点击进入文档下载页（PDF格式）

共205页，可试读40页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录