1 Figure 1.3.Impact of conditional creation of the string member 3.500 2.500 2 55 185 Version o Version 1 Version 2 Version 3 So we have arrived.We took the Trace implemeniation from 3,500 ms down to 185 ms.You may still en addone t all This an to its m victory?The immediatel.The addition of any code to ade would have a profound effect on its execution time.If nly de stroct e Tace would have been closer to being negligible. The functions that make xcellentcandidates for inininare precisely the ones that are bad candidates for tracing.It follows that Tace objects should not be added to small,frequently executed functions. Key Points ect definitionsrer sientnhWe ent ove ays then they would be (would d). Awe have sn a ehcaatrctorsd som Tean7 Fly
7 } } Another measurement has shown a significant performance improvement. Response time has dropped from 2,500 ms to 185 ms (see Figure 1.3). Figure 1.3. Impact of conditional creation of the string member. So we have arrived. We took the Trace implementation from 3,500 ms down to 185 ms. You may still contend that 185 ms looks pretty bad compared to a 55-ms execution time when addOne had no tracing logic at all. This is more than 3x degradation. So how can we claim victory? The point is that the original addOne function (without trace) did very little. It added one to its input argument and returned immediately. The addition of any code to addOne would have a profound effect on its execution time. If you add four instructions to trace the behavior of only two instructions, you have tripled your execution time. Conversely, if you increase by four instructions an execution path already containing 200, you have only degraded execution time by 2%. If addOne consisted of more complex computations, the addition of Trace would have been closer to being negligible. In some ways, this is similar to inlining. The influence of inlining on heavyweight functions is negligible. Inlining plays a major role only for simple functions that are dominated by the call and return overhead. The functions that make excellent candidates for inlining are precisely the ones that are bad candidates for tracing. It follows that Trace objects should not be added to small, frequently executed functions. Key Points • Object definitions trigger silent execution in the form of object constructors and destructors. We call it "silent execution" as opposed to "silent overhead" because object construction and destruction are not usually overhead. If the computations performed by the constructor and destructor are always necessary, then they would be considered efficient code (inlining would alleviate the cost of call and return overhead). As we have seen, constructors and destructors do not always have such "pure" characteristics, and they can create significant overhead. In some TEAMFLY Team-Fly®
by the (and/or destructor)are d.We emlesomamihCbetaleiaccoanCoand6saarCStp6meuaecisueHowevcr,itis elps,but it wo .Don't waste effort on computations whose results are not likely to be used.When tracing is off,the domandesgnle inte and more efficiently.than a .Inline.Eliminate the function call overhead that comes with small,frequently invoked function calls.Inlining the Trace constructor and destructor makes it easier to digest the Trace overhea
8 situations, computations performed by the constructor (and/or destructor) are left unused. We should also point out that this is more of a design issue than a C++ language issue. However, it is seen less often in C because it lacks constructor and destructor support. • Just because we pass an object by reference does not guarantee good performance. Avoiding object copy helps, but it would be helpful if we didn't have to construct and destroy the object in the first place. • Don't waste effort on computations whose results are not likely to be used. When tracing is off, the creation of the string member is worthless and costly. • Don't aim for the world record in design flexibility. All you need is a design that's sufficiently flexible for the problem domain. A char pointer can sometimes do the simple jobs just as well, and more efficiently, than a string. • Inline. Eliminate the function call overhead that comes with small, frequently invoked function calls. Inlining the Trace constructor and destructor makes it easier to digest the Trace overhead
Chapter 2.Constructors and Destructors r dedicated to th nndehe cieanup(sx1); would be accomplished in C++by x x1; flexible and too generic for the problem domain.They may perform computations that are rarelyorneve required.In practic e anc reuse and performance Ineritance and composition involve code reuse Oftentimes.reusable codewill you rea Inheritance and co iod to destructors.We drive this iscussion with a practical example:the e implemen ation of cts appear in varied forms.The three most common ones are semaphore.mutex,and critical section. provides It allows multiple threads toccess a sharedr e up to a semaphore caled (UTua EXcusion mute prote shared resoures bowin oand Take a shared queue.for example.The number of elements in the queue is manipulated by both
9 Chapter 2. Constructors and Destructors In an ideal world, there would never be a chapter dedicated to the performance implications of constructors and destructors. In that ideal world, constructors and destructors would have no overhead. They would perform only mandatory initialization and cleanup, and the average compiler would inline them. C code such as { struct X x1; init(&x1); ... cleanup(&x1); } would be accomplished in C++ by: { X x1; ... } and the cost would be identical. That's the theory. Down here in the trenches of software development, the reality is a little different. We often encounter inheritance and composition implementations that are too flexible and too generic for the problem domain. They may perform computations that are rarely or never required. In practice, it is not surprising to discover performance overhead associated with inheritance and composition. This is a limited manifestation of a bigger issue—the fundamental tension between code reuse and performance. Inheritance and composition involve code reuse. Oftentimes, reusable code will compute things you don't really need in a specific scenario. Any time you call functions that do more than you really need, you will take a performance hit. Inheritance Inheritance and composition are two ways in which classes are tied together in an object-oriented design. In this section we want to examine the connection between inheritance-based designs and the cost of constructors and destructors. We drive this discussion with a practical example: the implementation of thread synchronization constructs.[1] In multithreaded applications, you often need to provide thread synchronization to restrict concurrent access to shared resources. Thread synchronization constructs appear in varied forms. The three most common ones are semaphore, mutex, and critical section. [1] Chapter 15 provides more information on the fundamental concepts and terminology of multithreaded programming. A semaphore provides restricted concurrency. It allows multiple threads to access a shared resource up to a given maximum. When the maximum number of concurrent threads is set to 1, we end up with a special semaphore called a mutex (MUTual EXclusion). A mutex protects shared resources by allowing one and only one thread to operate on the resource at any one time. A shared resource typically is manipulated in separate code fragments spread over the application's code. Take a shared queue, for example. The number of elements in the queue is manipulated by both enqueue() and dequeue() routines. Modifying the number of elements should not be done simultaneously by multiple threads for obvious reasons
Types dequeue() get_the_lock(queueLock); numberofElements--; release the lock(queueLock); 。 void enqueue(const Types value) numberofElements++; release_the_lock(queueLock); g a wron唱 atomically. gunle mers thercn Upom cu uecotheheo the thread releases the lock to st poi int out that the e Win32 definiti ant tha Win32, s of on ngmentsofwhiehl an e at any one to a si running ir no affect.We are just pointing it out to avoid confusion get_the-lock(CSLock) /cr ica】section begins ends In the degueue (example it is pretty easy to inspect the code and verify that every lock operation is ing unlock.In practice we have se n routines that consis nents.As you can im maintenance nightmare and a sure bug waiting to surface.Large projects may have scores of people writing code and ugs. .you may overloo the ra is thrown while a lock is held,you'll have to catch the exception and manually release the lock.Not very elegant. C+ which it was defined.its destuctor is called autom pe for ce problem.E 09 in an object and le unction scope
10 Type& dequeue() { get_the_lock(queueLock); ... numberOfElements--; ... release_the_lock(queueLock); ... } void enqueue(const Type& value) { get_the_lock(queueLock); ... numberOfElements++; ... release_the_lock(queueLock); } If both enqueue() and dequeue() could modify numberOfElements concurrently, we easily could end up with numberOfElements containing a wrong value. Modifying this variable must be done atomically. The simplest application of a mutex lock appears in the form of a critical section. A critical section is a single fragment of code that should be executed only by one thread at a time. To achieve mutual exclusion, the threads must contend for the lock prior to entering the critical section. The thread that succeeds in getting the lock enters the critical section. Upon exiting the critical section,[2] the thread releases the lock to allow other threads to enter. [2] We must point out that the Win32 definition of critical section is slightly different than ours. In Win32, a critical section consists of one or more distinct code fragments of which one, and only one, can execute at any one time. The difference between a critical section and a mutex in Win32 is that a critical section is confined to a single process, whereas mutex locks can span process boundaries and synchronize threads running in separate processes. The inconsistency between our use of the terminology and that of Win32 will not affect our C++ discussion. We are just pointing it out to avoid confusion. get_the_lock(CSLock); { // Critical section begins ... // Protected computation } // Critical section ends release_the_lock(CSLock); In the dequeue() example it is pretty easy to inspect the code and verify that every lock operation is matched with a corresponding unlock. In practice we have seen routines that consisted of hundreds of lines of code containing multiple return statements. If a lock was obtained somewhere along the way, we had to release the lock prior to executing any one of the return statements. As you can imagine, this was a maintenance nightmare and a sure bug waiting to surface. Large-scale projects may have scores of people writing code and fixing bugs. If you add a return statement to a 100-line routine, you may overlook the fact that a lock was obtained earlier. That's problem number one. The second one is exceptions: If an exception is thrown while a lock is held, you'll have to catch the exception and manually release the lock. Not very elegant. C++ provides an elegant solution to those two difficulties. When an object reaches the end of the scope for which it was defined, its destructor is called automatically. You can utilize the automatic destruction to solve the lock maintenance problem. Encapsulate the lock in an object and let the constructor obtain the lock. The destructor will release the lock automatically. If such an object is defined in the function scope
of a 100-line routine.you no longer have to worry about multiple return statements.The compiler inserts a call to the lock destructor prior to each return statement and the lock is always releasec Using the constructor-destructor pair to acquire and release a shared resource IES90 Lip96CI leads to lock class implementations such as the following: utex lock(sthekey): ~Lock()pthread mutex unlock(&thekey);) privtniead mutext stherey ovides multiple favors of synchronization.The flavors levelA semaphore allows multiple threads to share a resouree up to a given holds the lock Othe structs will deadlock on this lock-nesting. ion cor otify all fast enough and the resource has already been acquired.A more efficient notification scheme will many threads to read a protected value but allow only one to modify Kernel/User space Some synchronization mechanisms are available only in kernel space Although thes constructs differ significantly in semantic and performance,they all share ntmentation that ooked s hy of oc temp ng.t eritance one pro virtual ~BaseLock()() ) The BaseLock class as you can tell doesn't do much.Its constructor and destructor are empty.The Basek class was intended as a root class for the various lock elasses that were expected to be derived from it.These distinct flavors would naturally be implemented as distinct subclasses derivation was the MutexLock class MutexLock public BaseLock pveadetherey:
11 of a 100-line routine, you no longer have to worry about multiple return statements. The compiler inserts a call to the lock destructor prior to each return statement and the lock is always released. Using the constructor-destructor pair to acquire and release a shared resource [ES90, Lip96C] leads to lock class implementations such as the following: class Lock { public: Lock(pthread_mutex_t& key) : theKey(key) { pthread_mutex_lock(&theKey); } ~Lock() { pthread_mutex_unlock(&theKey); } private: pthread_mutex_t &theKey; }; A programming environment typically provides multiple flavors of synchronization constructs. The flavors you may encounter will vary according to • Concurrency level A semaphore allows multiple threads to share a resource up to a given maximum. A mutex allows only one thread to access a shared resource. • Nesting Some constructs allow a thread to acquire a lock when the thread already holds the lock. Other constructs will deadlock on this lock-nesting. • Notify When the resource becomes available, some synchronization constructs will notify all waiting threads. This is very inefficient as all but one thread wake up to find out that they were not fast enough and the resource has already been acquired. A more efficient notification scheme will wake up only a single waiting thread. • Reader/Writer locks Allow many threads to read a protected value but allow only one to modify it. • Kernel/User space Some synchronization mechanisms are available only in kernel space. • Inter/Intra process Typically, synchronization is more efficient among threads of the same process than threads of distinct processes. Although these synchronization constructs differ significantly in semantics and performance, they all share the same lock/unlock protocol. It is very tempting, therefore, to translate this similarity into an inheritancebased hierarchy of lock classes that are rooted in a unifying base class. In one product we worked on, initially we found an implementation that looked roughly like this: class BaseLock { public: // (The LogSource object will be explained shortly) BaseLock(pthread_mutex_t &key, LogSource &lsrc) {}; virtual ~BaseLock() {}; }; The BaseLock class, as you can tell, doesn't do much. Its constructor and destructor are empty. The BaseLock class was intended as a root class for the various lock classes that were expected to be derived from it. These distinct flavors would naturally be implemented as distinct subclasses of BaseLock. One derivation was the MutexLock: class MutexLock : public BaseLock { public: MutexLock (pthread_mutex_t &key, LogSource &lsrc); ~MutexLock(); private: pthread_mutex_t &theKey;