2 1.62 1.5 1.01 1.0 05 0 Version 1 Version 2 Version 3 More ofen than not nd destruction of objects will exacta performance penalty.Inan obiect destruction.It follows that the cost associated with an obicct is direculy related to the engt and ea8esRceooesoekdmdnrdoesmpwdo This is not to say that inheritance is fundamentally a performance obstacle.We must make a distinction between the overall c computational cost is the set of all instructin compaona pcnaltyThe ove that subse r implementation.1o make the pount more concrete,ers t he class as n class SimpleMutex public: SimpleMutex(pthread mutex t&lock):myLock(lock)(acquire();) -simpleMutex()(release();) int release()(return pthread_mutex_unlock(smyLock);) pthread_mutex_t&myLock; The simpleMutex constructor harbors the following overall computational cost: .Initialize the myLock member Call the acquire()method Invoke pthread_mutex_lock (with the acquire()method member,is computation TearTFly
17 More often than not, the creation and destruction of objects will exact a performance penalty. In an inheritance hierarchy, the creation of an object will trigger the creation of its ancestors. The same goes for object destruction. It follows that the cost associated with an object is directly related to the length and complexity of its derivation chain. The number of objects created (and later destroyed) is proportional to the complexity of the derivation. This is not to say that inheritance is fundamentally a performance obstacle. We must make a distinction between the overall computational cost, required cost, and computational penalty. The overall computational cost is the set of all instructions executed in a computation. The required cost is that subset of instructions whose results are necessary. This part of the computation is mandatory; computational penalty is the rest. This is the part of the computation that could have been eliminated by an alternative design or implementation. To make the point more concrete, let's look at the SimpleMutex class as an example: class SimpleMutex { public: SimpleMutex(pthread_mutex_t& lock) : myLock(lock) {acquire();} ~SimpleMutex() {release();} private: int acquire() {return pthread_mutex_lock(&myLock);} int release() {return pthread_mutex_unlock(&myLock);} pthread_mutex_t& myLock; }; The SimpleMutex constructor harbors the following overall computational cost: • Initialize the myLock member • Call the acquire() method • Invoke pthread_mutex_lock() with the acquire() method The third item is the required cost. Regardless of the design options, one way or another, you would have to call pthread_mutex_lock() in order to lock the resource. The first item, setting the myLock member, is computational penalty. It is our object-based design that forced us to do that. Failing to inline the acquire() call would result in additional penalty. In practice, more than likely compilers will eliminate that penalty by inlining. TEAMFLY Team-Fly®
So we cannot make a blanket statement that complex inheritance designs are necessarily bad,nor do they it isall required cost.In practice.inheritance hierarchies are not likely to be perfectn that case,they are likely to impose a computational penalty. Composition Like inheritance,object composition raises similar performance concems with regard to object creation nan object is created (destroyed).all its contained member objects must be creaed instance,the Trace object discussed in the previous cl apter contains a member class Trace string theFunctionName; 1; When you construct an object of typeace,the constructor invokes thesng constructor to initialize the string data member.Ihis beh s a me e tn on of B obje igger tha attributes,the composition hierarchy generates a simple list The overall cost of composition is ther efore related to the funetionofhe coned obet.then there is noverhead.It is just what you have to do.On the othe hand,the Trace example (as shown in the previous chapter)is one example where"ost"and"overhead dthe power of 0 the name of replaced it with a cha inter.Achar pointer is much cheaper to construct than.Sadly there are segments of the C++programming community that seem toequ te failure to use the antra:"Use n唱ap The creation and destruction of contained objects is another issue worth consideration.There is no way for ct is created (destroyed). omatically Impose ile class Trace (const char name) private string theFunctionName; 伊
18 So we cannot make a blanket statement that complex inheritance designs are necessarily bad, nor do they always carry a performance penalty. All we can say is that overall cost grows with the size of the derivation tree. If all those computations are valuable then it is all required cost. In practice, inheritance hierarchies are not likely to be perfect.[3] In that case, they are likely to impose a computational penalty. [3] In this context, software perfection means you compute what you need, all of what you need, and nothing but what you need. Composition Like inheritance, object composition raises similar performance concerns with regard to object creation and destruction. When an object is created (destroyed), all its contained member objects must be created (destroyed) as well. For instance, the Trace object discussed in the previous chapter contains a member object of class string. class Trace { ... string theFunctionName; }; When you construct an object of type Trace, the constructor invokes the string constructor to initialize the string data member. This behavior is recursive: If class A contains a member of class B and B contains a C, the constructor for A will invoke the construction of a B object which, in turn, will trigger that of a C object. Since at each level in the composition (containment) hierarchy there may be multiple attributes, the composition hierarchy generates a tree, not a simple list. The overall cost of composition is therefore related to the size of this composition tree, which can become quite large. Again, we must emphasize that "cost" does not necessarily mean "overhead." If the program needs the full-blown functionality of the contained object, then there is no overhead. It is just what you have to do. On the other hand, the Trace example (as shown in the previous chapter) is one example where "cost" and "overhead" actually coincided. We did not need the power of a string object to represent the name of the function that is being traced. We never did anything too sophisticated with that object. We easily could have replaced it with a char pointer. A char pointer is much cheaper to construct than a string object. Sadly, there are segments of the C++ programming community that seem to equate failure to use the more complex data types of C++ with a lack of programming sophistication, which results in overkill rather than adherence to a prime mantra: "Use a solution that is as simple as possible and no simpler." You can imagine that in complex hierarchies with large derivation and composition trees, the cost of constructing and destroying an object can skyrocket. This is something to keep in mind during the design phase if there is a good chance that this hierarchy may come to life during a performance-sensitive flow. The creation and destruction of contained objects is another issue worth consideration. There is no way for you to prevent the creation (destruction) of subobjects when the containing object is created (destroyed). This is automatically imposed by the compiler. Take our earlier Trace example: class Trace { public: Trace (const char *name); ... private: string theFunctionName; };
The creation of arace object will create the string subobject.The Trace destructor will similarly we can replace class Trace pub Trace (const char *name); private string *theFunctionNamet ) Now we are in command of creation and destruction of the string obiect.We still have the option of performing a full-blown initialization.This form will construct a newsn object and set a pointer to it: Trace::Trace(char *name):theFunctionName(new string(name)) We also have the option of a partial initialization.Weassign an invalid default value to these pointers were rarely us Trace:Trace (const char *name):theFunctionName(0) -new string(name) The first and second form were very inefficient because ofthe usage pattern.In the typical scenario tracing construct a The third fom was b dictates which fom is most efficient Ifour sage nattern was such that tracing is always on.the first form CSoatimabobictiotpoiteSlwoldbebecstt ould be more efficient to embed the st ng object in cace object since ume stac and is freeddurng the stack clanu part of a function cal retum. Lazy Construction Performance optimization often has to strike a delicate balance between competing forces.This is perhaps why it is refe ea to pe ance maxim ls such cost.and reuse must often give way to the demand for performance.It is unusual when a performance fix. n an otherwise high-quality pice c does no f mpromise any other softw are development goa ithout any sacrifice.The first optimization eare abou to show isone such the creation (and eventual destruction)of objects from code paths where they are never used g
19 The creation of a Trace object will create the string subobject. The Trace destructor will similarly destroy it. It is automatic and you cannot prevent it with this implementation. To gain better control over the creation and destruction of a subobject, we can replace it with a pointer: class Trace { public: Trace (const char *name); // ... private: string *theFunctionName; }; Now we are in command of creation and destruction of the string object. We still have the option of performing a full-blown initialization. This form will construct a new string object and set a pointer to it: Trace::Trace(char *name) : theFunctionName(new string(name)) { ... } We also have the option of a partial initialization. We assign an invalid default value to these pointers indicating that these objects must be constructed prior to being used. Our anticipated usage pattern was such that tracing was normally off and the Trace objects created were rarely used. Minimizing the overhead of creation and destruction was therefore critical: Trace::Trace (const char *name) : theFunctionName(0) { if (traceIsActive) { theFunctionName = new string(name); ... } } The first and second form were very inefficient because of the usage pattern. In the typical scenario tracing would be off and the effort to construct a string object is worthless. The third form was best as assigning zero to a pointer is cheaper than constructing a brand-new object. Note that it is the usage pattern that dictates which form is most efficient. If our usage pattern was such that tracing is always on, the first form (containing subobjects, not pointers) would be best. It would be more efficient to embed the string object in the Trace object since it would consume stack memory as opposed to heap memory. Heap memory is far more expensive to allocate and free. Stack-based memory is allocated at compile time and is freed during the stack cleanup part of a function call return. Lazy Construction Performance optimization often has to strike a delicate balance between competing forces. This is perhaps why it is referred to as optimization as opposed to performance maximization. Performance optimization often requires the sacrifice of some other software goal. Important goals such as flexibility, maintainability, cost, and reuse must often give way to the demand for performance. It is unusual when a performance fix, in an otherwise high-quality piece of code, does not compromise any other software development goals. Sometimes we are fortunate when the elimination of simple coding mistakes results in higher performance without any sacrifice. The first optimization we are about to show is one such example. It eliminates the creation (and eventual destruction) of objects from code paths where they are never used
版点e routine. void myFunction() int i: intji compute(i,j) return; In C++.that habit of aut atically defir bcec5thayoucndpotusngyTsh all obied ens in r that allocated and deallocated memory on the fly: class DataPacket int sz):size(sz)( buffer new char[s2]; private members 1; and dealloca thd ne西om"sud,doo void routeData(char *data,int size) s12e) on ( if'(UPSTREAM - ion //data going upstream else uteSomething(packet) Object packet was used if and only if data was going upstream,which was 50%of the time.When data headed dowr nthe ssope.Half the time.this was a perfect waste of computin cvcles packet was not used at al pac 20
20 If you are going to instantiate an object in a performance-critical path, you ought to consider the cost factors. The cheapest object, however, is the one that's never instantiated. In C, Pascal, and other popular languages, data types must be defined at the beginning of a code block. We get into the habit of defining all the variables needed in a routine right up front at the beginning of that routine: void myFunction() { int i; int j; ... compute(i,j) ... return; } In C++, that habit of automatically defining all objects up front could be wasteful—you may construct objects that you end up not using. This happens in practice. In our C++ code we had a class DataPacket that allocated and deallocated memory on the fly: class DataPacket { public: DataPacket(char *data, int sz) : size(sz) { buffer = new char[sz]; ... } ~DataPacket() { delete [] buffer; ... } ... // other public member functions private: char *buffer; ... // other private members }; Partly due to memory allocation and deallocation, DataPacket was expensive to construct and destroy. Its cost was upwards of 400 instructions, which was significant in our context. It was used in a performance-critical path that routed data from one adapter to another: void routeData(char *data, int size) { DataPacket packet(data, size); bool direction = get_direction(); ... // Some computation if (UPSTREAM == direction) {// data going upstream computeSomething(packet); } else { // data going downstream ... // packet is not used in this scope. } } Object packet was used if and only if data was going upstream, which was 50% of the time. When data headed downstream, packet was not used at all. Object packet, however, was constructed unconditionally at the beginning of the scope. Half the time, this was a perfect waste of computing cycles
The solution here,as you can imagine,was very easy.Object packet should be created if and only if it is actually needed.The definition of packet ought to be moved inside the scope that uses it,when data is going upstream yoid routeData(chardata,int) /Delete definition of packet bool direction -get_direction(); if(UPSTREAM-direction)(/data going upstream Datapacket packet (data,intion of packet /packe no ue this scope. 1 solete C dec happen in practice.Part of beinga better Cis its varable creation It is worth noting that the definitionof able at the beginning of any scope are allc ed in C as well.It taken with regard to the placement of object definitions. Redundant Construction Class person contains an obiect of class string Person (const char *s)(name -s;)/version 0 st ing name ) Consider the following implementation of theeson default constructor Person (const char *s)(name -s; The eson object must be initialized before the body of the erson constructor is executed.The con ined string object does not appear in the Person:Person(char*)initializa ation list (in this uted The
21 The solution here, as you can imagine, was very easy. Object packet should be created if and only if it is actually needed. The definition of packet ought to be moved inside the scope that uses it, when data is going upstream: void routeData(char *data, int size) { ... // Delete definition of packet bool direction = get_direction(); ... if (UPSTREAM == direction) {// data going upstream DataPacket packet(data, size); // Add definition of packet // here... computeSomething(packet); } else { // data going downstream ... // packet is not used in this scope. } } Although the practice of delaying variable creation until its first usage has been preached by the C++ language experts since the language inception, this coding mistake was detected in real product code. The compatibility between C++ and C has been touted as one of C++'s advantages, but this is an area where that compatibility creates stylistic discontinuity. Adherence to the now obsolete C declaration syntax in C++ can have significant performance costs. Unfortunately, it is a fact that such trivial mistakes often do happen in practice. Part of C++'s being a better C is its ability to delay variable creation. It is worth noting that the definition of variables at the beginning of any scope are allowed in C as well. It is just that C programmers tend to define all of the variables right at the beginning of the function scope, since there is no run-time cost in a C variable definition. This is not the case with C++ where care must be taken with regard to the placement of object definitions. Redundant Construction Along the lines of simple but costly coding mistakes, here's another example of pure overhead. It is the double construction of a contained object [Mey97 item 12], [Lip91]. Class Person contains an object of class string: class Person { public: Person (const char *s) { name = s; }// Version 0 ... private: string name; }; Consider the following implementation of the Person default constructor: Person (const char *s) { name = s; } The Person object must be initialized before the body of the Person constructor is executed. The contained string object does not appear in the Person::Person(char*) initialization list (in this case there was none). Hence, the compiler must insert a call to invoke the string default constructor, to initialize the string member object properly. The body of the Person constructor is then executed. The