ay n om Create a tempo from the char pointer retumed bya( 。Destroy all three s+ri from the ebug (call So we go to all this trouble to construct three temporarystng objects,and proceed to drop them all over the floor when w out that trace is ctive.The overhead of creating and de stroying those ould easily hed figment of our imagination.We have ac ctually e rienced it in a real oryofanatcnproadacgcanctodeteinto ofa half million lines of C++code.Our first attempt backfired due to atrocious performance. Our Initial Trace Implementation int myFunction(int x) "myFunction" string moreInfo -"more interesting info"; debug (moreInfo) );//Trace destructor logs exit event to an output stream To enable this usage we started out with the following implementation class Trace public ce (const string sname); Hebug (const string smsg) string theFunctionName; ) The Trace constructor stores the function's name. :ace(cons ring)thePunctionlame (nAme) if (TraceIsActive)( cout <"Enter function"<<name <endli Additional information messages are logged via calls to the debug (method
2 • Create a temporary string object from "x = " • Call itoa(x) • Create a temporary string object from the char pointer returned by itoa() • Concatenate the preceding string objects to create a third temporary string • Destroy all three string temporaries after returning from the debug() call So we go to all this trouble to construct three temporary string objects, and proceed to drop them all over the floor when we find out that trace is inactive. The overhead of creating and destroying those string and Trace objects is at best hundreds of instructions. In typical OO code where functions are short and call frequencies are high, trace overhead could easily degrade performance by an order of magnitude. This is not a farfetched figment of our imagination. We have actually experienced it in a reallife product implementation. It is an educational experience to delve into this particular horror story in more detail. It is the story of an attempt to add tracing capability to a complex product consisting of a halfmillion lines of C++ code. Our first attempt backfired due to atrocious performance. Our Initial Trace Implementation Our intent was to have the trace object log event messages such as entering a function, leaving a function, and possibly other information of interest between those two events. int myFunction(int x) { string name = "myFunction"; Trace t(name); ... string moreInfo = "more interesting info"; t.debug(moreInfo); ... }; // Trace destructor logs exit event to an output stream To enable this usage we started out with the following Trace implementation: class Trace { public: Trace (const string &name); ~Trace (); void debug (const string &msg); static bool traceIsActive; private: string theFunctionName; }; The Trace constructor stores the function's name. inline Trace::Trace(const string &name) : theFunctionName(name) { if (TraceIsActive) { cout << "Enter function" << name << endl; } } Additional information messages are logged via calls to the debug() method
Trace :Trace() if (traceIsActive) cout <"Exit function "<theFunctionName <endl; Once theTrace was designed,coded,and tested,it was deployed and quickly inserted into a large part of io biccts has slowed dow that 2060fa e by a factor of five.We are talking about the case when tracing was off and performance was supposed to be unaffected What Went Wrong ance depending on their respective experience ·/Ois expensive. Our initialace implementation has adhered to all three of these principles.We avoided /O iftracing was off,all methods were inli and all strin arguments were passed by reterence stuck b tive wisdom of the previous ue fel short of t ggests that the dominant issue The of a Trace object is to log function entry and exit: int myPunction(int x) This minimal trace invokes a sequence of computations: .Create the string name local to myFunction
3 inline void Trace::debug(const string &msg) { if (TraceIsActive) { cout << msg << endl; } } inline Trace::~Trace() { if (traceIsActive) { cout << "Exit function " << theFunctionName << endl; } } Once the Trace was designed, coded, and tested, it was deployed and quickly inserted into a large part of the code. Trace objects popped up in most of the functions on the critical execution path. On a subsequent performance test we were shocked to discover that performance plummeted to 20% of its previous level. The insertion of Trace objects has slowed down performance by a factor of five. We are talking about the case when tracing was off and performance was supposed to be unaffected. What Went Wrong Programmers may have different views on C++ performance depending on their respective experiences. But there are a few basic principles that we all agree on: • I/O is expensive. • Function call overhead is a factor so we should inline short, frequently called functions. • Copying objects is expensive. Prefer pass-by-reference over pass-by-value. Our initial Trace implementation has adhered to all three of these principles. We avoided I/O if tracing was off, all methods were inlined, and all string arguments were passed by reference. We stuck by the rules and yet we got blindsided. Obviously, the collective wisdom of the previous rules fell short of the expertise required to develop high-performance C++. Our experience suggests that the dominant issue in C++ performance is not covered by these three principles. It is the creation (and eventual destruction) of unnecessary objects that were created in anticipation of being used but are not. The Trace implementation is an example of the devastating effect of useless objects on performance, evident even in the simplest use of a Trace object. The minimal usage of a Trace object is to log function entry and exit: int myFunction(int x) { string name = "myFunction"; Trace t(name); ... }; This minimal trace invokes a sequence of computations: • Create the string name local to myFunction. • Invoke the Trace constructor. • The Trace constructor invokes the string constructor to create the member string
At the end of the scope.which coincides with the end of the function,the Trace and twostng objects are destroyed 。Destroy the string .The Trace destructor invokes the string destructor for the memberstring. When tracing isoff r object never gets used.Yo pure v tracing is off This was supposed to be the fast lane. int addone (intx /version 0 return x+1; and measured execution time: int main() iveacing o /Start timing o,a86n /stop timing Next.we added aae object to addone and measured again to evaluate the performance delta This is Version 1(see Figure 1): int addone (intx) /version 1.Introducing a Trace object return x 1; Figure 1.1.The performance cost of the Trace object
4 At the end of the scope, which coincides with the end of the function, the Trace and two string objects are destroyed: • Destroy the string name. • Invoke the Trace destructor. • The Trace destructor invokes the string destructor for the member string. When tracing is off, the string member object never gets used. You could also make the case that the Trace object itself is not of much use either (when tracing is off). All the computational effort that goes into the creation and destruction of those objects is a pure waste. Keep in mind that this is the cost when tracing is off. This was supposed to be the fast lane. So how expensive does it get? For a baseline measurement, we timed the execution of a million iterations of the function addOne(): int addOne(int x) // Version 0 { return x+1; } As you can tell, addOne() doesn't do much, which is exactly the point of a baseline. We are trying to isolate the performance factors one at a time. Our main() function invoked addOne() a million times and measured execution time: int main() { Trace::traceIsActive = false;//Turn tracing off //... GetSystemTime(&t1); // Start timing for (i = 0; i < j; i++) { y = addOne(i); } GetSystemTime(&t2); // Stop timing // ... } Next, we added a Trace object to addOne and measured again to evaluate the performance delta. This is Version 1 (see Figure 1.1): int addOne(int x) // Version 1. Introducing a Trace object { string name = "addOne"; Trace t(name); return x + 1; } Figure 1.1. The performance cost of the Trace object
3,500 2 55 0 Version 0 Version 1 The cost of the for loon has skvrocketed from 55 ms to 3 500 ms In othe ords the eed of addone has plummeted by a factor of more than 60.This kind of overhead will wreak havoc on the performance of any ware.The The Recovery Plan The performance recovery plan was to eliminate objects and computations whose values get dropped when a pointer int addone(int x) return x+l; theFunctionName (name)//Version 2 Similarly,the Trace::debug()method was modified as well to acceptaochar as an input argument instead of asng.Now we don't have to create the namenprior to creating the
5 The cost of the for loop has skyrocketed from 55 ms to 3,500 ms. In other words, the speed of addOne has plummeted by a factor of more than 60. This kind of overhead will wreak havoc on the performance of any software. The cost of our tracing implementation was clearly unacceptable. But eliminating the tracing mechanism altogether was not an option—we had to have some form of tracing functionality. We had to regroup and come up with a more efficient implementation. The Recovery Plan The performance recovery plan was to eliminate objects and computations whose values get dropped when tracing is off. We started with the string argument created by addOne and given to the Trace constructor. We modified the function name argument from a string object to a plain char pointer: int addOne(int x) // Version 2. Forget the string object. // Use a char pointer instead. { char *name = "addOne"; Trace t(name); return x+1; } Along with that modification, we had to modify the Trace constructor itself to take a char pointer argument instead of a string reference: inline Trace::Trace(const char *name) : theFunctionName(name)// Version 2 { if (traceIsActive) { cout << "Enter function" << name << endl; } } Similarly, the Trace::debug() method was modified as well to accept a const *char as an input argument instead of a string. Now we don't have to create the name string prior to creating the
Trace object- ne less object toworry about.This translated into a performance boost,as was evident in our measurement Execution time dropped from 3,500 ms to 2,500 ms (see Figure 12). Figure 1.2.Impact of eliminating one string obiect 3,500 2,500 3 2 0 Version 0 Version 1 Version 2 ngobject with a plain char pointer.A char pointer gets"constructed"by a simple assignment- that's chea oth solution is to use composition instead of aggregation.Instead o em edding a pointer over obect is that we can delay ereation of theafter we have verified that tracing was on.We opted to take that route: //Version 3.Use a string pointer Trace (const char *name):theFunctionName(0) if (traceIsActive)( theFunctionName new string(name)i 1 private: string *theFunctionName Thedestructor must also be modified to delete inline Trace::~Trace() ction "<*theFunctionName <endl; delete theFunctionName; 6
6 Trace object—one less object to worry about. This translated into a performance boost, as was evident in our measurement. Execution time dropped from 3,500 ms to 2,500 ms (see Figure 1.2). Figure 1.2. Impact of eliminating one string object. The second step is to eliminate the unconditional creation of the string member object contained within the Trace object. From a performance perspective we have two equivalent solutions. One is to replace the string object with a plain char pointer. A char pointer gets "constructed" by a simple assignment— that's cheap. The other solution is to use composition instead of aggregation. Instead of embedding a string subobject in the Trace object, we could replace it with a string pointer. The advantage of a string pointer over a string object is that we can delay creation of the string after we have verified that tracing was on. We opted to take that route: class Trace { //Version 3. Use a string pointer public: Trace (const char *name) : theFunctionName(0) { if (traceIsActive) { // Conditional creation cout << "Enter function" << name < endl; theFunctionName = new string(name); } } ... private: string *theFunctionName; }; The Trace destructor must also be modified to delete the string pointer: inline Trace::~Trace() { if (traceIsActive) { cout << "Exit function " << *theFunctionName << endl; delete theFunctionName;