《并行与分布式程序设计》课程教学参考书：并行与并发编程《C++ Concurrency in Action - Practical Multithreading》（Manning, 2012）.pdf_P26-P30

Concurrency and multithreading in C++ 11 chapter 3),synchronizing operations between threads (see chapter 4),and low-level atomic operations (see chapter 5) The new C++Thread Library is heavily based on the prior experience accumu- lated through the use of the C++class libraries mentioned previously.In particular, the Boost Thread Library has been used as the primary model on which the new library is based,with many of the classes sharing their names and structure with the corresponding ones from Boost.As the new standard has evolved,this has been a two-way flow,and the Boost Thread Library has itself changed to match the C++ Standard in many respects,so users transitioning from Boost should find themselves very much at home. Concurrency support is just one of the changes with the new C++Standard-as mentioned at the beginning of this chapter,there are many enhancements to the lan- guage itself to make programmers'lives easier.Although these are generally outside the scope of this book,some of those changes have had a direct impact on the Thread Library itself and the ways in which it can be used.Appendix A provides a brief intro- duction to these language features. The support for atomic operations directly in C++enables programmers to write efficient code with defined semantics without the need for platform-specific assembly language.This is a real boon for those trying to write efficient,portable code;not only does the compiler take care of the platform specifics,but the optimizer can be written to take into account the semantics of the operations,thus enabling better optimiza- tion of the program as a whole. 1.3.3 Efficiency in the C++Thread Library One of the concerns that developers involved in high-performance computing often raise regarding C++in general,and C++classes that wrap low-level facilities-such as those in the new Standard C++Thread Library specifically is that of efficiency.If you're after the utmost in performance,then it's important to understand the imple- mentation costs associated with using any high-level facilities,compared to using the underlying low-level facilities directly.This cost is the abstraction penalty. The C++Standards Committee has been very aware of this when designing the C++ Standard Library in general and the Standard C++Thread Library in particular;one of the design goals has been that there should be little or no benefit to be gained from using the lower-level APIs directly,where the same facility is to be provided.The library has therefore been designed to allow for efficient implementation(with a very low abstraction penalty)on most major platforms. Another goal of the C++Standards Committee has been to ensure that C++pro- vides sufficient low-level facilities for those wishing to work close to the metal for the ultimate performance.To this end,along with the new memory model comes a com- prehensive atomic operations library for direct control over individual bits and bytes and the inter-thread synchronization and visibility of any changes.These atomic types and the corresponding operations can now be used in many places where developers

Concurrency and multithreading in C++ 11 chapter 3), synchronizing operations between threads (see chapter 4), and low-level atomic operations (see chapter 5). The new C++ Thread Library is heavily based on the prior experience accumulated through the use of the C++ class libraries mentioned previously. In particular, the Boost Thread Library has been used as the primary model on which the new library is based, with many of the classes sharing their names and structure with the corresponding ones from Boost. As the new standard has evolved, this has been a two-way flow, and the Boost Thread Library has itself changed to match the C++ Standard in many respects, so users transitioning from Boost should find themselves very much at home. Concurrency support is just one of the changes with the new C++ Standard—as mentioned at the beginning of this chapter, there are many enhancements to the language itself to make programmers’ lives easier. Although these are generally outside the scope of this book, some of those changes have had a direct impact on the Thread Library itself and the ways in which it can be used. Appendix A provides a brief introduction to these language features. The support for atomic operations directly in C++ enables programmers to write efficient code with defined semantics without the need for platform-specific assembly language. This is a real boon for those trying to write efficient, portable code; not only does the compiler take care of the platform specifics, but the optimizer can be written to take into account the semantics of the operations, thus enabling better optimization of the program as a whole. 1.3.3 Efficiency in the C++ Thread Library One of the concerns that developers involved in high-performance computing often raise regarding C++ in general, and C++ classes that wrap low-level facilities—such as those in the new Standard C++ Thread Library specifically is that of efficiency. If you’re after the utmost in performance, then it’s important to understand the implementation costs associated with using any high-level facilities, compared to using the underlying low-level facilities directly. This cost is the abstraction penalty. The C++ Standards Committee has been very aware of this when designing the C++ Standard Library in general and the Standard C++ Thread Library in particular; one of the design goals has been that there should be little or no benefit to be gained from using the lower-level APIs directly, where the same facility is to be provided. The library has therefore been designed to allow for efficient implementation (with a very low abstraction penalty) on most major platforms. Another goal of the C++ Standards Committee has been to ensure that C++ provides sufficient low-level facilities for those wishing to work close to the metal for the ultimate performance. To this end, along with the new memory model comes a comprehensive atomic operations library for direct control over individual bits and bytes and the inter-thread synchronization and visibility of any changes. These atomic types and the corresponding operations can now be used in many places where developers

12 CHAPTER 1 Hello,world of concurrency in C++! would previously have chosen to drop down to platform-specific assembly language. Code using the new standard types and operations is thus more portable and easier to maintain. The C++Standard Library also provides higher-level abstractions and facilities that make writing multithreaded code easier and less error prone.Sometimes the use of these facilities does come with a performance cost because of the additional code that must be executed.But this performance cost doesn't necessarily imply a higher abstraction penalty;in general the cost is no higher than would be incurred by writing equivalent functionality by hand,and the compiler may well inline much of the addi- tional code anyway. In some cases,the high-level facilities provide additional functionality beyond what may be required for a specific use.Most of the time this is not an issue:you don't pay for what you don't use.On rare occasions,this unused functionality will impact the performance of other code.If you're aiming for performance and the cost is too high, you may be better off handcrafting the desired functionality from lower-level facilities. In the vast majority of cases,the additional complexity and chance of errors far out- weigh the potential benefits from a small performance gain.Even if profiling does demonstrate that the bottleneck is in the C++Standard Library facilities,it may be due to poor application design rather than a poor library implementation.For example,if too many threads are competing for a mutex,it will impact the performance signifi- cantly.Rather than trying to shave a small fraction of time off the mutex operations,it would probably be more beneficial to restructure the application so that there's less contention on the mutex.Designing applications to reduce contention is covered in chapter 8. In those very rare cases where the C++Standard Library does not provide the perfor- mance or behavior required,it might be necessary to use platform-specific facilities. 13.4 Platform-specific facilities Although the C++Thread Library provides reasonably comprehensive facilities for multithreading and concurrency,on any given platform there will be platform-specific facilities that go beyond what's offered.In order to gain easy access to those facilities without giving up the benefits of using the Standard C++Thread Library,the types in the C++Thread Library may offer a native_handle()member function that allows the underlying implementation to be directly manipulated using a platform-specific API.By its very nature,any operations performed using the native_handle()are entirely platform dependent and out of the scope of this book (and the Standard C++ Library itself). Of course,before even considering using platform-specific facilities,it's important to understand what the Standard Library provides,so let's get started with an example

12 CHAPTER 1 Hello, world of concurrency in C++! would previously have chosen to drop down to platform-specific assembly language. Code using the new standard types and operations is thus more portable and easier to maintain. The C++ Standard Library also provides higher-level abstractions and facilities that make writing multithreaded code easier and less error prone. Sometimes the use of these facilities does come with a performance cost because of the additional code that must be executed. But this performance cost doesn’t necessarily imply a higher abstraction penalty; in general the cost is no higher than would be incurred by writing equivalent functionality by hand, and the compiler may well inline much of the additional code anyway. In some cases, the high-level facilities provide additional functionality beyond what may be required for a specific use. Most of the time this is not an issue: you don’t pay for what you don’t use. On rare occasions, this unused functionality will impact the performance of other code. If you’re aiming for performance and the cost is too high, you may be better off handcrafting the desired functionality from lower-level facilities. In the vast majority of cases, the additional complexity and chance of errors far outweigh the potential benefits from a small performance gain. Even if profiling does demonstrate that the bottleneck is in the C++ Standard Library facilities, it may be due to poor application design rather than a poor library implementation. For example, if too many threads are competing for a mutex, it will impact the performance significantly. Rather than trying to shave a small fraction of time off the mutex operations, it would probably be more beneficial to restructure the application so that there’s less contention on the mutex. Designing applications to reduce contention is covered in chapter 8. In those very rare cases where the C++ Standard Library does not provide the performance or behavior required, it might be necessary to use platform-specific facilities. 1.3.4 Platform-specific facilities Although the C++ Thread Library provides reasonably comprehensive facilities for multithreading and concurrency, on any given platform there will be platform-specific facilities that go beyond what’s offered. In order to gain easy access to those facilities without giving up the benefits of using the Standard C++ Thread Library, the types in the C++ Thread Library may offer a native_handle() member function that allows the underlying implementation to be directly manipulated using a platform-specific API. By its very nature, any operations performed using the native_handle() are entirely platform dependent and out of the scope of this book (and the Standard C++ Library itself). Of course, before even considering using platform-specific facilities, it’s important to understand what the Standard Library provides, so let’s get started with an example

14 CHAPTER 1 Hello,world of concurrency in C++! this case,the std::thread object named t has the new function hello()as its ini- tial function. This is the next difference:rather than just writing directly to standard output or calling hello()from main(),this program launches a whole new thread to do it, bringing the thread count to two-the initial thread that starts at main()and the new thread that starts at hello(). After the new thread has been launched 3,the initial thread continues execution. If it didn't wait for the new thread to finish,it would merrily continue to the end of main()and thus end the program-possibly before the new thread had had a chance to run.This is why the call to join()is there -as described in chapter 2,this causes the calling thread (in main())to wait for the thread associated with the std:thread object,in this case,t. If this seems like a lot of work to go to just to write a message to standard output,it is-as described previously in section 1.2.3,it's generally not worth the effort to use multiple threads for such a simple task,especially if the initial thread has nothing to do in the meantime.Later in the book,we'll work through examples that show scenar- ios where there's a clear gain to using multiple threads. 1.5 Summary In this chapter,I covered what is meant by concurrency and multithreading and why you'd choose to use it (or not)in your applications.I also covered the history of multi- threading in C++from the complete lack of support in the 1998 standard,through various platform-specific extensions,to proper multithreading support in the new C++ Standard,C++11.This support is coming just in time to allow programmers to take advantage of the greater hardware concurrency becoming available with newer CPUs, as chip manufacturers choose to add more processing power in the form of multiple cores that allow more tasks to be executed concurrently,rather than increasing the execution speed of a single core. I also showed how simple using the classes and functions from the C++Standard Library can be,in the examples in section 1.4.In C++,using multiple threads isn't complicated in and of itself;the complexity lies in designing the code so that it behaves as intended. After the taster examples of section 1.4,it's time for something with a bit more substance.In chapter 2 we'll look at the classes and functions available for manag- ing threads

14 CHAPTER 1 Hello, world of concurrency in C++! this case, the std::thread object named t d has the new function hello() as its initial function. This is the next difference: rather than just writing directly to standard output or calling hello() from main(), this program launches a whole new thread to do it, bringing the thread count to two—the initial thread that starts at main() and the new thread that starts at hello(). After the new thread has been launched d, the initial thread continues execution. If it didn’t wait for the new thread to finish, it would merrily continue to the end of main() and thus end the program—possibly before the new thread had had a chance to run. This is why the call to join() is there e—as described in chapter 2, this causes the calling thread (in main()) to wait for the thread associated with the std::thread object, in this case, t. If this seems like a lot of work to go to just to write a message to standard output, it is—as described previously in section 1.2.3, it’s generally not worth the effort to use multiple threads for such a simple task, especially if the initial thread has nothing to do in the meantime. Later in the book, we’ll work through examples that show scenarios where there’s a clear gain to using multiple threads. 1.5 Summary In this chapter, I covered what is meant by concurrency and multithreading and why you’d choose to use it (or not) in your applications. I also covered the history of multithreading in C++ from the complete lack of support in the 1998 standard, through various platform-specific extensions, to proper multithreading support in the new C++ Standard, C++11. This support is coming just in time to allow programmers to take advantage of the greater hardware concurrency becoming available with newer CPUs, as chip manufacturers choose to add more processing power in the form of multiple cores that allow more tasks to be executed concurrently, rather than increasing the execution speed of a single core. I also showed how simple using the classes and functions from the C++ Standard Library can be, in the examples in section 1.4. In C++, using multiple threads isn’t complicated in and of itself; the complexity lies in designing the code so that it behaves as intended. After the taster examples of section 1.4, it’s time for something with a bit more substance. In chapter 2 we’ll look at the classes and functions available for managing threads