Acknowledgments In the course of working on this book,I've received considerable help from many individuals.Among them I'd like to thank the reviewers who read and commented on the initial proposal:Fikret Ercal (Missouri University of Science and Technology), Dan Harvey (Southern Oregon University),Joel Hollingsworth (Elon University), Jens Mache (Lewis and Clark College),Don McLaughlin (West Virginia Uni- versity),Manish Parashar(Rutgers University),Charlie Peck (Earlham College), Stephen C.Renk (North Central College),Rolfe Josef Sassenfeld (The University of Texas at El Paso),Joseph Sloan (Wofford College),Michela Taufer(University of Delaware),Pearl Wang(George Mason University),Bob Weems(University of Texas at Arlington),and Cheng-Zhong Xu(Wayne State University). I'm also deeply grateful to the following individuals for their reviews of vari- ous chapters of the book:Duncan Buell (University of South Carolina),Matthias Gobbert(University of Maryland,Baltimore County),Krishna Kavi(University of North Texas),Hong Lin(University of Houston-Downtown),Kathy Liszka(Univer- sity of Akron),Leigh Little (The State University of New York),Xinlian Liu (Hood College),Henry Tufo(University of Colorado at Boulder),Andrew Sloss(Consul- tant Engineer,ARM),and Gengbin Zheng (University of Illinois).Their comments and suggestions have made the book immeasurably better.Of course,I'm solely responsible for remaining errors and omissions. Kathy Liszka is also preparing slides that can be used by faculty who adopt the text,and a former student,Jinyoung Choi,is working on preparing a solutions manual.Thanks to both of them. The staff of Morgan Kaufmann has been very helpful throughout this project.I'm especially grateful to the developmental editor,Nate McFadden.He gave me much valuable advice,and he did a terrific job arranging for the reviews.He's also been tremendously patient with all the problems I've encountered over the past few years. Thanks also to Marilyn Rash and Megan Guiney,who have been very prompt and efficient during the production process. My colleagues in the computer science and mathematics departments at USF have been extremely helpful during my work on the book.I'd like to single out Professor Gregory Benson for particular thanks:his understanding of parallel computing- especially Pthreads and semaphores-has been an invaluable resource for me.I'm also very grateful to our system administrators,Alexey Fedosov and Colin Bean. They've patiently and efficiently dealt with all of the "emergencies"that cropped up while I was working on programs for the book. I would never have been able to finish this book without the encouragement and moral support of my friends Holly Cohn,John Dean,and Robert Miller.They helped me through some very difficult times,and I'll be eternally grateful to them My biggest debt is to my students.They showed me what was too easy,and what was far too difficult.In short,they taught me how to teach parallel computing.My deepest thanks to all of them
Acknowledgments In the course of working on this book, I’ve received considerable help from many individuals. Among them I’d like to thank the reviewers who read and commented on the initial proposal: Fikret Ercal (Missouri University of Science and Technology), Dan Harvey (Southern Oregon University), Joel Hollingsworth (Elon University), Jens Mache (Lewis and Clark College), Don McLaughlin (West Virginia University), Manish Parashar (Rutgers University), Charlie Peck (Earlham College), Stephen C. Renk (North Central College), Rolfe Josef Sassenfeld (The University of Texas at El Paso), Joseph Sloan (Wofford College), Michela Taufer (University of Delaware), Pearl Wang (George Mason University), Bob Weems (University of Texas at Arlington), and Cheng-Zhong Xu (Wayne State University). I’m also deeply grateful to the following individuals for their reviews of various chapters of the book: Duncan Buell (University of South Carolina), Matthias Gobbert (University of Maryland, Baltimore County), Krishna Kavi (University of North Texas), Hong Lin (University of Houston–Downtown), Kathy Liszka (University of Akron), Leigh Little (The State University of New York), Xinlian Liu (Hood College), Henry Tufo (University of Colorado at Boulder), Andrew Sloss (Consultant Engineer, ARM), and Gengbin Zheng (University of Illinois). Their comments and suggestions have made the book immeasurably better. Of course, I’m solely responsible for remaining errors and omissions. Kathy Liszka is also preparing slides that can be used by faculty who adopt the text, and a former student, Jinyoung Choi, is working on preparing a solutions manual. Thanks to both of them. The staff of Morgan Kaufmann has been very helpful throughout this project. I’m especially grateful to the developmental editor, Nate McFadden. He gave me much valuable advice, and he did a terrific job arranging for the reviews. He’s also been tremendously patient with all the problems I’ve encountered over the past few years. Thanks also to Marilyn Rash and Megan Guiney, who have been very prompt and efficient during the production process. My colleagues in the computer science and mathematics departments at USF have been extremely helpful during my work on the book. I’d like to single out Professor Gregory Benson for particular thanks: his understanding of parallel computing— especially Pthreads and semaphores—has been an invaluable resource for me. I’m also very grateful to our system administrators, Alexey Fedosov and Colin Bean. They’ve patiently and efficiently dealt with all of the “emergencies” that cropped up while I was working on programs for the book. I would never have been able to finish this book without the encouragement and moral support of my friends Holly Cohn, John Dean, and Robert Miller. They helped me through some very difficult times, and I’ll be eternally grateful to them. My biggest debt is to my students. They showed me what was too easy, and what was far too difficult. In short, they taught me how to teach parallel computing. My deepest thanks to all of them
About the author Peter Pacheco received a PhD in mathematics from Florida State University.After completing graduate school,he became one of the first professors in UCLA's"Pro- gram in Computing,"which teaches basic computer science to students at the College of Letters and Sciences there.Since leaving UCLA,he has been on the faculty of the University of San Francisco.At USF Peter has served as chair of the computer science department and is currently chair of the mathematics department. His research is in parallel scientific computing.He has worked on the develop- ment of parallel software for circuit simulation,speech recognition,and the simu- lation of large networks of biologically accurate neurons.Peter has been teaching parallel computing at both the undergraduate and graduate levels for nearly twenty years.He is the author of Parallel Programming with MPI,published by Morgan Kaufmann Publishers. xix
About the Author Peter Pacheco received a PhD in mathematics from Florida State University. After completing graduate school, he became one of the first professors in UCLA’s “Program in Computing,” which teaches basic computer science to students at the College of Letters and Sciences there. Since leaving UCLA, he has been on the faculty of the University of San Francisco. At USF Peter has served as chair of the computer science department and is currently chair of the mathematics department. His research is in parallel scientific computing. He has worked on the development of parallel software for circuit simulation, speech recognition, and the simulation of large networks of biologically accurate neurons. Peter has been teaching parallel computing at both the undergraduate and graduate levels for nearly twenty years. He is the author of Parallel Programming with MPI, published by Morgan Kaufmann Publishers. xix
CHAPTER Why Parallel Computing? From 1986 to 2002 the performance of microprocessors increased,on average,50% per year [27].This unprecedented increase meant that users and software develop- ers could often simply wait for the next generation of microprocessors in order to obtain increased performance from an application program.Since 2002,however, single-processor performance improvement has slowed to about 20%per year.This difference is dramatic:at 50%per year,performance will increase by almost a factor of 60 in 10 years,while at 20%,it will only increase by about a factor of 6. Furthermore,this difference in performance increase has been associated with a dramatic change in processor design.By 2005,most of the major manufacturers of microprocessors had decided that the road to rapidly increasing performance lay in the direction of parallelism.Rather than trying to continue to develop ever-faster monolithic processors,manufacturers started putting multiple complete processors on a single integrated circuit. This change has a very important consequence for software developers:simply adding more processors will not magically improve the performance of the vast majority of serial programs,that is,programs that were written to run on a single processor.Such programs are unaware of the existence of multiple processors,and the performance of such a program on a system with multiple processors will be effectively the same as its performance on a single processor of the multiprocessor system. All of this raises a number of questions: 1.Why do we care?Aren't single processor systems fast enough?After all,20%per year is still a pretty significant performance improvement. 2.Why can't microprocessor manufacturers continue to develop much faster sin- gle processor systems?Why build parallel systems?Why build systems with multiple processors? 3.Why can't we write programs that will automatically convert serial programs into parallel programs,that is,programs that take advantage of the presence of multiple processors? Let's take a brief look at each of these questions.Keep in mind,though,that some of the answers aren't carved in stone.For example,20%per year may be more than adequate for many applications. An Introduction to Parallel Programming.DOl:10.1016/B978-0-12-374260-5.00001-4 1 Copyright 2011 Elsevier Inc.All rights reserved
CHAPTER 1 Why Parallel Computing? From 1986 to 2002 the performance of microprocessors increased, on average, 50% per year [27]. This unprecedented increase meant that users and software developers could often simply wait for the next generation of microprocessors in order to obtain increased performance from an application program. Since 2002, however, single-processor performance improvement has slowed to about 20% per year. This difference is dramatic: at 50% per year, performance will increase by almost a factor of 60 in 10 years, while at 20%, it will only increase by about a factor of 6. Furthermore, this difference in performance increase has been associated with a dramatic change in processor design. By 2005, most of the major manufacturers of microprocessors had decided that the road to rapidly increasing performance lay in the direction of parallelism. Rather than trying to continue to develop ever-faster monolithic processors, manufacturers started putting multiple complete processors on a single integrated circuit. This change has a very important consequence for software developers: simply adding more processors will not magically improve the performance of the vast majority of serial programs, that is, programs that were written to run on a single processor. Such programs are unaware of the existence of multiple processors, and the performance of such a program on a system with multiple processors will be effectively the same as its performance on a single processor of the multiprocessor system. All of this raises a number of questions: 1. Why do we care? Aren’t single processor systems fast enough? After all, 20% per year is still a pretty significant performance improvement. 2. Why can’t microprocessor manufacturers continue to develop much faster single processor systems? Why build parallel systems? Why build systems with multiple processors? 3. Why can’t we write programs that will automatically convert serial programs into parallel programs, that is, programs that take advantage of the presence of multiple processors? Let’s take a brief look at each of these questions. Keep in mind, though, that some of the answers aren’t carved in stone. For example, 20% per year may be more than adequate for many applications. Copyright c 2011 Elsevier Inc. All rights reserved. An Introduction to Parallel Programming. DOI: 10.1016/B978-0-12-374260-5.00001-4 1
2 CHAPTER 1 Why Parallel Computing? 1.1 WHY WE NEED EVER-INCREASING PERFORMANCE The vast increases in computational power that we've been enjoying for decades now have been at the heart of many of the most dramatic advances in fields as diverse as science,the Internet,and entertainment.For example,decoding the human genome,ever more accurate medical imaging,astonishingly fast and accurate Web searches,and ever more realistic computer games would all have been impossi- ble without these increases.Indeed,more recent increases in computational power would have been difficult,if not impossible,without earlier increases.But we can never rest on our laurels.As our computational power increases,the number of prob- lems that we can seriously consider solving also increases.The following are a few examples: Climate modeling.In order to better understand climate change,we need far more accurate computer models,models that include interactions between the atmo- sphere,the oceans,solid land,and the ice caps at the poles.We also need to be able to make detailed studies of how various interventions might affect the global climate. Protein folding.It's believed that misfolded proteins may be involved in dis- eases such as Huntington's,Parkinson's,and Alzheimer's,but our ability to study configurations of complex molecules such as proteins is severely limited by our current computational power. Drug discovery.There are many ways in which increased computational power can be used in research into new medical treatments.For example,there are many drugs that are effective in treating a relatively small fraction of those suffering from some disease.It's possible that we can devise alternative treatments by care- ful analysis of the genomes of the individuals for whom the known treatment is ineffective.This,however,will involve extensive computational analysis of genomes. Energy research.Increased computational power will make it possible to program much more detailed models of technologies such as wind turbines,solar cells,and batteries.These programs may provide the information needed to construct far more efficient clean energy sources. Data analysis.We generate tremendous amounts of data.By some estimates,the quantity of data stored worldwide doubles every two years [28],but the vast majority of it is largely useless unless it's analyzed.As an example,knowing the sequence of nucleotides in human DNA is,by itself,of little use.Understand- ing how this sequence affects development and how it can cause disease requires extensive analysis.In addition to genomics,vast quantities of data are generated by particle colliders such as the Large Hadron Collider at CERN,medical imaging, astronomical research,and Web search engines-to name a few. These and a host of other problems won't be solved without vast increases in computational power
2 CHAPTER 1 Why Parallel Computing? 1.1 WHY WE NEED EVER-INCREASING PERFORMANCE The vast increases in computational power that we’ve been enjoying for decades now have been at the heart of many of the most dramatic advances in fields as diverse as science, the Internet, and entertainment. For example, decoding the human genome, ever more accurate medical imaging, astonishingly fast and accurate Web searches, and ever more realistic computer games would all have been impossible without these increases. Indeed, more recent increases in computational power would have been difficult, if not impossible, without earlier increases. But we can never rest on our laurels. As our computational power increases, the number of problems that we can seriously consider solving also increases. The following are a few examples: . Climate modeling. In order to better understand climate change, we need far more accurate computer models, models that include interactions between the atmosphere, the oceans, solid land, and the ice caps at the poles. We also need to be able to make detailed studies of how various interventions might affect the global climate. . Protein folding. It’s believed that misfolded proteins may be involved in diseases such as Huntington’s, Parkinson’s, and Alzheimer’s, but our ability to study configurations of complex molecules such as proteins is severely limited by our current computational power. . Drug discovery. There are many ways in which increased computational power can be used in research into new medical treatments. For example, there are many drugs that are effective in treating a relatively small fraction of those suffering from some disease. It’s possible that we can devise alternative treatments by careful analysis of the genomes of the individuals for whom the known treatment is ineffective. This, however, will involve extensive computational analysis of genomes. . Energy research. Increased computational power will make it possible to program much more detailed models of technologies such as wind turbines, solar cells, and batteries. These programs may provide the information needed to construct far more efficient clean energy sources. . Data analysis. We generate tremendous amounts of data. By some estimates, the quantity of data stored worldwide doubles every two years [28], but the vast majority of it is largely useless unless it’s analyzed. As an example, knowing the sequence of nucleotides in human DNA is, by itself, of little use. Understanding how this sequence affects development and how it can cause disease requires extensive analysis. In addition to genomics, vast quantities of data are generated by particle colliders such as the Large Hadron Collider at CERN, medical imaging, astronomical research, and Web search engines—to name a few. These and a host of other problems won’t be solved without vast increases in computational power
1.3 Why We Need to Write Parallel Programs 3 1.2 WHY WE'RE BUILDING PARALLEL SYSTEMS Much of the tremendous increase in single processor performance has been driven by the ever-increasing density of transistors-the electronic switches-on integrated circuits.As the size of transistors decreases,their speed can be increased,and the overall speed of the integrated circuit can be increased.However,as the speed of transistors increases,their power consumption also increases.Most of this power is dissipated as heat,and when an integrated circuit gets too hot,it becomes unreli- able.In the first decade of the twenty-first century,air-cooled integrated circuits are reaching the limits of their ability to dissipate heat [26]. Therefore,it is becoming impossible to continue to increase the speed of inte- grated circuits.However,the increase in transistor density can continue-at least for a while.Also,given the potential of computing to improve our existence,there is an almost moral imperative to continue to increase computational power.Finally,if the integrated circuit industry doesn't continue to bring out new and better products,it will effectively cease to exist. How then,can we exploit the continuing increase in transistor density?The answer is parallelism.Rather than building ever-faster,more complex,monolithic processors,the industry has decided to put multiple,relatively simple,complete processors on a single chip.Such integrated circuits are called multicore proces- sors,and core has become synonymous with central processing unit,or CPU.In this setting a conventional processor with one CPU is often called a single-core system. 1.3 WHY WE NEED TO WRITE PARALLEL PROGRAMS Most programs that have been written for conventional,single-core systems cannot exploit the presence of multiple cores.We can run multiple instances of a program on a multicore system,but this is often of little help.For example,being able to run multiple instances of our favorite game program isn't really what we want-we want the program to run faster with more realistic graphics.In order to do this,we need to either rewrite our serial programs so that they're parallel,so that they can make use of multiple cores,or write translation programs,that is,programs that will automatically convert serial programs into parallel programs.The bad news is that researchers have had very limited success writing programs that convert serial programs in languages such as C and C++into parallel programs. This isn't terribly surprising.While we can write programs that recognize com- mon constructs in serial programs,and automatically translate these constructs into efficient parallel constructs,the sequence of parallel constructs may be terribly inef- ficient.For example,we can view the multiplication of two n x n matrices as a sequence of dot products,but parallelizing a matrix multiplication as a sequence of parallel dot products is likely to be very slow on many systems
1.3 Why We Need to Write Parallel Programs 3 1.2 WHY WE’RE BUILDING PARALLEL SYSTEMS Much of the tremendous increase in single processor performance has been driven by the ever-increasing density of transistors—the electronic switches—on integrated circuits. As the size of transistors decreases, their speed can be increased, and the overall speed of the integrated circuit can be increased. However, as the speed of transistors increases, their power consumption also increases. Most of this power is dissipated as heat, and when an integrated circuit gets too hot, it becomes unreliable. In the first decade of the twenty-first century, air-cooled integrated circuits are reaching the limits of their ability to dissipate heat [26]. Therefore, it is becoming impossible to continue to increase the speed of integrated circuits. However, the increase in transistor density can continue—at least for a while. Also, given the potential of computing to improve our existence, there is an almost moral imperative to continue to increase computational power. Finally, if the integrated circuit industry doesn’t continue to bring out new and better products, it will effectively cease to exist. How then, can we exploit the continuing increase in transistor density? The answer is parallelism. Rather than building ever-faster, more complex, monolithic processors, the industry has decided to put multiple, relatively simple, complete processors on a single chip. Such integrated circuits are called multicore processors, and core has become synonymous with central processing unit, or CPU. In this setting a conventional processor with one CPU is often called a single-core system. 1.3 WHY WE NEED TO WRITE PARALLEL PROGRAMS Most programs that have been written for conventional, single-core systems cannot exploit the presence of multiple cores. We can run multiple instances of a program on a multicore system, but this is often of little help. For example, being able to run multiple instances of our favorite game program isn’t really what we want—we want the program to run faster with more realistic graphics. In order to do this, we need to either rewrite our serial programs so that they’re parallel, so that they can make use of multiple cores, or write translation programs, that is, programs that will automatically convert serial programs into parallel programs. The bad news is that researchers have had very limited success writing programs that convert serial programs in languages such as C and C++ into parallel programs. This isn’t terribly surprising. While we can write programs that recognize common constructs in serial programs, and automatically translate these constructs into efficient parallel constructs, the sequence of parallel constructs may be terribly inef- ficient. For example, we can view the multiplication of two n × n matrices as a sequence of dot products, but parallelizing a matrix multiplication as a sequence of parallel dot products is likely to be very slow on many systems