Leveraging the full power of multicore processors demands new tools and new thinking from the software industry. Software and the Concurrency Revolution oncurrency has long been touted as the "next big thing"and"the way of the future,"but for the past 30 years,mainstream software development has been able to ignore it.Our parallel future has finally arrived: new machines will be parallel machines,and this will require major changes in the way we develop software. The introductory article in this issue("The Future of Microprocessors"by Kunle Olukotun and Lance Hammond)describes the hardware imperatives behind this shift in computer architecture from uniprocessors to multicore processors,also known as CMPs(chip multiprocessors).(For related analysis,see "The Free Lunch Is Over:A Fundamental Turn Toward Concur- rency in Software.") In this article we focus on the implications of con- currency for software and its consequences for both programming languages and programmers. The hardware changes that Olukotun and Ham- mond describe represent a fundamental shift in computing.For the past three decades,improvements in semiconductor fabrication and processor implemen- tation produced steady increases in the speed at which computers executed existing sequential programs.The architectural changes in multicore processors benefit only concurrent applications and therefore have little value for most existing mainstream software.For the foreseeable future,today's desktop applications will HERB SUTTER AND JAMES LARUS,MICROSOFT
54 September 2005 QUEUE rants: feedback@acmqueue.com Software and the Concurrency Revolution Leveraging the full power of multicore processors demands new tools and new thinking from the software industry. Concurrency has long been touted as the “next big thing” and “the way of the future,” but for the past 30 years, mainstream software development has been able to ignore it. Our parallel future has finally arrived: new machines will be parallel machines, and this will require major changes in the way we develop software. The introductory article in this issue (“The Future of Microprocessors” by Kunle Olukotun and Lance Hammond) describes the hardware imperatives behind this shift in computer architecture from uniprocessors to multicore processors, also known as CMPs (chip multiprocessors). (For related analysis, see “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software.”1 ) In this article we focus on the implications of concurrency for software and its consequences for both programming languages and programmers. The hardware changes that Olukotun and Hammond describe represent a fundamental shift in computing. For the past three decades, improvements in semiconductor fabrication and processor implementation produced steady increases in the speed at which computers executed existing sequential programs. The architectural changes in multicore processors benefit only concurrent applications and therefore have little value for most existing mainstream software. For the foreseeable future, today’s desktop applications will HERB SUTTER AND JAMES LARUS, MICROSOFT
Software First,concurrency will be integral to higher performance. Languages such as C ignored OO and remained usable for and the many programs.If concurrency becomes the sole path to higher-performance hardware,commercial and systems Concurrency programming languages will be valued on their support for concurrent programming.Existing languages,such as Revolution C,will gain concurrent features beyond simple models such as pthreads.Languages that fail to support concur- rent programming will gradually die away and remain useful only when modern hardware is unimportant. not run much faster than they do now.In fact,they may The second reason that concurrency will be more run slightly slower on newer chips,as individual cores disruptive than OO is that,although sequential program- become simpler and run at lower clock speeds to reduce ming is hard,concurrent programming is demonstrably power consumption on dense multicore processors. more difficult.For example,context-sensitive analysis of That brings us to a fundamental turning point in sequential programs is a fundamental technique for tak- software development,at least for mainstream software ing calling contexts into account when analyzing a pro- Computers will continue to become more and more gram.Concurrent programs also require synchronization capable,but programs can no longer simply ride the analysis,but simultaneously performing both analyses is hardware wave of increasing performance unless they are provably undecidable,2 highly concurrent. Finally,humans are quickly overwhelmed by concur- Although multicore performance is the forcing rency and find it much more difficult to reason about function,we have other reasons to want concurrency: concurrent than sequential code.Even careful people notably,to improve responsiveness by performing work miss possible interleavings among simple collections of asynchronously instead of synchronously.For example, partially ordered operations. today's applications must move work off the GUI thread so it can redraw the screen while a computation runs in DIFFERENCES BETWEEN CLENT AND the background SERVER APPLICATIONS But concurrency is hard.Not only are today's lan- Concurrency is a challenging issue for client-side applica- guages and tools inadequate to transform applications tions.For many server-based programs,however,concur- into parallel programs,but also it is difficult to find rency is a "solved problem,"in that we routinely architect parallelism in mainstream applications,and-worst of concurrent solutions that work well,although program- all-concurrency requires programmers to think in a way ming them and ensuring they scale can still require a humans find difficult. huge effort.These applications typically have an abun- Nevertheless,multicore machines are the future,and dance of parallelism,as they simultaneously handle many we must figure out how to program them.The rest of this independent request streams.For example,a Web server article delves into some of the reasons why it is hard,and or Web site independently executes thousands of copies some possible directions for solutions. of the same code on mostly nonoverlapping data. In addition,these executions are well isolated and CONSEQUENCES:ANEW ERA IN SOFTWARE share state via an abstract data store,such as a database Today's concurrent programming languages and tools are that supports highly concurrent access to structured data at a level comparable to sequential programming at the The net effect is that code that shares data through a beginning of the structured programming era.Sema- database can keep its "peaceful easy feeling"-the illusion phores and coroutines are the assembler of concurrency, of living in a tidy,single-threaded universe. and locks and threads are the slightly higher-level struc- The world of client applications is not nearly as well tured constructs of concurrency.What we need is OO for structured and regular.A typical client application exe- concurrency-higher-level abstractions that help build cutes a relatively small computation on behalf of a single concurrent programs,just as object-oriented abstractions user,so concurrency is found by dividing a computa- help build large componentized programs. tion into finer pieces.These pieces,say the user interface For several reasons,the concurrency revolution is and program's computation,interact and share data in likely to be more disruptive than the OO revolution. myriad ways.What makes this type of program difficult 56 September 2005 QUEUE rants:feedback@acmqueue.com
56 September 2005 QUEUE rants: feedback@acmqueue.com not run much faster than they do now. In fact, they may run slightly slower on newer chips, as individual cores become simpler and run at lower clock speeds to reduce power consumption on dense multicore processors. That brings us to a fundamental turning point in software development, at least for mainstream software. Computers will continue to become more and more capable, but programs can no longer simply ride the hardware wave of increasing performance unless they are highly concurrent. Although multicore performance is the forcing function, we have other reasons to want concurrency: notably, to improve responsiveness by performing work asynchronously instead of synchronously. For example, today’s applications must move work off the GUI thread so it can redraw the screen while a computation runs in the background. But concurrency is hard. Not only are today’s languages and tools inadequate to transform applications into parallel programs, but also it is difficult to find parallelism in mainstream applications, and—worst of all—concurrency requires programmers to think in a way humans find difficult. Nevertheless, multicore machines are the future, and we must figure out how to program them. The rest of this article delves into some of the reasons why it is hard, and some possible directions for solutions. CONSEQUENCES: A NEW ERA IN SOFTWARE Today’s concurrent programming languages and tools are at a level comparable to sequential programming at the beginning of the structured programming era. Semaphores and coroutines are the assembler of concurrency, and locks and threads are the slightly higher-level structured constructs of concurrency. What we need is OO for concurrency—higher-level abstractions that help build concurrent programs, just as object-oriented abstractions help build large componentized programs. For several reasons, the concurrency revolution is likely to be more disruptive than the OO revolution. First, concurrency will be integral to higher performance. Languages such as C ignored OO and remained usable for many programs. If concurrency becomes the sole path to higher-performance hardware, commercial and systems programming languages will be valued on their support for concurrent programming. Existing languages, such as C, will gain concurrent features beyond simple models such as pthreads. Languages that fail to support concurrent programming will gradually die away and remain useful only when modern hardware is unimportant. The second reason that concurrency will be more disruptive than OO is that, although sequential programming is hard, concurrent programming is demonstrably more difficult. For example, context-sensitive analysis of sequential programs is a fundamental technique for taking calling contexts into account when analyzing a program. Concurrent programs also require synchronization analysis, but simultaneously performing both analyses is provably undecidable.2 Finally, humans are quickly overwhelmed by concurrency and find it much more difficult to reason about concurrent than sequential code. Even careful people miss possible interleavings among simple collections of partially ordered operations. DIFFERENCES BETWEEN CLIENT AND SERVER APPLICATIONS Concurrency is a challenging issue for client-side applications. For many server-based programs, however, concurrency is a “solved problem,” in that we routinely architect concurrent solutions that work well, although programming them and ensuring they scale can still require a huge effort. These applications typically have an abundance of parallelism, as they simultaneously handle many independent request streams. For example, a Web server or Web site independently executes thousands of copies of the same code on mostly nonoverlapping data. In addition, these executions are well isolated and share state via an abstract data store, such as a database that supports highly concurrent access to structured data. The net effect is that code that shares data through a database can keep its “peaceful easy feeling”—the illusion of living in a tidy, single-threaded universe. The world of client applications is not nearly as well structured and regular. A typical client application executes a relatively small computation on behalf of a single user, so concurrency is found by dividing a computation into finer pieces. These pieces, say the user interface and program’s computation, interact and share data in myriad ways. What makes this type of program difficult Software and the Concurrency Revolution
to execute concurrently are nonhomogeneous code;fine- lel tasks"),in which one or more operations are applied grained,complicated interactions;and pointer-based data independently to each item in a data collection. structures. Fine-grained data parallelism relies on the indepen- dence of the operations executed concurrently.They PROGRAMMING MODELS should not share input data or results and should be Today,you can express parallelism in a number of differ- executable without coordination.For example: ent ways,each applicable to only a subset of programs. In many cases,it is difficult,without careful design and double A[100][100]; analysis,to know in advance which model is appropriate for a particular problem,and it is always tricky to com- A=A*2 bine several models when a given application does not fit cleanly into a single paradigm. multiplies each element of a 100x100 array by 2 and These parallel programming models differ significantly stores the result in the same array location.Each of the in two dimensions:the granularity of the parallel opera- 10,000 multiplications proceeds independently and with tions and the degree of coupling between these tasks.Dif- out coordination with its peers.This is probably more ferent points in this space favor different programming concurrency than necessary for most computers,and models,so let's examine these axes in turn. its granularity is very fine,so a more practical approach Operations executed in parallel can range from single would partition the matrix into n x m blocks and execute instructions,such as addition or multiplication,to com- the operations on the blocks concurrently. plex programs that take hours or days to run.Obviously, At the other end of the granularity axis,some applica- for small operations,the overhead costs of the parallel tions,such as search engines,share only a large read-only infrastructure are significant;for example,parallel instruc- database,so concurrently processing queries requires no tion execution generally requires hardware support. coordination.Similarly,large simulations,which require Multicore processors reduce communication and syn- many runs to explore a large space of input parameters, chronization costs,as compared with conventional mul- are another embarrassingly parallel application. tiprocessors,which can reduce the overhead burden on smaller pieces of code.Still,in general,the finer grained REGULAR PARALLELISM the task,the more attention must be paid to the cost of The next step beyond independent parallelism is to apply spawning it as a separate task and providing its communi- the same operation to a collection of data when the com- cation and synchronization. putations are mutually dependent.An operation on one The other dimension is the degree of coupling in the piece of data is dependent on another operation if there communication and synchronization between the opera- is communication or synchronization between the two tions.The ideal is none:operations run entirely inde- operations. pendently and produce distinct outputs.In this case,the For example,consider a stencil computation that operations can run in any order,incur no synchroniza- replaces each point in an array,the average of its four tion or communications costs,and are easily programmed nearest neighbors: without the possibility of data races.This state of affairs is rare,as most concurrent programs share data among A[i,j]=(A[i-1,j]+A[i,j-1]+A[i+1,j]+A[i,j+1])/4 their operations.The complexity of ensuring correct and efficient operation increases as the operations become This computation requires careful coordination to ensure more diverse.The easiest case is executing the same code that an array location is read by its neighbors before for each operation.This type of sharing is often regular being replaced by its average.If space is no concern, and can be understood by analyzing only a single task. then the averages can be computed into a new array.In More challenging is irregular parallelism,in which the general,other more structured computation strategies, operations are distinct and the sharing patterns are more such as traversing the array in a diagonal wavefront,will difficult to comprehend. produce the same result,with better cache locality and lower memory consumption. INDEPENDENT PARALLELISM Regular parallel programs may require synchronization Perhaps the simplest and best-behaved model is indepen- or carefully orchestrated execution strategies to produce dent parallelism(sometimes called "embarrassingly paral- the correct results,but unlike general parallelism,the more queue:www.acmqueue.com QUEUE September 2005 57
more queue: www.acmqueue.com QUEUE September 2005 57 to execute concurrently are nonhomogeneous code; finegrained, complicated interactions; and pointer-based data structures. PROGRAMMING MODELS Today, you can express parallelism in a number of different ways, each applicable to only a subset of programs. In many cases, it is difficult, without careful design and analysis, to know in advance which model is appropriate for a particular problem, and it is always tricky to combine several models when a given application does not fit cleanly into a single paradigm. These parallel programming models differ significantly in two dimensions: the granularity of the parallel operations and the degree of coupling between these tasks. Different points in this space favor different programming models, so let’s examine these axes in turn. Operations executed in parallel can range from single instructions, such as addition or multiplication, to complex programs that take hours or days to run. Obviously, for small operations, the overhead costs of the parallel infrastructure are significant; for example, parallel instruction execution generally requires hardware support. Multicore processors reduce communication and synchronization costs, as compared with conventional multiprocessors, which can reduce the overhead burden on smaller pieces of code. Still, in general, the finer grained the task, the more attention must be paid to the cost of spawning it as a separate task and providing its communication and synchronization. The other dimension is the degree of coupling in the communication and synchronization between the operations. The ideal is none: operations run entirely independently and produce distinct outputs. In this case, the operations can run in any order, incur no synchronization or communications costs, and are easily programmed without the possibility of data races. This state of affairs is rare, as most concurrent programs share data among their operations. The complexity of ensuring correct and efficient operation increases as the operations become more diverse. The easiest case is executing the same code for each operation. This type of sharing is often regular and can be understood by analyzing only a single task. More challenging is irregular parallelism, in which the operations are distinct and the sharing patterns are more difficult to comprehend. INDEPENDENT PARALLELISM Perhaps the simplest and best-behaved model is independent parallelism (sometimes called “embarrassingly parallel tasks”), in which one or more operations are applied independently to each item in a data collection. Fine-grained data parallelism relies on the independence of the operations executed concurrently. They should not share input data or results and should be executable without coordination. For example: double A[100][100]; … A = A * 2; multiplies each element of a 100x100 array by 2 and stores the result in the same array location. Each of the 10,000 multiplications proceeds independently and without coordination with its peers. This is probably more concurrency than necessary for most computers, and its granularity is very fine, so a more practical approach would partition the matrix into n x m blocks and execute the operations on the blocks concurrently. At the other end of the granularity axis, some applications, such as search engines, share only a large read-only database, so concurrently processing queries requires no coordination. Similarly, large simulations, which require many runs to explore a large space of input parameters, are another embarrassingly parallel application. REGULAR PARALLELISM The next step beyond independent parallelism is to apply the same operation to a collection of data when the computations are mutually dependent. An operation on one piece of data is dependent on another operation if there is communication or synchronization between the two operations. For example, consider a stencil computation that replaces each point in an array, the average of its four nearest neighbors: A[i, j] = (A[i-1, j] + A[i, j-1] + A[i+1, j] + A[i, j+1]) / 4; This computation requires careful coordination to ensure that an array location is read by its neighbors before being replaced by its average. If space is no concern, then the averages can be computed into a new array. In general, other more structured computation strategies, such as traversing the array in a diagonal wavefront, will produce the same result, with better cache locality and lower memory consumption. Regular parallel programs may require synchronization or carefully orchestrated execution strategies to produce the correct results, but unlike general parallelism, the
Software acquire the lock for that data,perform its computation, and then release the lock so other operations on the data and the can proceed.Unfortunately,although locks work,they pose serious problems for modern software development. Concurrency A fundamental problem with locks is that they are not composable.You can't take two correct lock-based Revolution pieces of code,combine them,and know that the result is still correct.Modern software development relies on the ability to compose libraries into larger programs,and so it is a serious difficulty that we cannot build on lock-based code behind the operations can be analyzed to determine components without examining their implementations. how to execute them concurrently and what data they The composability issue arises primarily from the share.This advantage is sometimes hypothetical,since possibility of deadlock.In its simplest form,deadlock program analysis is an imprecise discipline,and suffi- happens when two locks might be acquired by two tasks ciently complex programs are impossible for compilers to in opposite order:task T1 takes lock L1,task T2 takes lock understand and restructure. L2,and then T1 tries to take L2 while T2 tries to take L1. At the other end of the granularity axis,computa- Both block forever.Because this can happen any time tions on a Web site are typically independent except for two locks can be taken in opposite order,calling into accesses to a common database.The computations run code you don't control while holding a lock is a recipe for in parallel without a significant amount of coordination deadlock. beyond the database transactions.This ensures that con- That is exactly what extensible frameworks do,how- current access to the same data is consistently resolved. ever,as they call virtual functions while holding a lock. Today's best-of-breed commercial application frameworks UNSTRUCTURED PARALLELISM all do this,including the .NET Frameworks and the Java The most general,and least disciplined,form of parallel- standard libraries.We have gotten away with it because ism is when the concurrent computations differ,so that developers aren't yet writing lots of heavily concur- their data accesses are not predictable and need to be rent programs that do frequent locking.Many complex coordinated through explicit synchronization.This is the models attempt to deal with the deadlock problem-with form of parallelism most common in programs written backoff-and-retry protocols,for example-but they using threads and explicit synchronization,in which require strict discipline by programmers,and some intro- each thread has a distinct role in the program.In general, duce their own problems(e.g.,livelock). it is difficult to say anything specific about this form of Techniques for avoiding deadlock by guarantee- parallelism,except that conflicting data accesses in two ing locks will always be acquired in a safe order do not threads need explicit synchronization;otherwise,the compose,either.For example,lock leveling and lock program will be nondeterministic. hierarchies prevent programs from acquiring locks in con- flicting order by requiring that all locks at a given level be THE PROBLEM OF SHARED STATE AND acquired at once in a predetermined order,and that while WHY LOCKS ARENT THE ANSWER holding locks at one level,you can acquire additional Another challenging aspect of unstructured parallelism is locks only at higher levels.Such techniques work inside sharing unstructured state.A client application typically a module or framework maintained by a team(although manipulates shared memory organized as unpredictably they're underused in practice),but they assume control interconnected graphs of objects. of an entire code base.That severely restricts their use in When two tasks try to access the same object,and one extensible frameworks,add-in systems,and other situa- could modify its state,if we do nothing to coordinate tions that bring together code written by different parties. the tasks,we have a data race.Races are bad,because the A more basic problem with locks is that they rely on concurrent tasks can read and write inconsistent or cor- programmers to strictly follow conventions.The rela- rupted values. tionship between a lock and the data that it protects is There are a rich variety of synchronization devices implicit,and it is preserved only through programmer that can prevent races.The simplest of these is a lock. discipline.A programmer must always remember to take Each task that wants to access a piece of shared data must the right lock at the right point before touching shared 58 September 2005 QUEUE rants:feedback@acmqueue.com
58 September 2005 QUEUE rants: feedback@acmqueue.com code behind the operations can be analyzed to determine how to execute them concurrently and what data they share. This advantage is sometimes hypothetical, since program analysis is an imprecise discipline, and suffi- ciently complex programs are impossible for compilers to understand and restructure. At the other end of the granularity axis, computations on a Web site are typically independent except for accesses to a common database. The computations run in parallel without a significant amount of coordination beyond the database transactions. This ensures that concurrent access to the same data is consistently resolved. UNSTRUCTURED PARALLELISM The most general, and least disciplined, form of parallelism is when the concurrent computations differ, so that their data accesses are not predictable and need to be coordinated through explicit synchronization. This is the form of parallelism most common in programs written using threads and explicit synchronization, in which each thread has a distinct role in the program. In general, it is difficult to say anything specific about this form of parallelism, except that conflicting data accesses in two threads need explicit synchronization; otherwise, the program will be nondeterministic. THE PROBLEM OF SHARED STATE, AND WHY LOCKS AREN’T THE ANSWER Another challenging aspect of unstructured parallelism is sharing unstructured state. A client application typically manipulates shared memory organized as unpredictably interconnected graphs of objects. When two tasks try to access the same object, and one could modify its state, if we do nothing to coordinate the tasks, we have a data race. Races are bad, because the concurrent tasks can read and write inconsistent or corrupted values. There are a rich variety of synchronization devices that can prevent races. The simplest of these is a lock. Each task that wants to access a piece of shared data must acquire the lock for that data, perform its computation, and then release the lock so other operations on the data can proceed. Unfortunately, although locks work, they pose serious problems for modern software development. A fundamental problem with locks is that they are not composable. You can’t take two correct lock-based pieces of code, combine them, and know that the result is still correct. Modern software development relies on the ability to compose libraries into larger programs, and so it is a serious difficulty that we cannot build on lock-based components without examining their implementations. The composability issue arises primarily from the possibility of deadlock. In its simplest form, deadlock happens when two locks might be acquired by two tasks in opposite order: task T1 takes lock L1, task T2 takes lock L2, and then T1 tries to take L2 while T2 tries to take L1. Both block forever. Because this can happen any time two locks can be taken in opposite order, calling into code you don’t control while holding a lock is a recipe for deadlock. That is exactly what extensible frameworks do, however, as they call virtual functions while holding a lock. Today’s best-of-breed commercial application frameworks all do this, including the .NET Frameworks and the Java standard libraries. We have gotten away with it because developers aren’t yet writing lots of heavily concurrent programs that do frequent locking. Many complex models attempt to deal with the deadlock problem—with backoff-and-retry protocols, for example—but they require strict discipline by programmers, and some introduce their own problems (e.g., livelock). Techniques for avoiding deadlock by guaranteeing locks will always be acquired in a safe order do not compose, either. For example, lock leveling and lock hierarchies prevent programs from acquiring locks in con- flicting order by requiring that all locks at a given level be acquired at once in a predetermined order, and that while holding locks at one level, you can acquire additional locks only at higher levels. Such techniques work inside a module or framework maintained by a team (although they’re underused in practice), but they assume control of an entire code base. That severely restricts their use in extensible frameworks, add-in systems, and other situations that bring together code written by different parties. A more basic problem with locks is that they rely on programmers to strictly follow conventions. The relationship between a lock and the data that it protects is implicit, and it is preserved only through programmer discipline. A programmer must always remember to take the right lock at the right point before touching shared Software and the Concurrency Revolution
data.Conventions governing locks in a program are programs as a series of explicitly atomic blocks,which sometimes written down,but they're almost never stated appear to execute indivisibly,so concurrently execut- precisely enough for a tool to check them. ing operations see the shared state strictly before or after Locks have other more subtle problems.Locking is an atomic action executes.Although many people view a global program property,which is difficult to localize transactional memory as a promising direction,it is still a to a single procedure,class,or framework.All code that subject of active research. accesses a piece of shared state must know and obey the locking convention,regardless of who wrote the code or WHAT WE NEED IN PROGRAMMING LANGUAGES where it resides. We need higher-level language abstractions,including Attempts to make synchronization a local property evolutionary extensions to current imperative languages, do not work all the time.Consider a popular solution so that existing applications can incrementally become such as lava's synchronized methods.Each of an object's concurrent.The programming model must make concur- methods can take a lock on the object,so no two threads rency easy to understand and reason about,not only dur- can directly manipulate the object's state simultaneously. ing initial development but also during maintenance. As long as an object's state is accessed only by its meth- ods and programmers remember to add the synchronized EXPLICIT,IMPLICIT.AND AUTOMATIC PARALLELIZATION declaration,this approach works. Explicit programming models provide abstractions that There are at least three major problems with synchro- require programmers to state exactly where concurrency nized methods.First,they are not appropriate for types can occur.The major advantage of expressing concur- whose methods call virtual functions on other objects rency explicitly is that it allows programmers to take full (e.g.,Java's Vector and .NET's SyncHashTable),because advantage of their application domain knowledge and calling into third-party code while holding a lock opens express the full potential concurrency in the application. the possibility of deadlock.Second,synchronized methods It has drawbacks,however.It requires new higher-level can perform too much locking,by acquiring and releas- programming abstractions and a higher level of program- ing locks on all object instances,even those never shared mer proficiency in the presence of shared data. across threads (typically the majority).Third,synchro- Implicit programming models hide concurrency nized methods can also perform too little locking,by inside libraries or behind APIs,so that a caller retains a not preserving atomicity when a program calls multiple sequential worldview while the library performs the work methods on an object or on different objects.As a simple in parallel.This approach lets naive programmers safely example of the latter,consider a banking transfer: use concurrency.Its main drawback is that some kinds of concurrency-related performance gains can't be realized account1.Credit(amount);account2.Debit(amount) this way.Also,it is difficult to design interfaces that do not expose the concurrency in any circumstance-for Per-object locking protects each call,but does not prevent example,when a program applies the operation to several another thread from seeing the inconsistent state of the instances of the same data. two accounts between the calls.Operations of this type, Another widely studied approach is automatic paral- whose atomicity does not correspond to a method call lelization,where a compiler attempts to find parallel- boundary,require additional,explicit synchronization. ism,typically in programs written in a conventional language such as Fortran.As appealing as it may seem, LOCK ALTERNATIVES this approach has not worked well in practice.Accurate For completeness,we note two major alternatives to program analysis is necessary to understand a program's locks.The first is lock-free programming.By relying on a potential behavior.This analysis is challenging for simple deep knowledge of a processor's memory model,it is pos- languages such as Fortran,and far more difficult for sible to create data structures that can be shared without languages,such as C,that manipulate pointer-based data. explicit locking.Lock-free programming is difficult and Moreover,sequential programs often use sequential algo- fragile;inventing a new lock-free data-structure imple- rithms and contain little concurrency. mentation is still often a publishable result. The second alternative is transactional memory,which IMPERATIVE AND FUNCTIONAL LANGUAGES. brings the central idea of transactions from databases Popular commercial programming languages (e.g.,Pascal, into programming languages.Programmers write their C,C++,Java,C#)are imperative languages in which a more queue:www.acmqueue.com QUEUE September 2005 59
more queue: www.acmqueue.com QUEUE September 2005 59 data. Conventions governing locks in a program are sometimes written down, but they’re almost never stated precisely enough for a tool to check them. Locks have other more subtle problems. Locking is a global program property, which is difficult to localize to a single procedure, class, or framework. All code that accesses a piece of shared state must know and obey the locking convention, regardless of who wrote the code or where it resides. Attempts to make synchronization a local property do not work all the time. Consider a popular solution such as Java’s synchronized methods. Each of an object’s methods can take a lock on the object, so no two threads can directly manipulate the object’s state simultaneously. As long as an object’s state is accessed only by its methods and programmers remember to add the synchronized declaration, this approach works. There are at least three major problems with synchronized methods. First, they are not appropriate for types whose methods call virtual functions on other objects (e.g., Java’s Vector and .NET’s SyncHashTable), because calling into third-party code while holding a lock opens the possibility of deadlock. Second, synchronized methods can perform too much locking, by acquiring and releasing locks on all object instances, even those never shared across threads (typically the majority). Third, synchronized methods can also perform too little locking, by not preserving atomicity when a program calls multiple methods on an object or on different objects. As a simple example of the latter, consider a banking transfer: account1.Credit(amount); account2.Debit(amount) Per-object locking protects each call, but does not prevent another thread from seeing the inconsistent state of the two accounts between the calls. Operations of this type, whose atomicity does not correspond to a method call boundary, require additional, explicit synchronization. LOCK ALTERNATIVES For completeness, we note two major alternatives to locks. The first is lock-free programming. By relying on a deep knowledge of a processor’s memory model, it is possible to create data structures that can be shared without explicit locking. Lock-free programming is difficult and fragile; inventing a new lock-free data-structure implementation is still often a publishable result. The second alternative is transactional memory, which brings the central idea of transactions from databases into programming languages. Programmers write their programs as a series of explicitly atomic blocks, which appear to execute indivisibly, so concurrently executing operations see the shared state strictly before or after an atomic action executes. Although many people view transactional memory as a promising direction, it is still a subject of active research. WHAT WE NEED IN PROGRAMMING LANGUAGES We need higher-level language abstractions, including evolutionary extensions to current imperative languages, so that existing applications can incrementally become concurrent. The programming model must make concurrency easy to understand and reason about, not only during initial development but also during maintenance. EXPLICIT, IMPLICIT, AND AUTOMATIC PARALLELIZATION Explicit programming models provide abstractions that require programmers to state exactly where concurrency can occur. The major advantage of expressing concurrency explicitly is that it allows programmers to take full advantage of their application domain knowledge and express the full potential concurrency in the application. It has drawbacks, however. It requires new higher-level programming abstractions and a higher level of programmer proficiency in the presence of shared data. Implicit programming models hide concurrency inside libraries or behind APIs, so that a caller retains a sequential worldview while the library performs the work in parallel. This approach lets naïve programmers safely use concurrency. Its main drawback is that some kinds of concurrency-related performance gains can’t be realized this way. Also, it is difficult to design interfaces that do not expose the concurrency in any circumstance—for example, when a program applies the operation to several instances of the same data. Another widely studied approach is automatic parallelization, where a compiler attempts to find parallelism, typically in programs written in a conventional language such as Fortran. As appealing as it may seem, this approach has not worked well in practice. Accurate program analysis is necessary to understand a program’s potential behavior. This analysis is challenging for simple languages such as Fortran, and far more difficult for languages, such as C, that manipulate pointer-based data. Moreover, sequential programs often use sequential algorithms and contain little concurrency. IMPERATIVE AND FUNCTIONAL LANGUAGES. Popular commercial programming languages (e.g., Pascal, C, C++, Java, C#) are imperative languages in which a