30 Concurrency,distribution, client-server and the Internet ike humans.computers can team up with their peers to achieve resus that oof them could obtain alone;unlike humans,they can do many things at once (or with the appearance of simultaneity),and do all of them well.So far,however,the discussion has implicitly assumed that the computation is sequential-proceeds along a single thread of control.We should now see what happens when this assumption no longer holds,as we move to concurrent (also known as parallel)computation. Concurrency is not a new subject,but for a long time interest in it remained mostly confined to four application areas:operating systems,networking,implementation of database management systems,and high-speed scientific software.Although strategic and prestigious,these tasks involve only a small subset of the software development community. Things have changed.Concurrency is quickly becoming a required component ofjust about every type of application,including some which had traditionally been thought of as fundamentally sequential in nature.Beyond mere concurrency,our systems,whether or not client-server,must increasingly become distributed over networks,including the network of networks-the Internet.This evolution gives particular urgency to the central question of this chapter:can we apply object-oriented ideas in a concurrent and distributed context? Not only is this possible:object technology can help us develop concurrent and distributed applications simply and elegantly. 30.1 A SNEAK PREVIEW As usual,this discussion will not throw a pre-cooked answer at you,but instead will carefully build a solution from a detailed analysis of the problem and an exploration of possible avenues,including a few dead ends.Although necessary to make you understand the techniques in depth,this thoroughness might lead you to believe that they are complex; that would be inexcusable,since the concurrency mechanism on which we will finally settle is in fact characterized by almost incredible simplicity.To avoid this risk,we will begin by examining a summary of the mechanism,without any of the rationale. Warning. If you hate"spoilers",preferring to start with the full statement of the issues and to let the SPOILER!(The next drama proceed to its denouement step by step and inference by inference,ignore the one- section is 30.2,page page summary that follows and skip directly to the next section
30 Concurrency, distribution, client-server and the Internet Like humans, computers can team up with their peers to achieve results that none of them could obtain alone; unlike humans, they can do many things at once (or with the appearance of simultaneity), and do all of them well. So far, however, the discussion has implicitly assumed that the computation is sequential — proceeds along a single thread of control. We should now see what happens when this assumption no longer holds, as we move to concurrent (also known as parallel) computation. Concurrency is not a new subject, but for a long time interest in it remained mostly confined to four application areas: operating systems, networking, implementation of database management systems, and high-speed scientific software. Although strategic and prestigious, these tasks involve only a small subset of the software development community. Things have changed. Concurrency is quickly becoming a required component of just about every type of application, including some which had traditionally been thought of as fundamentally sequential in nature. Beyond mere concurrency, our systems, whether or not client-server, must increasingly become distributed over networks, including the network of networks — the Internet. This evolution gives particular urgency to the central question of this chapter: can we apply object-oriented ideas in a concurrent and distributed context? Not only is this possible: object technology can help us develop concurrent and distributed applications simply and elegantly. 30.1 A SNEAK PREVIEW As usual, this discussion will not throw a pre-cooked answer at you, but instead will carefully build a solution from a detailed analysis of the problem and an exploration of possible avenues, including a few dead ends. Although necessary to make you understand the techniques in depth, this thoroughness might lead you to believe that they are complex; that would be inexcusable, since the concurrency mechanism on which we will finally settle is in fact characterized by almost incredible simplicity. To avoid this risk, we will begin by examining a summary of the mechanism, without any of the rationale. If you hate “spoilers”, preferring to start with the full statement of the issues and to let the drama proceed to its dénouement step by step and inference by inference, ignore the onepage summary that follows and skip directly to the next section. Warning: SPOILER!(The next section is 30.2, page
952 CONCURRENCY,DISTRIBUTION,CLIENT-SERVER AND THE INTERNET $30.1 The extension covering full-fledged concurrency and distribution will be as minimal as it can get starting from a sequential notation:a single new keyword-separate.How is this possible?We use the fundamental scheme ofO-O computation:feature call,x.f(a), executed on behalf of some object Ol and calling on the object O2 attached to x,with the argument a.But instead of a single processor that handles operations on all objects,we may now rely on different processors for Ol and 02-so that the computation on Ol can move ahead without waiting for the call to terminate,since another processor handles it. Because the effect of a call now depends on whether the objects are handled by the same processor or different ones,the software text must tell us unambiguously what the intent is for any x.Hence the need for the new keyword:rather than just x:SOME TYPE, we declare x:separate SOME TYPE to indicate that x is handled by a different processor, so that calls of targetx can proceed in parallel with the rest of the computation.With such a declaration,any creation instruction !x.make (...)will spawn offa new processor-a new thread of control-to handle future calls on x. Nowhere in the software text should we have to specify which processor to use.All we state,through the separate declaration,is that two objects are handled by different processors,since this radically affects the system's semantics.Actual processor assignment can wait until run time.Nor do we settle too early on the exact nature of processors:a processor can be implemented by a piece of hardware(a computer),but just as well by a task(process)of the operating system,or,on a multithreaded OS,just a thread of such a task.Viewed by the software,"processor"is an abstract concept,you can execute the same concurrent application on widely different architectures (time-sharing on one computer,distributed network with many computers,threads within one Unix or Windows task...)without any change to its source text.All you will change is a "Concurrency Configuration File"which specifies the last-minute mapping of abstract processors to physical resources. We need to specify synchronization constraints.The conventions are straightforward: No special mechanism is required for a client to resynchronize with its supplier after a separate call x.f(a)has gone off in parallel.The client will wait when and if it needs to:when it requests information on the object through a query call,as in value=x.some query.This automatic mechanism is called wait by necessity. To obtain exclusive access to a separate object O2,it suffices to use the attached entity a as an argument to the corresponding call,as in r(a). A routine precondition involving a separate argument such as a causes the client to wait until the precondition holds. To guarantee that we can control our software and predict the result (in particular, rest assured that class invariants will be maintained),we must allow the processor in charge of an object to execute at most one routine at any given time. We may,however,need to interrupt the execution of a routine to let a new,high- priority client take over.This will cause an exception,so that the spurned client can take the appropriate corrective measures-most likely retrying after a while. This covers most of the mechanism,which will enable us to build the most advanced concurrent and distributed applications through the full extent of O-O techniques,from A complete sum- multiple inheritance to Design by Contract-as we will now study in detail,forgetting mary appears in for a while all that we have read in this short preview. 3011,page1025
952 CONCURRENCY, DISTRIBUTION, CLIENT-SERVER AND THE INTERNET §30.1 The extension covering full-fledged concurrency and distribution will be as minimal as it can get starting from a sequential notation: a single new keyword — separate. How is this possible? We use the fundamental scheme of O-O computation: feature call, x ● f (a), executed on behalf of some object O1 and calling f on the object O2 attached to x, with the argument a. But instead of a single processor that handles operations on all objects, we may now rely on different processors for O1 and O2 — so that the computation on O1 can move ahead without waiting for the call to terminate, since another processor handles it. Because the effect of a call now depends on whether the objects are handled by the same processor or different ones, the software text must tell us unambiguously what the intent is for any x. Hence the need for the new keyword: rather than just x: SOME_TYPE, we declare x: separate SOME_TYPE to indicate that x is handled by a different processor, so that calls of target x can proceed in parallel with the rest of the computation. With such a declaration, any creation instruction !! x ● make (…) will spawn off a new processor — a new thread of control — to handle future calls on x. Nowhere in the software text should we have to specify which processor to use. All we state, through the separate declaration, is that two objects are handled by different processors, since this radically affects the system’s semantics. Actual processor assignment can wait until run time. Nor do we settle too early on the exact nature of processors: a processor can be implemented by a piece of hardware (a computer), but just as well by a task (process) of the operating system, or, on a multithreaded OS, just a thread of such a task. Viewed by the software, “processor” is an abstract concept; you can execute the same concurrent application on widely different architectures (time-sharing on one computer, distributed network with many computers, threads within one Unix or Windows task…) without any change to its source text. All you will change is a “Concurrency Configuration File” which specifies the last-minute mapping of abstract processors to physical resources. We need to specify synchronization constraints. The conventions are straightforward: • No special mechanism is required for a client to resynchronize with its supplier after a separate call x ● f (a) has gone off in parallel. The client will wait when and if it needs to: when it requests information on the object through a query call, as in value := x ● some_query. This automatic mechanism is called wait by necessity. • To obtain exclusive access to a separate object O2, it suffices to use the attached entity a as an argument to the corresponding call, as in r (a). • A routine precondition involving a separate argument such as a causes the client to wait until the precondition holds. • To guarantee that we can control our software and predict the result (in particular, rest assured that class invariants will be maintained), we must allow the processor in charge of an object to execute at most one routine at any given time. • We may, however, need to interrupt the execution of a routine to let a new, highpriority client take over. This will cause an exception, so that the spurned client can take the appropriate corrective measures — most likely retrying after a while. This covers most of the mechanism, which will enable us to build the most advanced concurrent and distributed applications through the full extent of O-O techniques, from multiple inheritance to Design by Contract — as we will now study in detail, forgetting for a while all that we have read in this short preview. A complete summary appears in 30.11, page 1025
$30.2 THE RISE OF CONCURRENCY 953 30.2 THE RISE OF CONCURRENCY Back to square one.We must first review the various forms of concurrency,to understand how the evolution of our field requires most software developers to make concurrency part of their mindset.In addition to the traditional concepts of multiprocessing and multiprogramming,the past few years have introduced two innovative concepts:object request brokers and remote execution through the Net. Multiprocessing More and more,we want to use the formidable amount of computing power available around us;less and less,we are willing to wait for the computer(although we have become quite com fortable with the idea that the computer is waiting for us).So if one processing unit would not bring us quickly enough the result that we need,we will want to rely on several units working in parallel.This form of concurrency is known as multiprocessing. Spectacular applications of multiprocessing have involved researchers relying on hundreds of computers scattered over the Internet,at times when the computers' (presumably consenting)owners did not need them,to solve computationally intensive problems such as breaking cryptographic algorithms.Such efforts do not just apply to computing research:Hollywood's insatiable demand for realistic computer graphics has played its part in fueling progress in this area;the preparation of the movie Toy Story,one of the first to involve artificial characters only (only the voices are human),relied at some point on a network of more than one hundred high-end workstations-more economical, it seems,than one hundred professional animators. Multiprocessing is also ubiquitous in high-speed scientific computing,to solve ever larger problems of physics,engineering,meteorology,statistics,investment banking More routinely,many computing installations use some form of load balancing: automatically dispatching computations among the various computers available at any particular time on the local network of an organization. Another form of multiprocessing is the computing architecture known as client- server computing,which assigns various specialized roles to the computers on a network: the biggest and most expensive machines,of which a typical company network will have just one or a few,are "servers"handling shared databases,heavy computations and other strategic central resources;the cheaper machines,ubiquitously located wherever there is an end user,handle decentralizable tasks such as the human interface and simple com putations;they forward to the servers any task that exceeds their competence. The current popularity of the client-server approach is a swing of the pendulum away from the trend of the preceding decade.Initially (nineteen-sixties and seventies) architectures were centralized,forcing users to compete for resources.The personal computer and workstation revolution of the eighties was largely about empowering users with resources theretofore reserved to the Center(the "glass house"in industry jargon). Then they discovered the obvious:a personal computer cannot do everything,and some resources must be shared.Hence the emergence of client-server architectures in the nineties.The inevitable cynical comment-that we are back to the one-mainframe- many-terminals architecture ofour youth,only with more expensive terminals now called "client workstations"-is not really justified:the industry is simply searching,through trial and error,for the proper tradeoff between decentralization and sharing
§30.2 THE RISE OF CONCURRENCY 953 30.2 THE RISE OF CONCURRENCY Back to square one. We must first review the various forms of concurrency, to understand how the evolution of our field requires most software developers to make concurrency part of their mindset. In addition to the traditional concepts of multiprocessing and multiprogramming, the past few years have introduced two innovative concepts: object request brokers and remote execution through the Net. Multiprocessing More and more, we want to use the formidable amount of computing power available around us; less and less, we are willing to wait for the computer (although we have become quite comfortable with the idea that the computer is waiting for us). So if one processing unit would not bring us quickly enough the result that we need, we will want to rely on several units working in parallel. This form of concurrency is known as multiprocessing. Spectacular applications of multiprocessing have involved researchers relying on hundreds of computers scattered over the Internet, at times when the computers’ (presumably consenting) owners did not need them, to solve computationally intensive problems such as breaking cryptographic algorithms. Such efforts do not just apply to computing research: Hollywood’s insatiable demand for realistic computer graphics has played its part in fueling progress in this area; the preparation of the movie Toy Story, one of the first to involve artificial characters only (only the voices are human), relied at some point on a network of more than one hundred high-end workstations — more economical, it seems, than one hundred professional animators. Multiprocessing is also ubiquitous in high-speed scientific computing, to solve ever larger problems of physics, engineering, meteorology, statistics, investment banking. More routinely, many computing installations use some form of load balancing: automatically dispatching computations among the various computers available at any particular time on the local network of an organization. Another form of multiprocessing is the computing architecture known as clientserver computing, which assigns various specialized roles to the computers on a network: the biggest and most expensive machines, of which a typical company network will have just one or a few, are “servers” handling shared databases, heavy computations and other strategic central resources; the cheaper machines, ubiquitously located wherever there is an end user, handle decentralizable tasks such as the human interface and simple computations; they forward to the servers any task that exceeds their competence. The current popularity of the client-server approach is a swing of the pendulum away from the trend of the preceding decade. Initially (nineteen-sixties and seventies) architectures were centralized, forcing users to compete for resources. The personal computer and workstation revolution of the eighties was largely about empowering users with resources theretofore reserved to the Center (the “glass house” in industry jargon). Then they discovered the obvious: a personal computer cannot do everything, and some resources must be shared. Hence the emergence of client-server architectures in the nineties. The inevitable cynical comment — that we are back to the one-mainframemany-terminals architecture of our youth, only with more expensive terminals now called “client workstations” — is not really justified: the industry is simply searching, through trial and error, for the proper tradeoff between decentralization and sharing
954 CONCURRENCY,DISTRIBUTION,CLIENT-SERVER AND THE INTERNET $30.2 Multiprogramming The other main form of concurrency is multiprogramming,which involves a single computer working on several tasks at once. If we consider general-purpose systems(excluding processors that are embedded in an application device,be it a washing machine or an airplane instrument,and single- mindedly repeat a fixed set of operations),computers are almost always multi- programmed,performing operating system tasks in parallel with application tasks.In a strict form of multiprogramming the parallelism is apparent rather than real:at any single time the processing unit is actually working on just one job;but the time to switch between jobs is so short that an outside observer can believe they proceed concurrently.In addition, the processing unit itself may do several things in parallel(as in the advance fetch schemes of many computers,where each clock cycle loads the next instruction at the same time it executes the current one),or may actually be a combination of several processing units, so that multiprogramm ing becomes intertwined with multiprocessing A common application of multiprogramming is time-sharing,allowing a single machine to serve several users at once.But except in the case of very powerful "mainframe"computers this idea is considered much less attractive now than it was when computers were a precious rarity.Today we consider our time to be the more valuable resource,so we want the system to do several things at once just for us.In particular,multi- windowing user interfaces allow several applications to proceed in parallel:in one window we browse the Web,in another we edit a document,in yet another we compile and test some software.All this requires powerful concurrency mechanisms. Providing each computer user with a multi-windowing,multiprogramming interface is the responsibility of the operating system.But increasingly the users of the software we develop want to have concurrency within one application.The reason is always the same: they know that computing power is available by the bountiful,and they do not want to wait idly.So if it takes a while to load incoming messages in an e-mail system,you will want to be able to send an outgoing message while this operation proceeds.With a good Web browser you can access a new site while loading pages from another.In a stock trading system,you may at any single time be accessing market information from several stock exchanges,buying here,selling there,and monitoring a client's portfolio It is this need for intra-application concurrency which has suddenly brought the whole subject of concurrent computing to the forefront of software development and made it of interest far beyond its original constituencies.Meanwhile,all the traditional applications remain as important as ever,with new developments in operating systems,the Internet,local area networks,and scientific computing-where the continual quest for speed demands ever higher levels of multiprocessing
954 CONCURRENCY, DISTRIBUTION, CLIENT-SERVER AND THE INTERNET §30.2 Multiprogramming The other main form of concurrency is multiprogramming, which involves a single computer working on several tasks at once. If we consider general-purpose systems (excluding processors that are embedded in an application device, be it a washing machine or an airplane instrument, and singlemindedly repeat a fixed set of operations), computers are almost always multiprogrammed, performing operating system tasks in parallel with application tasks. In a strict form of multiprogramming the parallelism is apparent rather than real: at any single time the processing unit is actually working on just one job; but the time to switch between jobs is so short that an outside observer can believe they proceed concurrently. In addition, the processing unit itself may do several things in parallel (as in the advance fetch schemes of many computers, where each clock cycle loads the next instruction at the same time it executes the current one), or may actually be a combination of several processing units, so that multiprogramming becomes intertwined with multiprocessing. A common application of multiprogramming is time-sharing, allowing a single machine to serve several users at once. But except in the case of very powerful “mainframe” computers this idea is considered much less attractive now than it was when computers were a precious rarity. Today we consider our time to be the more valuable resource, so we want the system to do several things at once just for us. In particular, multiwindowing user interfaces allow several applications to proceed in parallel: in one window we browse the Web, in another we edit a document, in yet another we compile and test some software. All this requires powerful concurrency mechanisms. Providing each computer user with a multi-windowing, multiprogramming interface is the responsibility of the operating system. But increasingly the users of the software we develop want to have concurrency within one application. The reason is always the same: they know that computing power is available by the bountiful, and they do not want to wait idly. So if it takes a while to load incoming messages in an e-mail system, you will want to be able to send an outgoing message while this operation proceeds. With a good Web browser you can access a new site while loading pages from another. In a stock trading system, you may at any single time be accessing market information from several stock exchanges, buying here, selling there, and monitoring a client’s portfolio. It is this need for intra-application concurrency which has suddenly brought the whole subject of concurrent computing to the forefront of software development and made it of interest far beyond its original constituencies. Meanwhile, all the traditional applications remain as important as ever, with new developments in operating systems, the Internet, local area networks, and scientific computing — where the continual quest for speed demands ever higher levels of multiprocessing
$30.2 THE RISE OF CONCURRENCY 955 Object request brokers Another important recent development has been the emergence of the CORBA proposal from the Object Management Group,and the OLE 2/ActiveX architecture from Microsoft. Although the precise goals,details and markets differ,both efforts promise substantial progress towards distributed computing. The general purpose is to allow applications to access each other's objects and services as conveniently as possible,either locally or across a network.The CORBA effort (more precisely its CORBA 2 stage,clearly the interesting one)has also placed particular emphasis on interoperability: CORBA-aware applications can cooperate even if they are based on"object request brokers"from different vendors. Interoperability also applies to the language level:an application written in one of the supported languages can access objects from an application written in another.The interaction goes through an intermediate language called IDL (Interface Definition Language);supported languages have an official IDL binding,which maps the constructs of the language to those of IDL. IDL is a common-denominator O-O language centered on the notion ofinterface.An IDL interface for a class is similar in spirit to a short form,although more rudimentary (IDL in particular does not support assertions);it describes the set of features available on a certain abstraction.From a class written in an O-O language such as the notation of this book,tools will derive an IDL interface,making the class and its instances of interest to client software.A client written in the same language or another can,through an IDL interface,access across a network the features provided by such a supplier. Remote execution Another development of the late nineties is the mechanism for remote execution through the World-Wide Web. The first Web browsers made it not just possible but also convenient to explore information stored on remote computers anywhere in the world,and to follow logical connections,or hyperlinks,at the click of a button.But this was a passive mechanism: someone prepared some information,and everyone else accessed it read-only The next step was to move to an active setup where clicking on a link actually triggers execution of an operation.This assumes the presence,within the Web browser,of an execution engine which can recognize the downloaded information as executable code, and execute it.The execution engine can be a built-in part of the browser,or it may be dynamically attached to it in response to the downloading of information of the corresponding type.This latter solution is known as a plug-in mechanism and assumes that users interested in a particular execution mechanism can download the execution engine,usually free,from the Internet
§30.2 THE RISE OF CONCURRENCY 955 Object request brokers Another important recent development has been the emergence of the CORBA proposal from the Object Management Group, and the OLE 2/ActiveX architecture from Microsoft. Although the precise goals, details and markets differ, both efforts promise substantial progress towards distributed computing. The general purpose is to allow applications to access each other’s objects and services as conveniently as possible, either locally or across a network. The CORBA effort (more precisely its CORBA 2 stage, clearly the interesting one) has also placed particular emphasis on interoperability: • CORBA-aware applications can coöperate even if they are based on “object request brokers” from different vendors. • Interoperability also applies to the language level: an application written in one of the supported languages can access objects from an application written in another. The interaction goes through an intermediate language called IDL (Interface Definition Language); supported languages have an official IDL binding, which maps the constructs of the language to those of IDL. IDL is a common-denominator O-O language centered on the notion of interface. An IDL interface for a class is similar in spirit to a short form, although more rudimentary (IDL in particular does not support assertions); it describes the set of features available on a certain abstraction. From a class written in an O-O language such as the notation of this book, tools will derive an IDL interface, making the class and its instances of interest to client software. A client written in the same language or another can, through an IDL interface, access across a network the features provided by such a supplier. Remote execution Another development of the late nineties is the mechanism for remote execution through the World-Wide Web. The first Web browsers made it not just possible but also convenient to explore information stored on remote computers anywhere in the world, and to follow logical connections, or hyperlinks, at the click of a button. But this was a passive mechanism: someone prepared some information, and everyone else accessed it read-only. The next step was to move to an active setup where clicking on a link actually triggers execution of an operation. This assumes the presence, within the Web browser, of an execution engine which can recognize the downloaded information as executable code, and execute it. The execution engine can be a built-in part of the browser, or it may be dynamically attached to it in response to the downloading of information of the corresponding type. This latter solution is known as a plug-in mechanism and assumes that users interested in a particular execution mechanism can download the execution engine, usually free, from the Internet