Rolf Rabenseifner.Steering committee,Terms and Definitions,and Fortran Bindings. Deprecated Functions,Annex Change-Log,and Annex Language Bindings .Richard L.Graham,Steering committee,Meeting Convener 5 Jeffrey M.Squyres,Language Bindings and MPI-3.1 Secretary .Daniel Holmes,Point-to-Point Communication 8 George Bosilca,Datatypes and Environmental Management 10 Torsten Hoefler,Collective Communication and Process Topologies 11 .Pavan Balaji,Groups,Contexts,and Communicators,and External Interfaces 13 13 Jeff Hammond,The Info Object 14 15 David Solt,Process Creation and Management ·Quincey Koziol,I/o 1 1线 .Kathryn Mohror,Tool Support raieev Thakur.One-Sided communications 21 somo thece participants who attended MP Foru e-ma Charles archer Pavan balaii Purushotham v.Bangalore 25 Brian Barrett Wesley Bland Michael blocksome George bosilca Aurelien bouteiller Devendar Bureddy Yohann Burette Mohamad Chaarawi Iames Dinan Dmitry Durnov Edgar gabrie】 Todd Gamblin Balazs Cerofi poddy Cillios David goodell Manjunath Gorentla Venkata Richard L.Graham Rvan E.Gra Khaled hamidouche loff Hammond Marc-andre her ns Nathan Hjelm Torsten Hoefer Daniel Holmes Hidevuki Jitsumoto Jithin Jos Takahiro Kawashima Chulho Kin Michael Knobloch Alice konis Koziol eer Kumar Joshua Ladd acio L Huiwei Lu me mercier Adam Moody mo take Nak Steve ova J.Pena 41 rd prit Rolf Rab Nicholas Radcliffe Davide Ro h Raja Schulz Sa gmin Seo Christian Siebert Brian Smith David Solt 46 Jeffrey M.Squyres 47 xxxi
• Rolf Rabenseifner, Steering committee, Terms and Definitions, and Fortran Bindings, Deprecated Functions, Annex Change-Log, and Annex Language Bindings • Richard L. Graham, Steering committee, Meeting Convener • Jeffrey M. Squyres, Language Bindings and MPI-3.1 Secretary • Daniel Holmes, Point-to-Point Communication • George Bosilca, Datatypes and Environmental Management • Torsten Hoefler, Collective Communication and Process Topologies • Pavan Balaji, Groups, Contexts, and Communicators, and External Interfaces • Jeff Hammond, The Info Object • David Solt, Process Creation and Management • Quincey Koziol, I/O • Kathryn Mohror, Tool Support • Rajeev Thakur, One-Sided Communications The following list includes some of the active participants who attended MPI Forum meetings or participated in the e-mail discussions. Charles Archer Pavan Balaji Purushotham V. Bangalore Brian Barrett Wesley Bland Michael Blocksome George Bosilca Aurelien Bouteiller Devendar Bureddy Yohann Burette Mohamad Chaarawi Alexey Cheptsov James Dinan Dmitry Durnov Thomas Francois Edgar Gabriel Todd Gamblin Balazs Gerofi Paddy Gillies David Goodell Manjunath Gorentla Venkata Richard L. Graham Ryan E. Grant William Gropp Khaled Hamidouche Jeff Hammond Amin Hassani Marc-Andr´e Hermanns Nathan Hjelm Torsten Hoefler Daniel Holmes Atsushi Hori Yutaka Ishikawa Hideyuki Jitsumoto Jithin Jose Krishna Kandalla Christos Kavouklis Takahiro Kawashima Chulho Kim Michael Knobloch Alice Koniges Quincey Koziol Sameer Kumar Joshua Ladd Ignacio Laguna Huiwei Lu Guillaume Mercier Kathryn Mohror Adam Moody Tomotake Nakamura Takeshi Nanri Steve Oyanagi Antonio J. P˜ena Sreeram Potluri Howard Pritchard Rolf Rabenseifner Nicholas Radcliffe Ken Raffenetti Raghunath Raja Craig Rasmussen Davide Rossetti Kento Sato Martin Schulz Sangmin Seo Christian Siebert Anthony Skjellum Brian Smith David Solt Jeffrey M. Squyres xxxi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
Hari Subramoni Shinii Sumimoto Alexander Supalov Bronis R.de Supinski Sayantan Sur Masamichi Takagi Keita Teranishi Rajeey thakur Fabian Tillier Yuichi Tsujita Geoffroy Vallee Rolf vandeVaart Akshay Venkatesh Jerome Vienne Venkat vishwanath Anh Vo Huseyin S.Yildiz Junchao Zhang Xin Zhao The MPI Forum also acknowledges and appreciates the valuable input from people via 10 e-mail and in person The following institutions supported the MPI-3.1 effort through time and travel support 12 for the people listed above. Argonne National Laboratory Auburn University 15 Cisco Systems,Inc 6 Cray EPCC,The University of Edinburgh ETH Zurich Forschungszentrum Juilich Fuiitsu German Research School for Simulation Sciences The HDF Group International Business Machines 24 INRIA Intel Corporation Juilich Aachen Research Alliance,High-Performance Computing (JARA-HPC) Kyushu University Lawrence Berkeley National Laboratory Lawrence Livermore National Laboratory Lenovo Los Alamos National Laboratory Mellanox Technologies,Inc. Micr oft Corp t NEC Corporation NVIDIA Co RIKEN AICS Texas Adva puting Center Tokyo Institute of Techno iversity of Alabama at Birmingham rsity of inos Urban-Champaign ver vers regon r,High Performance Computing Center (HLRS) y of Te ssee,Knoxvill University of Tokyo xxxii
Hari Subramoni Shinji Sumimoto Alexander Supalov Bronis R. de Supinski Sayantan Sur Masamichi Takagi Keita Teranishi Rajeev Thakur Fabian Tillier Yuichi Tsujita Geoffroy Vall´ee Rolf vandeVaart Akshay Venkatesh Jerome Vienne Venkat Vishwanath Anh Vo Huseyin S. Yildiz Junchao Zhang Xin Zhao The MPI Forum also acknowledges and appreciates the valuable input from people via e-mail and in person. The following institutions supported the MPI-3.1 effort through time and travel support for the people listed above. Argonne National Laboratory Auburn University Cisco Systems, Inc. Cray EPCC, The University of Edinburgh ETH Zurich Forschungszentrum J¨ulich Fujitsu German Research School for Simulation Sciences The HDF Group International Business Machines INRIA Intel Corporation J¨ulich Aachen Research Alliance, High-Performance Computing (JARA-HPC) Kyushu University Lawrence Berkeley National Laboratory Lawrence Livermore National Laboratory Lenovo Los Alamos National Laboratory Mellanox Technologies, Inc. Microsoft Corporation NEC Corporation NVIDIA Corporation Oak Ridge National Laboratory The Ohio State University RIKEN AICS Sandia National Laboratories Texas Advanced Computing Center Tokyo Institute of Technology University of Alabama at Birmingham University of Houston University of Illinois at Urbana-Champaign University of Oregon University of Stuttgart, High Performance Computing Center Stuttgart (HLRS) University of Tennessee, Knoxville University of Tokyo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 xxxii
2 3 4 5 Chapter 1 6 8 Introduction to MPI 10 11 13 13 14 1.1 Overview and Goals 15 16 MPI (Message-Passing Interface)is a message-passing library interface specification.All 1 parts of this definition are significant.MPl addresses primarily the message- -passing parallel 1 programming model,in which data is moved from the address space of one process to that of another process through cooperative operations on each process.Extensions to the "classical"message-passing model are provided in collective operations,remote-memory 21 access operations,dynamic process creation,and parallel I/O.MPI is a specification,not an implementation;there are multiple implementations of MPI.This specification is for a libmry interface:MPI is not a language,and all MPl operations are exp essed as functions subroutines,or methods,according to the appropriate language bindings which,for C and part of the MPI standard.The standard has bee rooyofpratopc Fortran.are developers.The next few sections providean overview of the history of MPI's deve lopment. main advantag s of establishing a message and ease of use In a distributed n environ ment in which the highe outines and/o r abstractions are built upon lower level mes age-passing ro utines the benefits of standardiz tion articularlya the definiti on of a apd here, ovides vendors with a cle arly defin that they can i ement efficie or in so me cases for which they vide hardware ort,the The oal of the Mo age-Passing Interfac standard for writingm esimply stated is to develop a widely used uch the int ould establish tical,p tan dard fo message passing. A lete list c oals folloy Design an application programming interface(not necessarily for compilers or a system implementation library). Allow efficient c nunication:Avoid me overla of and c opying,allow mmunication .Allow for implementations that can be used in a heterogeneous environment 46 47 .Allow convenient C and Fortran bindings for the interface
Chapter 1 Introduction to MPI 1.1 Overview and Goals MPI (Message-Passing Interface) is a message-passing library interface specification. All parts of this definition are significant. MPI addresses primarily the message-passing parallel programming model, in which data is moved from the address space of one process to that of another process through cooperative operations on each process. Extensions to the “classical” message-passing model are provided in collective operations, remote-memory access operations, dynamic process creation, and parallel I/O. MPI is a specification, not an implementation; there are multiple implementations of MPI. This specification is for a library interface; MPI is not a language, and all MPI operations are expressed as functions, subroutines, or methods, according to the appropriate language bindings which, for C and Fortran, are part of the MPI standard. The standard has been defined through an open process by a community of parallel computing vendors, computer scientists, and application developers. The next few sections provide an overview of the history of MPI’s development. The main advantages of establishing a message-passing standard are portability and ease of use. In a distributed memory communication environment in which the higher level routines and/or abstractions are built upon lower level message-passing routines the benefits of standardization are particularly apparent. Furthermore, the definition of a messagepassing standard, such as that proposed here, provides vendors with a clearly defined base set of routines that they can implement efficiently, or in some cases for which they can provide hardware support, thereby enhancing scalability. The goal of the Message-Passing Interface simply stated is to develop a widely used standard for writing message-passing programs. As such the interface should establish a practical, portable, efficient, and flexible standard for message passing. A complete list of goals follows. • Design an application programming interface (not necessarily for compilers or a system implementation library). • Allow efficient communication: Avoid memory-to-memory copying, allow overlap of computation and communication, and offload to communication co-processors, where available. • Allow for implementations that can be used in a heterogeneous environment. • Allow convenient C and Fortran bindings for the interface. 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
CHAPTER 1.INTRODUCTION TO MPI Assume a reliable communication interface:the user need not cope with communica- tion failures.Such failures are dealt with by the underlying communication subsystem. Define an interface that can be implemented or hanges in t nderlying commu Semantics of the interface should be language independent. .The interface should be designed to allow for thread safety 10 1.2 Background of MPI-1.0 12 13 MPI sought to make use of the most attractive features of a number of existing message- passing systems,rather than selecting one of them and adopting it as the standard.Thus, 14 MPI was strongly influenced by work at the IBM T.J.Watson Research Center [1,2] 15 Intel's NX/2 [50],Express [13],nCUBE's Vertex [46],p4 [8,9],and PARMACS [5,10] 16 Other important contributions have come from Zipcode [53,54],Chimp [19,20],PVM 17 [4,17],Chameleon [27],and PICL [25]. 18 The MPI standardization effort involved about 60 people from 40 organizations mainly 19 from the United States and Europe.Most of the major vendors of concurrent computers 20 were involved in MPl,along with researchers from universities,government laboratories,and industry.The standardization process began with the Workshop on Standards for Message- 23 Passing in a Distributed Memory Environment,sponsored by the Center for Research on Parallel Computing,held April 29-30,1992,in Williamsburg,Virginia [60.At this workshop 25 the basic features essential to a standard message-passing interface were discussed,and a working group established to continue the standardization process. A preliminary draft proposal,known as MPl-1,was put forward by Dongarra,Hempel. Hev.and Walker in November 1992.and a revised version was completed in February 28 1993 18].MPI-1 embodied the main features that were identified at the Williamsburg workshon as being necessary in a message standard.Since MPl-1 was primarily 30 intended to promote discussion and"get the ball rolling,"it focused mainly on point-to-point communications.MPl-1 brought to the forefront a number of important standardization issues.but did not include any collective communication routines and was not thread-safe In November 1992,a meeting of the MPI working gr up was held in Minneapolis,at which it was decided to place the standardization proc on a more formal footing,and to dure s and organization of the High Performance Fortran Forum 36 e s were formed for the major component areas of the standard,and an email rvice established for each. In addition.the goal of p oducing a draft MPl standard by the Fall of 1993 was set.To achieve this goal the MPl working g thr out the first 9 m nths of 1993,and p the draft MPI standard at the Su 93 co e in Nov 1993. m tings and the mail dis d the MPI Fo of which has bee o all members of the high per e computing community 1.3 Background of MPI-1.1,MPI-1.2,and MPI-2.0 Beginning in March 1995,the MPI Forum began meeting to consider corrections and exten- sions to the original MPI Standard document [22].The first product of these deliberations
2 CHAPTER 1. INTRODUCTION TO MPI • Assume a reliable communication interface: the user need not cope with communication failures. Such failures are dealt with by the underlying communication subsystem. • Define an interface that can be implemented on many vendor’s platforms, with no significant changes in the underlying communication and system software. • Semantics of the interface should be language independent. • The interface should be designed to allow for thread safety. 1.2 Background of MPI-1.0 MPI sought to make use of the most attractive features of a number of existing messagepassing systems, rather than selecting one of them and adopting it as the standard. Thus, MPI was strongly influenced by work at the IBM T. J. Watson Research Center [1, 2], Intel’s NX/2 [50], Express [13], nCUBE’s Vertex [46], p4 [8, 9], and PARMACS [5, 10]. Other important contributions have come from Zipcode [53, 54], Chimp [19, 20], PVM [4, 17], Chameleon [27], and PICL [25]. The MPI standardization effort involved about 60 people from 40 organizations mainly from the United States and Europe. Most of the major vendors of concurrent computers were involved in MPI, along with researchers from universities, government laboratories, and industry. The standardization process began with the Workshop on Standards for MessagePassing in a Distributed Memory Environment, sponsored by the Center for Research on Parallel Computing, held April 29-30, 1992, in Williamsburg, Virginia [60]. At this workshop the basic features essential to a standard message-passing interface were discussed, and a working group established to continue the standardization process. A preliminary draft proposal, known as MPI-1, was put forward by Dongarra, Hempel, Hey, and Walker in November 1992, and a revised version was completed in February 1993 [18]. MPI-1 embodied the main features that were identified at the Williamsburg workshop as being necessary in a message passing standard. Since MPI-1 was primarily intended to promote discussion and “get the ball rolling,” it focused mainly on point-to-point communications. MPI-1 brought to the forefront a number of important standardization issues, but did not include any collective communication routines and was not thread-safe. In November 1992, a meeting of the MPI working group was held in Minneapolis, at which it was decided to place the standardization process on a more formal footing, and to generally adopt the procedures and organization of the High Performance Fortran Forum. Subcommittees were formed for the major component areas of the standard, and an email discussion service established for each. In addition, the goal of producing a draft MPI standard by the Fall of 1993 was set. To achieve this goal the MPI working group met every 6 weeks for two days throughout the first 9 months of 1993, and presented the draft MPI standard at the Supercomputing 93 conference in November 1993. These meetings and the email discussion together constituted the MPI Forum, membership of which has been open to all members of the high performance computing community. 1.3 Background of MPI-1.1, MPI-1.2, and MPI-2.0 Beginning in March 1995, the MPI Forum began meeting to consider corrections and extensions to the original MPI Standard document [22]. The first product of these deliberations 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
1.4.BACKGROUND OF MPI-1.3 AND MPI-2.1 3 was version 11 of the MPl specification.released in june of 1995 1231 (see for ofc MPI document releases)At that time,fo as 1.Further corrections and clarifications for the MPI-1.1 document. 5 6 2.Additions to MPl-1.1 that do not significantly change its types of functionality (new datatype constructors.language interoperability.etc.) 3.Completely sided co nication, 10 4.Bindings for Fortran 90 and C++.MPI-2 specifies C++bindings for both MPI-1 and 13 MPI-2 functions,and extensions to the Fortran 77 binding of MPl-1 and MPI-2 to 13 handle Fortran 90 issues. 14 15 5 Discussions of areas in which the mpl process and framework seem likely to be useful. 1 but where more discussion and experience are needed before standardization (e.g., zero-copy semantics on shared-memory machines.real-time specifications). 17 1线 Corre 1 in the above list)were collected in Ch Version pter ider +1 dditi MPL d 4 in th 21 the re ii sof the Mp -2 stitute the“W ms of type 5i ove nave been m parat cument pment (JOD),an are not part of the MPI-2 Stand easy for use rs and implementors to understand wh t level of MPI-1 compliance will mean compliance with MPl-1.3.This is a useful level of com pliance.It means that the implementation conforms to the clarifications of MPl-1.1 function behavior given in Chapter 3 of the MPl-2 document.Some implementations may require changes to be MPl-1 compliant. .MPI-2 compliance will mean compliance with all of MPI-2.1. .The MPI Journal of Development is not part of the MPI Standard. It is to be emphasized that forward compatibility is preserved.That is,a valid MPl-1.1 program is both a valid MPI-1.3 program and a valid MPI-2.1 program,and a valid MPI-1.3 program is a valid MPI-2.1 program. 1.4 Background of MPI-1.3 and MPI-2.1 After the release of MPI-2.0,the MPI Forum kept working on errata and clarifications for both standard documents(MPI-1.1 and MPI-2.0).The short document "Errata for MPI-1.1 was released October 12,1998.On July 5,2001,a first ballot of errata and clarifications for MPI-2.0 was released,and a second ballot was voted on May 22,2002.Both votes were done electronically.Both ballots were combined into one document:"Errata for MPI-2,"May 46 15,2002.This errata process was then interrupted,but the Forum and its e-mail reflectors kept working on new requests for clarification
1.4. BACKGROUND OF MPI-1.3 AND MPI-2.1 3 was Version 1.1 of the MPI specification, released in June of 1995 [23] (see http://www.mpi-forum.org for official MPI document releases). At that time, effort focused in five areas. 1. Further corrections and clarifications for the MPI-1.1 document. 2. Additions to MPI-1.1 that do not significantly change its types of functionality (new datatype constructors, language interoperability, etc.). 3. Completely new types of functionality (dynamic processes, one-sided communication, parallel I/O, etc.) that are what everyone thinks of as “MPI-2 functionality.” 4. Bindings for Fortran 90 and C++. MPI-2 specifies C++ bindings for both MPI-1 and MPI-2 functions, and extensions to the Fortran 77 binding of MPI-1 and MPI-2 to handle Fortran 90 issues. 5. Discussions of areas in which the MPI process and framework seem likely to be useful, but where more discussion and experience are needed before standardization (e.g., zero-copy semantics on shared-memory machines, real-time specifications). Corrections and clarifications (items of type 1 in the above list) were collected in Chapter 3 of the MPI-2 document: “Version 1.2 of MPI.” That chapter also contains the function for identifying the version number. Additions to MPI-1.1 (items of types 2, 3, and 4 in the above list) are in the remaining chapters of the MPI-2 document, and constitute the specifi- cation for MPI-2. Items of type 5 in the above list have been moved to a separate document, the “MPI Journal of Development” (JOD), and are not part of the MPI-2 Standard. This structure makes it easy for users and implementors to understand what level of MPI compliance a given implementation has: • MPI-1 compliance will mean compliance with MPI-1.3. This is a useful level of compliance. It means that the implementation conforms to the clarifications of MPI-1.1 function behavior given in Chapter 3 of the MPI-2 document. Some implementations may require changes to be MPI-1 compliant. • MPI-2 compliance will mean compliance with all of MPI-2.1. • The MPI Journal of Development is not part of the MPI Standard. It is to be emphasized that forward compatibility is preserved. That is, a valid MPI-1.1 program is both a valid MPI-1.3 program and a valid MPI-2.1 program, and a valid MPI-1.3 program is a valid MPI-2.1 program. 1.4 Background of MPI-1.3 and MPI-2.1 After the release of MPI-2.0, the MPI Forum kept working on errata and clarifications for both standard documents (MPI-1.1 and MPI-2.0). The short document “Errata for MPI-1.1” was released October 12, 1998. On July 5, 2001, a first ballot of errata and clarifications for MPI-2.0 was released, and a second ballot was voted on May 22, 2002. Both votes were done electronically. Both ballots were combined into one document: “Errata for MPI-2,” May 15, 2002. This errata process was then interrupted, but the Forum and its e-mail reflectors kept working on new requests for clarification. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48