1S0/1EC14882:2011(但) 1.11 Acknowledgments [intro.ack] 1 The C++programming language as deexcribed in this International Standard is baeed on the langnage as described in Chapter R (Reference Manmsl)of Stroustrupe The C++Progmsuming Lenguage (second edition, Addison-Wesley Publishing Company,ISBN 0-201-53992-6.copyright 1991 AT&T).That.in turn,is based on the C programming language as deseribed in Appemdix A of Kernighan and Ritchie:The C Progrummiug Lamgwage (Premtice-Hall,1978.I5BN 0-13-110163-3,copyright 1978 AT&T). 2 Portions of the library Clauses of this International Standard are based on work by P.J.Planger,which ws published as The Drup Stendard C++Librury (Premtice-Hall,15BN 0-13-117003-1,copyright 1995 P.J. ng四). 3 POSIX i a registered trademark of the Institute of Eleetrical and Electromie Enginoers,Ine. 4 All rights in these originals are reserved. s1.11 DISOEC 2011 -Al rghts reervd
1.11 Acknowledgments [intro.ack] 1 The C++ programming language as described in this International Standard is based on the language as described in Chapter R (Reference Manual) of Stroustrup: The C++ Programming Language (second edition, Addison-Wesley Publishing Company, ISBN 0-201-53992-6, copyright c 1991 AT&T). That, in turn, is based on the C programming language as described in Appendix A of Kernighan and Ritchie: The C Programming Language (Prentice-Hall, 1978, ISBN 0-13-110163-3, copyright c 1978 AT&T). 2 Portions of the library Clauses of this International Standard are based on work by P.J. Plauger, which was published as The Draft Standard C++ Library (Prentice-Hall, ISBN 0-13-117003-1, copyright c 1995 P.J. Plauger). 3 POSIX R is a registered trademark of the Institute of Electrical and Electronic Engineers, Inc. 4 All rights in these originals are reserved. § 1.11 ISO/IEC 14882:2011(E) 16 © ISO/IEC 2011 – All rights reserved
1s0MEC14882:2011(E 2 Lexical conventions [lex] 2.1 Separate translation [lex.separate] 1 The text of the program is kopt in units called souree Ailes in this Internatiomnal Standard.A souree file together with all the benders (17.6.1.2)and source files included (162)via the preprooessing directive include,less any source lines skipped by amy of the cooditional indusion (16.1)preprocessing directives,is callod a trunsletion ureit.I Note:A C++program need not all be translated at the same time.-end note] Note:Previously translated translation units and instantiation units can be preserved individually or in hbraries.The separate translation units of a program comrunicnte (3.5)by (for example)calls to fumnctions whose identfies have exteral linkage,manipulation of objects whose identifers have etern linkage,or manipulation of data files.Translation tits can be separately translated and then later linked to produce a画excutable program3.).一d note] 2.2 Phases of translation [lex.phases] 1 The preoedence among the syutax rules of translation is specified by the following phases.11 1.Physical source file characters are mapped,in an impkmentation-defined maner,to the basic souroe charncter set (introducing new-line characters for end-of-line indicatoes)if necessary.The set of phys ical source file characters accepted is implementation-defined.Trigraph sequences (2.4)are replaced by corresponding single-character internal reprisentations.Any souroe file character not in the basie source character set (2.3)is replaced by the univeral-character-name that deignates that charac- ter.(An Implementation may use amy Internal encoding.so long as an actual exctended charncter encomntered in the somroe file,and the same extended character expreseed in the source file as a universal-character-name (i.e.,using the \urxxx notation).are handled equivalently except where this replaoemnemt is reverted in a raw string liternl.) 2.Each instance of a backslash character (immediately followed by a new-line character is deleted, splicing physieal source lines to form logieal source lines.Only the last backslash on any physical source line shall be eligible for being part of such a splice.If,as a result,a character serence that matches the syntax of a universal-character-pame is produced,the behmvior is undefined.A source file that is pot empty and that does not ed in a nw-line chareter,oe that ends in a new-line character immediately preeoded by a backslash character before any such splicing takes place,shall be processed as if an additional new-line character were appended to the file. 3.The soree le is decompoeed into preprooesing tobens (2.5)and seqpenoes of white-spce characters (incuding commets).A souroe file shall not end in a partial preproersing tokem or in a partial com- ment.Each comment is replaced by one space character.New-line characters are retained.Whether each nooempty soquence of white-space charncters other than pew-line is retained or replaced by one space character is mspecified.The process of dividing a source file's characters into preprooesing to- kens is context-dependent.Erampie:see the handling of within a sinclode preprocessing directive. -ead crample together. 12)A parthl toben would arise fom a file ending in the fint poetin of a muhi-chararter tokmn that eeqiires a termninating sequence of characters.such as a Aeadername that is missing the cloing o s.A poatial commen would arbe from a souree fle ending with na unckeed /. 52.2 SOMEC 2011 -Al rights seserved 17
2 Lexical conventions [lex] 2.1 Separate translation [lex.separate] 1 The text of the program is kept in units called source files in this International Standard. A source file together with all the headers (17.6.1.2) and source files included (16.2) via the preprocessing directive #include, less any source lines skipped by any of the conditional inclusion (16.1) preprocessing directives, is called a translation unit. [ Note: A C++ program need not all be translated at the same time. — end note ] 2 [ Note: Previously translated translation units and instantiation units can be preserved individually or in libraries. The separate translation units of a program communicate (3.5) by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units can be separately translated and then later linked to produce an executable program (3.5). — end note ] 2.2 Phases of translation [lex.phases] 1 The precedence among the syntax rules of translation is specified by the following phases.11 1. Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical source file characters accepted is implementation-defined. Trigraph sequences (2.4) are replaced by corresponding single-character internal representations. Any source file character not in the basic source character set (2.3) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e., using the \uXXXX notation), are handled equivalently except where this replacement is reverted in a raw string literal.) 2. Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. If, as a result, a character sequence that matches the syntax of a universal-character-name is produced, the behavior is undefined. A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file. 3. The source file is decomposed into preprocessing tokens (2.5) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment.12 Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is unspecified. The process of dividing a source file’s characters into preprocessing tokens is context-dependent. [Example: see the handling of < within a #include preprocessing directive. — end example ] 11) Implementations must behave as if these separate phases occur, although in practice different phases might be folded together. 12) A partial preprocessing token would arise from a source file ending in the first portion of a multi-character token that requires a terminating sequence of characters, such as a header-name that is missing the closing " or >. A partial comment would arise from a source file ending with an unclosed /* comment. § 2.2 ISO/IEC 14882:2011(E) © ISO/IEC 2011 – All rights reserved 17
1S01EC14882:2011(目 4.Prepeocesing direetiws are eecuted,mero invocations are expanded,andPragaa uary operator expressions are execnted.If a character sequence that matches the syntax of a umivenal-character-name is produced by token concatenation (16.3.3),the behavior is undefined.A #include preprocessing di- rectiw caes the named header or souroe lile to be proomed from phae I throtgh phase 4,rectriwely. All preprocesesing directives are then deleted. 5.Each souree charaeter set member in a characteg literal or a string literal,well as each eseape sequenoe and miversal-character-name in a character literal or a non-raw string literal,is comverted to the correspomding member of the exccution character set (2.14.3,2145):if there is no corresponding member,it is comverted to an implementation-defined member otber than the mull(wide)character.13 6.Adjacent string literal tobens are concatenated 7.White-space charncters separating tobens are no loeger significant.Each preprocessing toben is con- verted into a token.(2.7).The reoulting tolens are syntactically and semantically anslyzed and trans- lated ns a translation undt.Note:Tbe process of analyzing and translating the tokens miy occnsionally rsult in oee tolm being replaced by a sequence of other tokens (14.2).-eud mote]Node:Souree files,translation tits and trareslated translation tnits need not Deceseanly be stored as files,nor need there be amy one-to-one correspordence between thee entities and any external represemtation.The descriptio通s conceptual only,.nd does not specify any particular implementatio通.一end note】 8.Tranelated translation units and instantiation units are combined as follows:Note:Some or all of these may be supplied from a library.-end note]Each translated translation unit is examined to produce a list of required instantiatioess.[Nofe:This may include instantiations which have been explicitly reqoested (14.7.2).-end note]The definitions of the required templates are located. It is implemnentation-defined wbether the source of the translatioa units contadning these definitsons is remuired to be available.Note:An implementation comld encode sullicient information into the traneslated translation unit so as to ensre the source is not required here.-end mote All the required instantiatioess are performed to produce instantiation snits.[Note:These are similar to translatod translation mits,but contain no references to uninstantiated templates and no template definitions. -ead note]The program is il-formed if any inestantiation fails. 9.All external entity refernors ane resolved.Library components are linked to satisfy external referenees to entities pot defined in the current translation.All such translator output is collected into a program imnge which contains information needed for execution in its excution enviroemeat. 2.3 Character sets flex.charset] 1 The basic soarce charocter set consists of 96 characters:the space character.the control characters repre senting borioeal tab,vertical tab,form fed and new-line,pls the follwing 91 graphical dreters: abGd01gh11k1■■0Pqr6ta7甲xy2 A B C D E F C H I J K L H N O P Q R S T U V W X Y Z 0123456789 -f)[门日()《>第1:.7”+-/。1w▣,1· 13)An need not cotrert all non-corresponding source characters to the same execution. t4)The glyphe for the memnbers of the baale source character wet aee Intended to identify charactees from the subeet of 190/TEC 10646 which corresponds to the ASCIl character set.Bowever.becmse the mapping foom souree file characters to the mource chaeacter wt (dewcribed in tramalation phame 1)ie specified as imaplemeutaton-detined,an implmaentation is required to documert how the boudc srriree characters are reperserted i sorarce hke 52.3 18 。S0EC2011=Al rights reserved
4. Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal-character-name is produced by token concatenation (16.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted. 5. Each source character set member in a character literal or a string literal, as well as each escape sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set (2.14.3, 2.14.5); if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.13 6. Adjacent string literal tokens are concatenated. 7. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. (2.7). The resulting tokens are syntactically and semantically analyzed and translated as a translation unit. [ Note: The process of analyzing and translating the tokens may occasionally result in one token being replaced by a sequence of other tokens (14.2).— end note ] [ Note: Source files, translation units and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation. The description is conceptual only, and does not specify any particular implementation. — end note ] 8. Translated translation units and instantiation units are combined as follows: [ Note: Some or all of these may be supplied from a library. — end note ] Each translated translation unit is examined to produce a list of required instantiations. [ Note: This may include instantiations which have been explicitly requested (14.7.2). — end note ] The definitions of the required templates are located. It is implementation-defined whether the source of the translation units containing these definitions is required to be available. [ Note: An implementation could encode sufficient information into the translated translation unit so as to ensure the source is not required here. — end note ] All the required instantiations are performed to produce instantiation units. [ Note: These are similar to translated translation units, but contain no references to uninstantiated templates and no template definitions. — end note ] The program is ill-formed if any instantiation fails. 9. All external entity references are resolved. Library components are linked to satisfy external references to entities not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment. 2.3 Character sets [lex.charset] 1 The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:14 a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ∼ ! = , \ " ’ 13) An implementation need not convert all non-corresponding source characters to the same execution character. 14) The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set. However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic source characters are represented in source files. § 2.3 ISO/IEC 14882:2011(E) 18 © ISO/IEC 2011 – All rights reserved
Is0MEC14882:2011(但目 2 The urieersal-chamcter-pame construct provides a way to name other characters. herquod: heradeclwaidiit heredeclrsi-digit hersdecimeldiglt herodecimoldigit aniversal-charucter-ume: Ve her-gnad her-gnad Aezqued The character deignated b the universal-character-name \UNNMNNNNN is that character whoee character short name in 150/IEC 10646 is NNNMNNNN:the character designnted by the universsl-character-name \uNNNN is that character whoee charncter short name in ISO/IEC 10646 i COOONNNN.If the bexadecimal value for n mniversal-charact4g-me corresponds to a surrogate code point(in the range (lxDi80-0xDF子下,ine-tihw)i the program is ill-formed.Additionally.if the hexcadecimal valoe for a universal-character-name outside the c-eher-seqenee,a-char-seysenee,or r-char-sequenee of a character or string Lteral corresponds to a control character (in either of the ranges (x0-(IF or Ox7F-09F,both inclusive)or to a character in the basic source charncter set,the program is ill-formed.s 3 The boasic erecution charncfer set and the besic ezeention wide-chereter sel shall each coetain all the members of the basic source character set,plus control characters representing alert,backspace,and carriage return,plus a aull charueter (respectively,mull mide charucter),whooe representation has all so bits.For ench basie execution character set,the vales of the members shall be noe-negative and distinct from one another.In both the source and execution bosic character sets,the vale of ench character after o in the above list of decimal digits shall be ome greater than the vale of the previous.The erecution charucter set and the errention side-chamefer aet are implementation-detined supersets of the busie coecution charncter set and the basic execution wide-character set,respectively.The values of the members of the execution character sets and the sets of additsonal members are locale-specibe. 2.4 Trigraph sequences lex.trigraph] 1 Before any other procesing takes place,each occurrence of one of the following sqe of three characters (trigmpk sequeaces")is replaced by the single character indicated in Table 1. Tnh】一Trigraph segtences Trigraph Replaorenent Trigraph Replacement Trigraph Replacetet 77▣ 27( 773 77/ 27) 77> 7?7 7 7?▣ 2 Eramuple: Thedetine arraycheck(a.b)a?7(b97)?71971 b7?(at?) becomes define arraycheck(a,b)a[b]lI b[a] 一nd工wmpt] a No other trigraph seence exists.Each that doem not begin one of the trigraphs listed abowve is not changed. 15)A weqnence of characters remembling a uirdversal-character-mame in am r.charaeguenoe (2.14.5)does not form a mniversal character-aame. 52.4 DSOEC 2011 -Al rights teerwd 19
2 The universal-character-name construct provides a way to name other characters. hex-quad: hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit universal-character-name: \u hex-quad \U hex-quad hex-quad The character designated by the universal-character-name \UNNNNNNNN is that character whose character short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the universal-character-name \uNNNN is that character whose character short name in ISO/IEC 10646 is 0000NNNN. If the hexadecimal value for a universal-character-name corresponds to a surrogate code point (in the range 0xD800–0xDFFF, inclusive), the program is ill-formed. Additionally, if the hexadecimal value for a universal-character-name outside the c-char-sequence, s-char-sequence, or r-char-sequence of a character or string literal corresponds to a control character (in either of the ranges 0x00–0x1F or 0x7F–0x9F, both inclusive) or to a character in the basic source character set, the program is ill-formed.15 3 The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose representation has all zero bits. For each basic execution character set, the values of the members shall be non-negative and distinct from one another. In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous. The execution character set and the execution wide-character set are implementation-defined supersets of the basic execution character set and the basic execution wide-character set, respectively. The values of the members of the execution character sets and the sets of additional members are locale-specific. 2.4 Trigraph sequences [lex.trigraph] 1 Before any other processing takes place, each occurrence of one of the following sequences of three characters (“trigraph sequences”) is replaced by the single character indicated in Table 1. Table 1 — Trigraph sequences Trigraph Replacement Trigraph Replacement Trigraph Replacement ??= # ??( [ ??< { ??/ \ ??) ] ??> } ??’ ˆ ??! | ??- ∼ 2 [Example: ??=define arraycheck(a,b) a??(b??) ??!??! b??(a??) becomes #define arraycheck(a,b) a[b] || b[a] — end example ] 3 No other trigraph sequence exists. Each ? that does not begin one of the trigraphs listed above is not changed. 15) A sequence of characters resembling a universal-character-name in an r-char-sequence (2.14.5) does not form a universalcharacter-name. § 2.4 ISO/IEC 14882:2011(E) © ISO/IEC 2011 – All rights reserved 19
130/1EC14882:2011(E) 2.5 Preprocessing tokens llex.pptoken] Ptp个folen heoder.name identifer pp-namber chanscter-litersl nser-defined-cbaracter-literl string-literl aser-definef-strin小提ee prrpmcexning-op-or-pac ench non-white-space charncter that cannot be one of the above 1 Each preprocessing token that is comnerted to a token (2.7)shall have the lexienl form of a beyword.an 动mtifier,a literal,an operato落,0减幕unct4og。 2 A preprooeseing tolen is the mimimal lexical clememt of the language in translation phases 3 through 6.The eategories of peeprocessing tolen are:header names,identiflers,preprocessing nmbers,character literals (including user-defined character literals).string literals (including ter-defined string literals),preproceeesing operators and punctuators,and single non-white-space characters that do pot lexically match the other preproossing token categories.If a or a character matches the last catepory,the behavior is undefined. Preprocessing tobens can be separated by white space:this comnsists of comments (2.8).or whitespace characters (space,horizontal tab,new-lime,vertieal tab,and form-feod),or both.As deseribed in Clamse 1fi, in certain cireumstanees during translation phase 4.white space for the absemce thereof)serves as more than preproceesing token separation.White space can appear within a preprooesing token only as part of a hender name oe between the quotation characters in a character literal or string literal. 3 If the input stream has been parsed into preprocessing tolkens up to a given character: -If the nexct charncter begins a sequence of characters that could be the preflx and initial double quote of a raw string literal,such as R",the next preprooeseing toin shall be a ra string literal.Between the initial and final double quote characters of the rr string,any transformntions performed in phases 1 and 2 (trigraphs,universal-character-names,and line splicing)are reverted:this reversion shall apply before any d-char,r-char,or delimiting paremnthesis is identified.The raw string literal is defined as the shortest sequence of characters that matchess the ra-string pattern oding-prefizeR ra-s饰剩 一Otberwise,if the next three characters are《I!and the subseqmuent char线er is Deither!not>,the《 is truated as a preproossor token by itself anl not ao the lirst character of the alternative toket <: Otberwise,the next preprocesing token is the longest sequenoe of characters that comld constitute a preprocessing token,even if that would cause further lexdienl analysis to fail. Frawuple: define R "x" cenat char+a"且"y": ∥med ru sring.nod "x”"y 一ndgm] 4 [Erample:The program fragment 1Ex is pared as a preproomssing mumber tokem (one that is not a valid floating oe integer liternl tolon),even though a parse as the pair of preproessing tolens 1 and Ex might produce a valid expression (for example,if Er were a macro defined as +1).Similarly,the program fragment 1E1 is parsed as a preprocesing mmber (ome that is a vllid floating literal tokem),whether or not E is a nacro nam电一nd example 52.5 20 DS0EC2011-A创ghts reserved
2.5 Preprocessing tokens [lex.pptoken] preprocessing-token: header-name identifier pp-number character-literal user-defined-character-literal string-literal user-defined-string-literal preprocessing-op-or-punc each non-white-space character that cannot be one of the above 1 Each preprocessing token that is converted to a token (2.7) shall have the lexical form of a keyword, an identifier, a literal, an operator, or a punctuator. 2 A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6. The categories of preprocessing token are: header names, identifiers, preprocessing numbers, character literals (including user-defined character literals), string literals (including user-defined string literals), preprocessing operators and punctuators, and single non-white-space characters that do not lexically match the other preprocessing token categories. If a ’ or a " character matches the last category, the behavior is undefined. Preprocessing tokens can be separated by white space; this consists of comments (2.8), or white-space characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both. As described in Clause 16, in certain circumstances during translation phase 4, white space (or the absence thereof) serves as more than preprocessing token separation. White space can appear within a preprocessing token only as part of a header name or between the quotation characters in a character literal or string literal. 3 If the input stream has been parsed into preprocessing tokens up to a given character: — If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", the next preprocessing token shall be a raw string literal. Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified. The raw string literal is defined as the shortest sequence of characters that matches the raw-string pattern encoding-prefixoptR raw-string — Otherwise, if the next three characters are <:: and the subsequent character is neither : nor >, the < is treated as a preprocessor token by itself and not as the first character of the alternative token <:. — Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail. [Example: #define R "x" const char* s = R"y"; // ill-formed raw string, not "x" "y" — end example ] 4 [Example: The program fragment 1Ex is parsed as a preprocessing number token (one that is not a valid floating or integer literal token), even though a parse as the pair of preprocessing tokens 1 and Ex might produce a valid expression (for example, if Ex were a macro defined as +1). Similarly, the program fragment 1E1 is parsed as a preprocessing number (one that is a valid floating literal token), whether or not E is a macro name. — end example ] § 2.5 ISO/IEC 14882:2011(E) 20 © ISO/IEC 2011 – All rights reserved