diff options
Diffstat (limited to 'doc/rfc/rfc2234.txt')
-rw-r--r-- | doc/rfc/rfc2234.txt | 787 |
1 files changed, 787 insertions, 0 deletions
diff --git a/doc/rfc/rfc2234.txt b/doc/rfc/rfc2234.txt new file mode 100644 index 0000000..edea302 --- /dev/null +++ b/doc/rfc/rfc2234.txt @@ -0,0 +1,787 @@ + + + + + + +Network Working Group D. Crocker, Ed. +Request for Comments: 2234 Internet Mail Consortium +Category: Standards Track P. Overell + Demon Internet Ltd. + November 1997 + + + Augmented BNF for Syntax Specifications: ABNF + + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1997). All Rights Reserved. + +TABLE OF CONTENTS + + 1. INTRODUCTION .................................................. 2 + + 2. RULE DEFINITION ............................................... 2 + 2.1 RULE NAMING .................................................. 2 + 2.2 RULE FORM .................................................... 3 + 2.3 TERMINAL VALUES .............................................. 3 + 2.4 EXTERNAL ENCODINGS ........................................... 5 + + 3. OPERATORS ..................................................... 5 + 3.1 CONCATENATION RULE1 RULE2 ............................. 5 + 3.2 ALTERNATIVES RULE1 / RULE2 ................................... 6 + 3.3 INCREMENTAL ALTERNATIVES RULE1 =/ RULE2 .................... 6 + 3.4 VALUE RANGE ALTERNATIVES %C##-## ........................... 7 + 3.5 SEQUENCE GROUP (RULE1 RULE2) ................................. 7 + 3.6 VARIABLE REPETITION *RULE .................................... 8 + 3.7 SPECIFIC REPETITION NRULE .................................... 8 + 3.8 OPTIONAL SEQUENCE [RULE] ..................................... 8 + 3.9 ; COMMENT .................................................... 8 + 3.10 OPERATOR PRECEDENCE ......................................... 9 + + 4. ABNF DEFINITION OF ABNF ....................................... 9 + + 5. SECURITY CONSIDERATIONS ....................................... 10 + + + + +Crocker & Overell Standards Track [Page 1] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + + 6. APPENDIX A - CORE ............................................. 11 + 6.1 CORE RULES ................................................... 11 + 6.2 COMMON ENCODING .............................................. 12 + + 7. ACKNOWLEDGMENTS ............................................... 12 + + 8. REFERENCES .................................................... 13 + + 9. CONTACT ....................................................... 13 + + 10. FULL COPYRIGHT STATEMENT ..................................... 14 + +1. INTRODUCTION + + Internet technical specifications often need to define a format + syntax and are free to employ whatever notation their authors deem + useful. Over the years, a modified version of Backus-Naur Form + (BNF), called Augmented BNF (ABNF), has been popular among many + Internet specifications. It balances compactness and simplicity, + with reasonable representational power. In the early days of the + Arpanet, each specification contained its own definition of ABNF. + This included the email specifications, RFC733 and then RFC822 which + have come to be the common citations for defining ABNF. The current + document separates out that definition, to permit selective + reference. Predictably, it also provides some modifications and + enhancements. + + The differences between standard BNF and ABNF involve naming rules, + repetition, alternatives, order-independence, and value ranges. + Appendix A (Core) supplies rule definitions and encoding for a core + lexical analyzer of the type common to several Internet + specifications. It is provided as a convenience and is otherwise + separate from the meta language defined in the body of this document, + and separate from its formal status. + +2. RULE DEFINITION + +2.1 Rule Naming + + The name of a rule is simply the name itself; that is, a sequence of + characters, beginning with an alphabetic character, and followed by + a combination of alphabetics, digits and hyphens (dashes). + + NOTE: Rule names are case-insensitive + + The names <rulename>, <Rulename>, <RULENAME> and <rUlENamE> all refer + to the same rule. + + + + +Crocker & Overell Standards Track [Page 2] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + + Unlike original BNF, angle brackets ("<", ">") are not required. + However, angle brackets may be used around a rule name whenever their + presence will facilitate discerning the use of a rule name. This is + typically restricted to rule name references in free-form prose, or + to distinguish partial rules that combine into a string not separated + by white space, such as shown in the discussion about repetition, + below. + +2.2 Rule Form + + A rule is defined by the following sequence: + + name = elements crlf + + where <name> is the name of the rule, <elements> is one or more rule + names or terminal specifications and <crlf> is the end-of- line + indicator, carriage return followed by line feed. The equal sign + separates the name from the definition of the rule. The elements + form a sequence of one or more rule names and/or value definitions, + combined according to the various operators, defined in this + document, such as alternative and repetition. + + For visual ease, rule definitions are left aligned. When a rule + requires multiple lines, the continuation lines are indented. The + left alignment and indentation are relative to the first lines of the + ABNF rules and need not match the left margin of the document. + +2.3 Terminal Values + + Rules resolve into a string of terminal values, sometimes called + characters. In ABNF a character is merely a non-negative integer. + In certain contexts a specific mapping (encoding) of values into a + character set (such as ASCII) will be specified. + + Terminals are specified by one or more numeric characters with the + base interpretation of those characters indicated explicitly. The + following bases are currently defined: + + b = binary + + d = decimal + + x = hexadecimal + + + + + + + + +Crocker & Overell Standards Track [Page 3] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + + Hence: + + CR = %d13 + + CR = %x0D + + respectively specify the decimal and hexadecimal representation of + [US-ASCII] for carriage return. + + A concatenated string of such values is specified compactly, using a + period (".") to indicate separation of characters within that value. + Hence: + + CRLF = %d13.10 + + ABNF permits specifying literal text string directly, enclosed in + quotation-marks. Hence: + + command = "command string" + + Literal text strings are interpreted as a concatenated set of + printable characters. + + NOTE: ABNF strings are case-insensitive and + the character set for these strings is us-ascii. + + Hence: + + rulename = "abc" + + and: + + rulename = "aBc" + + will match "abc", "Abc", "aBc", "abC", "ABc", "aBC", "AbC" and "ABC". + + To specify a rule which IS case SENSITIVE, + specify the characters individually. + + For example: + + rulename = %d97 %d98 %d99 + + or + + rulename = %d97.98.99 + + + + + +Crocker & Overell Standards Track [Page 4] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + + will match only the string which comprises only lowercased + characters, abc. + +2.4 External Encodings + + External representations of terminal value characters will vary + according to constraints in the storage or transmission environment. + Hence, the same ABNF-based grammar may have multiple external + encodings, such as one for a 7-bit US-ASCII environment, another for + a binary octet environment and still a different one when 16-bit + Unicode is used. Encoding details are beyond the scope of ABNF, + although Appendix A (Core) provides definitions for a 7-bit US-ASCII + environment as has been common to much of the Internet. + + By separating external encoding from the syntax, it is intended that + alternate encoding environments can be used for the same syntax. + +3. OPERATORS + +3.1 Concatenation Rule1 Rule2 + + A rule can define a simple, ordered string of values -- i.e., a + concatenation of contiguous characters -- by listing a sequence of + rule names. For example: + + foo = %x61 ; a + + bar = %x62 ; b + + mumble = foo bar foo + + So that the rule <mumble> matches the lowercase string "aba". + + LINEAR WHITE SPACE: Concatenation is at the core of the ABNF + parsing model. A string of contiguous characters (values) is + parsed according to the rules defined in ABNF. For Internet + specifications, there is some history of permitting linear white + space (space and horizontal tab) to be freelyPand + implicitlyPinterspersed around major constructs, such as + delimiting special characters or atomic strings. + + NOTE: This specification for ABNF does not + provide for implicit specification of linear white + space. + + Any grammar which wishes to permit linear white space around + delimiters or string segments must specify it explicitly. It is + often useful to provide for such white space in "core" rules that are + + + +Crocker & Overell Standards Track [Page 5] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + + then used variously among higher-level rules. The "core" rules might + be formed into a lexical analyzer or simply be part of the main + ruleset. + +3.2 Alternatives Rule1 / Rule2 + + Elements separated by forward slash ("/") are alternatives. + Therefore, + + foo / bar + + will accept <foo> or <bar>. + + NOTE: A quoted string containing alphabetic + characters is special form for specifying alternative + characters and is interpreted as a non-terminal + representing the set of combinatorial strings with the + contained characters, in the specified order but with + any mixture of upper and lower case.. + +3.3 Incremental Alternatives Rule1 =/ Rule2 + + It is sometimes convenient to specify a list of alternatives in + fragments. That is, an initial rule may match one or more + alternatives, with later rule definitions adding to the set of + alternatives. This is particularly useful for otherwise- independent + specifications which derive from the same parent rule set, such as + often occurs with parameter lists. ABNF permits this incremental + definition through the construct: + + oldrule =/ additional-alternatives + + So that the rule set + + ruleset = alt1 / alt2 + + ruleset =/ alt3 + + ruleset =/ alt4 / alt5 + + is the same as specifying + + ruleset = alt1 / alt2 / alt3 / alt4 / alt5 + + + + + + + + +Crocker & Overell Standards Track [Page 6] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + +3.4 Value Range Alternatives %c##-## + + A range of alternative numeric values can be specified compactly, + using dash ("-") to indicate the range of alternative values. Hence: + + DIGIT = %x30-39 + + is equivalent to: + + DIGIT = "0" / "1" / "2" / "3" / "4" / "5" / "6" / + + "7" / "8" / "9" + + Concatenated numeric values and numeric value ranges can not be + specified in the same string. A numeric value may use the dotted + notation for concatenation or it may use the dash notation to specify + one value range. Hence, to specify one printable character, between + end of line sequences, the specification could be: + + char-line = %x0D.0A %x20-7E %x0D.0A + +3.5 Sequence Group (Rule1 Rule2) + + Elements enclosed in parentheses are treated as a single element, + whose contents are STRICTLY ORDERED. Thus, + + elem (foo / bar) blat + + which matches (elem foo blat) or (elem bar blat). + + elem foo / bar blat + + matches (elem foo) or (bar blat). + + NOTE: It is strongly advised to use grouping + notation, rather than to rely on proper reading of + "bare" alternations, when alternatives consist of + multiple rule names or literals. + + Hence it is recommended that instead of the above form, the form: + + (elem foo) / (bar blat) + + be used. It will avoid misinterpretation by casual readers. + + The sequence group notation is also used within free text to set off + an element sequence from the prose. + + + + +Crocker & Overell Standards Track [Page 7] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + +3.6 Variable Repetition *Rule + + The operator "*" preceding an element indicates repetition. The full + form is: + + <a>*<b>element + + where <a> and <b> are optional decimal values, indicating at least + <a> and at most <b> occurrences of element. + + Default values are 0 and infinity so that *<element> allows any + number, including zero; 1*<element> requires at least one; + 3*3<element> allows exactly 3 and 1*2<element> allows one or two. + +3.7 Specific Repetition nRule + + A rule of the form: + + <n>element + + is equivalent to + + <n>*<n>element + + That is, exactly <N> occurrences of <element>. Thus 2DIGIT is a + 2-digit number, and 3ALPHA is a string of three alphabetic + characters. + +3.8 Optional Sequence [RULE] + + Square brackets enclose an optional element sequence: + + [foo bar] + + is equivalent to + + *1(foo bar). + +3.9 ; Comment + + A semi-colon starts a comment that continues to the end of line. + This is a simple way of including useful notes in parallel with the + specifications. + + + + + + + + +Crocker & Overell Standards Track [Page 8] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + +3.10 Operator Precedence + + The various mechanisms described above have the following precedence, + from highest (binding tightest) at the top, to lowest and loosest at + the bottom: + + Strings, Names formation + Comment + Value range + Repetition + Grouping, Optional + Concatenation + Alternative + + Use of the alternative operator, freely mixed with concatenations can + be confusing. + + Again, it is recommended that the grouping operator be used to + make explicit concatenation groups. + +4. ABNF DEFINITION OF ABNF + + This syntax uses the rules provided in Appendix A (Core). + + rulelist = 1*( rule / (*c-wsp c-nl) ) + + rule = rulename defined-as elements c-nl + ; continues if next line starts + ; with white space + + rulename = ALPHA *(ALPHA / DIGIT / "-") + + defined-as = *c-wsp ("=" / "=/") *c-wsp + ; basic rules definition and + ; incremental alternatives + + elements = alternation *c-wsp + + c-wsp = WSP / (c-nl WSP) + + c-nl = comment / CRLF + ; comment or newline + + comment = ";" *(WSP / VCHAR) CRLF + + alternation = concatenation + *(*c-wsp "/" *c-wsp concatenation) + + + + +Crocker & Overell Standards Track [Page 9] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + + concatenation = repetition *(1*c-wsp repetition) + + repetition = [repeat] element + + repeat = 1*DIGIT / (*DIGIT "*" *DIGIT) + + element = rulename / group / option / + char-val / num-val / prose-val + + group = "(" *c-wsp alternation *c-wsp ")" + + option = "[" *c-wsp alternation *c-wsp "]" + + char-val = DQUOTE *(%x20-21 / %x23-7E) DQUOTE + ; quoted string of SP and VCHAR + without DQUOTE + + num-val = "%" (bin-val / dec-val / hex-val) + + bin-val = "b" 1*BIT + [ 1*("." 1*BIT) / ("-" 1*BIT) ] + ; series of concatenated bit values + ; or single ONEOF range + + dec-val = "d" 1*DIGIT + [ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ] + + hex-val = "x" 1*HEXDIG + [ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ] + + prose-val = "<" *(%x20-3D / %x3F-7E) ">" + ; bracketed string of SP and VCHAR + without angles + ; prose description, to be used as + last resort + + +5. SECURITY CONSIDERATIONS + + Security is truly believed to be irrelevant to this document. + + + + + + + + + + + +Crocker & Overell Standards Track [Page 10] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + +6. APPENDIX A - CORE + + This Appendix is provided as a convenient core for specific grammars. + The definitions may be used as a core set of rules. + +6.1 Core Rules + + Certain basic rules are in uppercase, such as SP, HTAB, CRLF, + DIGIT, ALPHA, etc. + + ALPHA = %x41-5A / %x61-7A ; A-Z / a-z + + BIT = "0" / "1" + + CHAR = %x01-7F + ; any 7-bit US-ASCII character, + excluding NUL + + CR = %x0D + ; carriage return + + CRLF = CR LF + ; Internet standard newline + + CTL = %x00-1F / %x7F + ; controls + + DIGIT = %x30-39 + ; 0-9 + + DQUOTE = %x22 + ; " (Double Quote) + + HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" + + HTAB = %x09 + ; horizontal tab + + LF = %x0A + ; linefeed + + LWSP = *(WSP / CRLF WSP) + ; linear white space (past newline) + + OCTET = %x00-FF + ; 8 bits of data + + SP = %x20 + + + +Crocker & Overell Standards Track [Page 11] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + + ; space + + VCHAR = %x21-7E + ; visible (printing) characters + + WSP = SP / HTAB + ; white space + +6.2 Common Encoding + + Externally, data are represented as "network virtual ASCII", namely + 7-bit US-ASCII in an 8-bit field, with the high (8th) bit set to + zero. A string of values is in "network byte order" with the + higher-valued bytes represented on the left-hand side and being sent + over the network first. + +7. ACKNOWLEDGMENTS + + The syntax for ABNF was originally specified in RFC 733. Ken L. + Harrenstien, of SRI International, was responsible for re-coding the + BNF into an augmented BNF that makes the representation smaller and + easier to understand. + + This recent project began as a simple effort to cull out the portion + of RFC 822 which has been repeatedly cited by non-email specification + writers, namely the description of augmented BNF. Rather than simply + and blindly converting the existing text into a separate document, + the working group chose to give careful consideration to the + deficiencies, as well as benefits, of the existing specification and + related specifications available over the last 15 years and therefore + to pursue enhancement. This turned the project into something rather + more ambitious than first intended. Interestingly the result is not + massively different from that original, although decisions such as + removing the list notation came as a surprise. + + The current round of specification was part of the DRUMS working + group, with significant contributions from Jerome Abela , Harald + Alvestrand, Robert Elz, Roger Fajman, Aviva Garrett, Tom Harsch, Dan + Kohn, Bill McQuillan, Keith Moore, Chris Newman , Pete Resnick and + Henning Schulzrinne. + + + + + + + + + + + +Crocker & Overell Standards Track [Page 12] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + +8. REFERENCES + + [US-ASCII] Coded Character Set--7-Bit American Standard Code for + Information Interchange, ANSI X3.4-1986. + + [RFC733] Crocker, D., Vittal, J., Pogran, K., and D. Henderson, + "Standard for the Format of ARPA Network Text Message," RFC 733, + November 1977. + + [RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text + Messages", STD 11, RFC 822, August 1982. + +9. CONTACT + + David H. Crocker Paul Overell + + Internet Mail Consortium Demon Internet Ltd + 675 Spruce Dr. Dorking Business Park + Sunnyvale, CA 94086 USA Dorking + Surrey, RH4 1HN + UK + + Phone: +1 408 246 8253 + Fax: +1 408 249 6205 + EMail: dcrocker@imc.org paulo@turnpike.com + + + + + + + + + + + + + + + + + + + + + + + + + + +Crocker & Overell Standards Track [Page 13] + +RFC 2234 ABNF for Syntax Specifications November 1997 + + +10. Full Copyright Statement + + Copyright (C) The Internet Society (1997). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Crocker & Overell Standards Track [Page 14] + |