diff options
Diffstat (limited to 'doc/rfc/rfc1849.txt')
-rw-r--r-- | doc/rfc/rfc1849.txt | 5939 |
1 files changed, 5939 insertions, 0 deletions
diff --git a/doc/rfc/rfc1849.txt b/doc/rfc/rfc1849.txt new file mode 100644 index 0000000..9e74d71 --- /dev/null +++ b/doc/rfc/rfc1849.txt @@ -0,0 +1,5939 @@ + + + + + + +Independent Submission H. Spencer +Request for Comments: 1849 SP Systems +Obsoleted by: 5536, 5537 March 2010 +Category: Historic +ISSN: 2070-1721 + + + "Son of 1036": News Article Format and Transmission + +Abstract + + By the early 1990s, it had become clear that RFC 1036, then the + specification for the Interchange of USENET Messages, was badly in + need of repair. This "Internet-Draft-to-be", though never formally + published at that time, was widely circulated and became the de facto + standard for implementors of News Servers and User Agents, rapidly + acquiring the nickname "Son of 1036". Indeed, under that name, it + could fairly be described as the best-known Internet Draft (n)ever + published, and it formed the starting point for the recently adopted + Proposed Standards for Netnews. + + It is being published now in order to provide the historical + background out of which those standards have grown. Present-day + implementors should be aware that it is NOT NOW APPROPRIATE for use + in current implementations. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for the historical record. + + This document defines a Historic Document for the Internet community. + This is a contribution to the RFC Series, independently of any other + RFC stream. The RFC Editor has chosen to publish this document at + its discretion and makes no statement about its value for + implementation or deployment. Documents approved for publication by + the RFC Editor are not a candidate for any level of Internet + Standard; see Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc1849. + + + + + + + + + +Spencer Historic [Page 1] + +RFC 1849 Son of 1036 March 2010 + + +Copyright Notice + + Copyright (c) 2010 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. + + This document may not be modified, and derivative works of it may not + be created, except to format it for publication as an RFC or to + translate it into languages other than English. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Spencer Historic [Page 2] + +RFC 1849 Son of 1036 March 2010 + + +Table of Contents + + Preface ............................................................5 + Original Abstract ..................................................6 + 1. Introduction ....................................................6 + 2. Definitions, Notations, and Conventions .........................8 + 2.1. Textual Notations ..........................................8 + 2.2. Syntax Notation ............................................9 + 2.3. Definitions ...............................................10 + 2.4. End-of-Line ...............................................13 + 2.5. Case-Sensitivity ..........................................13 + 2.6. Language ..................................................13 + 3. Relation to MAIL (RFC822, etc.) ................................14 + 4. Basic Format ...................................................15 + 4.1. Overall Syntax ............................................15 + 4.2. Headers ...................................................16 + 4.2.1. Names and Contents .................................16 + 4.2.2. Undesirable Headers ................................18 + 4.2.3. White Space and Continuations ......................18 + 4.3. Body ......................................................19 + 4.3.1. Body Format Issues .................................19 + 4.3.2. Body Conventions ...................................20 + 4.4. Characters and Character Sets .............................23 + 4.5. Non-ASCII Characters in Headers ...........................26 + 4.6. Size Limits ...............................................28 + 4.7. Example ...................................................30 + 5. Mandatory Headers ..............................................30 + 5.1. Date ......................................................31 + 5.2. From ......................................................33 + 5.3. Message-ID ................................................35 + 5.4. Subject ...................................................36 + 5.5. Newsgroups ................................................38 + 5.6. Path ......................................................42 + 6. Optional Headers ...............................................45 + 6.1. Followup-To ...............................................45 + 6.2. Expires ...................................................46 + 6.3. Reply-To ..................................................47 + 6.4. Sender ....................................................47 + 6.5. References ................................................48 + 6.6. Control ...................................................50 + 6.7. Distribution ..............................................51 + 6.8. Keywords ..................................................52 + 6.9. Summary ...................................................53 + 6.10. Approved .................................................53 + 6.11. Lines ....................................................54 + 6.12. Xref .....................................................55 + 6.13. Organization .............................................56 + 6.14. Supersedes ...............................................57 + + + +Spencer Historic [Page 3] + +RFC 1849 Son of 1036 March 2010 + + + 6.15. Also-Control .............................................57 + 6.16. See-Also .................................................58 + 6.17. Article-Names ............................................58 + 6.18. Article-Updates ..........................................60 + 7. Control Messages ...............................................60 + 7.1. cancel ....................................................61 + 7.2. ihave, sendme .............................................64 + 7.3. newgroup ..................................................66 + 7.4. rmgroup ...................................................68 + 7.5. sendsys, version, whogets .................................68 + 7.6. checkgroups ...............................................73 + 8. Transmission Formats ...........................................74 + 8.1. Batches ...................................................74 + 8.2. Encoded Batches ...........................................75 + 8.3. News within Mail ..........................................76 + 8.4. Partial Batches ...........................................77 + 9. Propagation and Processing .....................................77 + 9.1. Relayer General Issues ....................................78 + 9.2. Article Acceptance and Propagation ........................80 + 9.3. Administrator Contact .....................................82 + 10. Gatewaying ....................................................83 + 10.1. General Gatewaying Issues ................................83 + 10.2. Header Synthesis .........................................85 + 10.3. Message ID Mapping .......................................86 + 10.4. Mail to and from News ....................................88 + 10.5. Gateway Administration ...................................89 + 11. Security and Related Issues ...................................90 + 11.1. Leakage ..................................................90 + 11.2. Attacks ..................................................91 + 11.3. Anarchy ..................................................92 + 11.4. Liability ................................................92 + 12. References ....................................................93 + Appendix A. Archaeological Notes ..................................96 + A.1. "A News" Article Format ...................................96 + A.2. Early "B News" Article Format .............................96 + A.3. Obsolete Headers ..........................................97 + A.4. Obsolete Control Messages .................................97 + Appendix B. A Quick Tour of MIME ..................................98 + Appendix C. Summary of Changes Since RFC 1036 ....................103 + Appendix D. Summary of Completely New Features ...................104 + Appendix E. Summary of Differences from RFCs 822 and 1123.........105 + + + + + + + + + + +Spencer Historic [Page 4] + +RFC 1849 Son of 1036 March 2010 + + +Preface + + Although [RFC1036] was published in 1987, for many years it remained + the only formally published specification for Netnews format and + processing. It was widely considered obsolete within a few years, + and it has now been superseded by the work of the USEFOR Working + Group, leading to the publication of [RFC5536] and [RFC5537]. + However, there was an intermediate step that is of some historical + interest. + + In 1993-4, Henry Spencer wrote and informally circulated a document + that became known as "Son of 1036", meant as a first draft of a + replacement for [RFC1036]. It went no further at the time (although, + more recently, the USEFOR Working Group started from it), but has + nevertheless seen considerable use as a technical reference and even + a de facto standard, despite its informal status. + + The USEFOR work has eliminated any further relevance of Son of 1036 + as a technical reference, but it remains of historical interest. The + USEFOR Working Group has asked that it be published as an Historic + RFC, to ensure its preservation in an accessible form and facilitate + referencing it. + + This document is identical to the last distributed version of Son of + 1036, dated 2 June 1994, except for reformatting, correction of a few + minor factual or formatting errors, completion of the then-empty + Appendix D and of the References section, minor editing to match + preferred RFC style, and changes to leading and trailing material. + Remarks enclosed within "{...}" indicate explanatory material not + present in the original version. References to the current MIME + standards (and a few others) have been added (that was an unresolved + issue in 1994). + + The technical content remains unchanged, including the references to + the document itself as a Draft rather than an RFC and the presence of + unresolved issues. The original section numbering has been + preserved, although the original pagination has not (among other + reasons, it did not fully follow IETF formatting standards). + + READERS ARE CAUTIONED THAT THIS DOCUMENT IS OBSOLETE AND SHOULD NOT + BE USED AS A TECHNICAL REFERENCE. Although Son of 1036 largely + documented existing practice, it also proposed some changes, some of + which did not catch on or are no longer considered good ideas. (Of + particular note, the MIME type "message/news" should not be used.) + Consult [RFC5536] and [RFC5537] for modern technical information. + + + + + + +Spencer Historic [Page 5] + +RFC 1849 Son of 1036 March 2010 + + + Although a number of people contributed useful comments or criticism + during the preparation of this document, its contents are entirely + the opinions of the author circa 1994. Not even the author himself + agrees with them all now. + + The author thanks Charles Lindsey for his assistance in getting this + document cleaned up and formally published at last (not least, for + supplying some prodding to actually get it done!). + + The author thanks Luc Rooijakkers for supplying the MIME summary that + Appendix B is based on. + +Original Abstract + + This Draft defines the format and procedures for interchange of + network news articles. It is hoped that a later version of this + Draft will obsolete RFC 1036, reflecting more recent experience and + accommodating future directions. + + Network news articles resemble mail messages but are broadcast to + potentially large audiences, using a flooding algorithm that + propagates one copy to each interested host (or group thereof), + typically stores only one copy per host, and does not require any + central administration or systematic registration of interested + users. Network news originated as the medium of communication for + Usenet, circa 1980. Since then, Usenet has grown explosively, and + many Internet sites participate in it. In addition, the news + technology is now in widespread use for other purposes, on the + Internet and elsewhere. + + This Draft primarily codifies and organizes existing practice. A few + small extensions have been added in an attempt to solve problems that + are considered serious. Major extensions (e.g., cryptographic + authentication) that need significant development effort are left to + be undertaken as independent efforts. + +1. Introduction + + Network news articles resemble mail messages but are broadcast to + potentially large audiences, using a flooding algorithm that + propagates one copy to each interested host (or groups thereof), + typically stores only one copy per host, and does not require any + central administration or systematic registration of interested + users. Network news originated as the medium of communication for + Usenet, circa 1980. Since then, Usenet has grown explosively, and + many Internet sites participate in it. In addition, the news + technology is now in widespread use for other purposes, on the + Internet and elsewhere. + + + +Spencer Historic [Page 6] + +RFC 1849 Son of 1036 March 2010 + + + The earliest news interchange used the so-called "A News" article + format. Shortly thereafter, an article format vaguely resembling + Internet mail was devised and used briefly. Both of those formats + are completely obsolete; they are documented in Appendix A for + historical reasons only. With the publication of [RFC850] in 1983, + news articles came to closely resemble Internet mail messages, with + some restrictions and some additional headers. In 1987, [RFC1036] + updated [RFC850] without making major changes. + + In the intervening five years, the [RFC1036] article format has + proven quite satisfactory, although minor extensions appear desirable + to match recent developments in areas such as multi-media mail. + [RFC1036] itself has not proven quite so satisfactory. It is often + rather vague and does not address some issues at all; this has caused + significant interoperability problems at times, and implementations + have diverged somewhat. Worse, although it was intended primarily to + document existing practice, it did not precisely match existing + practice even at the time it was published, and the deviations have + grown since. + + This Draft attempts to specify the format of articles, and the + procedures used to exchange them and process them, in sufficient + detail to allow full interoperability. In addition, some tentative + suggestions are made about directions for future development, in an + attempt to avert unnecessary divergence and consequent loss of + interoperability. Major extensions (e.g., cryptographic + authentication) that need significant development effort are left to + be undertaken as independent efforts. + + NOTE: One question all of this may raise is: why is there no News- + Version header, analogous to MIME-Version, specifying a version + number corresponding to this specification? The answer is: it + doesn't appear to be useful, given news's backward-compatibility + constraints. The major use of a version number is indicating + which of several INCOMPATIBLE interpretations is relevant. The + impossibility of orchestrating any sort of simultaneous change + over news's installed base makes it necessary to avoid such + incompatible changes (as opposed to extensions) entirely. MIME + has a version number mostly because it introduced incompatible + changes to the interpretation of several "Content-" headers. This + Draft attempts no changes in interpretation, and it appears + doubtful that future Drafts will find it feasible to introduce + any. + + UNRESOLVED ISSUE: Should this be reconsidered? Only if the header + has SPECIFIC IDENTIFIABLE uses today. Otherwise, it's just + useless added bulk. + + + + +Spencer Historic [Page 7] + +RFC 1849 Son of 1036 March 2010 + + + As in this Draft's predecessors, the exact means used to transmit + articles from one host to another is not specified. Network News + Transfer Protocol (NNTP) [RFC977] {since replaced by [RFC3977]} is + probably the most common transmission method on the Internet, but a + number of others are known to be in use, including the Unix-To-Unix + Copy Protocol [UUCP], which was extensively used in the early days of + Usenet and is still much used on its fringes today. + + Several of the mechanisms described in this Draft may seem somewhat + strange or even bizarre at first reading. As with Internet mail, + there is no reasonable possibility of updating the entire installed + base of news software promptly, so interoperability with old software + is crucial and will remain so. Compatibility with existing practice + and robustness in an imperfect world necessarily take priority over + elegance. + +2. Definitions, Notations, and Conventions + +2.1. Textual Notations + + Throughout this Draft, "MAIL" is short for "[RFC822] as amended by + [RFC1123]". ([RFC1123]'s amendments are mostly relatively small, but + they are not insignificant.) See also the discussion in Section 3 + about this Draft's relationship to MAIL. "MIME" is short for + "[RFC1341] and [RFC1342]" (or their {since} updated replacements + {[RFC2045], [RFC2046], and [RFC2047]}). + + UNRESOLVED ISSUE: Update these numbers {now resolved!}. + + {NOTE: Since the original publication of this Draft [RFC822] has + been updated, firstly to [RFC2822] and more recently to [RFC5322]; + however, this Draft is firmly rooted in the original [RFC822]. + Similarly, [RFC821] has also received two upgrades in the + meantime.} + + "ASCII" is short for "the ANSI X3.4 character set" [X3.4]. While + "ASCII" is often misused to refer to various character sets somewhat + similar to X3.4, in this Draft, "ASCII" means [X3.4] and only [X3.4]. + + NOTE: The name is traditional (to the point where the ANSI + standard sanctions it), even though it is no longer an acronym for + the name of the standard. + + NOTE: ASCII, X3.4, contains 128 characters, not all of them + printable. Character sets with more characters are not ASCII, + although they may include it as a subset. + + + + + +Spencer Historic [Page 8] + +RFC 1849 Son of 1036 March 2010 + + + Certain words used to define the significance of individual + requirements are capitalized. "MUST" means that the item is an + absolute requirement of the specification. "SHOULD" means that the + item is a strong recommendation: there may be valid reasons to ignore + it in unusual circumstances, but this should be done only after + careful study of the full implications and a firm conclusion that it + is necessary, because there are serious disadvantages to doing so. + "MAY" means that the item is truly optional, and implementors and + users are warned that conformance is possible but not to be relied + on. + + The term "compliant", applied to implementations, etc., indicates + satisfaction of all relevant "MUST" and "SHOULD" requirements. The + term "conditionally compliant" indicates satisfaction of all relevant + "MUST" requirements but violation of at least one relevant "SHOULD" + requirement. + + This Draft contains explanatory notes using the following format. + These may be skipped by persons interested solely in the content of + the specification. The purpose of the notes is to explain why + choices were made, to place them in context, or to suggest possible + implementation techniques. + + NOTE: While such explanatory notes may seem superfluous in + principle, they often help the less-than-omniscient reader grasp + the purpose of the specification and the constraints involved. + Given the limitations of natural language for descriptive + purposes, this improves the probability that implementors and + users will understand the true intent of the specification in + cases where the wording is not entirely clear. + + All numeric values are given in decimal unless otherwise indicated. + Octets are assumed to be unsigned values for this purpose. Large + numbers are written using the North American convention, in which "," + separates groups of three digits but otherwise has no significance. + +2.2. Syntax Notation + + Although the mechanisms specified in this Draft are all described in + prose, most are also described formally in the modified BNF notation + of [RFC822]. Implementors will need to be familiar with this + notation to fully understand this specification and are referred to + [RFC822] for a complete explanation of the modified BNF notation. + Here is a brief illustrative example: + + + + + + + +Spencer Historic [Page 9] + +RFC 1849 Son of 1036 March 2010 + + + sentence = clause *( punct clause ) "." + punct = ":" / ";" + clause = 1*word [ "(" clause ")" / "," 1*word ] + word = <any English word> + + This defines a sentence as some clauses separated by puncts and ended + by a period, a punct as a colon or semicolon, a clause as at least + one <word> optionally followed by either a parenthesized clause or a + comma and at least one more <word>, and a <word> as (informally) any + English word. The characters "<>" are used to enclose names when + (and only when) distinguishing them from surrounding text is useful. + The full form of the repetition notation is "<m>*<n><thing>", + denoting <m> through <n> repetitions of <thing>; <m> defaults to + zero, <n> to infinity, and the "*" and <n> can be omitted if <m> and + <n> are equal, so 1*word is one or more words, 1*5word is one through + five words, and 2word is exactly two words. + + The character "\" is not special in any way in this notation. + + This Draft is intended to be self-contained; all syntax rules used in + it are defined within it, and a rule with the same name as one found + in MAIL does not necessarily have the same definition. The lexical + layer of MAIL is NOT, repeat NOT, used in this Draft, and its + presence must not be assumed; notably, this Draft spells out all + places where white space is permitted/required and all places where + constructs resembling MAIL comments can occur. + + NOTE: News parsers historically have been much less permissive + than MAIL parsers. + +2.3. Definitions + + The term "character set", wherever it is used in this Draft, refers + to a coded character set, in the sense of ISO character set + standardization work, and must not be misinterpreted as meaning + merely "a set of characters". + + In this Draft, ASCII character 32 is referred to as "blank"; the word + "space" has a more generic meaning. + + An "article" is the unit of news, analogous to a MAIL "message". + + A "poster" is a human being (or software equivalent) submitting a + possibly compliant article to be "posted", i.e., made available for + reading on all relevant hosts. A "posting agent" is software that + assists posters to prepare articles, including determining whether + the final article is compliant, passing it on to a relayer for + posting if so, and returning it to the poster with an explanation if + + + +Spencer Historic [Page 10] + +RFC 1849 Son of 1036 March 2010 + + + not. A "relayer" is software that receives allegedly compliant + articles from posting agents and/or other relayers, files copies in a + "news database", and possibly passes copies on to other relayers. + + NOTE: While the same software may well function both as a relayer + and as part of a posting agent, the two functions are distinct and + should not be confused. The posting agent's purpose is (in part) + to validate an article, supply header information that can or + should be supplied automatically, and generally take reasonable + actions in an attempt to transform the poster's submission into a + compliant article. The relayer's purpose is to move already- + compliant articles around efficiently without damaging them. + + A "reader" is a human being reading news articles. A "reading agent" + is software that presents articles to a reader. + + NOTE: Informal usage often uses "reader" for both these meanings, + but this introduces considerable potential for confusion and + misunderstanding, so this Draft takes care to make the + distinction. + + A "newsgroup" is a single news forum, a logical bulletin board, + having a name and nominally intended for articles on a specific + topic. An article is "posted to" a single newsgroup or several + newsgroups. When an article is posted to more than one newsgroup, it + is said to be "cross-posted"; note that this differs from posting the + same text as part of each of several articles, one per newsgroup. A + "hierarchy" is the set of all newsgroups whose names share a first + component (see the name syntax in Section 5.5). + + A newsgroup may be "moderated", in which case submissions are not + posted directly, but mailed to a "moderator" for consideration and + possible posting. Moderators are typically human but may be + implemented partially or entirely in software. + + A "followup" is an article containing a response to the contents of + an earlier article (the followup's "precursor"). A "followup agent" + is a combination of reading agent and posting agent that aids in the + preparation and posting of a followup. + + Text comparisons are "case-sensitive" if they consider uppercase + letters (e.g., "A") different from lowercase letters (e.g., "a"), and + "case-insensitive" if letters differing only in case (e.g., "A" and + "a") are considered identical. Categories of text are said to be + case-(in)sensitive if comparisons of such texts to others are case- + (in)sensitive. + + + + + +Spencer Historic [Page 11] + +RFC 1849 Son of 1036 March 2010 + + + A "cooperating subnet" is a set of news-exchanging hosts that is + sufficiently well-coordinated (typically via a central administration + of some sort) that stronger assumptions can be made about hosts in + the set than about news hosts in general. This is typically used to + relax restrictions that are otherwise required for worst-case + interoperability; members of a cooperating subnet MAY interchange + articles that do not conform to this Draft's specifications, provided + all members have agreed to this and provided the articles are not + permitted to leak out of the subnet. The word "subnet" is used to + emphasize that a cooperating subnet is typically not an isolated + universe; care must be taken that traffic leaving the subnet complies + with the restrictions of the larger net, not just those of the + cooperating subnet. + + A "message ID" is a unique identifier for an article, usually + supplied by the posting agent that posted it. It distinguishes the + article from every other article ever posted anywhere (in theory). + Articles with the same message ID are treated as identical copies of + the same article even if they are not in fact identical. + + A "gateway" is software that receives news articles and converts them + to messages of some other kind (e.g., mail to a mailing list), or + vice versa; in essence, it is a translating relayer that straddles + boundaries between different methods of message exchange. The most + common type of gateway connects newsgroup(s) to mailing list(s), + either unidirectionally or bidirectionally, but there are also + gateways between news networks using this Draft's news format and + those using other formats. + + A "control message" is an article that is marked as containing + control information; a relayer receiving such an article will + (subject to permissions, etc.) take actions beyond just filing and + passing on the article. + + NOTE: "Control article" would be more consistent terminology, but + "control message" is already well established. + + An article's "reply address" is the address to which mailed replies + should be sent. This is the address specified in the article's From + header (see Section 5.2), unless it also has a Reply-To header (see + Section 6.3). + + The notation (for example) "(ASCII 17)" following a name means "this + name refers to the ASCII character having value 17". An "ASCII + printable character" is an ASCII character in the range 33-126. An + "ASCII control character" is an ASCII character in the range 0-31, or + the character DEL (ASCII 127). A "non-ASCII character" is a + character having a value exceeding 127. + + + +Spencer Historic [Page 12] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: Blank is neither an "ASCII printable character" nor an + "ASCII control character". + +2.4. End-of-Line + + How the end of a text line is represented depends on the context and + the implementation. For Internet transmission via protocols such as + SMTP [RFC821], an end-of-line is a CR (ASCII 13) followed by an LF + (ASCII 10). ISO C [ISO/IEC9899] and many modern operating systems + indicate end-of-line with a single character, typically ASCII LF (aka + "newline"), and this is the normal convention when news is + transmitted via UUCP. A variety of other methods are in use, + including out-of-band methods in which there is no specific character + that means end-of-line. + + This Draft does not constrain how end-of-line is represented in news, + except that characters other than CR and LF MUST NOT be usurped for + use in end-of-line representations. Also, obviously, all software + dealing with a particular copy of an article must agree on the + convention to be used. "EOL" is used to mean "whatever end-of-line + representation is appropriate"; it is not necessarily a character or + sequence of characters. + + NOTE: If faced with picking an EOL representation in the absence + of other constraints, use of a single character simplifies + processing, and the ASCII standard [X3.4] specifies that if one + character is to be used for this purpose, it should be LF (ASCII + 10). + + NOTE: Inside MIME encodings, use of the Internet canonical EOL + representation (CR followed by LF) is mandatory. See [RFC2049]. + +2.5. Case-Sensitivity + + Text in newsgroup names, header parameters, etc. is case-sensitive + unless stated otherwise. + + NOTE: This is at variance with MAIL, which is case-insensitive + unless stated otherwise, but is consistent with news historical + practice and existing news software. See the comments on backward + compatibility in Section 1. + +2.6. Language + + Various constant strings in this Draft, such as header names and + month names, are derived from English words. Despite their + derivation, these words do NOT change when the poster or reader + employing them is interacting in a language other than English. + + + +Spencer Historic [Page 13] + +RFC 1849 Son of 1036 March 2010 + + + Posting and reading agents SHOULD translate as appropriate in their + interaction with the poster or reader, but the forms that actually + appear in articles are always the English-derived ones defined in + this Draft. + +3. Relation to MAIL (RFC822, etc.) + + The primary intent of this Draft is to completely describe the news + article format as a subset of MAIL's message format (augmented by + some new headers). Unless explicitly noted otherwise, the intent + throughout is that an article MUST also be a valid MAIL message. + + NOTE: Despite obvious similarities between news and mail, opinions + vary on whether it is possible or desirable to unify them into a + single service. However, it is unquestionably both possible and + useful to employ some of the same tools for manipulating both mail + messages and news articles, so there is specific advantage to be + had in defining them compatibly. Furthermore, there is no + apparent need to re-invent the wheel when slight extensions to an + existing definition will suffice. + + Given that this Draft attempts to be self-contained, it inevitably + contains considerable repetition of information found in MAIL. This + raises the possibility of unintentional conflicts. Unless + specifically noted otherwise, any wording in this Draft that permits + behavior that is not MAIL-compliant is erroneous and should be + followed only to the extent that the result remains compliant with + MAIL. + + NOTE: [RFC1036] said "where this standard conflicts with the + Internet Standard, RFC 822 should be considered correct and this + standard in error". Taken literally, this was obviously + incorrect, since [RFC1036] imposed a number of restrictions not + found in [RFC822]. The intent, however, was reasonable: to + indicate that UNINTENTIONAL differences were errors in [RFC1036]. + + Implementors and users should note that MAIL is deliberately an + extensible standard, and most extensions devised for mail are also + relevant to (and compatible with) news. Note particularly MIME, + summarized briefly in Appendix B, which extends MAIL in a number of + useful ways that are definitely relevant to news. Also of note is + the work in progress on reconciling Privacy Enhanced Mail (PEM), + which defines extensions for authentication and security) with MIME, + after which this may also be relevant to news. + + UNRESOLVED ISSUE: Update the MIME/PEM information. + + + + + +Spencer Historic [Page 14] + +RFC 1849 Son of 1036 March 2010 + + + Similarly, descriptions here of MIME facilities should be considered + correct only to the extent that they do not require or legitimize + practices that would violate those RFCs. (Note that this Draft does + extend the application of some MIME facilities, but this is an + extension rather than an alteration.) + +4. Basic Format + +4.1. Overall Syntax + + The overall syntax of a news article is: + + article = 1*header separator body + header = start-line *continuation + start-line = header-name ":" space [ nonblank-text ] eol + continuation = space nonblank-text eol + header-name = 1*name-character *( "-" 1*name-character ) + name-character = letter / digit + letter = <ASCII letter A-Z or a-z> + digit = <ASCII digit 0-9> + separator = eol + body = *( [ nonblank-text / space ] eol ) + eol = <EOL> + nonblank-text = [ space ] text-character *( space-or-text ) + text-character = <any ASCII character except NUL (ASCII 0), + HT (ASCII 9), LF (ASCII 10), CR (ASCII 13), + or blank (ASCII 32)> + space = 1*( <HT (ASCII 9)> / <blank (ASCII 32)> ) + space-or-text = space / text-character + + An article consists of some headers followed by a body. An empty + line separates the two. The headers contain structured information + about the article and its transmission. A header begins with a + header name identifying it, and can be continued onto subsequent + lines by beginning the continuation line(s) with white space. (Note + that Section 4.2.3 adds some restrictions to the header syntax + indicated here.) The body is largely unstructured text significant + only to the poster and the readers. + + NOTE: Terminology here follows the current custom in the news + community, rather than the MAIL convention of (sometimes) + referring to what is here called a "header" as a "header field" or + "field". + + Note that the separator line must be truly empty, and not just a line + containing white space. Further empty lines following it are part of + the body, as are empty lines at the end of the article. + + + + +Spencer Historic [Page 15] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: Some systems make no distinction between empty lines and + lines consisting entirely of white space; indeed, some systems + cannot represent entirely empty lines. The grammar's requirement + that header continuation lines contain some printable text is + meant to ensure that the empty/space distinction cannot confuse + identification of the separator line. + + NOTE: It is tempting to authorize posting agents to strip empty + lines at the beginning and end of the body, but such empty lines + could possibly be part of a preformatted document. + + Implementors are warned that trailing white space, whether alone on + the line or not, MAY be significant in the body, notably in early + versions of the "uuencode" encoding for binary data. Trailing white + space MUST be preserved unless the article is known to have + originated within a cooperating subnet that avoids using significant + trailing white space, and SHOULD be preserved regardless. Posters + SHOULD avoid using conventions or encodings that make trailing white + space significant; for encoding of binary data, MIME's "base64" + encoding is recommended. Implementors are warned that ISO C + implementations are not required to preserve trailing white space, + and special precautions may be necessary in implementations that do + not. + + NOTE: Unfortunately, the signature-delimiter convention (described + in Section 4.3.2) does use significant trailing white space. It's + too late to fix this; there is work underway on defining an + organized signature convention as part of MIME, which is a + preferable solution in the long run. + + Posters are warned that some very old relayer software misbehaves + when the first non-empty line of an article body begins with white + space. + +4.2. Headers + +4.2.1. Names and Contents + + Despite the restrictions on header-name syntax imposed by the + grammar, relayers and reading agents SHOULD tolerate header names + containing any ASCII printable character other than colon (":", + ASCII 58). + + NOTE: MAIL header names can contain any ASCII printable character + (other than colon) in theory, but in practice, arbitrary header + names are known to cause trouble for some news software. Section + 4.1's restriction to alphanumeric sequences separated by hyphens + is believed to permit all widely used header names without causing + + + +Spencer Historic [Page 16] + +RFC 1849 Son of 1036 March 2010 + + + problems for any widely used software. Software is nevertheless + encouraged to cope correctly with the full range of possibilities, + since aberrations are known to occur. + + Relayers MUST disregard headers not described in this Draft (that is, + with header names not mentioned in this Draft) and pass them on + unaltered. + + Posters wishing to convey non-standard information in headers SHOULD + use header names beginning with "X-". No standard header name will + ever be of this form. Reading agents SHOULD ignore "X-" headers, or + at least treat them with great care. + + The order of headers in an article is not significant. However, + posting agents are encouraged to put mandatory headers (see + Section 5) first, followed by optional headers (see Section 6), + followed by headers not defined in this Draft. + + NOTE: While relayers and reading agents must be prepared to handle + any order, having the significant headers (the precise definition + of "significant" depends on context) first can noticeably improve + efficiency, especially in memory-limited environments where it is + difficult to buffer up an arbitrary quantity of headers while + searching for the few that matter. + + Header names are case-insensitive. There is a preferred case + convention, which posters and posting agents SHOULD use: each hyphen- + separated "word" has its initial letter (if any) in uppercase and the + rest in lowercase, except that some abbreviations have all letters + uppercase (e.g., "Message-ID" and "MIME-Version"). The forms used in + this Draft are the preferred forms for the headers described herein. + Relayers and reading agents are warned that articles might not obey + this convention. + + NOTE: Although software must be prepared for the possibility of + random use of case in header names (and other case-independent + text), establishing a preferred convention reduces pointless + diversity and may permit optimized software that looks for the + preferred forms before resorting to less-efficient case- + insensitive searches. + + In general, a header can consist of several lines, with each + continuation line beginning with white space. The EOLs preceding + continuation lines are ignored when processing such a header, + effectively combining the start-line and the continuations into a + single logical line. The logical line, less the header name, colon, + and any white space following the colon, is the "header content". + + + + +Spencer Historic [Page 17] + +RFC 1849 Son of 1036 March 2010 + + +4.2.2. Undesirable Headers + + A header whose content is empty is said to be an empty header. + Relayers and reading agents SHOULD NOT consider presence or absence + of an empty header to alter the semantics of an article (although + syntactic rules, such as requirements that certain header names + appear at most once in an article, MUST still be satisfied). Posting + agents SHOULD delete empty headers from articles before posting them. + + Headers that merely state defaults explicitly (e.g., a Followup-To + header with the same content as the Newsgroups header, or a MIME + Content-Type header with contents "text/plain; charset=us-ascii") or + state information that reading agents can typically determine easily + themselves (e.g., the length of the body in octets) are redundant, + conveying no information whatsoever. Headers that state information + that cannot possibly be of use to a significant number of relayers, + reading agents, or readers (e.g., the name of the software package + used as the posting agent) are useless and pointless. Posters and + posting agents SHOULD avoid including redundant or useless headers in + articles. + + NOTE: Information that someone, somewhere, might someday find + useful is best omitted from headers. (There's quite enough of it + in article bodies.) Headers should contain information of known + utility only. This is not meant to preclude inclusion of + information primarily meant for news-software debugging, but such + information should be included only if there is real reason, + preferably based on experience, to suspect that it may be + genuinely useful. Articles passing through gateways are the only + obvious case where inclusion of debugging information appears + clearly legitimate. (See Section 10.1.) + + NOTE: A useful rule of thumb for software implementors is: "if I + had to pay a dollar a day for the transmission of this header, + would I still think it worthwhile?". + +4.2.3. White Space and Continuations + + The colon following the header name on the start-line MUST be + followed by white space, even if the header is empty. If the header + is not empty, at least some of the content MUST appear on the start- + line. Posting agents MUST enforce these restrictions, but relayers + (etc.) SHOULD accept even articles that violate them. + + NOTE: MAIL does not require white space after the colon, but it is + usual. [RFC1036] required the white space, even in empty headers, + and some existing software demands it. In MAIL, and arguably in + [RFC1036] (although the wording is vague), it is technically + + + +Spencer Historic [Page 18] + +RFC 1849 Son of 1036 March 2010 + + + legitimate for the white space to be part of a continuation line + rather than the start-line, but not all existing software will + accept this. Deleting empty headers and placing some content on + the start-line avoids this issue; this is desirable because + trailing blanks, easily deleted by accident, are best not made + significant in headers. + + In general, posters and posting agents SHOULD use blank (ASCII 32), + not tab (ASCII 9), where white space is desired in headers. Existing + software does not consistently accept tab as synonymous with blank in + all contexts. In particular, [RFC1036] appeared to specify that the + character immediately following the colon after a header name was + required to be a blank, and some news software insists on that, so + this character MUST be a blank. Again, posting agents MUST enforce + these restrictions but relayers SHOULD be more tolerant. + + Since the white space beginning a continuation line remains a part of + the logical line, headers can be "broken" into multiple lines only at + white space. Posting agents SHOULD NOT break headers unnecessarily. + Relayers SHOULD preserve existing header breaks, and SHOULD NOT + introduce new breaks. Breaking headers SHOULD be a last resort; + relayers and reading agents SHOULD handle long header lines + gracefully. (See the discussion of size limits in Section 4.6.) + +4.3. Body + + Although the article body is unstructured for most of the purposes of + this Draft, structure MAY be imposed on it by other means, notably + MIME headers (see Appendix B). + +4.3.1. Body Format Issues + + The body of an article MAY be empty, although posting agents SHOULD + consider this an error condition (meriting returning the article to + the poster for revision). A posting agent that does not reject such + an article SHOULD issue a warning message to the poster and supply a + non-empty body. Note that the separator line MUST be present even if + the body is empty. + + NOTE: An empty body is probably a poster error except, arguably, + for some control messages, and even they really ought to have a + body explaining the reason for the control message. Some old + reading agents are known to generate empty bodies for "cancel" + control messages, so posting agents might opt not to reject + bodyless articles in such cases (although it would be better to + fix the reading agents to request a body). However, some existing + news software is known to react badly to bodyless articles, hence + the request for posting agents to insert a body in such cases. + + + +Spencer Historic [Page 19] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: A possible posting-agent-supplied body text (already used by + one widespread posting agent) is "This article was probably + generated by a buggy news reader". (The use of "reader" to refer + to the reading agent is traditional, although this Draft uses more + precise terminology.) + + NOTE: The requirement for the separator line even in a bodyless + article is inherited from MAIL and also distinguishes legitimately + bodyless articles from articles accidentally truncated in the + middle of the headers. + + Note that an article body is a sequence of lines terminated by EOLs, + not arbitrary binary data, and in particular it MUST end with an EOL. + However, relayers SHOULD treat the body of an article as an + uninterpreted sequence of octets (except as mandated by changes of + EOL representation and by control-message processing) and SHOULD + avoid imposing constraints on it. See also Section 4.6. + +4.3.2. Body Conventions + + Although body lines can in principle be very long (see Section 4.6 + for some discussion of length limits), posters SHOULD restrict body + line lengths to circa 70-75 characters. On systems where text is + conventionally stored with EOLs only at paragraph breaks and other + "hard return" points, with software breaking lines as appropriate for + display or manipulation, posting agents SHOULD insert EOLs as + necessary so that posted articles comply with this restriction. + + NOTE: News originated in environments where line breaks in plain + text files were supplied by the user, not the software. Be this + good or bad, much reading-agent and posting-agent software assumes + that news articles follow this convention, so it is often + inconvenient to read or respond to articles that violate it. The + "70-75" number comes from the widespread use of display devices + that are 80 columns wide (with the number reduced to provide a bit + of margin for quoting, see below). + + Reading agents confronted with body lines much longer than the + available output-device width SHOULD break lines as appropriate. + Posters are warned that such breaks may not occur exactly where the + poster intends. + + NOTE: "As appropriate" would typically include breaking lines when + supplying the text of an article to be quoted in a reply or + followup, something that line-breaking reading agents often + neglect to do now. + + + + + +Spencer Historic [Page 20] + +RFC 1849 Son of 1036 March 2010 + + + Although styles vary widely, for plain text it is usual to use no + left margin, leave the right edge ragged, use a single empty line to + separate paragraphs, and employ normal natural-language usage on + matters such as upper/lowercase. (In particular, articles SHOULD NOT + be written entirely in uppercase. In environments where posters have + access only to uppercase, posting agents SHOULD translate it to + lowercase.) + + NOTE: Most people find substantial bodies of text entirely in + uppercase relatively hard to read, while all-lowercase text merely + looks slightly odd. The common association of uppercase with + strong emphasis adds to this. + + Tone of voice does not carry well in written text, and + misunderstandings are common when sarcasm, parody, or exaggeration + for humorous effect is attempted without explicit warning. It has + become conventional to use the sequence ":-)", which (on most output + devices) resembles a rotated "smiley face" symbol, as a marker for + text not meant to be taken literally, especially when humor is + intended. This practice aids communication and averts unintended + ill-will; posters are urged to use it. A variety of analogous + sequences are used with less-standardized meanings [Sanderson]. + + The order of arrival of news articles at a particular host depends + somewhat on transmission paths, and occasionally articles are lost + for various reasons. When responding to a previous article, posters + SHOULD NOT assume that all readers understand the exact context. It + is common to quote some of the previous article to establish context. + This SHOULD be done by prefacing each quoted line (even if it is + empty) with the character ">". This will result in multiple levels + of ">" when quoted context itself contains quoted context. + + NOTE: It may seem superfluous to put a prefix on empty lines, but + it simplifies implementation of functions such as "skip all quoted + text" in reading agents. + + Readability is enhanced if quoted text and new text are separated by + an empty line. + + Posters SHOULD edit quoted context to trim it down to the minimum + necessary. However, posting agents SHOULD NOT attempt to enforce + this by imposing overly simplistic rules like "no more than 50% of + the lines should be quotes". + + NOTE: While encouraging trimming is desirable, the 50% rule + imposed by some old posting agents is both inadequate and + counterproductive. Posters do not respond to it by being more + selective about quoting; they respond by padding short responses, + + + +Spencer Historic [Page 21] + +RFC 1849 Son of 1036 March 2010 + + + or by using different quoting styles to defeat automatic analysis. + The former adds unnecessary noise and volume, while the latter + also defeats more useful forms of automatic analysis that reading + agents might wish to do. + + NOTE: At the very least, if a minimum-unquoted quota is being set, + article bodies shorter than (say) 20 lines, or perhaps articles + that exceed the quota by only a few lines, should be exempt. This + avoids the ridiculous situation of complaining about a 5-line + response to a 6-line quote. + + NOTE: A more subtle posting-agent rule, suggested for experimental + use, is to reject articles that appear to contain quoted + signatures (see below). This is almost certainly the result of a + careless poster not bothering to trim down quoted context. Also, + if a posting agent or followup agent presents an article template + to the poster for editing, it really should take note of whether + the poster actually made any changes, and refrain from posting an + unmodified template. + + Some followup agents supply "attribution" lines for quoted context, + indicating where it first appeared and under whose name. When + multiple levels of quoting are present and quoted context is edited + for brevity, "inner" attribution lines are not always retained. The + editing process is also somewhat error-prone. Reading agents (and + readers) are warned not to assume that attributions are accurate. + + UNRESOLVED ISSUE: Should a standard format for attribution lines + be defined? There is already considerable diversity, but + automatic news analysis would be substantially aided by a standard + convention. + + Early difficulties in inferring return addresses from article headers + led to "signatures": short closing texts, automatically added to the + end of articles by posting agents, identifying the poster and giving + his network addresses, etc. If a poster or posting agent does append + a signature to an article, the signature SHOULD be preceded with a + delimiter line containing (only) two hyphens (ASCII 45) followed by + one blank (ASCII 32). Posting agents SHOULD limit the length of + signatures, since verbose excess bordering on abuse is common if no + restraint is imposed; 4 lines is a common limit. + + NOTE: While signatures are arguably a blemish, they are a well- + understood convention, and conveying the same information in + headers exposes it to mangling and makes it rather less + conspicuous. A standard delimiter line makes it possible for + reading agents to handle signatures specially if desired. + + + + +Spencer Historic [Page 22] + +RFC 1849 Son of 1036 March 2010 + + + (This is unfortunately hampered by extensive misunderstanding of, + and misuse of, the delimiter.) + + NOTE: The choice of delimiter is somewhat unfortunate, since it + relies on preservation of trailing white space, but it is too + well-established to change. There is work underway to define a + more sophisticated signature scheme as part of MIME, and this will + presumably supersede the current convention in due time. + + NOTE: Four 75-column lines of signature text is 300 characters, + which is ample to convey name and mail-address information in all + but the most bizarre situations. + +4.4. Characters and Character Sets + + Header and body lines MAY contain any ASCII characters other than CR + (ASCII 13), LF (ASCII 10), and NUL (ASCII 0). + + NOTE: CR and LF are excluded because they clash with common EOL + conventions. NUL is excluded because it clashes with the C + end-of-string convention, which is significant to most existing + news software. These three characters are unlikely to be + transmitted successfully. + + However, posters SHOULD avoid using ASCII control characters except + for tab (ASCII 9), formfeed (ASCII 12), and backspace (ASCII 8). Tab + signifies sufficient horizontal white space to reach the next of a + set of fixed positions; posters are warned that there is no standard + set of positions, so tabs should be avoided if precise spacing is + essential. Formfeed signifies a point at which a reading agent + SHOULD pause and await reader interaction before displaying further + text. Backspace SHOULD be used only for underlining, done by a + sequence of underscores (ASCII 95) followed by an equal number of + backspaces, signifying that the same number of text characters + following are to be underlined. Posters are warned that underlining + is not available on all output devices and is best not relied on for + essential meaning. Reading agents SHOULD recognize underlining and + translate it to the appropriate commands for devices that support it. + + NOTE: Interpretation of almost all control characters is device- + specific to some degree, and devices differ. Tabs and underlining + are supported, to some extent, by most modern devices and reading + agents, hence the cautious exemptions for them. The underlining + method is specified because the inverse method, text and then + underscores, is tempting to the naive; however, if sent unaltered + to a device that shows only the most recent of several overstruck + characters rather than a composite, the result can be utterly + unreadable. + + + +Spencer Historic [Page 23] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: A common interpretation of tab is that it is a request to + space forward to the next position whose number is one more than a + multiple of 8, with positions numbered sequentially starting at 1. + (So tab positions are 9, 17, 25, ...) Reading agents not + constrained by existing system conventions might wish to use this + interpretation. + + NOTE: It will typically be necessary for a reading agent to catch + and interpret formfeed, not just send it to the output device. + The actions performed by typical output devices on receiving a + formfeed are neither adequate for, nor appropriate to, the pause- + for-interaction meaning. + + Cooperating subnets that wish to employ non-ASCII character sets by + using escape sequences (employing, e.g., ESC (ASCII 27), SO + (ASCII 14), and SI (ASCII 15)) to alter the meaning of superficially + ASCII characters MAY do so, but MUST use MIME headers to alert + reading agents to the particular character set(s) and escape + sequences in use. A reading agent SHOULD NOT pass such an escape + sequence through, unaltered, to the output device unless the agent + confirms that the sequence is one used to affect character sets and + has reason to believe that the device is capable of interpreting that + particular sequence properly. + + NOTE: Cooperating-subnet organizers are warned that some very old + relayers strip certain control characters out of articles they + pass along. ESC is known to be among the affected characters. + + NOTE: There are now standard Internet encodings for Japanese + [RFC1345] and Vietnamese [RFC1456] in particular. + + Articles MUST NOT contain any octet with value exceeding 127, i.e., + any octet that is not an ASCII character. + + NOTE: This rule, like others, may be relaxed by unanimous consent + of the members of a cooperating subnet, provided suitable + precautions are taken to ensure that rule-violating articles do + not leak out of the subnet. (This has already been done in many + areas where ASCII is not adequate for the local language(s).) + Beware that articles containing non-ASCII octets in headers are a + violation of the MAIL specifications and are not valid MAIL + messages. MIME offers a way to encode non-ASCII characters in + ASCII for use in headers; see Section 4.5. + + + + + + + + +Spencer Historic [Page 24] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: While there is great interest in using 8-bit character sets, + not all software can yet handle them correctly, hence the + restriction to cooperating subnets. MIME encodings can be used to + transmit such characters while remaining within the octet + restriction. + + In anticipation of the day when it is possible to use non-ASCII + characters safely anywhere, and to provide for the (substantial) + cooperating subnets that are already using them, transmission paths + SHOULD treat news articles as uninterpreted sequences of octets + (except perhaps for transformations between EOL representations) and + relayers SHOULD treat non-ASCII characters in articles as ordinary + characters. + + NOTE: 8-bit enthusiasts are warned that not all software conforms + to these recommendations yet. In particular, standard NNTP + [RFC977] is a 7-bit protocol {but in [RFC3977] it has been upped + to 8-bit}, and there may be implementations that enforce this + rule. Be warned, also, that it will never be safe to send raw + binary data in the body of news articles, because changes of EOL + representation may (will!) corrupt it. + + Except where cooperating subnets permit more direct approaches, MIME + headers and encodings SHOULD be used to transmit non-ASCII content + using ASCII characters; see Section 4.5, Appendix B, and the MIME + RFCs for details. If article content can be expressed in ASCII, it + SHOULD be. Failing that, the order of preference for character sets + is that described in MIME. + + NOTE: Using the MIME facilities, it is possible to transmit ANY + character set, and ANY form of binary data, using only ASCII + characters. Equally important, such articles are self-describing + and the reading agent can tell which octet-to-symbol mapping is + intended! Designation of some preferred character sets is + intended to minimize the number of character sets that a reading + agent must understand in order to display most articles properly. + + Articles containing non-ASCII characters, articles using ASCII + characters (values 0 through 127) to refer to non-ASCII symbols, and + articles using escape sequences to shift character sets SHOULD + include MIME headers indicating which character set(s) and + conventions are being used. They MUST do so unless such articles are + strictly confined to a cooperating subnet that has its own pre-agreed + conventions. MIME encodings are preferred over all of these + techniques. If it comes to a relayer's attention that it is being + asked to pass an article using such techniques outward across what it + knows to be the boundary of such a cooperating subnet, it MUST report + + + + +Spencer Historic [Page 25] + +RFC 1849 Son of 1036 March 2010 + + + this error to its administrator and MAY refuse to pass the article + beyond the subnet boundary. If it does pass the article, it MUST + re-encode it with MIME encodings to make it conform to this Draft. + + NOTE: Such re-encoding is a non-trivial task, due to MIME rules + such as the prohibition of nested encodings. It's not just a + matter of pouring the body through a simple filter. + + Reading agents SHOULD note MIME headers and attempt to show the + reader the closest possible approximation to the intended content. + They SHOULD NOT just send the octets of the article to the output + device unaltered, unless there is reason to believe that the output + device will indeed interpret them correctly. Reading agents MUST NOT + pass ASCII control characters or escape sequences, other than as + discussed above, unaltered to the output device; only by chance would + the result be the desired one, and there is serious potential for + harmful side effects, either accidental or malicious. + + NOTE: Exactly what to do with unwanted control + characters/sequences depends on the philosophy of the reading + agent, but passing them straight to the output device is almost + always wrong. If the reading agent wants to mark the presence of + such a character/sequence in circumstances where only ASCII + printable characters are available, translating it to "#" might be + a suitable method; "#" is a conspicuous character seldom used in + normal text. + + NOTE: Reading agents should be aware that many old output devices + (or the transmission paths to them) zero out the top bit of octets + sent to them. This can transform non-ASCII characters into ASCII + control characters. + + Followup agents MUST be careful to apply appropriate transformations + of representation to the outbound followup as well as the inbound + precursor. A followup to an article containing non-ASCII material is + very likely to contain non-ASCII material itself. + +4.5. Non-ASCII Characters in Headers + + All octets found in headers MUST be ASCII characters. However, it is + desirable to have a way of encoding non-ASCII characters, especially + in "human-readable" headers such as Subject. MIME provides a way to + do this. Full details may be found in the MIME specifications; + herewith a quick summary to alert software authors to the issues. + + + + + + + +Spencer Historic [Page 26] + +RFC 1849 Son of 1036 March 2010 + + + encoded-word = "=?" charset "?" encoding "?" codes "?=" + charset = 1*tag-char + encoding = 1*tag-char + tag-char = < ASCII printable character except + !()<>@,;:\"[]/?= > + codes = 1*code-char + code-char = <ASCII printable character except ?> + + An encoded word is a sequence of ASCII printable characters that + specifies the character set, encoding method, and bits of + (potentially) non-ASCII characters. Encoded words are allowed only + in certain positions in certain headers. Specific headers impose + restrictions on the content of encoded words beyond that specified in + this section. Posting agents MUST ensure that any material + resembling an encoded word (complete with all delimiters), in a + context where encoded words may appear, really is an encoded word. + + NOTE: The syntax is a bit ugly, but it was designed to minimize + chances of confusion with legitimate header contents, and to + satisfy difficult constraints on use within existing headers. + + An encoded word MUST NOT be more than 75 octets long. Each line of a + header containing encoded word(s) MUST be at most 76 octets long, not + counting the EOL. + + NOTE: These limits are meant to bound the lookahead needed to + determine whether text that begins with "=?" is really an encoded + word. + + The details of charsets and encodings are defined by MIME; the + sequence of preferred character sets is the same as MIME's. Encoded + words SHOULD NOT be used for content expressible in ASCII. + + When an encoded word is used, other than in a newsgroup name (see + Section 5.5), it MUST be separated from any adjacent non-space + characters (including other encoded words) by white space. Reading + agents displaying the contents of encoded words (as opposed to their + encoded form) should ignore white space adjacent to encoded words. + + UNRESOLVED ISSUE: Should this section be deleted entirely, or made + much more terse? The material is relevant, but too complex to + discuss fully. + + NOTE: The deletion of intervening white space permits using + multiple encoded words, implicitly concatenated by the deletion, + to encode text that will not fit within a single 75-character + encoded word. + + + + +Spencer Historic [Page 27] + +RFC 1849 Son of 1036 March 2010 + + + Reading-agent implementors are warned that although this Draft + completely specifies where encoded words may appear in the headers it + defines, there are other headers (e.g., the MIME Content-Description + header) that MAY contain them. + +4.6. Size Limits + + Implementations SHOULD avoid fixed constraints on the sizes of lines + within an article and on the size of the entire article. + + Relayers SHOULD treat the body of an article as an uninterpreted + sequence of octets (except as mandated by changes of EOL + representation and processing of control messages), not to be altered + or constrained in any way. + + If it is absolutely necessary for an implementation to impose a limit + on the length of header lines, body lines, or header logical lines, + that limit shall be at least 1000 octets, including EOL + representations. Relayers and transmission paths confronted with + lines beyond their internal limits (if any) MUST NOT simply inject + EOLs at random places; they MAY break headers (as described in + Section 4.2.3) as a last resort, and otherwise they MUST either pass + the long lines through unaltered, or refuse to pass the article at + all (see Section 9.1 for further discussion). + + NOTE: The limit here is essentially the same minimum as that + specified for SMTP mail [RFC821]. Implementors are warned that + Path (see Section 5.6) and References (see Section 6.5) headers, + in particular, often become several hundred characters long, so + 1000 is not an overly generous limit. + + All implementations MUST be able to handle an article totalling at + least 65,000 octets, including headers and EOL representations, + gracefully and efficiently. All implementations SHOULD be able to + handle an article totalling at least 1,000,000 (one million) octets, + including headers and EOL representations, gracefully and + efficiently. "Gracefully and efficiently" is intended to preclude + not only failures, but also major loss of performance, serious + problems in error recovery, or resource consumption beyond what is + reasonably necessary. + + NOTE: The intent here is to prohibit lowering the existing de + facto limit any further, while strongly encouraging movement + towards a higher one. Actually, although improvements are + desirable in some cases, much news software copes reasonably well + with very large articles. The same cannot be said of the + communications software and protocols used to transmit news from + one host to another, especially when slow communications links are + + + +Spencer Historic [Page 28] + +RFC 1849 Son of 1036 March 2010 + + + involved. Occasional huge articles that appear now (by accident + or through ignorance) typically leave trails of failing software, + system problems, and irate administrators in their wake. + + NOTE: It is intended that the successor to this Draft will raise + the "MUST" limit to 1,000,000 and the "SHOULD" limit still + further. + + Posters SHOULD limit posted articles to at most 60,000 octets, + including headers and EOL representations, unless the articles are + being posted only within a cooperating subnet that is known to be + capable of handling larger articles gracefully. Posting agents + presented with a large article SHOULD warn the poster and request + confirmation. + + NOTE: The difference between this and the earlier "MUST" limit is + due to margin for header growth, differing EOL representations, + and transmission overheads. + + NOTE: Disagreeable though these limits are, it is a fact that in + current networks, an article larger than 64K (after header growth, + etc.) simply is not transmitted reliably. Note also the comments + above on the trauma caused by single extremely large articles now; + the problems are real and current. These problems arguably should + be fixed, but this will not happen network-wide in the immediate + future, hence the restriction of larger articles to cooperating + subnets, for now. + + Posters using non-ASCII characters in their text MUST take into + account the overhead involved in MIME encoding, unless the article's + propagation will be entirely limited to a cooperating subnet that + does not use MIME encodings for non-ASCII characters. For example, + MIME base64 encoding involves growth by a factor of approximately + 4/3, so an article that would likely have to use this encoding should + be at most about 45,000 octets before encoding. + + Posters SHOULD use MIME "message/partial" conventions to facilitate + automatic reassembly of a large document split into smaller pieces + for posting. It is recommended that the content identifier used + should be a message ID, generated by the same means as article + message IDs (see Section 5.3), and that all parts should have a + See-Also header (see Section 6.16) giving the message IDs of at least + the previous parts and preferably all of the parts. + + NOTE: See-Also is more correct for this purpose than References, + although References is in common use today (with less-formal + reassembly arrangements). MIME reassemblers should probably + + + + +Spencer Historic [Page 29] + +RFC 1849 Son of 1036 March 2010 + + + examine articles suggested by References headers if See-Also + headers are not present to indicate the whereabouts of the other + parts of "message/partial" articles. + + To repeat: implementations SHOULD avoid fixed constraints on the + sizes of lines within an article and on the size of the entire + article. + +4.7. Example + + Here is a sample article: + + From: jerry@eagle.ATT.COM (Jerry Schwarz) + Path: cbosgd!mhuxj!mhuxt!eagle!jerry + Newsgroups: news.announce + Subject: Usenet Etiquette -- Please Read + Message-ID: <642@eagle.ATT.COM> + Date: Mon, 17 Jan 1994 11:14:55 -0500 (EST) + Followup-To: news.misc + Expires: Wed, 19 Jan 1994 00:00:00 -0500 + Organization: AT&T Bell Laboratories, Murray Hill + + body + body + body + +5. Mandatory Headers + + An article MUST have one, and only one, of each of the following + headers: Date, From, Message-ID, Subject, Newsgroups, Path. + + NOTE: MAIL specifies (if read most carefully) that there must be + exactly one Date header and exactly one From header, but otherwise + does not restrict multiple appearances of headers. (Notably, it + permits multiple Message-ID headers!) This appears singularly + useless, or even harmful, in the context of news, and much current + news software will not tolerate multiple appearances of mandatory + headers. + + Note also that there are situations, discussed in the relevant parts + of Section 6, where References, Sender, or Approved headers are + mandatory. + + In the discussions of the individual headers, the content of each is + specified using the syntax notation. The convention used is that the + content of, for example, the Subject header is defined as + <Subject-content>. + + + + +Spencer Historic [Page 30] + +RFC 1849 Son of 1036 March 2010 + + +5.1. Date + + The Date header contains the date and time when the article was + submitted for transmission: + + Date-content = [ weekday "," space ] date space time + weekday = "Mon" / "Tue" / "Wed" / "Thu" + / "Fri" / "Sat" / "Sun" + date = day space month space year + day = 1*2digit + month = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun" + / "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec" + year = 4digit / 2digit + time = hh ":" mm [ ":" ss ] space timezone + timezone = "UT" / "GMT" + / ( "+" / "-" ) hh mm [ space "(" zone-name ")" ] + hh = 2digit + mm = 2digit + ss = 2digit + zone-name = 1*( <ASCII printable character except ()\> + / space ) + + This is a restricted subset of the MAIL date format. + + If a weekday is given, it MUST be consistent with the date. The + modern Gregorian calendar is used, and dates MUST be consistent with + its usual conventions; for example, if the month is May, the day must + be between 1 and 31 inclusive. The year SHOULD be given as four + digits, and posting agents SHOULD enforce this; however, relayers + MUST accept the two-digit form, and MUST interpret it as having the + implicit prefix "19". + + NOTE: Two-digit year numbers can, should, and must be phased out + by 1999. + + The time is given on the 24-hour clock, e.g., two hours before + midnight is "22:00" or "22:00:00". The hh must be between 00 and 23 + inclusive, the mm between 0 and 59 inclusive, and the ss between 0 + and 60 inclusive. + + NOTE: Leap seconds very occasionally result in minutes that are 61 + seconds long. + + The date and time SHOULD be given in the poster's local time zone, + including a specification of that time zone as a numeric offset + (which SHOULD include the time zone name, e.g., "EST", supplied in + parentheses like a MAIL comment). If not, they MUST be given in + Universal Time (abbreviated "UT"; "GMT" is a historical synonym for + + + +Spencer Historic [Page 31] + +RFC 1849 Son of 1036 March 2010 + + + "UT"). The time zone name in parentheses, if present, is a comment; + software MUST ignore it, except that reading agents might wish to + display it to the reader. Time zone names other than "UT" and "GMT" + MUST appear only in the comment. + + NOTE: Attempts to deal with a full set of time zone names have all + foundered on the vast number of such names in use and the + duplications (for example, there are at least FIVE different time + zones called "EST" by somebody). Even the limited set of North + American zone names authorized by MAIL is subject to confusion and + misinterpretation, hence the flat ban on non-UT time zone names, + except as comments. + + NOTE: [RFC1036] specified that use of GMT (aka UT, UTC) was + preferred. However, the local time (in the poster's time zone) is + arguably information of possible interest to the reader, and this + requires some indication of the poster's time zone. Numeric + offsets are an unambiguous way of doing this, and their use was + indeed sanctioned by [RFC1036] (that is, this is a change of + preference only). + + NOTE: There is frequent confusion, including errors in some news + software, regarding the sign of numeric time zones. Zones west of + Greenwich have negative offsets. For example, North American + Eastern Standard Time is zone -0500 and North American Eastern + Daylight Time is zone -0400. + + NOTE: Implementors are warned that the hh in a time zone can go up + to about 14; it is not limited to 12. This is because the + International Date Line does not run exactly along the boundary + between zone -1200 and zone +1200. + + NOTE: The comments in Section 2.6 regarding translation to other + languages are relevant here. The Date-content format, and the + spellings of its components, as found in articles themselves, are + always as defined in this Draft, regardless of the language used + to interact with readers and posters. Reading and posting agents + should translate as appropriate. Actually, even English-language + reading and posting agents will probably want to do some degree of + translation on dates, if only to abbreviate the lengthy format and + (perhaps) translate to and from the reader's time zone. + + + + + + + + + + +Spencer Historic [Page 32] + +RFC 1849 Son of 1036 March 2010 + + +5.2. From + + The From header contains the electronic address, and possibly the + full name, of the article's author: + + From-content = address [ space "(" paren-phrase ")" ] + / [ plain-phrase space ] "<" address ">" + paren-phrase = 1*( paren-char / space / encoded-word ) + paren-char = <ASCII printable character except ()<>\> + plain-phrase = plain-word *( space plain-word ) + plain-word = unquoted-word / quoted-word / encoded-word + unquoted-word = 1*unquoted-char + unquoted-char = <ASCII printable character except !()<>@,;:\".[]> + quoted-word = quote 1*( quoted-char / space ) quote + quote = <" (ASCII 34)> + quoted-char = <ASCII printable character except "()<>\> + address = local-part "@" domain + local-part = unquoted-word *( "." unquoted-word ) + domain = unquoted-word *( "." unquoted-word ) + + (Encoded words are described in Section 4.5.) The full name is + distinguished from the electronic address either by enclosing the + former in parentheses (making it resemble a MAIL comment, after the + address) or by enclosing the latter in angle brackets. The second + form is preferred. In the first form, encoded words inside the full + name MUST be composed entirely of <paren-char>s. In the second form, + encoded words inside the full name may not contain characters other + than letters (of either case), digits, and the characters "!", "*", + "+", "-", "/", "=", and "_". The local part is case-sensitive + (except that all case counterparts of "postmaster" are deemed + equivalent), the domain is case-insensitive, and all other parts of + the From content are comments that MUST be ignored by news software + (except insofar as reading agents may wish to display them to the + reader). Posters and posting agents MUST restrict themselves to this + subset of the MAIL From syntax; relayers MAY accept a broader subset, + but see the discussion in Section 9.1. + + NOTE: The syntax here is a restricted subset of the MAIL From + syntax, with quoting particularly restricted, for simple parsing. + In particular, the presence of "<" in the From content indicates + that the second form is being used; otherwise, the first form is + being used. The major restrictions here are those already de + facto imposed by existing software. + + NOTE: Overly lenient posting agents sometimes permit the second + form with a full name containing "(" or ")", but it is extremely + rare for a full name to contain "<" or ">", even in mail. + + + + +Spencer Historic [Page 33] + +RFC 1849 Son of 1036 March 2010 + + + Accordingly, reading agents wishing to robustly determine which + form is in use in a particular article should key on the presence + or absence of "<", not the presence or absence of "(". + + The address SHOULD be a valid and complete Internet domain address, + capable of being successfully mailed to by an Internet host (possibly + via an MX (Mail Exchange) record and a forwarder). The pseudo-domain + ".uucp" MAY be used for hosts registered in the UUCP maps (e.g., name + "xyz.uucp" for registered site "xyz"), but such hosts SHOULD + discontinue this usage (either by arranging a proper Internet address + and forwarder, or by using the "% hack" (see below)), as soon as + possible. Bitnet hosts SHOULD use Internet addresses, avoiding the + obsolescent ".bitnet" pseudo-domain. Other forms of address MUST NOT + be used. + + NOTE: "Other forms" specifically include UK-style "backward" + domains ("uk.oxbridge.cs" is in the Czech Republic, not the UK), + pure-UUCP addressing ("knee!shin!foot" instead of + "foot%shin@knee.uucp"), and abbreviated domains ("zebra.zoo" + instead of "zebra.zoo.toronto.edu"). + + If it is necessary to use the local part to specify a routing + relative to the nearest Internet host, this MUST be done using the "% + hack", using "%" as a secondary "@". For example, to specify that + mail to the address should go to Internet host "foo.bar.edu", then to + non-Internet host "ein", then to non-Internet host "deux", for + delivery there to mailbox "fred", a suitable address would be: + + fred%deux%ein@foo.bar.edu + + Analogous forms using "!" in the local part MUST NOT be used, as they + are ambiguous; they should be expressed in the "%" form. + + NOTE: "a!b@c" can be interpreted as either "b%c@a" or "b%a@c", and + there is no consistency in which choice is made. Such addresses + consequently are unreliable. The "%" form does not suffer from + this problem, and although its use is officially discouraged, it + is a de facto standard, to the point that MAIL recognizes it. + + Relayers MUST NOT, repeat MUST NOT, repeat MUST NOT, rewrite From + lines, in any way, however minor or seemingly innocent. Trying to + "fix" a non-conforming address has a very high probability of making + things worse. Either pass it along unchanged or reject the article. + + NOTE: An additional reason for banning the use of "!" addressing + is that it has a much higher probability of being rewritten into + mangled unrecognizability by old relayers. + + + + +Spencer Historic [Page 34] + +RFC 1849 Son of 1036 March 2010 + + + Posters and posting agents SHOULD avoid use of the characters "!" and + "@" in full names, as they may trigger unwanted header rewriting by + old, simple-minded news software. + + NOTE: Also, the characters "." and ",", not infrequently found in + names (e.g., "John W. Campbell, Jr."), are NOT, repeat NOT, + allowed in an unquoted word. A From header like the following + MUST NOT be written without the quotation marks: + + From: "John W. Campbell, Jr." <editor@analog.com> + +5.3. Message-ID + + The Message-ID header contains the article's message ID, a unique + identifier distinguishing the article from every other article: + + Message-ID-content = message-id + message-id = "<" local-part "@" domain ">" + + As with From addresses, a message ID's local part is case-sensitive, + and its domain is case-insensitive. The "<" and ">" are parts of the + message ID, not peculiarities of the Message-ID header. + + NOTE: News message IDs are a restricted subset of MAIL message + IDs. In particular, no existing news software copes properly with + MAIL quoting conventions within the local part, so they are + forbidden. This is unfortunate, particularly for X.400 gateways + that often wish to include characters that are not legal in + unquoted message IDs, but it is impossible to fix net-wide. See + the notes on gatewaying in Section 10. + + The domain in the message ID SHOULD be the full Internet domain name + of the posting agent's host. Use of the ".uucp" pseudo-domain (for + hosts registered in the UUCP maps) or the ".bitnet" pseudo-domain + (for Bitnet hosts) is permissible but SHOULD be avoided. + + Posters and posting agents MUST generate the local part of a message + ID using an algorithm that obeys the specified syntax (words + separated by ".", with certain characters not permitted) (see Section + 5.2 for details) and will not repeat itself (ever). The algorithm + SHOULD NOT generate message IDs that differ only in case of letters. + Note the specification in Section 6.5 of a recommended convention for + indicating subject changes. Otherwise, the algorithm is up to the + implementor. + + NOTE: The crucial use of message IDs is to distinguish circulating + articles from each other and from articles circulated recently. + They are also potentially useful as permanent indexing keys, hence + + + +Spencer Historic [Page 35] + +RFC 1849 Son of 1036 March 2010 + + + the requirement for permanent uniqueness, but indexers cannot + absolutely rely on this because the earlier RFCs urged it but did + not demand it. All major implementations have always generated + permanently unique message IDs by design, but in some cases this + is sensitive to proper administration, and duplicates may have + occurred by accident. + + NOTE: The most popular method of generating local parts is to use + the date and time, plus some way of distinguishing between + simultaneous postings on the same host (e.g., a process number), + and encode them in a suitably restricted alphabet. An older but + now less-popular alternative is to use a sequence number, + incremented each time the host generates a new message ID; this is + workable but requires careful design to cope properly with + simultaneous posting attempts, and it is not as robust in the + presence of crashes and other malfunctions. + + NOTE: Some buggy news software considers message IDs completely + case-insensitive, hence the advice to avoid relying on case + distinctions. The restrictions placed on the "alphabet" of local + parts and domains in Section 5.2 have the useful side effect of + making it unnecessary to parse message IDs in complex ways to + break them into case-sensitive and case-insensitive portions. + + The local part of a message ID MUST NOT be "postmaster" or any other + string that would compare equal to "postmaster" in a case-insensitive + comparison. Message IDs MUST be no longer than 250 octets, including + the "<" and ">". + + NOTE: "Postmaster" is an irksome exception to case-sensitivity in + local parts, inherited from MAIL, and simply avoiding it is the + best way to deal with it (not that it's likely, but the issue + needs to be dealt with). The length limit is undesirable but is + present in widely used existing software. The limit is actually + 255, but a small safety margin is wise. + +5.4. Subject + + The Subject header's content (the "subject" of the article) is a + short phrase describing the topic of the article: + + Subject-content = [ "Re: " ] nonblank-text + + Encoded words MAY appear in this header. + + + + + + + +Spencer Historic [Page 36] + +RFC 1849 Son of 1036 March 2010 + + + If the article is a followup, the subject SHOULD begin with "Re: " (a + "back reference"). If the article is not a followup, the subject + MUST NOT begin with a back reference. Back references are case- + insensitive, although "Re: " is the preferred form. A followup agent + assisting a poster in preparing a followup SHOULD prepend a back + reference, UNLESS the subject already begins with one. If the poster + determines that the topic of the followup differs significantly from + what is described in the subject, a new, more descriptive subject + SHOULD be substituted (with no back reference). An article whose + subject begins with a back reference MUST have a References header + referencing the precursor. + + NOTE: A back reference is FOUR characters, the fourth being a + blank. [RFC1036] was confused about this. Observe also that only + ONE back reference should be present. + + NOTE: There is a semi-standard convention, often used, in which a + subject change is flagged by making the new Subject-content of the + form: + + new topic (was: old topic) + + possibly with "old topic" somewhat truncated. Posters wishing to + do something like this are urged to use this exact form, to + simplify automated analysis. + + For historical reasons, the subject MUST NOT begin with "cmsg " (note + that this sequence ends with a blank). + + NOTE: Some old news software takes a subject beginning with + "cmsg " as an indication that the article is a control message + (see Sections 6.6 and 7). This mechanism is obsolete and + undesirable, but accidental triggering of it is still possible. + + The subject SHOULD be terse. Posters SHOULD avoid trying to cram + their entire article into the headers; even the simplest query + usually benefits from a sentence or two of elaboration and context, + and the details of header display vary widely among reading agents. + + NOTE: All-in-the-subject articles are sometimes the result of + misunderstandings over the interaction protocol of a posting + agent. Posting agents might wish to give special attention to the + possibility that a poster specifying a very long subject might + have thought he was typing the body of the article. + + + + + + + +Spencer Historic [Page 37] + +RFC 1849 Son of 1036 March 2010 + + +5.5. Newsgroups + + The Newsgroups header's content specifies to which newsgroup(s) the + article is posted: + + Newsgroups-content = newsgroup-name *( ng-delim newsgroup-name ) + newsgroup-name = plain-component *( "." component ) + component = plain-component / encoded-word + plain-component = component-start *13component-rest + component-start = lowercase / digit + lowercase = <letter a-z> + component-rest = component-start / "+" / "-" / "_" + ng-delim = "," + + Encoded words used in newsgroup names MUST NOT contain characters + other than letters, digits, "+", "-", "/", "_", "=", and "?" + (although they may encode them). + + A newsgroup name consists of one or more components, which may be + plain components or (except for the first) encoded words. A plain + component MUST contain at least one letter, MUST begin with a letter + or digit, and MUST NOT be longer than 14 characters. The first + component MUST begin with a letter; subsequent components SHOULD + begin with a letter. Newsgroup names MUST NOT contain uppercase + letters, except where required by encodings in encoded words. The + sequences "all" and "ctl" MUST NOT be used as components. + + NOTE: The alphabet and syntax specified encompasses all existing + names of widespread newsgroups, while avoiding various forms that + are known to cause problems. Important existing software uses + various non-alphanumeric characters as punctuation adjacent to + newsgroup names. (It would, in fact, be preferable to ban "+" + from newsgroup names, were it not that several widespread + newsgroups related to the C++ programming language already use + it.) + + NOTE: Much existing software converts the newsgroup name into a + directory path and stores the articles themselves using numeric + filenames, so all-digit name components can be troublesome; the + "Great Renaming" early in the history of Usenet included revisions + of several newsgroup names to eliminate such components. + + NOTE: The same storage technique is the reason for the + 14-character limit. The limit is now largely historical, since + most modern systems have much larger limits on the length of a + directory entry's name, but many old systems are still in use. + Systems with shorter limits also exist, but news software on such + systems has had to deal with the problem already, since there are + + + +Spencer Historic [Page 38] + +RFC 1849 Son of 1036 March 2010 + + + several widespread newsgroups with 14-character components in + their names. Implementors are warned that it is intended that the + successor to this Draft will increase the 14-character limit, and + they are urged to fix their software to handle longer names + gracefully (if such fixes are necessary, given the intended domain + of application of the particular software). + + NOTE: The requirement that the first character of a name be a + letter accommodates existing software that assumes it can tell the + difference between a newsgroup name and other possible syntactic + entities by inspecting the first character. Similar + considerations motivate excluding "+", "-", and "_" from coming + first in a component, and the preference for components that do + not begin with digits. The "all" sequence is used as a wildcard + symbol in much existing software, and the "ctl" sequence was + involved in an obsolete historical mechanism for marking control + messages, so they are best avoided. + + NOTE: Possibly newsgroup names should have been case-insensitive, + but all existing software treats them as case-sensitive. + ([RFC977] claims that they are case-insensitive in NNTP, but + existing implementations are believed to ignore this.) The + simplest solution is just to ban use of uppercase letters, since + no widespread newsgroup name uses them anyway; this avoids any + possibility of confusion. + + NOTE: The syntax has the disadvantage of containing no white + space, making it impossible to continue a Newsgroups header across + several lines. Implementors of relayers and reading agents are + warned that it is intended that the successor to this Draft will + change the definition of ng-delim to: + + ng-delim = "," [ space ] + + and are urged to fix their software to handle (i.e., ignore) white + space following the commas. Meanwhile, posters must avoid + inserting such space (despite the natural-language convention that + permits it), and posting agents should strip it out. + + NOTE: Encoded words as components are somewhat problematic but are + clearly desirable for use in non-English-speaking nations. They + are not subject to the 14-character limit, and this (plus the + possibility of "/" within them) may require special handling in + news software. + + + + + + + +Spencer Historic [Page 39] + +RFC 1849 Son of 1036 March 2010 + + + Encoded words are allowed in newsgroup names ONLY where non-ASCII + characters are necessary to the name, and they must use the "b" + encoding [RFC2045] and the first suitable character set in the MIME + order of preferred character sets [RFC2047] {ASCII before ISO-8859-* + before anything else}. + + NOTE: Since the newsgroup name is the encoded form, NOT the + underlying non-ASCII form, there is room for terrible confusion + here if the choice of encoding for a particular name is not fully + standardized. + + Posters SHOULD use only the names of existing newsgroups in the + Newsgroups header, because newsgroups are NOT created simply by being + posted to. However, it is legitimate to cross-post to newsgroup(s) + that do not exist on the posting agent's host, provided that at least + one of the newsgroups DOES exist there, and followup agents MUST + accept this (posting agents MAY accept it, but SHOULD at least alert + the poster to the situation and request confirmation). Relayers MUST + NOT rewrite Newsgroups headers in any way, even if some or all of the + newsgroups do not exist on the relayer's host. + + NOTE: Early experience with news software that created newsgroups + when they were mentioned in a Newsgroups header was thoroughly + negative: posters frequently mistype newsgroup names. + + NOTE: While it is legitimate for some of an article's newsgroups + not to exist on the host where it is posted, this IS a rather + unusual situation except in followups (which should go to all + newsgroups the precursor was posted to, even if not all of them + reach the site where the followup is being posted). + + NOTE: Rewriting Newsgroups headers to strip locally unknown + newsgroups is superficially attractive. However, early experience + with exactly that policy was thoroughly negative: news propagation + is more redundant and much less orderly than many people imagine, + and in particular it is not unheard of for the (sometimes) fastest + path between two (say) University of Toronto sites to pass outside + the University of Toronto, in which case newsgroup stripping can + cause incomplete propagation. Having an article's set of + newsgroups change as it propagates can also result in followups + not achieving the same propagation as the original. It's been + tried; it's more trouble than it's worth; don't do it. + + NOTE: In particular, newsgroup stripping superficially looks like + a solution to the problem of duplicate regional newsgroup names. + For example, both the University of Toronto and the University of + Texas have "ut.general" newsgroups, and material cross-posted to + that name and a global newsgroup appears in both universities' + + + +Spencer Historic [Page 40] + +RFC 1849 Son of 1036 March 2010 + + + local newsgroups. However, the side effects of stripping are + sufficiently unacceptable to disqualify it for this purpose. + Don't do it. + + Cross-posting an article to several relevant newsgroups is far + superior to posting separate articles with duplicated content to each + newsgroup, because reading agents can detect the situation and show + the article to a reader only once. Posters SHOULD cross-post rather + than duplicate-post. + + NOTE: On the other hand, cross-posting to a large number of + newsgroups usually indicates that the poster has not thought about + his audience; articles are rarely pertinent to more than (say) + half a dozen newsgroups. Posting agents might wish to request + confirmation when the number of newsgroups exceeds (say) five in + the presence of a Followup-To header, or (say) two in the absence + of such a header. + + NOTE: One problem with cross-postings is what to do with an + article cross-posted to a set of newsgroups including both + moderated and unmoderated ones. Posters tend to expect such an + article to show up immediately in the unmoderated newsgroups, + especially if they do not realize that one or more of the + newsgroups is moderated. However, since it is not possible for a + moderator to retroactively add an already-posted article to a + moderated newsgroup, the only correct action is to mail such an + article to one (and only one) of the moderators for action. It is + probably best for the posting agent to detect this situation and + ask the poster what action is preferred. The acceptable choices + are to alter the newsgroup list or to mail to a moderator of the + poster's choice; the posting agent should NOT offer duplicate- + posting as an easy-to-request option (if only because many + moderators will reject a submission that has already been posted + to unmoderated newsgroups). + + NOTE: An article cross-posted to multiple moderated newsgroups + really should have approval from all of the moderators involved. + In practice, the only straightforward way to do this is to send + the article to one of them and have him consult the others. + + A newsgroup SHOULD NOT appear more than once in the Newsgroups + header. + + Newsgroup names having only one component are reserved for newsgroups + whose propagation is restricted to a single host (or the + administrative equivalent). It is inadvisable to name a newsgroup + + + + + +Spencer Historic [Page 41] + +RFC 1849 Son of 1036 March 2010 + + + "poster" because that word has special meaning in the Followup-To + header (see Section 6.1). The names "control" and "junk" are + frequently used for pseudo-newsgroups internal to relayer + implementations, and hence are also best avoided. + + NOTE: Beware of the duplicate-regional-newsgroup-names problem + mentioned above. In particular, there are many, many hosts with a + newsgroup named "general", and some surprising things show up in + such newsgroups when people cross-post. It is probably better to + use multi-component names, which are less likely to be duplicated. + Fred's Widget House should use "fwh.general" rather than just + "general" as its in-house general-topics newsgroup. + + It is conventional to reserve newsgroup names beginning with "to." + for test messages sent on an essentially point-to-point basis (see + also the ihave/sendme protocol described in Section 7.2); newsgroup + names beginning with "to." SHOULD NOT be used for any other purpose. + The second (and possibly later) components of such a name should, + together, comprise the relayer name (see Section 5.6) of a relayer. + The newsgroup exists only at the named relayer and its neighbors. + The neighbors all pass that newsgroup to the named relayer, while the + named relayer does not pass it to anyone. + + The order of newsgroup names in the Newsgroups header is not + significant. + +5.6. Path + + The Path header's content indicates which relayers the article has + already visited, so that unnecessary redundant transmission can be + avoided: + + Path-content = [ path-list path-delimiter ] local-part + path-list = relayer-name *( path-delimiter relayer-name ) + relayer-name = 1*rn-char + rn-char = letter / digit / "." / "-" / "_" + path-delimiter = "!" + + The Path content is a list of relayer names, separated by path + delimiters, followed (after a final delimiter) by the local part of a + mailing address. Each relayer MUST prepend its name, and a + delimiter, to the Path content in all articles it processes. A + relayer MUST NOT pass an article to a neighboring relayer whose name + is already mentioned in an article's path list, unless this is + explicitly requested by the neighbor in some way. The Path content + is case-sensitive. + + + + + +Spencer Historic [Page 42] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: The Path header supplied by a posting agent should normally + contain only the local part. The relayer that the posting agent + passes the article to for posting will prepend its relayer name to + get the path list started. + + NOTE: Observe that the trailing local part is NOT part of the path + list. This Path header: + + Path: fee!fie!foe!fum + + contains three relayer names: "fee", "fie", and "foe". A relayer + named "fum" is still eligible to be sent this article. + + NOTE: This syntax has the disadvantage of containing no white + space, making it impossible to continue a Path header across + several lines. Implementors of relayers and reading agents are + warned that it is intended that the successor to this Draft will + change the definition of path delimiter to: + + path-delimiter = "!" [ space ] + + and are urged to fix their software to handle (i.e., ignore) white + space following the exclamation points. They are urged to hurry; + some ill-behaved systems reportedly already feel free to add such + white space. + + NOTE: [RFC1036] allows considerably more flexibility in choice of + delimiter, in theory, but this flexibility has never been used, + and most news software does not implement it properly. The + grammar reflects the current reality. Note, in particular, that + [RFC1036] treats "_" as a delimiter, but in fact it is known to + appear in relayer names occasionally. + + Because an article will not propagate to a relayer already mentioned + in its path list, the path list MUST NOT contain any names other than + those of relayers the article has passed through AS NEWS. This is + trivially obvious for normal news articles but requires attention + from the moderators of moderated newsgroups and the implementors and + maintainers of gateways. + + NOTE: For the same reason, a relayer and its neighbors need to + agree on the choice of relayer name, and names should not be + changed without notifying neighbors. + + Relayer names need to be unique among all relayers that will ever see + the articles using them. A relayer name is normally either an + "official" name for the host the relayer runs on, or some other + "official" name controlled by the same organization. Except in + + + +Spencer Historic [Page 43] + +RFC 1849 Son of 1036 March 2010 + + + cooperating subnets that agree to some other convention and don't let + articles using it escape beyond the subnet, a relayer name MUST be + either a UUCP name registered in the UUCP maps (without any domain + suffix such as ".UUCP") or a complete Internet domain name. Use of a + (registered) UUCP name is recommended, where practical, to keep the + length of the path list down. + + The use of Internet domain names in the path list presents one + problem: domain names are case-insensitive, but the path list is + case-sensitive. Relayers using domain names as their relayer names + MUST pick a standard form for the name and use that form consistently + to the exclusion of all others. The preferred form for this purpose, + which relayers SHOULD use, is the all-lowercase form. + + NOTE: It is arguably unfortunate that the path list is case- + sensitive, but it is much too late to change this. Most Internet + sites do, in any event, use one standardized form of their name + almost everywhere. + + In the ordinary case, where the poster is the author of the article, + the local part following the path list SHOULD be the local part of + the poster's full Internet domain mailing address. + + NOTE: It should be just the local part, not the full address. The + character "@" does not appear in a Path header. + + The Path content somewhat resembles a mailing address, particularly + in the UUCP world with its manual routing and "!" address syntax. + Historically, this resemblance was important, and the Path content + was often used as a reply address. This practice has always been + somewhat unreliable, since news paths are not always mail paths and + news relayer names are not always recognized by mail handlers, and + its reliability has generally worsened in recent times. The + widespread use of and recognition of Internet domain addresses, even + outside the actual Internet, has largely eliminated the problem. + Readers SHOULD NOT use the Path content as a reply address. On the + other hand, relayer administrators are urged not to break this usage + without good reason; where practical, paths followed by news SHOULD + be traversable by mail, and mail handlers SHOULD recognize relayer + names as host names. + + It will typically be difficult or impractical for gateways and + moderators to supply a Path content that is useful as a reply address + for the author, bearing in mind that the path list they supply will + normally be empty. (To reiterate: the path list MUST NOT contain any + names other than those of relayers the article has passed through AS + NEWS.) They SHOULD supply a local part that will result in replies + + + + +Spencer Historic [Page 44] + +RFC 1849 Son of 1036 March 2010 + + + to a Path-derived address being returned to the sender with a brief + explanation. Software permitting, the local part "not-for-mail" is + recommended. + + NOTE: A moderator or gateway administrator who supplies a local + part that delivers such mail to an administrative mailbox will + quickly discover why it should be bounced automatically! It is + best, however, for the returned message to include an explanation + of what has probably happened, rather than just a mysterious + "undeliverable mail" complaint, since the sender may not be aware + that his/her software is unwisely using the Path content as a + reply address. Reply software might wish to question attempts to + reply to a Path-derived address ending in "not-for-mail" (which is + why a specific name is being recommended here). + +6. Optional Headers + + Many MAIL headers, and many of those specified in present and future + MAIL extensions, are potentially applicable to news. Headers + specific to MAIL's point-to-point transmission paradigm, e.g., To and + Cc, SHOULD NOT appear in news articles. (Gateways wishing to + preserve such information for debugging probably SHOULD hide it under + different names; prefixing "X-" to the original headers, resulting in + forms like "X-To", is suggested.) + + The following optional headers are either specific to news or of + particular note in news articles; an article MAY contain some or all + of them. (Note that there are some circumstances in which some of + them are mandatory; these are explained under the individual + headers.) An article MUST NOT contain two or more headers with any + one of these header names. + + NOTE: The ban on duplicate header names does not apply to headers + not specified in this Draft, such as "X-" headers. Software + should not assume that all header names in a given article are + unique. + +6.1. Followup-To + + The Followup-To header contents specify to which newsgroup(s) + followups should be posted: + + Followup-To-content = Newsgroups-content / "poster" + + + + + + + + +Spencer Historic [Page 45] + +RFC 1849 Son of 1036 March 2010 + + + The syntax is the same as that of the Newsgroups content, with the + exception that the magic word "poster" means that followups should be + mailed to the article's reply address rather than posted. In the + absence of Followup-To, the default newsgroup(s) for a followup are + those in the Newsgroups header. + + NOTE: The way to request that followups be mailed to a specific + address other than that in the From line is to supply + "Followup-To: poster" and a Reply-To header. Putting a mailing + address in the Followup-To line is incorrect; posting agents + should reject or rewrite such headers. + + NOTE: There is no syntax for "no followups allowed" because + "Followup-To: poster" accomplishes this effect without extra + machinery. + + Although it is generally desirable to limit followups to the smallest + reasonable set of newsgroups, especially when the precursor was + cross-posted widely, posting agents SHOULD NOT supply a Followup-To + header except at the poster's explicit request. + + NOTE: In particular, it is incorrect for the posting agent to + assume that followups to a cross-posted article should be directed + to the first newsgroup only. Trimming the list of newsgroups + should be the poster's decision, not the posting agent's. + However, when an article is to be cross-posted to a considerable + number of newsgroups, a posting agent might wish to SUGGEST to the + poster that followups go to a shorter list. + +6.2. Expires + + The Expires header content specifies a date and time when the article + is deemed to be no longer useful and should be removed ("expired"): + + Expires-content = Date-content + + The content syntax is the same as that of the Date content. In the + absence of Expires, the default is decided by the administrators of + each host the article reaches, who MAY also restrict the extent to + which the Expires header is honored. + + The Expires header has two main applications: removing articles whose + utility ends on a specific date (e.g., event announcements that can + be removed once the day of the event has passed) and preserving + articles expected to be of prolonged usefulness (e.g., information + aimed at new readers of a newsgroup). The latter application is + sometimes abused. Since individual hosts have local policies for + expiration of news (depending on available disk space, for instance), + + + +Spencer Historic [Page 46] + +RFC 1849 Son of 1036 March 2010 + + + posters SHOULD NOT provide Expires headers for articles unless there + is a natural expiration date associated with the topic. Posting + agents MUST NOT provide a default Expires header. Leave it out and + allow local policies to be used unless there is a good reason not to. + Expiry dates are properly the decision of individual host + administrators; posters and moderators SHOULD set only expiry dates + with which most administrators would agree. + + NOTE: A poster preparing an Expires header for an article whose + utility ends on a specific day should typically specify the NEXT + day as the expiry date. A meeting on July 7th remains of interest + on the 7th. + +6.3. Reply-To + + The Reply-To header content specifies a reply address different from + the author's address given in the From header: + + Reply-To-content = From-content + + In the absence of Reply-To, the reply address is the address in the + From header. + + Use of a Reply-To header is preferable to including a similar request + in the article body, because reply-preparation software can take + account of Reply-To automatically. + +6.4. Sender + + The Sender header identifies the poster, in the event that this + differs from the author identified in the From header: + + Sender-content = From-content + + In the absence of Sender, the default poster is the author (named in + the From header). + + NOTE: The intent is that the Sender header have a fairly high + probability of identifying the person who really posted the + article. The ability to specify a From header naming someone + other than the poster is useful but can be abused. + + If the poster supplies a From header, the posting agent MUST ensure + that a Sender header is present, unless it can verify that the + mailing address in the From header is a valid mailing address for the + poster. A poster-supplied Sender header MAY be used, if its mailing + address is verifiably a valid mailing address for the poster; + + + + +Spencer Historic [Page 47] + +RFC 1849 Son of 1036 March 2010 + + + otherwise, the posting agent MUST supply a Sender header and delete + (or rename, for example, to X-Unverifiable-Sender) any poster- + supplied Sender header. + + NOTE: It might be useful to preserve a poster-supplied Sender + header so that the poster can supply the full-name part of the + content. The mailing address, however, must be right, hence, the + posting agent must generate the Sender header if it is unable to + verify the mailing address of a poster-supplied one. + + NOTE: NNTP implementors, in particular, are urged to note this + requirement (which would eliminate the need for ad hoc headers + like NNTP-Posting-Host), although there are admittedly some + implementation difficulties. A user name from an [RFC1413] server + and a host name from an inverse mapping of the address, perhaps + with a "full name" comment noting the origin of the information, + would be at least a first approximation: + + Sender: fred@zoo.toronto.edu (RFC-1413@reverse-lookup; + not verified) + + While this does not completely meet the specs, it comes a lot closer + than not having a Sender header at all. Even just supplying a + placeholder for the user name: + + Sender: somebody@zoo.toronto.edu (user name unknown) + + would be better than nothing. + +6.5. References + + The References header content lists message IDs of precursors: + + References-content = message-id *( space message-id ) + + A followup MUST have a References header, and an article that is not + a followup MUST NOT have a References header. The References-content + of a followup MUST be the precursor's References-content (if any) + followed by the precursor's message ID. + + NOTE: Use the See-Also header (Section 6.16) for interconnection + of articles that are not in a followup relationship to each other. + + NOTE: In retrospect, RFCs 850 and 1036, and the implementations + whose practice they represented, erred here. The proper MAIL + header to use for references to precursors is In-Reply-To, and the + References header is meant to be used for the purposes here + ascribed to See-Also. This incompatibility is far too solidly + + + +Spencer Historic [Page 48] + +RFC 1849 Son of 1036 March 2010 + + + established to be fixed, unfortunately. The best that can be done + is to provide a clear mapping between the two and urge gateways to + do the transformation. The news usage is (now) a deliberate + violation of the MAIL specifications; articles containing news + References headers are technically not valid MAIL messages, + although it is unlikely that much MAIL software will notice + because the incompatibility is at a subtle semantic level that + does not affect the syntax. + + UNRESOLVED ISSUE: Would it be better to just give up and admit + that news uses References for both purposes? + + UNRESOLVED ISSUE: Should the syntax be generalized to include URLs + as alternatives to message IDs? Perhaps not; too many things know + about References already. And non-articles can't be precursors of + articles, not really. + + Followup agents SHOULD NOT shorten References headers. If it is + absolutely necessary to shorten the header, as a desperate last + resort, a followup agent MAY do this by deleting some of the message + IDs. However, it MUST NOT delete the first message ID, the last + three message IDs (including that of the immediate precursor), or any + message ID mentioned in the body of the followup. If it is possible + for the followup agent to determine the Subject content of the + articles identified in the References header, it MUST NOT delete the + message ID of any article where the Subject content changed (other + than by prepending of a back reference). The followup agent MUST NOT + delete any message ID whose local part ends with "_-_" (underscore + (ASCII 95), hyphen (ASCII 45), underscore); followup agents are urged + to use this form to mark subject changes and to avoid using it + otherwise. + + NOTE: As software capable of exploiting References chains has + grown more common, the random shortening permitted by [RFC1036] + has become increasingly troublesome. ANY shortening is + undesirable, and software should do it only in cases of dire + necessity. In such cases, these rules attempt to limit the + damage. + + NOTE: The first message ID is very important as the starting point + of the "thread" of discussion and absolutely should not be + deleted. Keeping the last three message IDs gives thread- + following software a fighting chance to reconstruct a full thread + even if an article or two is missing. Keeping message IDs + mentioned in the body is obviously desirable. + + + + + + +Spencer Historic [Page 49] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: Subject changes are difficult to determine, but they are + significant as possible beginnings of new threads. The "_-_" + convention is provided so that posting agents (which have more + information about subjects) can flag articles containing a subject + change in a way that followup agents can detect without access to + the articles themselves. The sequence is chosen as one that is + fairly unlikely to occur by accident. + + UNRESOLVED ISSUE: Is "_-_" really worth having? + + When a References header is shortened, at least three blanks SHOULD + be left between adjacent message IDs at each point where deletions + were made. Software preparing new References headers SHOULD preserve + multiple blanks in older References content. + + NOTE: It's desirable to have some marker of where deletions + occurred, but the restricted syntax of the header makes this + difficult. Extra white space is not a very good marker, since it + may be deleted by software that ill-advisedly rewrites headers, + but at least it doesn't break existing software. + + To repeat: followup agents SHOULD NOT shorten References headers. + + NOTE: Unfortunately, reading agents and other software analyzing + References patterns have to be prepared for the worst anyway. The + worst includes random deletions and the possibility of circular + References chains (when References is misused in place of See-Also + (Section 6.16)). + +6.6. Control + + The Control header content marks the article as a control message and + specifies the desired actions (other than the usual ones of filing + and passing on the article): + + Control-content = verb *( space argument ) + verb = 1*( letter / digit ) + argument = 1*<ASCII printable character> + + The verb indicates what action should be taken, and the argument(s) + (if any) supply details. In some cases, the body of the article may + also contain details. Section 7 describes the standard verbs. See + also the Also-Control header (Section 6.15). + + NOTE: Control messages are often processed and filed rather + differently than normal articles. + + + + + +Spencer Historic [Page 50] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: The restriction of verbs to letters and digits is new but is + consistent with existing practice and potentially simplifies + implementation by avoiding characters significant to command + interpreters. Beware that the arguments are under no such + restriction in general. + + NOTE: Two other conventions for distinguishing control messages + from normal articles were formerly in use: a three-component + newsgroup name ending in ".ctl" or a subject beginning with + "cmsg " was considered to imply that the article was a control + message. These conventions are obsolete. Do not use them. + + An article with a Control header MUST NOT have an Also-Control or + Supersedes header. + +6.7. Distribution + + The Distribution header content specifies geographic or + organizational limits on an article's propagation: + + Distribution-content = distribution *( dist-delim distribution ) + dist-delim = "," + distribution = plain-component + + A distribution is syntactically identical to a one-component + newsgroup name and must satisfy the same rules and restrictions. In + the absence of Distribution, the default distribution is "world". + + NOTE: This syntax has the disadvantage of containing no white + space, making it impossible to continue a Distribution header + across several lines. Implementors of relayers and reading agents + are warned that it is intended that the successor to this Draft + will change the definition of dist delimiter to: + + dist-delim = "," [ space ] + + and are urged to fix their software to handle (i.e., ignore) white + space following the commas. + + A relayer MUST NOT pass an article to another relayer unless + configuration information specifies transmission to that other + relayer of BOTH (a) at least one of the article's newsgroup(s), and + (b) at least one of the article's distribution(s). In effect, the + only role of distributions is to limit propagation, by preventing + transmission of articles that would have been transmitted had the + decision been based solely on newsgroups. + + + + + +Spencer Historic [Page 51] + +RFC 1849 Son of 1036 March 2010 + + + A posting agent might wish to present a menu of possible + distributions, or suggest a default, but normally SHOULD NOT supply a + default without giving the poster a chance to override it. A + followup agent SHOULD initially supply the same Distribution header + as found in the precursor, although the poster MAY alter this if + appropriate. + + Despite the syntactic similarity and some historical confusion, + distributions are NOT newsgroup names. The whole point of putting a + distribution on an article is that it is DIFFERENT from the + newsgroup(s). In general, a meaningful distribution corresponds to + some sort of region of propagation: a geographical area, an + organization, or a cooperating subnet. + + NOTE: Distributions have historically suffered from the completely + uncontrolled nature of their name space, the lack of feedback to + posters on incomplete propagation resulting from use of random + trash in Distribution headers, and confusion with newsgroups + (arising partly because many regions and organizations DO have + internal newsgroups with names resembling their internal + distributions). This has resulted in much garbage in Distribution + headers, notably the pointless practice of automatically supplying + the first component of the newsgroup name as a distribution (which + is MOST unlikely to restrict propagation!). Many sites have opted + to maximize propagation of such ill-formed articles by essentially + ignoring distributions. This unfortunately interferes with + legitimate uses. The situation is bad enough that distributions + must be considered largely useless except within cooperating + subnets that make an organized effort to restrain propagation of + their internal distributions. + + NOTE: The distributions "world" and "local" have no standard magic + meaning (except that the former is the default distribution if + none is given). Some pieces of software do assign such meanings + to them. + +6.8. Keywords + + The Keywords header content is one or more phrases intended to + describe some aspect of the content of the article: + + Keywords-content = plain-phrase *( "," [ space ] plain-phrase ) + + Keywords, separated by commas, each follow the <plain-phrase> syntax + defined in Section 5.2. Encoded words in keywords MUST NOT contain + characters other than letters (of either case), digits, and the + characters "!", "*", "+", "-", "/", "=", and "_". + + + + +Spencer Historic [Page 52] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: Posters and posting agents are asked to take note that + keywords are separated by commas, not by white space. The + following Keywords header contains only one keyword (a rather + unlikely and improbable one): + + Keywords: Thompson Ritchie Multics Linux + + and should probably have been written: + + Keywords: Thompson, Ritchie, Multics, Linux + + This particular error is unfortunately rather widespread. + + NOTE: Reading agents and archivers preparing indexes of articles + should bear in mind that user-chosen keywords are notoriously poor + for indexing purposes unless the keywords are picked from a + predefined set (which they are not in this case). Also, some + followup agents unwisely propagate the Keywords header from the + precursor into the followup by default. At least one news-based + experiment has found the contents of Keywords headers to be + completely valueless for indexing. + +6.9. Summary + + The Summary header content is a short phrase summarizing the + article's content: + + Summary-content = nonblank-text + + As with the subject, no restriction is placed on the content since it + is intended solely for display to humans. + + NOTE: Reading agents should be aware that the Summary header is + often used as a sort of secondary Subject header, and (if present) + its contents should perhaps be displayed when the subject is + displayed. + + The summary SHOULD be terse. Posters SHOULD avoid trying to cram + their entire article into the headers; even the simplest query + usually benefits from a sentence or two of elaboration and context, + and not all reading agents display all headers. + +6.10. Approved + + The Approved header content indicates the mailing addresses (and + possibly the full names) of the persons or entities approving the + article for posting: + + + + +Spencer Historic [Page 53] + +RFC 1849 Son of 1036 March 2010 + + + Approved-content = From-content *( "," [ space ] From-content ) + + An Approved header is required in all postings to moderated + newsgroups; the presence or absence of this header allows a posting + agent to distinguish between articles posted by the moderator (which + are normal articles to be posted normally) and attempted + contributions by others (which should be mailed to the moderator for + approval). An Approved header is also required in certain control + messages, to reduce the probability of accidental posting of same; + see the relevant parts of Section 7. + + NOTE: There is, at present, no way to authenticate Approved + headers to ensure that the claimed approval really was bestowed. + Nor is there an established mechanism for even maintaining a list + of legitimate approvers (such a list would quickly become out of + date if it had to be maintained by hand). Such mechanisms, + presumably relying on cryptographic authentication, would be a + worthwhile extension to this Draft, and experimental work in this + area is encouraged. (The problem is harder than it sounds because + news is used on many systems that do not have real-time access to + key servers.) + + NOTE: Relayer implementors, please note well: it is the POSTING + AGENT that is authorized to distinguish between moderator postings + and attempted contributions, and to mail the latter to the + moderator. As discussed in Section 9.1, relayers MUST NOT, repeat + MUST NOT, send such mail; on receipt of an unApproved article in a + moderated newsgroup, they should discard the article, NOT + transform it into a mail message (except perhaps to a local + administrator). + + NOTE: [RFC1036] restricted Approved to a single From-content. + However, multiple moderation is no longer rare, and multi- + moderator Approved headers are already in use. + +6.11. Lines + + The Lines header content indicates the number of lines in the body of + the article: + + Lines-content = 1*digit + + The line count includes all body lines, including the signature (if + any) and including empty lines (if any) at the beginning or end of + the body. (The single empty separator line between the headers and + the body is not part of the body.) The "body" here is the body as + found in the posted article, AFTER all transformations such as MIME + encodings. + + + +Spencer Historic [Page 54] + +RFC 1849 Son of 1036 March 2010 + + + Reading agents SHOULD NOT rely on the presence of this header, since + it is optional (and some posting agents do not supply it). They MUST + NOT rely on it being precise, since it frequently is not. + + NOTE: The average line length in article bodies is surprisingly + consistent at about 40 characters, and since the line count + typically is used only for approximate judgements ("is this too + long to read quickly?"), dividing the byte count of the body by 40 + gives an estimate of the body line count that is adequate for + normal use. This estimate is NOT adequate if the body has been + MIME encoded, but neither is the Lines header: at least one major + relayer will add a Lines header to an article that lacks one, + without considering the possibility of MIME encodings when + computing the line count. + + NOTE: It would be better to have a Content-Size header as part of + MIME, so that body parts could have their own sizes, and so that + the units used could be appropriate to the data type (line count + is not a useful measure of the size of an encoded image, for + example). Doing this is preferable to trying to fix Lines. + + UNRESOLVED ISSUE: Update on Content-Size? + + Relayers SHOULD discard this header if they find it necessary to + re-encode the article in such a way that the original Lines header + would be rendered incorrect. + +6.12. Xref + + The Xref header content indicates where an article was filed by the + last relayer to process it: + + Xref-content = relayer 1*( space location ) + relayer = relayer-name + location = newsgroup-name ":" article-locator + article-locator = 1*<ASCII printable character> + + The relayer's name is included so that software can determine which + relayer generated the header (and specifically, whether it really was + the one that filed the copy being examined). The locations specify + what newsgroups the article was filed under (which may differ from + those in the Newsgroups header) and where it was filed under them. + The exact form of an article locator is implementation-specific. + + NOTE: Reading agents can exploit this information to avoid + presenting the same article to a reader several times. The + information is sometimes available in system databases, but having + it in the article is convenient. Relayers traditionally generate + + + +Spencer Historic [Page 55] + +RFC 1849 Son of 1036 March 2010 + + + an Xref header only if the article is cross-posted, but this is + not mandatory, and there is at least one new application + ("mirroring": keeping news databases on two hosts identical) where + the header is useful in all articles. + + NOTE: The traditional form of an article locator is a decimal + number, with articles in each newsgroup numbered consecutively + starting from 1. NNTP [RFC977] demands that such a model be + provided, and there may be other software that expects it, but it + seems desirable to permit flexibility for unorthodox + implementations. + + A relayer inserting an Xref header into an article MUST delete any + previous Xref header. A relayer that is not inserting its own Xref + header SHOULD delete any previous Xref header. A relayer MAY delete + the Xref header when passing an article on to another relayer. + + NOTE: [RFC1036] specified that the Xref header was not transmitted + when an article was passed to another relayer, but the major news + implementations have never obeyed this rule, and applications like + mirroring depend on this disobedience. + + A relayer MUST use the same name in Xref headers as it uses in Path + headers. Reading agents MUST ignore an Xref header containing a + relayer name that differs from the one that begins the path list. + +6.13. Organization + + The Organization header content is a short phrase identifying the + poster's organization: + + Organization-content = nonblank-text + + This header is typically supplied by the posting agent. The + Organization content SHOULD mention geographical location (e.g., city + and country) when it is not obvious from the organization's name. + + NOTE: The motive here is that the organization is often difficult + to guess from the mailing address, is not always supplied in a + signature, and can help identify the poster to the reader. + + NOTE: There is no "s" in "Organization". + + The Organization content is provided for identification only and does + not imply that the poster speaks for the organization or that the + article represents organization policy. Posting agents SHOULD permit + the poster to override a local default Organization header. + + + + +Spencer Historic [Page 56] + +RFC 1849 Son of 1036 March 2010 + + +6.14. Supersedes + + The Supersedes header content specifies articles to be cancelled on + arrival of this one: + + Supersedes-content = message-id *( space message-id ) + + Supersedes is equivalent to Also-Control (Section 6.15) with an + implicit verb of "cancel" (Section 7.1). + + NOTE: Supersedes is normally used where the article is an updated + version of the one(s) being cancelled. + + NOTE: Although the ability to use multiple message IDs in + Supersedes is highly desirable (see Section 7.1), posters are + warned that existing implementations often do not correctly handle + more than one. + + NOTE: There is no "c" in "Supersedes". + + An article with a Supersedes header MUST NOT have an Also-Control or + Control header. + +6.15. Also-Control + + The Also-Control header content marks the article as being a control + message IN ADDITION to being a normal news article and specifies the + desired actions: + + Also-Control-content = Control-content + + An article with an Also-Control header is filed and passed on + normally, but the content of the Also-Control header is processed as + if it were found in a Control header. + + NOTE: It is sometimes desirable to piggyback control actions on a + normal article, so that the article will be filed normally but + will also be acted on as a control message. This header is + essentially a generalization of Supersedes. + + NOTE: Be warned that some old relayers do not implement + Also-Control. + + An article with an Also-Control header MUST NOT have a Control or + Supersedes header. + + + + + + +Spencer Historic [Page 57] + +RFC 1849 Son of 1036 March 2010 + + +6.16. See-Also + + The See-Also header content lists message IDs of articles that are + related to this one but are not its precursors: + + See-Also-content = message-id *( space message-id ) + + See-Also resembles References, but without the restrictions imposed + on References by the followup rules. + + NOTE: See-Also provides a way to group related articles, such as + the parts of a single document that had to be split across + multiple articles due to its size, or to cross-reference between + parallel threads. + + NOTE: See the discussion (in Section 6.5) on MAIL compatibility + issues of References and See-Also. + + NOTE: In the specific case where it is desired to essentially make + another article PART of the current one, e.g., for annotation of + the other article, MIME's "message/external-body" convention can + be used to do so without actual inclusion. "news-message-ID" was + registered as a standard external-body access method, with a + mandatory NAME parameter giving the message ID and an optional + SITE parameter suggesting an NNTP site that might have the article + available (if it is not available locally), by IANA 22 June 1993. + + UNRESOLVED ISSUE: Could the syntax be generalized to include URLs + as alternatives to message IDs? Here it makes much more sense + than in References. + +6.17. Article-Names + + The Article-Names header content indicates any special significance + the article may have in particular newsgroups: + + Article-Names-content = 1*( name-clause space ) + name-clause = newsgroup-name ":" article-name + article-name = letter 1*( letter / digit / "-" ) + + Each name clause specifies a newsgroup (which SHOULD be among those + in the Newsgroups header) and an article name local to that + newsgroup. Article names MAY be used by relayers to file the article + in special ways, or they MAY just be noted for possible special + attention by reading agents. Article names are case-sensitive. + + + + + + +Spencer Historic [Page 58] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: This header provides a way to mark special postings, such as + introductions, frequently-asked-question lists, etc., so that + reading agents have a way of finding them automatically. The + newsgroup name is specified for each article name because the + names may be newsgroup-specific; for example, many frequently- + asked-question lists are posted to "news.answers" in addition to + their "home" newsgroup, and they would not be known by the same + name(s) in both newsgroups. + + The Article-Names header SHOULD be ignored unless the article also + contains an Approved header. + + NOTE: This stipulation is made in anticipation of the possibility + that Approved headers will be involved in cryptographic + authentication. + + The presence of an Article-Names header does not necessarily imply + that the article will be retained unusually long before expiration, + or that previous article(s) with similar Article-Names headers will + be cancelled by its arrival. Posters preparing special postings + SHOULD include appropriate other headers, such as Expires and + Supersedes, to request such actions. + + Different networks MAY establish different sets of article names for + the special postings they deem significant; it is preferable for + usage to be standardized within networks, although it might be + desirable for individual newsgroups to have different naming + conventions in some situations. Article names MUST be 14 characters + or less. The following names are suggested but are not mandatory: + + intro Introduction to the newsgroup for newcomers. + + charter Charter, rules, organization, moderation policies, etc. + + background Biographies of special participants, history of the + newsgroup, notes on related newsgroups, etc. + + subgroups Descriptions of sub-newsgroups under this newsgroup, + e.g., "sci.space.news" under "sci.space". + + facts Information relating to the purpose of the newsgroup, + e.g., an acronym glossary in "sci.space". + + references Where to get more information: books, journals, FTP + repositories, etc. + + faq Answers to frequently asked questions. + + + + +Spencer Historic [Page 59] + +RFC 1849 Son of 1036 March 2010 + + + menu If present, a list of all of the other article names + local to this newsgroup, with brief descriptions of their + contents. + + Such articles may be divided into subsections using the MIME + "multipart/mixed" conventions. If size considerations make it + necessary to split such articles, names ending in a hyphen and a part + number are suggested; for example, a three-part frequently-asked- + questions list could have article names "faq-1", "faq-2", and + "faq-3". + + NOTE: It is somewhat premature to attempt to standardize article + names, since this is essentially a new feature with no experience + behind it. However, if reading agents are to attach special + significance to these names, some attempt at standard conventions + is imperative. This is a first attempt at providing some. + +6.18. Article-Updates + + The Article-Updates header content indicates what previous articles + this one is deemed (by the poster) to update (i.e., replace): + + Article-Updates-content = message-id *( space message-id ) + + Each message ID identifies a previous article that this one is deemed + to update. This MUST NOT cause the previous article(s) to be + cancelled or otherwise altered, unless this is implied by other + headers (e.g., Supersedes); Article-Updates is merely an advisory + that MAY be noted for special attention by reading agents. + + NOTE: This header provides a way to mark articles that are only + minor updates of previous ones, containing no significant new + information and not worth reading if the previous ones have been + read. + + NOTE: If suitable conventions using MIME multipart bodies and the + "message/external-body" body-part type can be developed, a + replacing article might contain only differences between the old + text and the new text, rather than a complete new copy. This is + the motivation for not making Article-Updates also function as + Supersedes does: the replacing article might depend on the + continued presence of the replaced article. + +7. Control Messages + + The following sections document the currently defined control + messages. "Message" is used herein as a synonym for "article" unless + context indicates otherwise. + + + +Spencer Historic [Page 60] + +RFC 1849 Son of 1036 March 2010 + + + Posting agents are warned that since certain control messages require + article bodies in quite specific formats, signatures SHOULD NOT be + appended to such articles, and it may be wise to take greater care + than usual to avoid unintended (although perhaps well-meaning) + alterations to text supplied by the poster. Relayers MUST assume + that control messages mean what they say; they MAY be obeyed as is or + rejected, but MUST NOT be reinterpreted. + + The execution of the actions requested by control messages is subject + to local administrative restrictions, which MAY deny requests or + refer them to an administrator for approval. The descriptions below + are generally phrased in terms suggesting mandatory actions, but any + or all of these MAY be subject to local administrative approval + (either as a class or case-by-case). Analogously, where the + description below specifies that a message or portion thereof is to + be ignored, this action MAY include reporting it to an administrator. + + NOTE: The exact choice of local action might depend on what action + the control message requests, who it claims to come from, etc. + + Relayers MUST propagate even control messages they do not understand. + + In the following sections, each type of control message is defined + syntactically by defining its arguments and its body. For example, + "cancel" is defined by defining cancel-arguments and cancel-body. + +7.1. cancel + + The cancel message requests that one or more previous articles be + "cancelled": + + cancel-arguments = message-id *( space message-id ) + cancel-body = body + + The argument(s) identify the articles to be cancelled, by message ID. + The body is a comment, which software MUST ignore, and SHOULD contain + an indication of why the cancellation was requested. The cancel + message SHOULD be posted to the same newsgroup(s), with the same + distribution(s), as the article(s) it is attempting to cancel. + + NOTE: Using the same newsgroups and distributions maximizes the + chances of the cancel message propagating everywhere the target + articles went. + + NOTE: [RFC1036] permitted only a single message-id in a cancel + message. Support for cancelling multiple articles is highly + desirable, especially for use with Supersedes (see Section 6.14). + If several revisions of an article appear in fast succession, each + + + +Spencer Historic [Page 61] + +RFC 1849 Son of 1036 March 2010 + + + using Supersedes to cancel the previous one, it is possible for a + middle revision to be destroyed by cancellation before it is + propagated onward to cancel its predecessor. Allowing each + article to cancel several predecessors greatly alleviates this + problem. (Posting agents preparing a cancel of an article that + itself cancels other articles might wish to add those articles to + the cancel-arguments.) However, posters should be aware that much + old software does not implement multiple cancellation properly and + should avoid using it when reliable cancellation is vitally + important. + + When an article (the "target article") is to be cancelled, there are + four cases of interest: the article hasn't arrived yet, it has + arrived and been filed and is available for reading, it has expired + and been archived on some less-accessible storage medium, or it has + expired and been deleted. The next few paragraphs discuss each case + in turn (in reverse order, which is convenient for the explanation). + + EXPIRED AND DELETED. Take no action. + + EXPIRED AND ARCHIVED. If the article is readily accessible and can + be deleted or made unreadable easily, treat as under AVAILABLE below. + Otherwise, treat as under EXPIRED AND DELETED. + + NOTE: While it is desirable for archived articles to be + cancellable, this can easily involve rewriting an entire archive + volume just to get rid of one article, perhaps with manual actions + required to arrange it. It is difficult to envision a situation + so dire as to require such measures from hundreds or thousands of + administrators, or for that matter one in which widespread + compliance with such a request is likely. + + AVAILABLE. Compare the mailing addresses from the From lines of the + cancel message and the target article, bearing in mind that local + parts (except for "postmaster") are case-sensitive and domains are + case-insensitive. If they do not match, either refer the issue to an + administrator for a case-by-case decision, or treat as if they + matched. + + NOTE: It is generally trivial to forge articles, so nothing short + of cryptographic authentication is really adequate to ensure that + a cancel came from the original article's author. Moreover, it is + highly desirable to permit authorities other than the author to + cancel articles, to allow for cases in which the author is + unavailable, uncooperative, or malicious, and in which damage + and/or legal problems may be minimized by prompt cancellation. + + + + + +Spencer Historic [Page 62] + +RFC 1849 Son of 1036 March 2010 + + + Reliable authentication that would permit such administrative + cancels would be a worthwhile extension to this Draft, and + experimental work in this area is encouraged. + + NOTE: Meanwhile, a simple check of addresses is useful accident + prevention and catches at least the most simple-minded forgers. + Since the intent is accident prevention rather than ironclad + security, use of the From address is appropriate, all the more so + because in the presence of gateways (especially redundant multiple + gateways), the author may not have full control over Sender + headers. + + NOTE: The "refer... or treat as if they matched" rule is intended + to specifically forbid quietly ignoring cancels with mismatched + addresses. + + If the addresses match, then if technically possible, the relayer + MUST delete the target article completely and immediately. Failing + that, it MUST make the target article unreadable (preferably to + everyone, minimally to everyone but the administrator) and either + arrange for it to be deleted as soon as possible or notify an + administrator at once. + + NOTE: To allow for events such as criminal actions, malicious + forgeries, and copyright infringements, where damage and/or legal + problems may be minimized by prompt cancellation, complete removal + is strongly preferred over merely making the target article + unreadable. The potential for malice is outweighed by the + importance of really getting rid of the target article in some + legitimate cases. (In cases of inadvertent copyright violation in + particular, the ability to quickly remedy the violation is of + considerable legal importance.) Failing that, making it + unreadable is better than nothing. + + NOTE: Merely annotating the article so that readers see an + indication that the author wanted it cancelled is not acceptable. + Making the article unreadable is the minimum action. + + NOTE: There have been experiments with making cancelled articles + unreadable, so that local news administrators could reverse + cancellations. In practice, administrators almost never find + cause to do so. Removal appears to be clearly preferable where + technically feasible. + + + + + + + + +Spencer Historic [Page 63] + +RFC 1849 Son of 1036 March 2010 + + + NOT ARRIVED YET. If practical, retain the cancel message until the + target article does arrive, or until there is no further possibility + of it arriving and being accepted (see Section 9.2), and then treat + as under AVAILABLE. Failing that, arrange for the target article to + be rejected and discarded if it does arrive. + + NOTE: It may well be impractical to retain the control message, + given uncertainty about whether the target article will ever + arrive. Existing practice in such cases is to assume that + addresses would match and arrange the equivalent of deletion. + This is often done by making a spurious entry in a database of + already-seen message IDs (see Section 9.3), so that if the article + does arrive, it will be rejected as a duplicate. + + The cancel message MUST be propagated onward in the usual fashion, + regardless of which of the four cases applied, so that the target + article will be cancelled everywhere even if cancellation and target + article follow different routes. + + NOTE: [RFC1036] appeared to require stopping cancel propagation in + the NOT ARRIVED YET case, although the wording was somewhat + unclear. This appears to have been an unwise decision; there are + known cases of important cancellations (in situations of + inadvertent copyright violation, for example) achieving rather + poorer propagation than the target article. News propagation is + often a much less orderly process than the authors of [RFC1036] + apparently envisioned. Modern implementations generally propagate + the cancellation regardless. + + Posting agents meant for use by ordinary posters SHOULD reject an + attempt to post a cancel message if the target article is available + and the mailing address in its From header does not match the one in + the cancel message's From header. + + NOTE: This, again, is primarily accident prevention. + +7.2. ihave, sendme + + The ihave and sendme control messages implement a crude batched + predecessor of the NNTP [RFC977] protocol. They are largely obsolete + in the Internet but still see use in the UUCP environment, especially + for backup feeds that normally are active only when a primary feed + path has failed. + + NOTE: The ihave and sendme messages defined here have ABSOLUTELY + NOTHING TO DO WITH NNTP, despite similarities of terminology. + + + + + +Spencer Historic [Page 64] + +RFC 1849 Son of 1036 March 2010 + + + The two messages share the same syntax: + + ihave-arguments = *( message-id space ) relayer-name + sendme-arguments = ihave-arguments + ihave-body = *( message-id eol ) + sendme-body = ihave-body + + Message IDs MUST appear in either the arguments or the body, but not + both. Relayers SHOULD generate the form putting message IDs in the + body, but the other form MUST be supported for backward + compatibility. + + NOTE: [RFC1036] made the relayer name optional, but difficulties + could easily ensue in determining the origin of the message, and + this option is believed to be unused nowadays. Putting the + message IDs in the body is strongly preferred over putting them in + the arguments because it lends itself much better to large numbers + of message IDs and avoids the empty-body problem mentioned in + Section 4.3.1. + + The ihave message states that the named relayer has filed articles + with the specified message IDs, which may be of interest to the + relayer(s) receiving the ihave message. The sendme message requests + that the relayer receiving it send the articles having the specified + message IDs to the named relayer. + + These control messages are normally sent essentially as point-to- + point messages, by using "to." newsgroups (see Section 5.5) that are + sent only to the relayer for which the messages are intended. The + two relayers MUST be neighbors, exchanging news directly with each + other. Each relayer advertises its new arrivals to the other using + ihave messages, and each uses sendme messages to request the articles + it lacks. + + NOTE: Arguably these point-to-point control messages should flow + by some other protocol, e.g., mail, but administrative and + interfacing issues are simplified if the news system doesn't need + to talk to the mail system. + + To reduce overhead, ihave and sendme messages SHOULD be sent + relatively infrequently and SHOULD contain substantial numbers of + message IDs. If ihave and sendme are being used to implement a + backup feed, it may be desirable to insert a delay between reception + of an ihave and generation of a sendme, so that a slightly slow + primary feed will not cause large numbers of articles to be requested + unnecessarily via sendme. + + + + + +Spencer Historic [Page 65] + +RFC 1849 Son of 1036 March 2010 + + +7.3. newgroup + + The newgroup control message requests that a new newsgroup be + created: + + newgroup-arguments = newsgroup-name [ space moderation ] + moderation = "moderated" / "unmoderated" + newgroup-body = body + / [ body ] descriptor [ body ] + descriptor = descriptor-tag eol description-line eol + descriptor-tag = "For your newsgroups file:" + description-line = newsgroup-name space description + description = nonblank-text [ " (Moderated)" ] + + The first argument names the newsgroup to be created, and the second + one (if present) indicates whether it is moderated. If there is no + second argument, the default is "unmoderated". + + NOTE: Implementors are warned that there is occasional use of + other forms in the second argument. It is suggested that such + violations of this Draft, which are also violations of [RFC1036], + cause the newgroup message to be ignored. [RFC1036] was slightly + vague about how second arguments other than "moderated" were to be + treated (specifically, whether they were illegal or just ignored), + but it is thought that all existing major implementations will + handle "unmoderated" correctly, and it appears desirable to + tighten up the specs to make it possible for other forms to be + used in future. + + The body is a comment, which software MUST ignore, except that if it + contains a descriptor, the description line is intended to be + suitable for addition to a list of newsgroup descriptions. The + description cannot be continued onto later lines but is not + constrained to any particular length. Moderated newsgroups have + descriptions that end with the string " (Moderated)" (note that this + string begins with a blank). + + NOTE: It is unfortunate that the description line is part of the + body, rather than being supplied in a header, but this is + established practice. Newsgroup creators are cautioned that the + descriptor tag must be reproduced exactly as given above, must be + alone on a line, and that it is case-sensitive. (To reduce errors + in this regard, posting agents might wish to question or reject + newgroup messages that do not contain a descriptor.) Given the + desire for short lines, description writers should avoid content- + free phrases like "discussion of" and "news about", and stick to + defining what the newsgroup is about. + + + + +Spencer Historic [Page 66] + +RFC 1849 Son of 1036 March 2010 + + + The remainder of the body SHOULD contain an explanation of the + purpose of the newsgroup and the decision to create it. + + NOTE: Criteria for newsgroup creation vary widely and are outside + the scope of this Draft, but if formal procedures of one kind or + another were followed in the decision, the body should mention + this. Administrators often look for such information when + deciding whether to comply with creation/deletion requests. + + A newgroup message that lacks an Approved header MUST be ignored. + + NOTE: It would also be desirable to ignore a newgroup message + unless its Approved header names a person who is authorized (in + some sense) to create such a newsgroup. A cooperating subnet with + sufficiently strong coordination to maintain a correct and current + list of authorized creators might wish to do so for its internal + newsgroups. It also (or alternatively) might wish to ignore a + newgroup message for an internal newsgroup that was posted (or + cross-posted) to a non-internal newsgroup. + + NOTE: As mentioned in Section 6.10, some form of (cryptographic?) + authentication of Approved headers would be highly desirable, + especially for control messages. + + It would be desirable to provide some way of supplying a moderator's + address in a newgroup message for a moderated newsgroup, but this + will cause problems unless effective authentication is available, so + it is left for future work. + + NOTE: This leaves news administrators stuck with the annoying + chore of arranging proper mailing of moderated-newsgroup + submissions. On Usenet, this can be simplified by exploiting a + forwarding facility that some major sites provide: they maintain + forwarding addresses, each the name of a moderated newsgroup with + all periods (".", ASCII 46) replaced by hyphens ("-", ASCII 45), + which forward mail to the current newsgroup moderators. More + advice on the subject of forwarding to moderators can be found in + the document titled "How to Construct the Mailpaths File", posted + regularly to the Usenet newsgroups news.lists, news.admin.misc, + and news.answers. + + A newgroup message naming a newsgroup that already exists is + requesting a change in the moderation status or description of the + newsgroup. The same rules apply. + + + + + + + +Spencer Historic [Page 67] + +RFC 1849 Son of 1036 March 2010 + + +7.4. rmgroup + + The rmgroup message requests that a newsgroup be deleted: + + rmgroup-arguments = newsgroup-name + rmgroup-body = body + + The sole argument is the newsgroup name. The body is a comment, + which software MUST ignore; it SHOULD contain an explanation of the + decision to delete the newsgroup. + + NOTE: Criteria for newsgroup deletion vary widely and are outside + the scope of this Draft, but if formal procedures of one kind or + another were followed in the decision, the body should mention + this. Administrators often look for such information when + deciding whether to comply with creation/deletion requests. + + A rmgroup message that lacks an Approved header MUST be ignored. + + NOTE: It would also be desirable to ignore a rmgroup message + unless its Approved header names a person who is authorized (in + some sense) to delete such a newsgroup. A cooperating subnet with + sufficiently strong coordination to maintain a correct and current + list of authorized deleters might wish to do so for its internal + newsgroups. It also (or alternatively) might wish to ignore a + rmgroup message for an internal newsgroup that was posted (or + cross-posted) to a non-internal newsgroup. + + Unexpected deletion of a newsgroup being a disruptive action, + implementations are strongly advised to refer rmgroup messages to an + administrator by default, unless perhaps the message can be + determined to have originated within a cooperating subnet whose + members are considered trustworthy. Abuses have occurred. + +7.5. sendsys, version, whogets + + The sendsys message requests that a description of the relayer's news + feeds to other relayers be mailed to the article's reply address: + + sendsys-arguments = [ relayer-name ] + sendsys-body = body + + If there is an argument, relayers other than the one named by the + argument MUST NOT respond. The body is a comment, which software + MUST ignore; it SHOULD contain an explanation of the reason for the + request. + + + + + +Spencer Historic [Page 68] + +RFC 1849 Son of 1036 March 2010 + + + The version message requests that the name and version of the relayer + software be mailed to the reply address: + + version-arguments = + version-body = body + + There are no arguments. The body is a comment, which software MUST + ignore; it SHOULD contain an explanation of the reason for the + request. + + The whogets message requests that a description of the relayer and + its news feeds to other relayers be mailed to the article's reply + address: + + whogets-arguments = newsgroup-name [ space relayer-name ] + whogets-body = body + + The first argument is the name of the "target newsgroup", specifying + the newsgroup for which propagation information is desired. This + MUST be a complete newsgroup name, not the name of a hierarchy or a + portion of a newsgroup name that is not itself the name of a + newsgroup. If there is a second argument, only the relayer named by + that argument should respond. The body is a comment, which software + MUST ignore; it SHOULD contain an explanation of the reason for the + request. + + NOTE: Whogets is intended as a replacement for sendsys (and + version) with a precisely specified reply format. Since the + syntax for specifying what newsgroups get sent to what other + relayers varies widely between different forms of relayer + software, the only practical way to standardize the reply format + is to indicate a specific newsgroup and ask where THAT newsgroup + propagates. The requirement that it be a complete newsgroup name + is intended to (largely) avoid the problem of having to answer + "yes and no" in cases where not all newsgroups in a hierarchy are + sent. + + Any of these messages lacking an Approved header MUST be ignored. + Response to any of these messages SHOULD be delayed for at least + 24 hours, and no response should be attempted if the message has been + cancelled in that time. Also, no response SHOULD be attempted unless + the local part of the destination address is "newsmap". News + administrators SHOULD arrange for mail to "newsmap" on their systems + to be discarded (without reply) unless legitimate use is in progress. + + NOTE: Because these messages can cause many, many relayers to send + mail to one person, such messages, specifying mailing to an + innocent person's mailbox, have been forged as a half-witted + + + +Spencer Historic [Page 69] + +RFC 1849 Son of 1036 March 2010 + + + practical joke. A delay gives administrators time to notice a + fraudulent message and act (by cancelling the message, preparing + to divert the flood of mail into the bit bucket, or both). + Restriction of the destination address to "newsmap" reduces the + appeal of fraud by making it impossible to use it to harass a + normal user. (A site that does NOT discard mail to "newsmap", but + rather bounces it back, may incur higher communications costs than + if the mail had been accepted into a user's mailbox, but a + malicious forger could accomplish this anyway, by using an address + whose local part is very unlikely to be a legitimate mailbox + name.) + + NOTE: [RFC1036] did not require the Approved header for these + control messages. This has been added because of the possibility + that cryptographic authentication of Approved headers will become + available. + + The body of the reply to a sendsys message SHOULD be of the form: + + sendsys-reply = responder 1*sys-line + responder = "Responding-System:" space domain eol + sys-line = relayer-name ":" newsgroup-patterns + [ ":" text ] eol + newsgroup-patterns = newsgroup-name *( "," newsgroup-name ) + + The first line identifies the responding system, using a syntax + resembling a header (but note that it is part of the BODY). + Remaining lines indicate what newsgroups are sent to what other + systems. The syntax of newsgroup patterns is not well standardized; + the form described is common (often with newsgroup names only + partially given, denoting all names starting with a particular set of + components) but not universal. The whogets message provides a + better-defined alternative. + + The reply to a version message is of somewhat ill-defined form, with + a body normally consisting of a single line of text that somehow + describes the version of the relayer software. The whogets message + provides a better-defined alternative. + + + + + + + + + + + + + +Spencer Historic [Page 70] + +RFC 1849 Son of 1036 March 2010 + + + The body of the reply to a whogets message MUST be of the form: + + whogets-reply = responder-domain responder-relayer + response-date responding-to arrived-via + responder-version whogets-delimiter + *pass-line + responder-domain = "Responding-System:" space domain eol + responder-relayer = "Responding-Relayer:" space relayer-name eol + response-date = "Response-Date:" space date eol + responding-to = "Responding-To:" space message-id eol + arrived-via = "Arrived-Via:" path-list eol + responder-version = "Responding-Version:" space nonblank-text eol + whogets-delimiter = eol + pass-line = relayer-name [ space domain ] eol + + The first six lines identify the responding relayer by its Internet + domain name (use of the ".uucp" and ".bitnet" pseudo-domains is + permissible, for registered hosts in them, but discouraged) and its + relayer name; specify the date when the reply was generated and the + message ID of the whogets message being replied to; give the path + list (from the Path header) of the whogets message (which MAY, if + absolutely necessary, be truncated to a convenient length, but MUST + contain at least the leading three relayer names); and indicate the + version of relayer software responding. Note that these lines are + part of the BODY even though their format resembles that of headers. + Despite the apparently fixed order specified by the syntax above, + they can appear in any order, but there must be exactly one of each. + + After those preliminaries, and an empty line to unambiguously define + their end, the remaining lines are the relayer names (which MAY be + accompanied by the corresponding domain names, if known) of systems + to which the responding system passes the target newsgroup. Only the + names of news relayers are to be included. + + NOTE: It is desirable for a reply to identify its source by both + domain name and relayer name because news propagation is governed + by the latter but location in a broader context is best determined + by the former. The date and whogets message ID should, in + principle, be present in the MAIL headers but are included in the + body for robustness in the presence of uncooperative mail systems. + The reason for the path list is discussed below. Adding version + information eliminates the need for a separate message to gather + it. + + + + + + + + +Spencer Historic [Page 71] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: The limitation of pass lines to contain only names of news + relayers is meant to exclude names used within a single host (as + identifiers for mail gateways, portions of ihave/sendme + implementations, etc.), which do not actually refer to other + hosts. + + A relayer that is unaware of the existence of the target newsgroup + MUST NOT reply to a whogets message at all, although this MUST NOT + influence decisions on whether to pass the article on to other + relayers. + + NOTE: While this may result in discontinuous maps in cases where + some hosts have not honored requests for creation of a newsgroup, + it will also prevent a flood of useless responses in the event + that a whogets message intended to map a small region "leaks" out + to a larger one. The possibility of discontinuous recognition of + a newsgroup does make it important that the whogets message itself + continue to propagate (if other criteria permit). This is also + the reason for the inclusion of the whogets message's path list, + or at least the leading portion of it, in the reply: to permit + reconstruction of at least small gaps in maps. + + Different networks set different rules for the legitimacy of these + messages, given that they may reveal details of organization-internal + topology that are sometimes considered proprietary. + + NOTE: On Usenet, in particular, willingness to respond to these + messages is held to be a condition of network membership: the + topology of Usenet is public information. Organizations wishing + to belong to such networks while keeping their internal topology + confidential might wish to organize their internal news software + so that all articles reaching outsiders appear to be from a single + "gatekeeper" system, with the details of internal topology hidden + behind that system. + + UNRESOLVED ISSUE: It might be useful to have a way to set some + sort of hop limit for these. + + + + + + + + + + + + + + +Spencer Historic [Page 72] + +RFC 1849 Son of 1036 March 2010 + + +7.6. checkgroups + + The checkgroups control message contains a supposedly authoritative + list of the valid newsgroups within some subset of the newsgroup name + space: + + checkgroups-arguments = + checkgroups-body = [ invalidation ] valid-groups + / invalidation + invalidation = "!" plain-component + *( "," plain-component ) eol + valid-groups = 1*( description-line eol ) + + There are no arguments. The body lines (except possibly for an + initial invalidation) each contain a description line for a + newsgroup, as defined under the newgroup message (Section 7.3). + + NOTE: Some other, ill-defined, forms of the checkgroups body were + formerly used. See Appendix A. + + The checkgroups message applies to all hierarchies containing any of + the newsgroups listed in the body. The checkgroups message asserts + that the newsgroups it lists are the only newsgroups in those + hierarchies. If there is an invalidation, it asserts that the + hierarchies it names no longer contain any newsgroups. + + Processing a checkgroups message MAY cause a local list of newsgroup + descriptions to be updated. It SHOULD also cause the local lists of + newsgroups (and their moderation statuses) in the mentioned + hierarchies to be checked against the message. The results of the + check MAY be used for automatic corrective action or MAY be reported + to the news administrator in some way. + + NOTE: Automatically updating descriptions of existing newsgroups + is relatively safe. In the case of newsgroup additions or + deletions, simply notifying the administrator is generally the + wisest action, unless perhaps the message can be determined to + have originated within a cooperating subnet whose members are + considered trustworthy. + + NOTE: There is a problem with the checkgroups concept: not all + newsgroups in a hierarchy necessarily propagate to the same set of + machines. (Notably, there is a set of newsgroups known as the + "inet" newsgroups, which have relatively limited distribution but + coexist in several hierarchies with more widely distributed + newsgroups.) The advice of checkgroups should always be taken + with a grain of salt and should never be followed blindly. + + + + +Spencer Historic [Page 73] + +RFC 1849 Son of 1036 March 2010 + + +8. Transmission Formats + + While this Draft does not specify transmission methods, except to + place a few constraints on them, there are some data formats used + only for transmission that are unique to news. + +8.1. Batches + + For efficient bulk transmission and processing of news articles, it + is often desirable to transmit a number of them as a single block of + data, i.e., a "batch". The format of a batch is: + + batch = 1*( batch-header article ) + batch-header = "#! rnews " article-size eol + article-size = 1*digit + + A batch is a sequence of articles, each prefixed by a header line + that includes its size. The article size is a decimal count of the + octets in the article, counting each EOL as one octet regardless of + how it is actually represented. + + NOTE: A relayer might wish to accept either a single article or a + batch as input. Since "#" cannot appear in a header name, + examination of the first octet of the input will reveal its + nature. + + NOTE: In the header line, there is exactly one blank before + "rnews", there is exactly one blank after "rnews", and the EOL + immediately follows the article size. Beware that some software + inserts non-standard trash after the size. + + NOTE: Despite the similarity of this format to the executable- + script format used by some operating systems, it is EXTREMELY + unwise to just feed incoming batches to a command interpreter in + the anticipation that it will run a command named "rnews" to + process the batch. Unless arrangements are made to very tightly + restrict the range of commands that can be executed by this means, + the security implications are disastrous. + + + + + + + + + + + + + +Spencer Historic [Page 74] + +RFC 1849 Son of 1036 March 2010 + + +8.2. Encoded Batches + + When transmitting news, especially over communications links that are + slow or are billed by the bit, it is often desirable to batch news + and apply data compression to the batches. Transmission links + sending compressed batches SHOULD use out-of-band means of + communication to specify the compression algorithm being used. If + there is no way to send out-of-band information along with a batch, + the following encapsulation for a compressed batch MAY be used: + + ec-batch = "#! " compression-keyword eol + compressed-batch + compression-keyword = "cunbatch" + + A line containing a keyword indicating the type of compression is + followed by the compressed batch. The only truly widespread + compression keyword at present is "cunbatch", indicating compression + using the widely distributed "compress" program. Other compression + keywords MAY be used by mutual agreement between the hosts involved. + + NOTE: An encapsulated compressed batch is NOT, in general, a text + file, despite having an initial text line. This combination of + text and non-text data is often awkward to handle; for example, + standard decompression programs cannot be used without first + stripping off the initial line, and that in turn is painful to do + because many text-handling tools that are superficially suited to + the job do not cope well with non-text data, hence the + recommendation that out-of-band communication be used instead when + possible. + + NOTE: For UUCP transmission, where a batch is typically + transmitted by invoking the remote command "rnews" with the batch + as its input stream, a plausible out-of-band method for indicating + a compression type would be to give a compression keyword in an + option to "rnews", perhaps in the form: + + rnews -d decompressor + + where "decompressor" is the name of a decompression program (e.g., + "uncompress" for a batch compressed with "compress" or "gunzip" + for a batch compressed with "gzip"). How this decompression + program is located and invoked by the receiving relayer is + implementation-specific. + + NOTE: See the notes in Section 8.1 on the inadvisability of + feeding batches directly to command interpreters. + + + + + +Spencer Historic [Page 75] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: There is exactly one blank between "#!" and the compression + keyword, and the EOL immediately follows the keyword. + +8.3. News within Mail + + It is often desirable to transmit news as mail, either for the + convenience of a human recipient or because that is the only type of + transmission available on a restrictive communication path. + + Given the similarity between the news format and the MAIL format, it + is superficially attractive to just send the news article as a mail + message. This is typically a mistake: mail-handling software often + feels free to manipulate various headers in undesirable ways (in some + cases, such as Sender, such manipulation is actually mandatory), and + mail transmission problems, etc. MUST be reported to the + administrators responsible for the mail transmission rather than to + the article's author. In general, news sent as mail should be + encapsulated to separate the MAIL headers and the news headers. + + When the intended recipient is a human, any convenient form of + encapsulation may be used. Recommended practice is to use MIME + encapsulation with a content type of "message/news", given that news + articles have additional semantics beyond what "message/rfc822" + implies. + + NOTE: "message/news" was registered as a standard subtype by IANA + 22 June 1993. + + When mail is being used as a transmission path between two relayers, + however, a standard method is desirable. Currently the standard + method is to send the mail to an address whose local part is "rnews", + with whatever MAIL headers are necessary for successful transmission. + The news article (including its headers) is sent as the body of the + mail message, with an "N" prepended to each line. + + NOTE: The "N" reduces the probability of an innocent line in a + news article being taken as a magic command to mail software and + makes it easy for receiving software to strip off any lines added + by mail software (e.g., the trailing empty line added by some UUCP + mail software). + + This method has its weaknesses. In particular, it assumes that the + mail transmission channel can transmit nearly arbitrary body text + undamaged. When mail is being used as a transmission path of last + resort, however, the mail system often has inconvenient preconceived + notions about the format of message bodies. Various ad hoc encoding + schemes have been used to avoid such problems. The recommended + method is to send a news article or batch as the body of a MIME mail + + + +Spencer Historic [Page 76] + +RFC 1849 Son of 1036 March 2010 + + + message, using content type "application/news-transmission" and + MIME's "base64" encoding (which is specifically designed to survive + all known major mail systems). + + NOTE: In the process, MIME conventions could be used to fragment + and reassemble an article that is too large to be sent as a single + mail message over a transmission path that restricts message + length. In addition, the "conversions" parameter to the content + type could be used to indicate what (if any) compression method + has been used. Also, the Content-MD5 header [RFC1544] can be used + as a "checksum" to provide high confidence of detecting accidental + damage to the contents. + + UNRESOLVED ISSUE: The "conversions" parameter no longer exists. + What should be done about this, if anything? + + NOTE: It might look tempting to use a content type such as + "message/X-netnews", but MIME bans non-trivial encodings of the + entire body of messages with content type "message". The intent + is to avoid obscuring nested structure underneath encodings. For + inter-relayer news transmission, there is no nested structure of + interest, and it is important that the entire article (including + its headers, not just its body) be protected against the vagaries + of intervening mail software. This situation appears to fit the + MIME description of circumstances in which "application" is the + proper content type. + + NOTE: "application/news-transmission", with a "conversions" + parameter, was registered as a standard subtype by IANA + 22 June 1993. + + UNRESOLVED ISSUE: The "conversions" parameter no longer exists in + MIME. What should we do about this? + +8.4. Partial Batches + + UNRESOLVED ISSUE: The existing batch conventions assemble + (potentially) many articles into one batch. Handling very large + articles would be substantially less troublesome if there was also + a fragmentation convention for splitting a large article into + several batches. Is this worth defining at this time? + +9. Propagation and Processing + + Most aspects of news propagation and processing are implementation- + specific. The basic propagation algorithms, and certain details of + how they are implemented, nevertheless need to be standard. + + + + +Spencer Historic [Page 77] + +RFC 1849 Son of 1036 March 2010 + + + There are two important principles that news implementors (and + administrators) need to keep in mind. The first is the well-known + Internet Robustness Principle: + + Be liberal in what you accept, and conservative in what you send. + + However, in the case of news there is an even more important + principle, derived from a much older code of practice, the + Hippocratic Oath (we will thus call this the Hippocratic Principle): + + First, do no harm. + + It is VITAL to realize that decisions that might be merely suboptimal + in a smaller context can become devastating mistakes when amplified + by the actions of thousands of hosts within a few hours. + +9.1. Relayer General Issues + + Relayers MUST NOT alter the content of articles unnecessarily. Well- + intentioned attempts to "improve" headers, in particular, typically + do more harm than good. It is necessary for a relayer to prepend its + own name to the Path content (see Section 5.6) and permissible for it + to rewrite or delete the Xref header (see Section 6.12). Relayers + MAY delete the thoroughly obsolete headers described in Appendix A.3, + although this behavior no longer seems useful enough to encourage. + Other alterations SHOULD be avoided at all costs, as per the + Hippocratic Principle. + + NOTE: As discussed in Section 2.3, tidying up the headers of a + user-prepared article is the job of the posting agent, not the + relayer. The relayer's purpose is to move already-compliant + articles around efficiently without damaging them. Note that in + existing implementations, specific programs may contain both + posting-agent functions and relayer functions. The distinction is + that posting-agent functions are invoked only on articles posted + by local posters, never on articles received from other relayers. + + NOTE: A particular corollary of this rule is that relayers should + not add headers unless truly necessary. In particular, this is + not SMTP; do not add Received headers. + + Relayers MUST NOT pass non-conforming articles on to other relayers, + except perhaps in a cooperating subnet that has agreed to permit + certain kinds of non-conforming behavior. This is a direct + consequence of the Internet Robustness Principle. + + + + + + +Spencer Historic [Page 78] + +RFC 1849 Son of 1036 March 2010 + + + The two preceding paragraphs may appear to be in conflict. What is + to be done when a non-conforming article is received? The Robustness + Principle argues that it should be accepted but must not be passed on + to other relayers while still non-conforming, and the Hippocratic + Principle strongly discourages attempts at repair. The conclusion + that this appears to lead to is correct: a non-conforming article MAY + be accepted for local filing and processing, or it MAY be discarded + entirely, but it MUST NOT be passed on to other relayers. + + A relayer MUST NOT respond to the arrival of an article by sending + mail to any destination, other than a local administrator, except by + explicit prearrangement with the recipient. Neither posting an + article (other than certain types of control messages; see + Section 7.5) nor being the moderator of a moderated newsgroup + constitutes such prearrangement. UNDER NO CIRCUMSTANCES WHATSOEVER + may a relayer attempt to send mail to either an article's originator + or a moderator. + + NOTE: Reporting apparent errors in message composition is the job + of a posting agent, not a relayer. The same is true of mailing + moderated-newsgroup postings to moderators. In networks of + thousands of cooperating relayers, it is simply unacceptable for + there to be any circumstance whatsoever that causes any + significant fraction of them to simultaneously send mail to the + same destination. (Some control messages are exceptions, although + perhaps ill-advised ones.) What might, in a smaller network, be a + useful notification or forwarding becomes a deluge of nearly + identical messages that can bring mail software to its knees and + severely inconvenience recipients. Moderators, in particular, + historically have suffered grievously from this. + + Notification of problems in incoming articles MAY go to local + administrators, or at most (by prearrangement!) to the + administrators of the neighboring relayer(s) that passed on the + problematic articles. + + NOTE: It would be desirable to notify the author that his posting + is not propagating as he expects. However, there is no known + method for doing this that will scale up gracefully. (In + particular, "notify only if within N relayers of the originator" + falls down in the presence of commercial news services like UUNET: + there may be hundreds or thousands of relayers within a couple of + hops of the originator.) The best that can be done right now is + to notify neighbors, in hopes that the word will eventually + propagate up the line, or organize regional monitoring at major + hubs. + + + + + +Spencer Historic [Page 79] + +RFC 1849 Son of 1036 March 2010 + + + If it is necessary to alter an article, e.g., translate it to another + character set or alter its EOL representation, strenuous efforts + should be made to ensure that such transformations are reversible, + and that relayers or other software that might wish to reverse them + know exactly how to do so. + + NOTE: For example, a cooperating subnet that exchanges articles + using a non-ASCII character set like EBCDIC should define a + standard, reversible ASCII-EBCDIC mapping and take pains to see + that it is used at all points where the subnet meets the outside. + If the only reason for using EBCDIC is that the readers typically + employ EBCDIC devices, it would be more robust to employ ASCII as + the interchange format and do the transformation in the reading + and posting agents. + +9.2. Article Acceptance and Propagation + + When a relayer first receives an article, it must decide whether to + accept it. (This applies regardless of whether the article arrived + by itself or as part of a batch, and in principle regardless of + whether it originated as a local posting or as traffic from another + relayer.) In a cooperating subnet with well-controlled propagation + paths, some of the tests specified here MAY be delegated to centrally + located relayers; that is, relayers that can receive news ONLY via + one of the central relayers might simplify acceptance testing based + on the assumption that incoming traffic has already passed the full + set of tests at a central relayer. + + The wording that follows is based on a model in which articles arrive + on a relayer's host before acceptance tests are done. However, + depending on the degree of integration of the transport mechanisms + and the relayer, some or all of these tests MAY be done before the + article is actually transmitted, so that articles that definitely + will not be accepted need not be transmitted at all. + + The wording that follows also specifies a particular order for the + acceptance tests. While this order is the obvious one, the tests MAY + be done in any order. + + First, the relayer MUST verify that the article is a legal news + article, with all mandatory headers present with legal contents. + + NOTE: This check in principle is done by the first relayer to see + an article, so an article received from another relayer should + always be legal, but there is enough old software still + operational that this cannot be taken for granted; see the + discussion of the Internet Robustness Principle in Section 9.1. + + + + +Spencer Historic [Page 80] + +RFC 1849 Son of 1036 March 2010 + + + Second, the relayer MUST determine whether it has already seen this + article (identified by its message ID). This is normally done by + retaining a history of all article message IDs seen in the last + N days, where the value of N is decided by the relayer's + administrator but SHOULD be at least 7. Since N cannot practically + be infinite, articles whose Date content indicates that they are + older than N days are declared "stale" and are deemed to have been + seen already. + + NOTE: This check is important because news propagation topology is + typically redundant, often highly so, and it is not at all + uncommon for a relayer to receive the same article from several + neighbors. The history of already-seen message IDs can get quite + large, hence, the desire to limit its length, but it is important + that it be long enough that slowly propagating articles are not + classed as stale. News propagation within the Internet is + normally very rapid, but when UUCP links are involved, end-to-end + delays of several days are not rare, so a week is not a + particularly generous minimum. + + NOTE: Despite generally more rapid propagation in recent times, it + is still not unheard of for some propagation paths to be very + slow. This can introduce the possibility of old articles arriving + again after they are gone from the history, hence the "stale" + rule. + + Third, the relayer MUST determine whether any of the article's + newsgroups are "subscribed to" by the host, i.e., fit a description + of what hierarchies or newsgroups the site wants to receive. + + NOTE: This check is significant because information on what + newsgroups a relayer wishes to receive is often stored at its + neighbors, who may not have up-to-date information or may simplify + the rules for implementation reasons. As a hedge against the + possibility of missed or delayed newgroup control messages, + relayers may wish to observe a notion of a newsgroup subscription + that is independent of the list of newsgroups actually known to + the relayer. This would permit reception and relaying of articles + in newsgroups that the relayer is not (yet) aware of, subject to + more general criteria indicating that they are likely to be of + interest. + + Once an article has been accepted, it may be passed on to other + relayers. The fundamental news propagation rule is a flooding + algorithm: on receiving and accepting an article, send it to all + neighboring relayers not already in its path list that are sent its + newsgroup(s) and distribution(s). + + + + +Spencer Historic [Page 81] + +RFC 1849 Son of 1036 March 2010 + + + NOTE: The path list's role in loop prevention may appear + relatively unimportant, given that looping articles would + typically be rejected as duplicates anyway. However, the path + list's role in preventing superfluous transmissions is not + trivial. In particular, the path list is the only thing that + prevents relayer X, on receiving an article from relayer Y, from + sending it back to Y again. (Indeed, the usual symptom of + confusion about relayer names is that incoming news loops back in + this manner.) The looping articles would be rejected as + duplicates, but doubling the communications load on every news + transmission path is not to be taken lightly! + + In general, relayers SHOULD NOT make propagation decisions by + "anticipation": relayer X, noting that the article's path list + already contains relayer Y, decides not to send it to relayer Z + because X anticipates that Z will get the article by a better path. + If that is generally true, then why is there a news feed from X to Z + at all? In fact, the "better path" may be running slowly or may be + down. News propagation is very robust precisely because some + redundant transmission is done "just in case". If it is imperative + to limit unnecessary traffic on a path, use of NNTP [RFC977] or + ihave/sendme (see Section 7.2) to pass articles only when necessary + is better than arbitrary decisions not to pass articles at all. + + Anticipation is occasionally justified in special cases. Such cases + should involve both (1) a cooperating subnet whose propagation paths + are well-understood and well-monitored, with failures and slowdowns + noticed and dealt with promptly, and (2) a persistent pattern of + heavy unnecessary traffic on a path that is either slow or costly. + In addition, there should be some reason why neither NNTP nor + ihave/sendme is suitable as a solution to the problem. + +9.3. Administrator Contact + + It is desirable to have a standardized contact address for a + relayer's administrators, in the spirit of the "postmaster" address + for mail administrators. Mail addressed to "newsmaster" on a + relayer's host MUST go to the administrator(s) of that relayer. Mail + addressed to "usenet" on the relayer's host SHOULD be handled + likewise. Mail addressed to either address on other hosts using the + same news database SHOULD be handled likewise. + + NOTE: These addresses are case-sensitive, although it would be + desirable for sequences equivalent to them using case-insensitive + comparison to be handled likewise. While "newsmaster" seems the + preferred network-independent address, by analogy to "postmaster", + there is an existing practice of using "usenet" for this purpose, + + + + +Spencer Historic [Page 82] + +RFC 1849 Son of 1036 March 2010 + + + and so "usenet" should be supported if at all possible (especially + on hosts belonging to Usenet!). The address "news" is also + sometimes used for purposes like this, but less consistently. + +10. Gatewaying + + Gatewaying of traffic between news networks using this Draft and + those using other exchange mechanisms can be useful but must be done + cautiously. Gateway administrators are taking on significant + responsibilities and must recognize that the consequences of error + can be quite serious. + +10.1. General Gatewaying Issues + + This section will primarily address the problems of gatewaying + traffic INTO news networks. Little can be said about the other + direction without some specific knowledge of the network(s) involved. + However, the two issues are not entirely independent: if a non-news + network is gatewayed into a news network at more than one point, + traffic injected into the non-news network by one gateway may appear + at another as a candidate for injection back into the news network. + + This raises a more general principle, the single most important issue + for gatewaying: + + Above all, prevent loops. + + The normal loop prevention of news transmission is vitally dependent + on the Message-ID header. Any gateway that finds it necessary to + remove this header, alter it, or supersede it (by moving it into the + body) MUST take equally effective precautions against looping. + + NOTE: There are few things more effective at turning news readers + into a lynch mob than a malfunctioning gateway, or pair of + gateways, that takes in news articles, mangles them just enough to + prevent news relayers from recognizing them as duplicates, and + regurgitates them back into the news stream. This happens rather + too often. + + Gateway implementors should realize that gateways have all of the + responsibilities of relayers, plus the added complications introduced + by transformations between different information formats. Much of + the discussion in Section 9 about relayer issues is relevant to + gateways as well. In particular, gateways SHOULD keep a history of + recently seen articles, as described in Section 9.2, and not assume + that articles will never reappear. This is particularly important + for networks that have their own concept analogous to message IDs: a + gateway should keep a history of traffic seen from BOTH directions. + + + +Spencer Historic [Page 83] + +RFC 1849 Son of 1036 March 2010 + + + If at all possible, articles entering the non-news network SHOULD be + marked in some way so that they will NOT be re-gatewayed back into + news. Multiple gateways obviously must agree on the marking method + used; if it is done by having them know each others' names, name + changes MUST be coordinated with great care. If marking cannot be + done, all transformations MUST be reversible so that a re-gatewayed + article is identical to the original (except perhaps for a longer + Path header). + + Gateways MUST NOT pass control messages (articles containing Control, + Also-Control, or Supersedes headers) without removing the headers + that make them control messages, unless there are compelling reasons + to believe that they are relevant to both sides and that conventions + are compatible. If it is truly desirable to pass them unaltered, + suitable precautions MUST be taken to ensure that there is NO + POSSIBILITY of a looping control message. + + NOTE: The damage done by looping articles is multiplied a + thousandfold if one of the affected articles is something like a + sendsys message (see Section 7.5) that requests multiple automatic + replies. Most gateways simply should not pass control messages at + all. If some unusual reason dictates doing so, gateway + implementors and administrators are urged to consider bulletproof + rate-limiting measures for the more destructive ones like sendsys, + e.g., passing only one per hour no matter how many are offered. + + Gateways, like relayers, SHOULD make determined efforts to avoid + mangling articles unnecessarily. In the case of gateways, some + transformations may be inevitable, but keeping them to a minimum and + ensuring that they are reversible is still highly desirable. + + Gateways MUST avoid destroying information. In particular, the + restrictions of Section 4.2.2 are best taken with a grain of salt in + the context of gateways. Information that does not translate + directly into news headers SHOULD be retained, perhaps in "X-" + headers, both because it may be of interest to sophisticated readers + and because it may be crucial to tracing propagation problems. + + Gateway implementors should take particular note of the discussion of + mailed replies, or more precisely the ban on same, in Section 9.1. + Gateway problems MUST be reported to the local administration, not to + the innocent originator of traffic. "Gateway problems" here includes + all forms of propagation anomaly on the non-news side of the gateway, + e.g., unreachable addresses on a mailing list. Note that this + requires consideration of possible misbehavior of "downstream" hosts, + not just the gateway host. + + + + + +Spencer Historic [Page 84] + +RFC 1849 Son of 1036 March 2010 + + +10.2. Header Synthesis + + News articles prepared by gateways MUST be legal news articles. In + particular, they MUST include all of the mandatory headers (see + Section 5) and MUST fully conform to the restrictions on said + headers. This often requires that a gateway function not only as a + relayer but also partly as a posting agent, aiding in the synthesis + of a conforming article from non-conforming input. + + NOTE: The full-conformance requirement needs particularly careful + attention when gatewaying mailing lists to news, because a number + of constructs that are legal in MAIL headers are NOT permissible + in news headers. (Note also that not all mail traffic fully + conforms to even the MAIL specification.) The rest of this + section will be phrased in terms of mail-to-news gatewaying, but + most of it is more generally applicable. + + The mandatory headers generally present few problems. + + If no date information is available, the gateway should supply a Date + header with the gateway's current date. If only partial information + is available (e.g., date but not time), this should be fleshed out to + a full Date header by adding default values, not by mixing in parts + of the gateway's current date. (Defaults should be chosen so that + fleshed-out dates will not be in the future!) It may be necessary to + map time zone information to the restricted forms permitted in the + news Date header. See Section 5.1. + + NOTE: The prohibition of mixing dates is on the theory that it is + better to admit ignorance than to lie. + + If the author's address as supplied in the original message is not + suitable for inclusion in a From header, the gateway MUST transform + it so it is (for example, by use of the "% hack" and the domain + address of the gateway). The desire to preserve information is NOT + an excuse for violating the rules. If the transformation is drastic + enough that there is reason to suspect loss of information, it may be + desirable to include the original form in an "X-" header, but the + From header's contents MUST be as specified in Section 5.2. + + If the message contains a Message-ID header, the contents should be + dealt with as discussed in Section 10.3. If there is no message ID + present, it will be necessary to synthesize one, following the news + rules (see Section 5.3). + + Every effort should be made to produce a meaningful Subject header; + see Section 5.4. Many news readers select articles to read based on + Subject headers, and inserting a placeholder like "<no subject + + + +Spencer Historic [Page 85] + +RFC 1849 Son of 1036 March 2010 + + + available>" is considered highly objectionable. Even synthesizing a + Subject header by picking out the first half-dozen nouns and + adjectives in the article body is better than using a placeholder, + since it offers SOME indication of what the article might contain. + + The contents of the Newsgroups header (Section 5.5) are usually + predetermined by gateway configuration, but a gateway to a network + that has its own concept of newsgroups or discussions might have to + make transformations. Such transformations should be reversible; + otherwise, confusion is likely on both sides. + + It will rarely be possible for gateways to provide a Path header that + is both an accurate history of the relayers the article has passed + through AS NEWS and a usable reply address. The history function + MUST be given priority; see the discussion in Section 5.6. It will + usually be necessary for a gateway to supply an empty path list, + abandoning the reply function. + + It is desirable for gatewayed articles to convey as much useful + information as possible, e.g., by use of optional news headers (see + Section 6) when the relevant information is available. Synthesis of + optional headers can generally follow similar rules. + + Software synthesizing References headers should note the discussion + in Section 6.5 concerning the incompatibility between MAIL and news. + Also of interest is the possibility of incorporating information from + In-Reply-To headers and from attribution lines in the body; an + incomplete or somewhat conjectural References header is much better + than none at all, and reading agents already have to cope with + incomplete or slightly erroneous References lists. + +10.3. Message ID Mapping + + This section, like the previous one, is phrased in terms of mail + being gatewayed into news, but most of the discussion should be more + generally applicable. + + A particularly sticky problem of gatewaying mail into news is + supplying legal news message IDs. Note, in particular, that not all + MAIL message IDs are legal in news; the news syntax (specified in + Section 5.3, with related material in Section 5.2) is more + restrictive. Generating a fully conforming news article from a mail + message may require transforming the message ID somewhat. + + Generation and transformation of message IDs assumes particular + importance if a given mailing list (or whatever) is being handled by + more than one gateway. It is highly desirable that the same article + contents not appear twice in the same newsgroup, which requires that + + + +Spencer Historic [Page 86] + +RFC 1849 Son of 1036 March 2010 + + + they receive the same message ID from all gateways. Gateways SHOULD + use the following algorithm (possibly modified by the later + discussion of gatewaying into more than one newsgroup) unless local + considerations dictate another: + + 1. Separate message ID from surroundings, if necessary. A + plausible method for this is to start at the first "<", end at + the next ">", and reject the message if no ">" is found or a + second "<" is seen before the ">". Also reject the message if + the message ID contains no "@" or more than one "@", or if it + contains no ".". Also reject the message if the message ID + contains non-ASCII characters, ASCII control characters, or + white space. + + NOTE: Any legitimate domain will include at least one ".". + [RFC822], Section 6.2.2, forbids white space in this context + when passing mail on to non-MAIL software. + + 2. Delete the leading "<" and trailing ">". Separate message ID + into local part and domain at the "@". + + 3. In both components, transliterate leading dots (".", ASCII 46), + trailing dots, and dots after the first in sequences of two or + more consecutive dots, into underscores (ASCII 95). + + 4. In both components, transliterate disallowed characters other + than dots (see the definition of <unquoted-char> in + Section 5.2) to underscores (ASCII 95). + + 5. Form the message ID as + + "<" local-part "@" domain ">" + + NOTE: This algorithm is approximately that of Rich Salz's + successful gatewaying package. + + Despite the desire to keep message IDs consistent across multiple + gateways, there is also a more subtle issue that can require a + different approach. If the same articles are being gatewayed into + more than one newsgroup, and it is not possible to arrange that all + gateways gateway them to the same cross-posted set of newsgroups, + then the message IDs in the different newsgroups MUST be DIFFERENT. + + NOTE: Otherwise, arrival of an article in one newsgroup will + prevent it from appearing in another, and which newsgroup a + particular article appears in will be an accident of which + direction it arrives from first. It is very difficult to maintain + a coherent discussion when each participant sees a randomly + + + +Spencer Historic [Page 87] + +RFC 1849 Son of 1036 March 2010 + + + selected 50% of the traffic. The fundamental problem here is that + the basic assumption behind message IDs is being violated: the + gateways are assigning the same message ID to articles that differ + in an important respect (Newsgroups header). + + In such cases, it is suggested that the newsgroup name, or an agreed- + on abbreviation thereof, be prepended to the local part of the + message ID (with a separating ".") by the gateway. This will ensure + that multiple gateways generate the same message ID, while also + ensuring that different newsgroups can be read independently. + + NOTE: It is preferable to have the gateway(s) cross-post the + article, avoiding the issue altogether, but this may not be + feasible, especially if one newsgroup is widespread and the other + is purely local. + +10.4. Mail to and from News + + Gatewaying mail to news, and vice versa, is the most obvious form of + news gatewaying. It is common to set up gateways between news and + mail rather too casually. + + It is hard to go very wrong in gatewaying news into a mailing list, + except for the non-trivial matter of making sure that error reports + go to the local administration rather than to the authors of news + articles. (This requires attention to the "envelope address" as well + as to the message headers.) Doing the reverse connection correctly + is much harder than it looks. + + NOTE: In particular, just feeding the mail message to "inews -h" + or the equivalent is NOT, repeat NOT, adequate to gateway mail to + news. Significant gatewaying software is necessary to do it + right. Not all headers of mail messages conform to even the MAIL + specifications, never mind the stricter rules for news. + + It is useful to distinguish between two different forms of + mail-to-news gatewaying: gatewaying a mailing list into a newsgroup, + and operating a "post-by-mail" service in which individual articles + can be posted to a newsgroup by mailing them to a specific address. + In the first case, the message is already being "broadcast", and the + situation can be viewed as gatewaying one form of news into another. + The second case is closer to that of a moderator posting submissions + to a moderated newsgroup. + + In either case, the discussions in the preceding two sections are + relevant, as is the Hippocratic Principle of Section 9. However, + some additional considerations are specific to mail-to-news + gatewaying. + + + +Spencer Historic [Page 88] + +RFC 1849 Son of 1036 March 2010 + + + As mentioned in Section 6, point-to-point headers like To and Cc + SHOULD NOT appear as such in news, although it is suggested that they + be transformed to "X-" headers, e.g., X-To and X-Cc, to preserve + their information content for possible use by readers or + troubleshooters. The Received header is entirely specific to MAIL + and SHOULD be deleted completely during gatewaying, except perhaps + for the Received header supplied by the gateway host itself. + + The Sender header is a tricky case, one where mailing-list and post- + by-mail practice should differ. For gatewaying mailing lists, the + mailing-list host should be considered a relayer, and the From and + Sender headers supplied in its transmissions left strictly untouched. + For post-by-mail, as for a moderator posting a mailed submission, the + Sender header should reflect the poster rather than the author. If a + post-by-mail gateway receives a message with its own Sender header, + it might wish to preserve the content in an X-Sender header. + + It will generally be necessary to transform between mail's + In-Reply-To/References convention and news's References/See-Also + convention, to preserve correct semantics of cross references. This + also requires attention when going the other way, from news to mail. + See the discussion of the difference in Section 6.5. + +10.5. Gateway Administration + + Any news system will benefit from an attentive administrator, + preferably assisted by automated monitoring for anomalies. This is + particularly true of gateways. Gateway software SHOULD be + instrumented so that unusual occurrences, such as sudden massive + surges in traffic, are reported promptly. It is desirable, in fact, + to go further: gateway software SHOULD endeavor to limit damage in + the event that the administrator does not respond promptly. + + NOTE: For example, software might limit the gatewaying rate by + queueing incoming traffic and emptying the queue at a finite + maximum rate (well below the maximum that the host is capable of!) + that is set by the administrator and is not raised automatically. + + Traffic gatewayed into a news network SHOULD include a suitable + header, perhaps X-Gateway-Administrator, giving an electronic address + that can be used to report problems. This SHOULD be an address that + goes directly to a human, and not to a "routine administrative + issues" mailbox that is examined only occasionally, since the point + is to be able to reach the administrator quickly in an emergency. + Gateway administrators SHOULD arrange substitutes to cover gateway + operation (with suitable redirection of mail) when they are on + vacation, etc. + + + + +Spencer Historic [Page 89] + +RFC 1849 Son of 1036 March 2010 + + +11. Security and Related Issues + + Although the interchange format itself raises no significant security + issues, the wider context does. + +11.1. Leakage + + The most obvious form of security problem with news is "leakage" of + articles that are intended to have only restricted circulation. The + flooding algorithm is EXTREMELY good at finding any path by which + articles can leave a subnet with supposedly restrictive boundaries. + Substantial administrative effort is required to ensure that local + newsgroups remain local, unless connections to the outside world are + tightly restricted. + + A related problem is that the sendme control message can be used to + ask for any article by its message ID. The usefulness of this has + declined as message-ID generation algorithms have become less + predictable, but it remains a potential problem for "secure" + newsgroups. Hosts with such newsgroups may wish to disable the + sendme control message entirely. + + The sendsys, version, and whogets control messages also allow + "outsiders" to request information from "inside", which may reveal + details of internal topology (etc.) that are considered + confidential. (Note that at least limited openness about such + matters may be a condition of membership in such networks, e.g., + Usenet.) + + Organizations wishing to control these forms of leakage are strongly + advised to designate a small number of "official gateway" hosts to + handle all news exchange with the outside world, so that a bounded + amount of administrative effort is needed to control propagation and + eliminate problems. Attempts to keep news out entirely, by refusing + to support an official gateway, typically result in large numbers of + unofficial partial gateways appearing over time. Such a + configuration is much more difficult to troubleshoot. + + A somewhat related problem is the possibility of proprietary material + being disclosed unintentionally by a poster who does not realize how + far his words will propagate, either from sheer misunderstanding or + because of errors made (by human or software) in followup + preparation. There is little that can be done about this except + education. + + + + + + + +Spencer Historic [Page 90] + +RFC 1849 Son of 1036 March 2010 + + +11.2. Attacks + + Although the limitations of the medium restrict what can be done to + attack a host via news, some possibilities exist, most of them + problems news shares with mail. + + If reading agents are careless about transmitting non-printable + characters to output devices, malicious posters may post articles + containing control sequences ("letterbombs") meant to have various + destructive effects on output devices. Possible effects depend on + the device, but they can include hardware damage (e.g., by repeated + writing of values into configuration memories that can tolerate only + a limited number of write cycles) and security violation (e.g., by + reprogramming function keys potentially used by privileged readers). + + A more sophisticated variation on the letterbomb is inclusion of + "Trojan horses" in programs. Obviously, readers must be cautious + about using software found in news, but more subtly, reading agents + must also exercise care. MIME messages can include material that is + executable in some sense, such as PostScript documents (which are + programs!), and letterbombs may be introduced into such material. + + Given the presence of finite resources and other software + limitations, some degree of system disruption can be achieved by + posting otherwise-innocent material in great volume, either in single + huge articles (see Section 4.6) or in a stream of modest-sized + articles. (Some would say that the steady growth of Usenet volume + constitutes a subtle and unintentional attack of the latter type; + certainly it can have disruptive effects if administrators are + inattentive.) Systems need some ability to cope with surges, because + single huge articles occur occasionally as the result of software + error, innocent misunderstanding, or deliberate malice; and downtime + at upstream hosts can cause droughts, followed by floods, of + legitimate articles. (There is also a certain amount of normal + variation; for example, Usenet traffic is noticeably lighter on + weekends and during Christmas holidays, and rises noticeably at the + start of the school term of North American universities.) However, a + site that normally receives little traffic may be quite vulnerable to + "swamping" attack if its software is insufficiently careful. + + In general, careless implementation may open doors that are not + intrinsic to news. In particular, implementation of control messages + (see Sections 6.6 and 7) and unbatchers (see Sections 8.1 and 8.2) + via a command interpreter requires substantial precautions to ensure + that only the intended capabilities are available. Care must also be + taken that article-supplied text is not fed to programs that have + escapes to command interpreters. + + + + +Spencer Historic [Page 91] + +RFC 1849 Son of 1036 March 2010 + + + Finally, there is considerable potential for malice in the sendsys, + version, and whogets control messages. They are not harmful to the + hosts receiving them as news, but they can be used to enlist those + hosts (by the thousands) as unwitting allies in a mail-swamping + attack on a victim who may not even receive news. The precautions + discussed in Section 7.5 can reduce the potential for such attacks + considerably, but the hazard cannot be eliminated as long as these + control messages exist. + +11.3. Anarchy + + The highly distributed nature of news propagation, and the lack of + adequate authentication protocols (especially for use over the less- + interactive transport mechanisms such as UUCP), make article forgery + relatively straightforward. It may be possible to at least track a + forgery to its source, once it is recognized as such, but clever + forgers can make even that relatively difficult. The assumption that + forgeries will be recognized as such is also not to be taken for + granted; readers are notoriously prone to blindly assuming + authenticity. If a forged article's initial path list includes the + relayer name of the supposed poster's host, the article will never be + sent to that host, and the alleged author may learn about the forgery + secondhand or not at all. + + A particularly noxious form of forgery is the forged "cancel" control + message. Notably, it is relatively straightforward to write software + that will automatically send out a (forged) cancel message for any + article meeting some criterion, e.g., written by a specific author. + The authentication problems discussed in Section 7.1 make it + difficult to solve this without crippling cancel's important + functionality. + + A related problem is the possibility of disagreements over newsgroup + creation, on networks where such things are not decided by central + authorities. There have been cases of "rmgroup wars", where one + poster persistently sends out newgroup messages to create a newsgroup + and another, equally persistently, sends out rmgroup messages asking + that it be removed. This is not particularly damaging, if relayers + are configured to be cautious, but it can cause serious confusion + among innocent third parties who just want to know whether or not + they can use the newsgroup for communication. + +11.4. Liability + + News shares the legal uncertainty surrounding other forms of + electronic communication: what rules apply to this new medium of + information exchange? News is a particularly problematic case + + + + +Spencer Historic [Page 92] + +RFC 1849 Son of 1036 March 2010 + + + because it is a broadcast medium rather than a point-to-point one + like mail, and analogies to older forms of communication are + particularly weak. + + Are news-carrying hosts common carriers, like the phone companies, + providing communications paths without having either authority over + or responsibility for content? Or are they publishers, responsible + for the content regardless of whether they are aware of it or not? + Or something in between? Such questions are particularly significant + when the content is technically criminal, e.g., some types of + sexually oriented material in some jurisdictions, in which case + ignorance of its presence may not be an adequate defense. + + Even in milder situations such as libel or copyright violation, the + responsibilities of the poster, his host, and other hosts carrying + the traffic are unclear. Note, in particular, the problems arising + when the article is a forgery, or when the alleged author claims it + is a forgery but cannot prove this. + +12. References + + [ISO/IEC9899] "Information technology - Programming Language C", + ISO/IEC 9899:1990 {more recently 9899:1999}, 1990. + + [Metamail] Borenstein, N., + <http://ftp.funet.fi/pub/unix/mail/metamail/ANNOUNCE>, + February 1994. + + [RFC821] Postel, J., "Simple Mail Transfer Protocol", STD 10, + RFC 821, August 1982. + + [RFC822] Crocker, D., "STANDARD FOR THE FORMAT OF ARPA INTERNET + TEXT MESSAGES", STD 11, RFC 822, August 1982. + + [RFC850] Horton, M., "Standard for interchange of Usenet + messages", RFC 850, June 1983. + + [RFC977] Kantor, B. and P. Lapsley, "Network News Transfer + Protocol - A Proposed Standard for the Stream-Based + Transmission of News", RFC 977, February 1986. + + [RFC1036] Horton, M. and R. Adams, "Standard for interchange of + USENET Messages", RFC 1036, December 1987. + + [RFC1123] Braden, R., Ed., "Requirements for Internet Hosts - + Application and Support", STD 3, RFC 1123, + October 1989. + + + + +Spencer Historic [Page 93] + +RFC 1849 Son of 1036 March 2010 + + + [RFC1341] Borenstein, N. and N. Freed, "MIME (Multipurpose + Internet Mail Extensions): Mechanisms for Specifying + and Describing the Format of Internet Message Bodies", + RFC 1341, June 1992. + + [RFC1342] Moore, K., "Representation of Non-ASCII Text in + Internet Message Headers", RFC 1342, June 1992. + + [RFC1345] Simonsen, K., "Character Mnemonics and Character + Sets", RFC 1345, June 1992. + + [RFC1413] St. Johns, M., "Identification Protocol", RFC 1413, + February 1993. + + [RFC1456] Vietnamese Standardization Working Group, "Conventions + for Encoding the Vietnamese Language", RFC 1456, + May 1993. + + [RFC1544] Rose, M., "The Content-MD5 Header Field", RFC 1544, + November 1993. + + [RFC1896] Resnick, P. and A. Walker, "The text/enriched MIME + Content-type", RFC 1896, February 1996. + + [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet + Mail Extensions (MIME) Part One: Format of Internet + Message Bodies", RFC 2045, November 1996. + + [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet + Mail Extensions (MIME) Part Two: Media Types", + RFC 2046, November 1996. + + [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail + Extensions) Part Three: Message Header Extensions for + Non-ASCII Text", RFC 2047, November 1996. + + [RFC2049] Freed, N. and N. Borenstein, "Multipurpose Internet + Mail Extensions (MIME) Part Five: Conformance Criteria + and Examples", RFC 2049, November 1996. + + [RFC2822] Resnick, P., Ed., "Internet Message Format", RFC 2822, + April 2001. + + [RFC3977] Feather, C., "Network News Transfer Protocol (NNTP)", + RFC 3977, October 2006. + + + + + + +Spencer Historic [Page 94] + +RFC 1849 Son of 1036 March 2010 + + + [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, + October 2008. + + [RFC5536] Murchison, K., Ed., Lindsey, C., and D. Kohn, "Netnews + Article Format", RFC 5536, November 2009. + + [RFC5537] Allbery, R., Ed., and C. Lindsey, "Netnews + Architecture and Protocols", RFC 5537, November 2009. + + [Sanderson] David Sanderson, Smileys, O'Reilly & Associates Ltd., + 1993. + + [UUCP] Tim O'Reilly and Grace Todino, Managing UUCP and + Usenet, O'Reilly & Associates Ltd., January 1992. + + [X3.4] "American National Standard for Information Systems - + Coded Character Sets - 7-Bit American National + Standard Code for Information Interchange (7-Bit + ASCII)", ANSI X3.4, March 1986. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Spencer Historic [Page 95] + +RFC 1849 Son of 1036 March 2010 + + +Appendix A. Archaeological Notes + +A.1. "A News" Article Format + + The obsolete "A News" article format consisted of exactly five lines + of header information, followed by the body. For example: + + Aeagle.642 + news.misc + cbosgd!mhuxj!mhuxt!eagle!jerry + Fri Nov 19 16:14:55 1982 + Usenet Etiquette - Please Read + body + body + body + + The first line consisted of an "A" followed by an article ID + (analogous to a message ID and used for similar purposes). The + second line was the list of newsgroups. The third line was the path. + The fourth was the date, in the format above (all fields fixed + width), resembling an Internet date but not quite the same. The + fifth was the subject. + + This format is documented for archaeological purposes only. Do not + generate articles in this format. + +A.2. Early "B News" Article Format + + This obsolete pseudo-Internet article format, used briefly during the + transition between the A News format and the modern format, followed + the general outline of a MAIL message but with some non-standard + headers. For example: + + From: cbosgd!mhuxj!mhuxt!eagle!jerry (Jerry Schwarz) + Newsgroups: news.misc + Title: Usenet Etiquette -- Please Read + Article-I.D.: eagle.642 + Posted: Fri Nov 19 16:14:55 1982 + Received: Fri Nov 19 16:59:30 1982 + Expires: Mon Jan 1 00:00:00 1990 + + body + body + body + + The From header contained the information now found in the Path + header, plus possibly the full name now typically found in the From + header. The Title header contained what is now the Subject content. + + + +Spencer Historic [Page 96] + +RFC 1849 Son of 1036 March 2010 + + + The Posted header contained what is now the Date content. The + Article-I.D. header contained an article ID, analogous to a message + ID and used for similar purposes. The Newsgroups and Expires headers + were approximately as they are now. The Received header contained + the date when the latest relayer to process the article first saw it. + All dates were in the above format, with all fields fixed width, + resembling an Internet date but not quite the same. + + This format is documented for archaeological purposes only. Do not + generate articles in this format. + +A.3. Obsolete Headers + + Early versions of news software following the modern format sometimes + generated headers like the following: + + Relay-Version: version B 2.10 2/13/83; site cbosgd.UUCP + Posting-Version: version B 2.10 2/13/83; site eagle.UUCP + Date-Received: Friday, 19-Nov-82 16:59:30 EST + + Relay-Version contained version information about the relayer that + last processed the article. Posting-Version contained version + information about the posting agent that posted the article. Date- + Received contained the date when the last relayer to process the + article first saw it (in a slightly nonstandard format). + + These headers are documented for archaeological purposes only. Do + not generate articles using them. + +A.4. Obsolete Control Messages + + There once was a senduuname control message, resembling sendsys but + requesting transmission of the list of hosts to which the receiving + host had UUCP connections. This rapidly ceased to be of much use, + and many organizations consider information about their internal + connectivity to be confidential. + + Historically, a checkgroups body consisting of one or two lines, the + first of the form "-n newsgroup", caused checkgroups to apply to only + that single newsgroup. This form is documented for archaeological + purposes only; do not use it. + + Historically, an article posted to a newsgroup whose name had exactly + three components of which the third was "ctl" signified that article + was to be taken as a control message. The Subject header specified + the actions in the same way the Control header does now. This form + is documented for archaeological purposes only; do not use it; do not + implement it. + + + +Spencer Historic [Page 97] + +RFC 1849 Son of 1036 March 2010 + + +Appendix B. A Quick Tour of MIME + + (The editor wishes to thank Luc Rooijakkers; most of this appendix is + a lightly edited version of a summary he kindly supplied.) + + MIME (Multipurpose Internet Mail Extensions) is an upward-compatible + set of extensions to [RFC822], currently documented in [RFC2045], + [RFC2046], and [RFC2047]. This appendix summarizes these documents. + See the MIME RFCs for more information; they are very readable. + + UNRESOLVED ISSUE: These RFC numbers (here and elsewhere in this + Draft) need updating when the new MIME RFCs come out {now + resolved!}. + + MIME defines the following new headers: + + MIME-Version + Content-Type + Content-Transfer-Encoding + Content-ID + Content-Description + + The MIME-Version header is mandatory for all messages conforming to + the MIME specification and carries the version number of the MIME + specification. Example: + + MIME-Version: 1.0 + + The Content-Type header indicates the content type of the message. + Content types are split into a top-level type and a subtype, + separated by a slash. Auxiliary information can also be supplied, + using an attribute-value notation. Example: + + Content-Type: text/plain; charset=us-ascii + + (In the absence of a Content-Type header this is in fact the default + content type.) + + Important type/subtype combinations are: + + text/plain Plain text, possibly in a non-ASCII character + set. + + text/enriched A very simple wordprocessor-like language + supporting character attributes (e.g., + underlining), justification control, and + multiple character sets. (This proposal has + + + + +Spencer Historic [Page 98] + +RFC 1849 Son of 1036 March 2010 + + + gone through several iterations and has + recently split off from the main MIME RFCs + into a separate document [RFC1896].) + + message/rfc822 A mail message conforming to a slightly + relaxed version of [RFC822]. + + message/partial Part of a message (supporting the transparent + splitting and joining of messages when they + are too large to be handled by some transport + agent). + + message/external-body A message whose body is external. Possible + access methods include via mail, FTP, local + file, etc. + + multipart/mixed A message whose body consists of multiple + parts, possibly of different types, intended + to be viewed in serial order. Each part + looks like an [RFC822] message, consisting of + headers and a body. Most of the [RFC822] + headers have no defined semantics for body + parts. + + multipart/parallel Likewise, except that the parts are intended + to be viewed in parallel (on user agents that + support it). + + multipart/alternative Likewise, except that the parts are intended + to be semantically equivalent such that the + part that best matches the capabilities of + the environment should be displayed. For + example, a message may include plain-text, + enriched-text, and postscript versions of + some document. + + multipart/digest A variant of multipart/mixed especially + intended for message digests (the default + type of the parts is message/rfc822 instead + of text/plain, saving on the number of + headers for the parts). + + application/postscript A PostScript document. (PostScript is a + trademark of Adobe.) + + Other top-level types exist for still images, audio, and video + samples. + + + + +Spencer Historic [Page 99] + +RFC 1849 Son of 1036 March 2010 + + + Some of the above types require the ability to transport binary data. + Since the existing message systems usually do not support this, MIME + provides a Content-Transfer-Encoding header to indicate the kind of + encoding used. The possible encodings are: + + 7bit No encoding; the data consists of short (less than + 1000 characters) lines of 7-bit ASCII data, + delimited by EOL sequences. This is the default + encoding. + + 8bit Like 7bit, except that bytes with the high-order + bit set may be present. Many transmission paths + are incapable of carrying messages that use this + encoding. + + binary No encoding; any sequence of bytes may be present. + Many transmission paths are incapable of carrying + messages that use this encoding. + + base64 The data is encoded by representing every group of + 3 bytes as 4 characters from the alphabet + "A-Za-z0-9+/", which was chosen for its high + robustness through mail gateways (the alphabet used + by uuencode does not survive ASCII-EBCDIC-ASCII + translations). In the final group of 4 characters, + "=" is used for those characters not representing + data bytes. Line length is limited, and EOLs in + the encoded form are ignored. + + quoted-printable Any byte can be represented by a three-character + "=XX" sequence where the X's are uppercase + hexadecimal digits. Bytes representing printable + 7-bit US-ASCII characters except "=" may be + represented literally. Tabs and blanks may be + represented literally if not at the end of a line. + Line length is limited, and an EOL preceded by "=" + was inserted for this purpose and is not present in + the original. + + The base64 and quoted-printable encodings are applied to data in + Internet canonical form, which means that any EOL encoded as anything + but EOL must be an Internet canonical EOL: CR followed by LF. + + The Content-Description header allows further description of a body + part, analogous to the use of Subject for messages. + + + + + + +Spencer Historic [Page 100] + +RFC 1849 Son of 1036 March 2010 + + + Finally, the Content-ID header can be used to assign an + identification to body parts, analogous to the assignment of + identifications to messages by Message-ID. + + Note that most of these headers are structured header fields, as + defined in [RFC822]. Consequently, comments are allowed in their + values. The following is a legal MIME header: + + Content-Type: (a comment) text (yeah) / + plain (and now some params:) ; charset= (guess what) + iso-8859-1 (we don't have iso-10646 yet, pity) + + NOTE: Although the MIME specification was developed for mail, + there is nothing precluding its use for news as well. While it + might simplify implementation to restrict the MIME headers + somewhat, in the same way that other news headers (e.g., From) are + restricted subsets of the [RFC822] originals, this would add yet + another divergence between two formats that ought to be as + compatible as possible. In the case of the MIME headers, there is + no body of existing code posing compatibility concerns. A full- + featured MIME reading agent needs a full [RFC822] parser anyway, + to properly handle body parts of types like message/rfc822, so + there is little gain from restricting MIME headers. Adopting the + MIME specification unchanged seems best. However, article-level + MIME headers must still comply with the overall news header syntax + given in Section 4, so that news software that is NOT interested + in MIME need not contain a full [RFC822] parser. + + "MIME (Multipurpose Internet Mail Extensions) Part Three: Message + Header Extensions for Non-ASCII Text" [RFC2047] addresses the problem + of non-ASCII characters in headers. An example of a header using the + [RFC2047] mechanism is + + From: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be> + + Such encodings are allowed in selected headers, subject to the + restrictions listed in [RFC2047]. + + The MIME effort has also produced an RFC defining a Content-MD5 + header [RFC1544] containing an MD5-based "checksum" of the contents + of an article or body part, giving high confidence of detecting + accidental modifications to the contents. + + The "metamail" software package [Metamail] helps provide MIME support + with minimal changes to mailers and may also be relevant to news + reading agents. + + + + + +Spencer Historic [Page 101] + +RFC 1849 Son of 1036 March 2010 + + + The PEM (Privacy Enhanced Mail) effort is pursuing analogous + facilities to offer stronger guarantees against malicious + modifications, unauthorized eavesdropping, and forgery. This work + too may be applicable to news, once it is reconciled with MIME (by + efforts now underway). + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Spencer Historic [Page 102] + +RFC 1849 Son of 1036 March 2010 + + +Appendix C. Summary of Changes Since RFC 1036 + + This Draft is much longer than [RFC1036], so there is obviously much + change in content. Much of this is just increased precision and + rigor. Noteworthy changes and additions include: + + + restrictions on article bodies (Section 4.3) + + + all references to MIME facilities + + + size limits on articles + + + precise specification of Date-content syntax + + + message IDs must never be re-used, ever + + + "!" is the only Path delimiter + + + multiple moderators in the Approved header + + + rules on References trimming, and the _-_ mechanism + + + generalization of the Xref rules + + + multiple message IDs in Cancel and Supersedes + + + Also-Control + + + See-Also + + + Article-Names + + + Article-Updates + + + more precise rules for cancellation + + + cancellation authorization based on From, not Sender + + + "unmoderated" and descriptors in newgroup messages + + + restrictive rules on handling of sendsys and version messages + + + the whogets control message + + + precise specification of checkgroups messages + + + compression type preferably specified out-of-band + + + + +Spencer Historic [Page 103] + +RFC 1849 Son of 1036 March 2010 + + + + rules for encapsulating news in MIME mail + + + tighter specification of relayer functioning (Section 9.1) + + + the "newsmaster" contact address + + + rules for gatewaying (Section 10) + + + discussion of security issues (Section 11) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Spencer Historic [Page 104] + +RFC 1849 Son of 1036 March 2010 + + +Appendix D. Summary of Completely New Features + + Most of this Draft merely documents existing practice, preferred + versions thereof, or straightforward generalizations of it, but there + are a few outright inventions. These are: + + + the _-_ mechanism for References trimming + + + Also-Control + + + See-Also + + + Article-Names + + + Article-Updates + + + the whogets control message + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Spencer Historic [Page 105] + +RFC 1849 Son of 1036 March 2010 + + +Appendix E. Summary of Differences from RFCs 822 and 1123 + + The following are noteworthy differences between this Draft's + articles and MAIL messages: + + + generally less-permissive header syntax + + + notably, limited From syntax + + + MAIL header comments allowed in only a few contexts + + + slightly more restricted message-ID syntax + + + several more mandatory headers + + + duplicate headers forbidden + + + References/See-Also versus In-Reply-To/References (Section 6.5) + + + case sensitivity in some contexts + + + point-to-point headers, e.g., To and Cc, forbidden (Section 6) + + + several new headers + +Author's Address + + Henry Spencer + SP Systems + Box 280 Stn. A + Toronto, Ontario M5W1B2 + Canada + + EMail: henry@zoo.utoronto.ca + + + + + + + + + + + + + + + + + +Spencer Historic [Page 106] + |