diff options
Diffstat (limited to 'doc/rfc/rfc1122.txt')
-rw-r--r-- | doc/rfc/rfc1122.txt | 6844 |
1 files changed, 6844 insertions, 0 deletions
diff --git a/doc/rfc/rfc1122.txt b/doc/rfc/rfc1122.txt new file mode 100644 index 0000000..c14f2e5 --- /dev/null +++ b/doc/rfc/rfc1122.txt @@ -0,0 +1,6844 @@ + + + + + + +Network Working Group Internet Engineering Task Force +Request for Comments: 1122 R. Braden, Editor + October 1989 + + + Requirements for Internet Hosts -- Communication Layers + + +Status of This Memo + + This RFC is an official specification for the Internet community. It + incorporates by reference, amends, corrects, and supplements the + primary protocol standards documents relating to hosts. Distribution + of this document is unlimited. + +Summary + + This is one RFC of a pair that defines and discusses the requirements + for Internet host software. This RFC covers the communications + protocol layers: link layer, IP layer, and transport layer; its + companion RFC-1123 covers the application and support protocols. + + + + Table of Contents + + + + + 1. INTRODUCTION ............................................... 5 + 1.1 The Internet Architecture .............................. 6 + 1.1.1 Internet Hosts .................................... 6 + 1.1.2 Architectural Assumptions ......................... 7 + 1.1.3 Internet Protocol Suite ........................... 8 + 1.1.4 Embedded Gateway Code ............................. 10 + 1.2 General Considerations ................................. 12 + 1.2.1 Continuing Internet Evolution ..................... 12 + 1.2.2 Robustness Principle .............................. 12 + 1.2.3 Error Logging ..................................... 13 + 1.2.4 Configuration ..................................... 14 + 1.3 Reading this Document .................................. 15 + 1.3.1 Organization ...................................... 15 + 1.3.2 Requirements ...................................... 16 + 1.3.3 Terminology ....................................... 17 + 1.4 Acknowledgments ........................................ 20 + + 2. LINK LAYER .................................................. 21 + 2.1 INTRODUCTION ........................................... 21 + + + +Internet Engineering Task Force [Page 1] + + + + +RFC1122 INTRODUCTION October 1989 + + + 2.2 PROTOCOL WALK-THROUGH .................................. 21 + 2.3 SPECIFIC ISSUES ........................................ 21 + 2.3.1 Trailer Protocol Negotiation ...................... 21 + 2.3.2 Address Resolution Protocol -- ARP ................ 22 + 2.3.2.1 ARP Cache Validation ......................... 22 + 2.3.2.2 ARP Packet Queue ............................. 24 + 2.3.3 Ethernet and IEEE 802 Encapsulation ............... 24 + 2.4 LINK/INTERNET LAYER INTERFACE .......................... 25 + 2.5 LINK LAYER REQUIREMENTS SUMMARY ........................ 26 + + 3. INTERNET LAYER PROTOCOLS .................................... 27 + 3.1 INTRODUCTION ............................................ 27 + 3.2 PROTOCOL WALK-THROUGH .................................. 29 + 3.2.1 Internet Protocol -- IP ............................ 29 + 3.2.1.1 Version Number ............................... 29 + 3.2.1.2 Checksum ..................................... 29 + 3.2.1.3 Addressing ................................... 29 + 3.2.1.4 Fragmentation and Reassembly ................. 32 + 3.2.1.5 Identification ............................... 32 + 3.2.1.6 Type-of-Service .............................. 33 + 3.2.1.7 Time-to-Live ................................. 34 + 3.2.1.8 Options ...................................... 35 + 3.2.2 Internet Control Message Protocol -- ICMP .......... 38 + 3.2.2.1 Destination Unreachable ...................... 39 + 3.2.2.2 Redirect ..................................... 40 + 3.2.2.3 Source Quench ................................ 41 + 3.2.2.4 Time Exceeded ................................ 41 + 3.2.2.5 Parameter Problem ............................ 42 + 3.2.2.6 Echo Request/Reply ........................... 42 + 3.2.2.7 Information Request/Reply .................... 43 + 3.2.2.8 Timestamp and Timestamp Reply ................ 43 + 3.2.2.9 Address Mask Request/Reply ................... 45 + 3.2.3 Internet Group Management Protocol IGMP ........... 47 + 3.3 SPECIFIC ISSUES ........................................ 47 + 3.3.1 Routing Outbound Datagrams ........................ 47 + 3.3.1.1 Local/Remote Decision ........................ 47 + 3.3.1.2 Gateway Selection ............................ 48 + 3.3.1.3 Route Cache .................................. 49 + 3.3.1.4 Dead Gateway Detection ....................... 51 + 3.3.1.5 New Gateway Selection ........................ 55 + 3.3.1.6 Initialization ............................... 56 + 3.3.2 Reassembly ........................................ 56 + 3.3.3 Fragmentation ..................................... 58 + 3.3.4 Local Multihoming ................................. 60 + 3.3.4.1 Introduction ................................. 60 + 3.3.4.2 Multihoming Requirements ..................... 61 + 3.3.4.3 Choosing a Source Address .................... 64 + 3.3.5 Source Route Forwarding ........................... 65 + + + +Internet Engineering Task Force [Page 2] + + + + +RFC1122 INTRODUCTION October 1989 + + + 3.3.6 Broadcasts ........................................ 66 + 3.3.7 IP Multicasting ................................... 67 + 3.3.8 Error Reporting ................................... 69 + 3.4 INTERNET/TRANSPORT LAYER INTERFACE ..................... 69 + 3.5 INTERNET LAYER REQUIREMENTS SUMMARY .................... 72 + + 4. TRANSPORT PROTOCOLS ......................................... 77 + 4.1 USER DATAGRAM PROTOCOL -- UDP .......................... 77 + 4.1.1 INTRODUCTION ...................................... 77 + 4.1.2 PROTOCOL WALK-THROUGH ............................. 77 + 4.1.3 SPECIFIC ISSUES ................................... 77 + 4.1.3.1 Ports ........................................ 77 + 4.1.3.2 IP Options ................................... 77 + 4.1.3.3 ICMP Messages ................................ 78 + 4.1.3.4 UDP Checksums ................................ 78 + 4.1.3.5 UDP Multihoming .............................. 79 + 4.1.3.6 Invalid Addresses ............................ 79 + 4.1.4 UDP/APPLICATION LAYER INTERFACE ................... 79 + 4.1.5 UDP REQUIREMENTS SUMMARY .......................... 80 + 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP ................... 82 + 4.2.1 INTRODUCTION ...................................... 82 + 4.2.2 PROTOCOL WALK-THROUGH ............................. 82 + 4.2.2.1 Well-Known Ports ............................. 82 + 4.2.2.2 Use of Push .................................. 82 + 4.2.2.3 Window Size .................................. 83 + 4.2.2.4 Urgent Pointer ............................... 84 + 4.2.2.5 TCP Options .................................. 85 + 4.2.2.6 Maximum Segment Size Option .................. 85 + 4.2.2.7 TCP Checksum ................................. 86 + 4.2.2.8 TCP Connection State Diagram ................. 86 + 4.2.2.9 Initial Sequence Number Selection ............ 87 + 4.2.2.10 Simultaneous Open Attempts .................. 87 + 4.2.2.11 Recovery from Old Duplicate SYN ............. 87 + 4.2.2.12 RST Segment ................................. 87 + 4.2.2.13 Closing a Connection ........................ 87 + 4.2.2.14 Data Communication .......................... 89 + 4.2.2.15 Retransmission Timeout ...................... 90 + 4.2.2.16 Managing the Window ......................... 91 + 4.2.2.17 Probing Zero Windows ........................ 92 + 4.2.2.18 Passive OPEN Calls .......................... 92 + 4.2.2.19 Time to Live ................................ 93 + 4.2.2.20 Event Processing ............................ 93 + 4.2.2.21 Acknowledging Queued Segments ............... 94 + 4.2.3 SPECIFIC ISSUES ................................... 95 + 4.2.3.1 Retransmission Timeout Calculation ........... 95 + 4.2.3.2 When to Send an ACK Segment .................. 96 + 4.2.3.3 When to Send a Window Update ................. 97 + 4.2.3.4 When to Send Data ............................ 98 + + + +Internet Engineering Task Force [Page 3] + + + + +RFC1122 INTRODUCTION October 1989 + + + 4.2.3.5 TCP Connection Failures ...................... 100 + 4.2.3.6 TCP Keep-Alives .............................. 101 + 4.2.3.7 TCP Multihoming .............................. 103 + 4.2.3.8 IP Options ................................... 103 + 4.2.3.9 ICMP Messages ................................ 103 + 4.2.3.10 Remote Address Validation ................... 104 + 4.2.3.11 TCP Traffic Patterns ........................ 104 + 4.2.3.12 Efficiency .................................. 105 + 4.2.4 TCP/APPLICATION LAYER INTERFACE ................... 106 + 4.2.4.1 Asynchronous Reports ......................... 106 + 4.2.4.2 Type-of-Service .............................. 107 + 4.2.4.3 Flush Call ................................... 107 + 4.2.4.4 Multihoming .................................. 108 + 4.2.5 TCP REQUIREMENT SUMMARY ........................... 108 + + 5. REFERENCES ................................................. 112 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 4] + + + + +RFC1122 INTRODUCTION October 1989 + + +1. INTRODUCTION + + This document is one of a pair that defines and discusses the + requirements for host system implementations of the Internet protocol + suite. This RFC covers the communication protocol layers: link + layer, IP layer, and transport layer. Its companion RFC, + "Requirements for Internet Hosts -- Application and Support" + [INTRO:1], covers the application layer protocols. This document + should also be read in conjunction with "Requirements for Internet + Gateways" [INTRO:2]. + + These documents are intended to provide guidance for vendors, + implementors, and users of Internet communication software. They + represent the consensus of a large body of technical experience and + wisdom, contributed by the members of the Internet research and + vendor communities. + + This RFC enumerates standard protocols that a host connected to the + Internet must use, and it incorporates by reference the RFCs and + other documents describing the current specifications for these + protocols. It corrects errors in the referenced documents and adds + additional discussion and guidance for an implementor. + + For each protocol, this document also contains an explicit set of + requirements, recommendations, and options. The reader must + understand that the list of requirements in this document is + incomplete by itself; the complete set of requirements for an + Internet host is primarily defined in the standard protocol + specification documents, with the corrections, amendments, and + supplements contained in this RFC. + + A good-faith implementation of the protocols that was produced after + careful reading of the RFC's and with some interaction with the + Internet technical community, and that followed good communications + software engineering practices, should differ from the requirements + of this document in only minor ways. Thus, in many cases, the + "requirements" in this RFC are already stated or implied in the + standard protocol documents, so that their inclusion here is, in a + sense, redundant. However, they were included because some past + implementation has made the wrong choice, causing problems of + interoperability, performance, and/or robustness. + + This document includes discussion and explanation of many of the + requirements and recommendations. A simple list of requirements + would be dangerous, because: + + o Some required features are more important than others, and some + features are optional. + + + +Internet Engineering Task Force [Page 5] + + + + +RFC1122 INTRODUCTION October 1989 + + + o There may be valid reasons why particular vendor products that + are designed for restricted contexts might choose to use + different specifications. + + However, the specifications of this document must be followed to meet + the general goal of arbitrary host interoperation across the + diversity and complexity of the Internet system. Although most + current implementations fail to meet these requirements in various + ways, some minor and some major, this specification is the ideal + towards which we need to move. + + These requirements are based on the current level of Internet + architecture. This document will be updated as required to provide + additional clarifications or to include additional information in + those areas in which specifications are still evolving. + + This introductory section begins with a brief overview of the + Internet architecture as it relates to hosts, and then gives some + general advice to host software vendors. Finally, there is some + guidance on reading the rest of the document and some terminology. + + 1.1 The Internet Architecture + + General background and discussion on the Internet architecture and + supporting protocol suite can be found in the DDN Protocol + Handbook [INTRO:3]; for background see for example [INTRO:9], + [INTRO:10], and [INTRO:11]. Reference [INTRO:5] describes the + procedure for obtaining Internet protocol documents, while + [INTRO:6] contains a list of the numbers assigned within Internet + protocols. + + 1.1.1 Internet Hosts + + A host computer, or simply "host," is the ultimate consumer of + communication services. A host generally executes application + programs on behalf of user(s), employing network and/or + Internet communication services in support of this function. + An Internet host corresponds to the concept of an "End-System" + used in the OSI protocol suite [INTRO:13]. + + An Internet communication system consists of interconnected + packet networks supporting communication among host computers + using the Internet protocols. The networks are interconnected + using packet-switching computers called "gateways" or "IP + routers" by the Internet community, and "Intermediate Systems" + by the OSI world [INTRO:13]. The RFC "Requirements for + Internet Gateways" [INTRO:2] contains the official + specifications for Internet gateways. That RFC together with + + + +Internet Engineering Task Force [Page 6] + + + + +RFC1122 INTRODUCTION October 1989 + + + the present document and its companion [INTRO:1] define the + rules for the current realization of the Internet architecture. + + Internet hosts span a wide range of size, speed, and function. + They range in size from small microprocessors through + workstations to mainframes and supercomputers. In function, + they range from single-purpose hosts (such as terminal servers) + to full-service hosts that support a variety of online network + services, typically including remote login, file transfer, and + electronic mail. + + A host is generally said to be multihomed if it has more than + one interface to the same or to different networks. See + Section 1.1.3 on "Terminology". + + 1.1.2 Architectural Assumptions + + The current Internet architecture is based on a set of + assumptions about the communication system. The assumptions + most relevant to hosts are as follows: + + (a) The Internet is a network of networks. + + Each host is directly connected to some particular + network(s); its connection to the Internet is only + conceptual. Two hosts on the same network communicate + with each other using the same set of protocols that they + would use to communicate with hosts on distant networks. + + (b) Gateways don't keep connection state information. + + To improve robustness of the communication system, + gateways are designed to be stateless, forwarding each IP + datagram independently of other datagrams. As a result, + redundant paths can be exploited to provide robust service + in spite of failures of intervening gateways and networks. + + All state information required for end-to-end flow control + and reliability is implemented in the hosts, in the + transport layer or in application programs. All + connection control information is thus co-located with the + end points of the communication, so it will be lost only + if an end point fails. + + (c) Routing complexity should be in the gateways. + + Routing is a complex and difficult problem, and ought to + be performed by the gateways, not the hosts. An important + + + +Internet Engineering Task Force [Page 7] + + + + +RFC1122 INTRODUCTION October 1989 + + + objective is to insulate host software from changes caused + by the inevitable evolution of the Internet routing + architecture. + + (d) The System must tolerate wide network variation. + + A basic objective of the Internet design is to tolerate a + wide range of network characteristics -- e.g., bandwidth, + delay, packet loss, packet reordering, and maximum packet + size. Another objective is robustness against failure of + individual networks, gateways, and hosts, using whatever + bandwidth is still available. Finally, the goal is full + "open system interconnection": an Internet host must be + able to interoperate robustly and effectively with any + other Internet host, across diverse Internet paths. + + Sometimes host implementors have designed for less + ambitious goals. For example, the LAN environment is + typically much more benign than the Internet as a whole; + LANs have low packet loss and delay and do not reorder + packets. Some vendors have fielded host implementations + that are adequate for a simple LAN environment, but work + badly for general interoperation. The vendor justifies + such a product as being economical within the restricted + LAN market. However, isolated LANs seldom stay isolated + for long; they are soon gatewayed to each other, to + organization-wide internets, and eventually to the global + Internet system. In the end, neither the customer nor the + vendor is served by incomplete or substandard Internet + host software. + + The requirements spelled out in this document are designed + for a full-function Internet host, capable of full + interoperation over an arbitrary Internet path. + + + 1.1.3 Internet Protocol Suite + + To communicate using the Internet system, a host must implement + the layered set of protocols comprising the Internet protocol + suite. A host typically must implement at least one protocol + from each layer. + + The protocol layers used in the Internet architecture are as + follows [INTRO:4]: + + + o Application Layer + + + +Internet Engineering Task Force [Page 8] + + + + +RFC1122 INTRODUCTION October 1989 + + + The application layer is the top layer of the Internet + protocol suite. The Internet suite does not further + subdivide the application layer, although some of the + Internet application layer protocols do contain some + internal sub-layering. The application layer of the + Internet suite essentially combines the functions of the + top two layers -- Presentation and Application -- of the + OSI reference model. + + We distinguish two categories of application layer + protocols: user protocols that provide service directly + to users, and support protocols that provide common system + functions. Requirements for user and support protocols + will be found in the companion RFC [INTRO:1]. + + The most common Internet user protocols are: + + o Telnet (remote login) + o FTP (file transfer) + o SMTP (electronic mail delivery) + + There are a number of other standardized user protocols + [INTRO:4] and many private user protocols. + + Support protocols, used for host name mapping, booting, + and management, include SNMP, BOOTP, RARP, and the Domain + Name System (DNS) protocols. + + + o Transport Layer + + The transport layer provides end-to-end communication + services for applications. There are two primary + transport layer protocols at present: + + o Transmission Control Protocol (TCP) + o User Datagram Protocol (UDP) + + TCP is a reliable connection-oriented transport service + that provides end-to-end reliability, resequencing, and + flow control. UDP is a connectionless ("datagram") + transport service. + + Other transport protocols have been developed by the + research community, and the set of official Internet + transport protocols may be expanded in the future. + + Transport layer protocols are discussed in Chapter 4. + + + +Internet Engineering Task Force [Page 9] + + + + +RFC1122 INTRODUCTION October 1989 + + + o Internet Layer + + All Internet transport protocols use the Internet Protocol + (IP) to carry data from source host to destination host. + IP is a connectionless or datagram internetwork service, + providing no end-to-end delivery guarantees. Thus, IP + datagrams may arrive at the destination host damaged, + duplicated, out of order, or not at all. The layers above + IP are responsible for reliable delivery service when it + is required. The IP protocol includes provision for + addressing, type-of-service specification, fragmentation + and reassembly, and security information. + + The datagram or connectionless nature of the IP protocol + is a fundamental and characteristic feature of the + Internet architecture. Internet IP was the model for the + OSI Connectionless Network Protocol [INTRO:12]. + + ICMP is a control protocol that is considered to be an + integral part of IP, although it is architecturally + layered upon IP, i.e., it uses IP to carry its data end- + to-end just as a transport protocol like TCP or UDP does. + ICMP provides error reporting, congestion reporting, and + first-hop gateway redirection. + + IGMP is an Internet layer protocol used for establishing + dynamic host groups for IP multicasting. + + The Internet layer protocols IP, ICMP, and IGMP are + discussed in Chapter 3. + + + o Link Layer + + To communicate on its directly-connected network, a host + must implement the communication protocol used to + interface to that network. We call this a link layer or + media-access layer protocol. + + There is a wide variety of link layer protocols, + corresponding to the many different types of networks. + See Chapter 2. + + + 1.1.4 Embedded Gateway Code + + Some Internet host software includes embedded gateway + functionality, so that these hosts can forward packets as a + + + +Internet Engineering Task Force [Page 10] + + + + +RFC1122 INTRODUCTION October 1989 + + + gateway would, while still performing the application layer + functions of a host. + + Such dual-purpose systems must follow the Gateway Requirements + RFC [INTRO:2] with respect to their gateway functions, and + must follow the present document with respect to their host + functions. In all overlapping cases, the two specifications + should be in agreement. + + There are varying opinions in the Internet community about + embedded gateway functionality. The main arguments are as + follows: + + o Pro: in a local network environment where networking is + informal, or in isolated internets, it may be convenient + and economical to use existing host systems as gateways. + + There is also an architectural argument for embedded + gateway functionality: multihoming is much more common + than originally foreseen, and multihoming forces a host to + make routing decisions as if it were a gateway. If the + multihomed host contains an embedded gateway, it will + have full routing knowledge and as a result will be able + to make more optimal routing decisions. + + o Con: Gateway algorithms and protocols are still changing, + and they will continue to change as the Internet system + grows larger. Attempting to include a general gateway + function within the host IP layer will force host system + maintainers to track these (more frequent) changes. Also, + a larger pool of gateway implementations will make + coordinating the changes more difficult. Finally, the + complexity of a gateway IP layer is somewhat greater than + that of a host, making the implementation and operation + tasks more complex. + + In addition, the style of operation of some hosts is not + appropriate for providing stable and robust gateway + service. + + There is considerable merit in both of these viewpoints. One + conclusion can be drawn: an host administrator must have + conscious control over whether or not a given host acts as a + gateway. See Section 3.1 for the detailed requirements. + + + + + + + +Internet Engineering Task Force [Page 11] + + + + +RFC1122 INTRODUCTION October 1989 + + + 1.2 General Considerations + + There are two important lessons that vendors of Internet host + software have learned and which a new vendor should consider + seriously. + + 1.2.1 Continuing Internet Evolution + + The enormous growth of the Internet has revealed problems of + management and scaling in a large datagram-based packet + communication system. These problems are being addressed, and + as a result there will be continuing evolution of the + specifications described in this document. These changes will + be carefully planned and controlled, since there is extensive + participation in this planning by the vendors and by the + organizations responsible for operations of the networks. + + Development, evolution, and revision are characteristic of + computer network protocols today, and this situation will + persist for some years. A vendor who develops computer + communication software for the Internet protocol suite (or any + other protocol suite!) and then fails to maintain and update + that software for changing specifications is going to leave a + trail of unhappy customers. The Internet is a large + communication network, and the users are in constant contact + through it. Experience has shown that knowledge of + deficiencies in vendor software propagates quickly through the + Internet technical community. + + 1.2.2 Robustness Principle + + At every layer of the protocols, there is a general rule whose + application can lead to enormous benefits in robustness and + interoperability [IP:1]: + + "Be liberal in what you accept, and + conservative in what you send" + + Software should be written to deal with every conceivable + error, no matter how unlikely; sooner or later a packet will + come in with that particular combination of errors and + attributes, and unless the software is prepared, chaos can + ensue. In general, it is best to assume that the network is + filled with malevolent entities that will send in packets + designed to have the worst possible effect. This assumption + will lead to suitable protective design, although the most + serious problems in the Internet have been caused by + unenvisaged mechanisms triggered by low-probability events; + + + +Internet Engineering Task Force [Page 12] + + + + +RFC1122 INTRODUCTION October 1989 + + + mere human malice would never have taken so devious a course! + + Adaptability to change must be designed into all levels of + Internet host software. As a simple example, consider a + protocol specification that contains an enumeration of values + for a particular header field -- e.g., a type field, a port + number, or an error code; this enumeration must be assumed to + be incomplete. Thus, if a protocol specification defines four + possible error codes, the software must not break when a fifth + code shows up. An undefined code might be logged (see below), + but it must not cause a failure. + + The second part of the principle is almost as important: + software on other hosts may contain deficiencies that make it + unwise to exploit legal but obscure protocol features. It is + unwise to stray far from the obvious and simple, lest untoward + effects result elsewhere. A corollary of this is "watch out + for misbehaving hosts"; host software should be prepared, not + just to survive other misbehaving hosts, but also to cooperate + to limit the amount of disruption such hosts can cause to the + shared communication facility. + + 1.2.3 Error Logging + + The Internet includes a great variety of host and gateway + systems, each implementing many protocols and protocol layers, + and some of these contain bugs and mis-features in their + Internet protocol software. As a result of complexity, + diversity, and distribution of function, the diagnosis of + Internet problems is often very difficult. + + Problem diagnosis will be aided if host implementations include + a carefully designed facility for logging erroneous or + "strange" protocol events. It is important to include as much + diagnostic information as possible when an error is logged. In + particular, it is often useful to record the header(s) of a + packet that caused an error. However, care must be taken to + ensure that error logging does not consume prohibitive amounts + of resources or otherwise interfere with the operation of the + host. + + There is a tendency for abnormal but harmless protocol events + to overflow error logging files; this can be avoided by using a + "circular" log, or by enabling logging only while diagnosing a + known failure. It may be useful to filter and count duplicate + successive messages. One strategy that seems to work well is: + (1) always count abnormalities and make such counts accessible + through the management protocol (see [INTRO:1]); and (2) allow + + + +Internet Engineering Task Force [Page 13] + + + + +RFC1122 INTRODUCTION October 1989 + + + the logging of a great variety of events to be selectively + enabled. For example, it might useful to be able to "log + everything" or to "log everything for host X". + + Note that different managements may have differing policies + about the amount of error logging that they want normally + enabled in a host. Some will say, "if it doesn't hurt me, I + don't want to know about it", while others will want to take a + more watchful and aggressive attitude about detecting and + removing protocol abnormalities. + + 1.2.4 Configuration + + It would be ideal if a host implementation of the Internet + protocol suite could be entirely self-configuring. This would + allow the whole suite to be implemented in ROM or cast into + silicon, it would simplify diskless workstations, and it would + be an immense boon to harried LAN administrators as well as + system vendors. We have not reached this ideal; in fact, we + are not even close. + + At many points in this document, you will find a requirement + that a parameter be a configurable option. There are several + different reasons behind such requirements. In a few cases, + there is current uncertainty or disagreement about the best + value, and it may be necessary to update the recommended value + in the future. In other cases, the value really depends on + external factors -- e.g., the size of the host and the + distribution of its communication load, or the speeds and + topology of nearby networks -- and self-tuning algorithms are + unavailable and may be insufficient. In some cases, + configurability is needed because of administrative + requirements. + + Finally, some configuration options are required to communicate + with obsolete or incorrect implementations of the protocols, + distributed without sources, that unfortunately persist in many + parts of the Internet. To make correct systems coexist with + these faulty systems, administrators often have to "mis- + configure" the correct systems. This problem will correct + itself gradually as the faulty systems are retired, but it + cannot be ignored by vendors. + + When we say that a parameter must be configurable, we do not + intend to require that its value be explicitly read from a + configuration file at every boot time. We recommend that + implementors set up a default for each parameter, so a + configuration file is only necessary to override those defaults + + + +Internet Engineering Task Force [Page 14] + + + + +RFC1122 INTRODUCTION October 1989 + + + that are inappropriate in a particular installation. Thus, the + configurability requirement is an assurance that it will be + POSSIBLE to override the default when necessary, even in a + binary-only or ROM-based product. + + This document requires a particular value for such defaults in + some cases. The choice of default is a sensitive issue when + the configuration item controls the accommodation to existing + faulty systems. If the Internet is to converge successfully to + complete interoperability, the default values built into + implementations must implement the official protocol, not + "mis-configurations" to accommodate faulty implementations. + Although marketing considerations have led some vendors to + choose mis-configuration defaults, we urge vendors to choose + defaults that will conform to the standard. + + Finally, we note that a vendor needs to provide adequate + documentation on all configuration parameters, their limits and + effects. + + + 1.3 Reading this Document + + 1.3.1 Organization + + Protocol layering, which is generally used as an organizing + principle in implementing network software, has also been used + to organize this document. In describing the rules, we assume + that an implementation does strictly mirror the layering of the + protocols. Thus, the following three major sections specify + the requirements for the link layer, the internet layer, and + the transport layer, respectively. A companion RFC [INTRO:1] + covers application level software. This layerist organization + was chosen for simplicity and clarity. + + However, strict layering is an imperfect model, both for the + protocol suite and for recommended implementation approaches. + Protocols in different layers interact in complex and sometimes + subtle ways, and particular functions often involve multiple + layers. There are many design choices in an implementation, + many of which involve creative "breaking" of strict layering. + Every implementor is urged to read references [INTRO:7] and + [INTRO:8]. + + This document describes the conceptual service interface + between layers using a functional ("procedure call") notation, + like that used in the TCP specification [TCP:1]. A host + implementation must support the logical information flow + + + +Internet Engineering Task Force [Page 15] + + + + +RFC1122 INTRODUCTION October 1989 + + + implied by these calls, but need not literally implement the + calls themselves. For example, many implementations reflect + the coupling between the transport layer and the IP layer by + giving them shared access to common data structures. These + data structures, rather than explicit procedure calls, are then + the agency for passing much of the information that is + required. + + In general, each major section of this document is organized + into the following subsections: + + (1) Introduction + + (2) Protocol Walk-Through -- considers the protocol + specification documents section-by-section, correcting + errors, stating requirements that may be ambiguous or + ill-defined, and providing further clarification or + explanation. + + (3) Specific Issues -- discusses protocol design and + implementation issues that were not included in the walk- + through. + + (4) Interfaces -- discusses the service interface to the next + higher layer. + + (5) Summary -- contains a summary of the requirements of the + section. + + + Under many of the individual topics in this document, there is + parenthetical material labeled "DISCUSSION" or + "IMPLEMENTATION". This material is intended to give + clarification and explanation of the preceding requirements + text. It also includes some suggestions on possible future + directions or developments. The implementation material + contains suggested approaches that an implementor may want to + consider. + + The summary sections are intended to be guides and indexes to + the text, but are necessarily cryptic and incomplete. The + summaries should never be used or referenced separately from + the complete RFC. + + 1.3.2 Requirements + + In this document, the words that are used to define the + significance of each particular requirement are capitalized. + + + +Internet Engineering Task Force [Page 16] + + + + +RFC1122 INTRODUCTION October 1989 + + + These words are: + + * "MUST" + + This word or the adjective "REQUIRED" means that the item + is an absolute requirement of the specification. + + * "SHOULD" + + This word or the adjective "RECOMMENDED" means that there + may exist valid reasons in particular circumstances to + ignore this item, but the full implications should be + understood and the case carefully weighed before choosing + a different course. + + * "MAY" + + This word or the adjective "OPTIONAL" means that this item + is truly optional. One vendor may choose to include the + item because a particular marketplace requires it or + because it enhances the product, for example; another + vendor may omit the same item. + + + An implementation is not compliant if it fails to satisfy one + or more of the MUST requirements for the protocols it + implements. An implementation that satisfies all the MUST and + all the SHOULD requirements for its protocols is said to be + "unconditionally compliant"; one that satisfies all the MUST + requirements but not all the SHOULD requirements for its + protocols is said to be "conditionally compliant". + + 1.3.3 Terminology + + This document uses the following technical terms: + + Segment + A segment is the unit of end-to-end transmission in the + TCP protocol. A segment consists of a TCP header followed + by application data. A segment is transmitted by + encapsulation inside an IP datagram. + + Message + In this description of the lower-layer protocols, a + message is the unit of transmission in a transport layer + protocol. In particular, a TCP segment is a message. A + message consists of a transport protocol header followed + by application protocol data. To be transmitted end-to- + + + +Internet Engineering Task Force [Page 17] + + + + +RFC1122 INTRODUCTION October 1989 + + + end through the Internet, a message must be encapsulated + inside a datagram. + + IP Datagram + An IP datagram is the unit of end-to-end transmission in + the IP protocol. An IP datagram consists of an IP header + followed by transport layer data, i.e., of an IP header + followed by a message. + + In the description of the internet layer (Section 3), the + unqualified term "datagram" should be understood to refer + to an IP datagram. + + Packet + A packet is the unit of data passed across the interface + between the internet layer and the link layer. It + includes an IP header and data. A packet may be a + complete IP datagram or a fragment of an IP datagram. + + Frame + A frame is the unit of transmission in a link layer + protocol, and consists of a link-layer header followed by + a packet. + + Connected Network + A network to which a host is interfaced is often known as + the "local network" or the "subnetwork" relative to that + host. However, these terms can cause confusion, and + therefore we use the term "connected network" in this + document. + + Multihomed + A host is said to be multihomed if it has multiple IP + addresses. For a discussion of multihoming, see Section + 3.3.4 below. + + Physical network interface + This is a physical interface to a connected network and + has a (possibly unique) link-layer address. Multiple + physical network interfaces on a single host may share the + same link-layer address, but the address must be unique + for different hosts on the same physical network. + + Logical [network] interface + We define a logical [network] interface to be a logical + path, distinguished by a unique IP address, to a connected + network. See Section 3.3.4. + + + + +Internet Engineering Task Force [Page 18] + + + + +RFC1122 INTRODUCTION October 1989 + + + Specific-destination address + This is the effective destination address of a datagram, + even if it is broadcast or multicast; see Section 3.2.1.3. + + Path + At a given moment, all the IP datagrams from a particular + source host to a particular destination host will + typically traverse the same sequence of gateways. We use + the term "path" for this sequence. Note that a path is + uni-directional; it is not unusual to have different paths + in the two directions between a given host pair. + + MTU + The maximum transmission unit, i.e., the size of the + largest packet that can be transmitted. + + + The terms frame, packet, datagram, message, and segment are + illustrated by the following schematic diagrams: + + A. Transmission on connected network: + _______________________________________________ + | LL hdr | IP hdr | (data) | + |________|________|_____________________________| + + <---------- Frame -----------------------------> + <----------Packet --------------------> + + + B. Before IP fragmentation or after IP reassembly: + ______________________________________ + | IP hdr | transport| Application Data | + |________|____hdr___|__________________| + + <-------- Datagram ------------------> + <-------- Message -----------> + or, for TCP: + ______________________________________ + | IP hdr | TCP hdr | Application Data | + |________|__________|__________________| + + <-------- Datagram ------------------> + <-------- Segment -----------> + + + + + + + + +Internet Engineering Task Force [Page 19] + + + + +RFC1122 INTRODUCTION October 1989 + + + 1.4 Acknowledgments + + This document incorporates contributions and comments from a large + group of Internet protocol experts, including representatives of + university and research labs, vendors, and government agencies. + It was assembled primarily by the Host Requirements Working Group + of the Internet Engineering Task Force (IETF). + + The Editor would especially like to acknowledge the tireless + dedication of the following people, who attended many long + meetings and generated 3 million bytes of electronic mail over the + past 18 months in pursuit of this document: Philip Almquist, Dave + Borman (Cray Research), Noel Chiappa, Dave Crocker (DEC), Steve + Deering (Stanford), Mike Karels (Berkeley), Phil Karn (Bellcore), + John Lekashman (NASA), Charles Lynn (BBN), Keith McCloghrie (TWG), + Paul Mockapetris (ISI), Thomas Narten (Purdue), Craig Partridge + (BBN), Drew Perkins (CMU), and James Van Bokkelen (FTP Software). + + In addition, the following people made major contributions to the + effort: Bill Barns (Mitre), Steve Bellovin (AT&T), Mike Brescia + (BBN), Ed Cain (DCA), Annette DeSchon (ISI), Martin Gross (DCA), + Phill Gross (NRI), Charles Hedrick (Rutgers), Van Jacobson (LBL), + John Klensin (MIT), Mark Lottor (SRI), Milo Medin (NASA), Bill + Melohn (Sun Microsystems), Greg Minshall (Kinetics), Jeff Mogul + (DEC), John Mullen (CMC), Jon Postel (ISI), John Romkey (Epilogue + Technology), and Mike StJohns (DCA). The following also made + significant contributions to particular areas: Eric Allman + (Berkeley), Rob Austein (MIT), Art Berggreen (ACC), Keith Bostic + (Berkeley), Vint Cerf (NRI), Wayne Hathaway (NASA), Matt Korn + (IBM), Erik Naggum (Naggum Software, Norway), Robert Ullmann + (Prime Computer), David Waitzman (BBN), Frank Wancho (USA), Arun + Welch (Ohio State), Bill Westfield (Cisco), and Rayan Zachariassen + (Toronto). + + We are grateful to all, including any contributors who may have + been inadvertently omitted from this list. + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 20] + + + + +RFC1122 LINK LAYER October 1989 + + +2. LINK LAYER + + 2.1 INTRODUCTION + + All Internet systems, both hosts and gateways, have the same + requirements for link layer protocols. These requirements are + given in Chapter 3 of "Requirements for Internet Gateways" + [INTRO:2], augmented with the material in this section. + + 2.2 PROTOCOL WALK-THROUGH + + None. + + 2.3 SPECIFIC ISSUES + + 2.3.1 Trailer Protocol Negotiation + + The trailer protocol [LINK:1] for link-layer encapsulation MAY + be used, but only when it has been verified that both systems + (host or gateway) involved in the link-layer communication + implement trailers. If the system does not dynamically + negotiate use of the trailer protocol on a per-destination + basis, the default configuration MUST disable the protocol. + + DISCUSSION: + The trailer protocol is a link-layer encapsulation + technique that rearranges the data contents of packets + sent on the physical network. In some cases, trailers + improve the throughput of higher layer protocols by + reducing the amount of data copying within the operating + system. Higher layer protocols are unaware of trailer + use, but both the sending and receiving host MUST + understand the protocol if it is used. + + Improper use of trailers can result in very confusing + symptoms. Only packets with specific size attributes are + encapsulated using trailers, and typically only a small + fraction of the packets being exchanged have these + attributes. Thus, if a system using trailers exchanges + packets with a system that does not, some packets + disappear into a black hole while others are delivered + successfully. + + IMPLEMENTATION: + On an Ethernet, packets encapsulated with trailers use a + distinct Ethernet type [LINK:1], and trailer negotiation + is performed at the time that ARP is used to discover the + link-layer address of a destination system. + + + +Internet Engineering Task Force [Page 21] + + + + +RFC1122 LINK LAYER October 1989 + + + Specifically, the ARP exchange is completed in the usual + manner using the normal IP protocol type, but a host that + wants to speak trailers will send an additional "trailer + ARP reply" packet, i.e., an ARP reply that specifies the + trailer encapsulation protocol type but otherwise has the + format of a normal ARP reply. If a host configured to use + trailers receives a trailer ARP reply message from a + remote machine, it can add that machine to the list of + machines that understand trailers, e.g., by marking the + corresponding entry in the ARP cache. + + Hosts wishing to receive trailer encapsulations send + trailer ARP replies whenever they complete exchanges of + normal ARP messages for IP. Thus, a host that received an + ARP request for its IP protocol address would send a + trailer ARP reply in addition to the normal IP ARP reply; + a host that sent the IP ARP request would send a trailer + ARP reply when it received the corresponding IP ARP reply. + In this way, either the requesting or responding host in + an IP ARP exchange may request that it receive trailer + encapsulations. + + This scheme, using extra trailer ARP reply packets rather + than sending an ARP request for the trailer protocol type, + was designed to avoid a continuous exchange of ARP packets + with a misbehaving host that, contrary to any + specification or common sense, responded to an ARP reply + for trailers with another ARP reply for IP. This problem + is avoided by sending a trailer ARP reply in response to + an IP ARP reply only when the IP ARP reply answers an + outstanding request; this is true when the hardware + address for the host is still unknown when the IP ARP + reply is received. A trailer ARP reply may always be sent + along with an IP ARP reply responding to an IP ARP + request. + + 2.3.2 Address Resolution Protocol -- ARP + + 2.3.2.1 ARP Cache Validation + + An implementation of the Address Resolution Protocol (ARP) + [LINK:2] MUST provide a mechanism to flush out-of-date cache + entries. If this mechanism involves a timeout, it SHOULD be + possible to configure the timeout value. + + A mechanism to prevent ARP flooding (repeatedly sending an + ARP Request for the same IP address, at a high rate) MUST be + included. The recommended maximum rate is 1 per second per + + + +Internet Engineering Task Force [Page 22] + + + + +RFC1122 LINK LAYER October 1989 + + + destination. + + DISCUSSION: + The ARP specification [LINK:2] suggests but does not + require a timeout mechanism to invalidate cache entries + when hosts change their Ethernet addresses. The + prevalence of proxy ARP (see Section 2.4 of [INTRO:2]) + has significantly increased the likelihood that cache + entries in hosts will become invalid, and therefore + some ARP-cache invalidation mechanism is now required + for hosts. Even in the absence of proxy ARP, a long- + period cache timeout is useful in order to + automatically correct any bad ARP data that might have + been cached. + + IMPLEMENTATION: + Four mechanisms have been used, sometimes in + combination, to flush out-of-date cache entries. + + (1) Timeout -- Periodically time out cache entries, + even if they are in use. Note that this timeout + should be restarted when the cache entry is + "refreshed" (by observing the source fields, + regardless of target address, of an ARP broadcast + from the system in question). For proxy ARP + situations, the timeout needs to be on the order + of a minute. + + (2) Unicast Poll -- Actively poll the remote host by + periodically sending a point-to-point ARP Request + to it, and delete the entry if no ARP Reply is + received from N successive polls. Again, the + timeout should be on the order of a minute, and + typically N is 2. + + (3) Link-Layer Advice -- If the link-layer driver + detects a delivery problem, flush the + corresponding ARP cache entry. + + (4) Higher-layer Advice -- Provide a call from the + Internet layer to the link layer to indicate a + delivery problem. The effect of this call would + be to invalidate the corresponding cache entry. + This call would be analogous to the + "ADVISE_DELIVPROB()" call from the transport layer + to the Internet layer (see Section 3.4), and in + fact the ADVISE_DELIVPROB routine might in turn + call the link-layer advice routine to invalidate + + + +Internet Engineering Task Force [Page 23] + + + + +RFC1122 LINK LAYER October 1989 + + + the ARP cache entry. + + Approaches (1) and (2) involve ARP cache timeouts on + the order of a minute or less. In the absence of proxy + ARP, a timeout this short could create noticeable + overhead traffic on a very large Ethernet. Therefore, + it may be necessary to configure a host to lengthen the + ARP cache timeout. + + 2.3.2.2 ARP Packet Queue + + The link layer SHOULD save (rather than discard) at least + one (the latest) packet of each set of packets destined to + the same unresolved IP address, and transmit the saved + packet when the address has been resolved. + + DISCUSSION: + Failure to follow this recommendation causes the first + packet of every exchange to be lost. Although higher- + layer protocols can generally cope with packet loss by + retransmission, packet loss does impact performance. + For example, loss of a TCP open request causes the + initial round-trip time estimate to be inflated. UDP- + based applications such as the Domain Name System are + more seriously affected. + + 2.3.3 Ethernet and IEEE 802 Encapsulation + + The IP encapsulation for Ethernets is described in RFC-894 + [LINK:3], while RFC-1042 [LINK:4] describes the IP + encapsulation for IEEE 802 networks. RFC-1042 elaborates and + replaces the discussion in Section 3.4 of [INTRO:2]. + + Every Internet host connected to a 10Mbps Ethernet cable: + + o MUST be able to send and receive packets using RFC-894 + encapsulation; + + o SHOULD be able to receive RFC-1042 packets, intermixed + with RFC-894 packets; and + + o MAY be able to send packets using RFC-1042 encapsulation. + + + An Internet host that implements sending both the RFC-894 and + the RFC-1042 encapsulations MUST provide a configuration switch + to select which is sent, and this switch MUST default to RFC- + 894. + + + +Internet Engineering Task Force [Page 24] + + + + +RFC1122 LINK LAYER October 1989 + + + Note that the standard IP encapsulation in RFC-1042 does not + use the protocol id value (K1=6) that IEEE reserved for IP; + instead, it uses a value (K1=170) that implies an extension + (the "SNAP") which can be used to hold the Ether-Type field. + An Internet system MUST NOT send 802 packets using K1=6. + + Address translation from Internet addresses to link-layer + addresses on Ethernet and IEEE 802 networks MUST be managed by + the Address Resolution Protocol (ARP). + + The MTU for an Ethernet is 1500 and for 802.3 is 1492. + + DISCUSSION: + The IEEE 802.3 specification provides for operation over a + 10Mbps Ethernet cable, in which case Ethernet and IEEE + 802.3 frames can be physically intermixed. A receiver can + distinguish Ethernet and 802.3 frames by the value of the + 802.3 Length field; this two-octet field coincides in the + header with the Ether-Type field of an Ethernet frame. In + particular, the 802.3 Length field must be less than or + equal to 1500, while all valid Ether-Type values are + greater than 1500. + + Another compatibility problem arises with link-layer + broadcasts. A broadcast sent with one framing will not be + seen by hosts that can receive only the other framing. + + The provisions of this section were designed to provide + direct interoperation between 894-capable and 1042-capable + systems on the same cable, to the maximum extent possible. + It is intended to support the present situation where + 894-only systems predominate, while providing an easy + transition to a possible future in which 1042-capable + systems become common. + + Note that 894-only systems cannot interoperate directly + with 1042-only systems. If the two system types are set + up as two different logical networks on the same cable, + they can communicate only through an IP gateway. + Furthermore, it is not useful or even possible for a + dual-format host to discover automatically which format to + send, because of the problem of link-layer broadcasts. + + 2.4 LINK/INTERNET LAYER INTERFACE + + The packet receive interface between the IP layer and the link + layer MUST include a flag to indicate whether the incoming packet + was addressed to a link-layer broadcast address. + + + +Internet Engineering Task Force [Page 25] + + + + +RFC1122 LINK LAYER October 1989 + + + DISCUSSION + Although the IP layer does not generally know link layer + addresses (since every different network medium typically has + a different address format), the broadcast address on a + broadcast-capable medium is an important special case. See + Section 3.2.2, especially the DISCUSSION concerning broadcast + storms. + + The packet send interface between the IP and link layers MUST + include the 5-bit TOS field (see Section 3.2.1.6). + + The link layer MUST NOT report a Destination Unreachable error to + IP solely because there is no ARP cache entry for a destination. + + 2.5 LINK LAYER REQUIREMENTS SUMMARY + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION| | | |T|T|e +--------------------------------------------------|-------|-|-|-|-|-|-- + | | | | | | | +Trailer encapsulation |2.3.1 | | |x| | | +Send Trailers by default without negotiation |2.3.1 | | | | |x| +ARP |2.3.2 | | | | | | + Flush out-of-date ARP cache entries |2.3.2.1|x| | | | | + Prevent ARP floods |2.3.2.1|x| | | | | + Cache timeout configurable |2.3.2.1| |x| | | | + Save at least one (latest) unresolved pkt |2.3.2.2| |x| | | | +Ethernet and IEEE 802 Encapsulation |2.3.3 | | | | | | + Host able to: |2.3.3 | | | | | | + Send & receive RFC-894 encapsulation |2.3.3 |x| | | | | + Receive RFC-1042 encapsulation |2.3.3 | |x| | | | + Send RFC-1042 encapsulation |2.3.3 | | |x| | | + Then config. sw. to select, RFC-894 dflt |2.3.3 |x| | | | | + Send K1=6 encapsulation |2.3.3 | | | | |x| + Use ARP on Ethernet and IEEE 802 nets |2.3.3 |x| | | | | +Link layer report b'casts to IP layer |2.4 |x| | | | | +IP layer pass TOS to link layer |2.4 |x| | | | | +No ARP cache entry treated as Dest. Unreach. |2.4 | | | | |x| + + + + + +Internet Engineering Task Force [Page 26] + + + + +RFC1122 INTERNET LAYER October 1989 + + +3. INTERNET LAYER PROTOCOLS + + 3.1 INTRODUCTION + + The Robustness Principle: "Be liberal in what you accept, and + conservative in what you send" is particularly important in the + Internet layer, where one misbehaving host can deny Internet + service to many other hosts. + + The protocol standards used in the Internet layer are: + + o RFC-791 [IP:1] defines the IP protocol and gives an + introduction to the architecture of the Internet. + + o RFC-792 [IP:2] defines ICMP, which provides routing, + diagnostic and error functionality for IP. Although ICMP + messages are encapsulated within IP datagrams, ICMP + processing is considered to be (and is typically implemented + as) part of the IP layer. See Section 3.2.2. + + o RFC-950 [IP:3] defines the mandatory subnet extension to the + addressing architecture. + + o RFC-1112 [IP:4] defines the Internet Group Management + Protocol IGMP, as part of a recommended extension to hosts + and to the host-gateway interface to support Internet-wide + multicasting at the IP level. See Section 3.2.3. + + The target of an IP multicast may be an arbitrary group of + Internet hosts. IP multicasting is designed as a natural + extension of the link-layer multicasting facilities of some + networks, and it provides a standard means for local access + to such link-layer multicasting facilities. + + Other important references are listed in Section 5 of this + document. + + The Internet layer of host software MUST implement both IP and + ICMP. See Section 3.3.7 for the requirements on support of IGMP. + + The host IP layer has two basic functions: (1) choose the "next + hop" gateway or host for outgoing IP datagrams and (2) reassemble + incoming IP datagrams. The IP layer may also (3) implement + intentional fragmentation of outgoing datagrams. Finally, the IP + layer must (4) provide diagnostic and error functionality. We + expect that IP layer functions may increase somewhat in the + future, as further Internet control and management facilities are + developed. + + + +Internet Engineering Task Force [Page 27] + + + + +RFC1122 INTERNET LAYER October 1989 + + + For normal datagrams, the processing is straightforward. For + incoming datagrams, the IP layer: + + (1) verifies that the datagram is correctly formatted; + + (2) verifies that it is destined to the local host; + + (3) processes options; + + (4) reassembles the datagram if necessary; and + + (5) passes the encapsulated message to the appropriate + transport-layer protocol module. + + For outgoing datagrams, the IP layer: + + (1) sets any fields not set by the transport layer; + + (2) selects the correct first hop on the connected network (a + process called "routing"); + + (3) fragments the datagram if necessary and if intentional + fragmentation is implemented (see Section 3.3.3); and + + (4) passes the packet(s) to the appropriate link-layer driver. + + + A host is said to be multihomed if it has multiple IP addresses. + Multihoming introduces considerable confusion and complexity into + the protocol suite, and it is an area in which the Internet + architecture falls seriously short of solving all problems. There + are two distinct problem areas in multihoming: + + (1) Local multihoming -- the host itself is multihomed; or + + (2) Remote multihoming -- the local host needs to communicate + with a remote multihomed host. + + At present, remote multihoming MUST be handled at the application + layer, as discussed in the companion RFC [INTRO:1]. A host MAY + support local multihoming, which is discussed in this document, + and in particular in Section 3.3.4. + + Any host that forwards datagrams generated by another host is + acting as a gateway and MUST also meet the specifications laid out + in the gateway requirements RFC [INTRO:2]. An Internet host that + includes embedded gateway code MUST have a configuration switch to + disable the gateway function, and this switch MUST default to the + + + +Internet Engineering Task Force [Page 28] + + + + +RFC1122 INTERNET LAYER October 1989 + + + non-gateway mode. In this mode, a datagram arriving through one + interface will not be forwarded to another host or gateway (unless + it is source-routed), regardless of whether the host is single- + homed or multihomed. The host software MUST NOT automatically + move into gateway mode if the host has more than one interface, as + the operator of the machine may neither want to provide that + service nor be competent to do so. + + In the following, the action specified in certain cases is to + "silently discard" a received datagram. This means that the + datagram will be discarded without further processing and that the + host will not send any ICMP error message (see Section 3.2.2) as a + result. However, for diagnosis of problems a host SHOULD provide + the capability of logging the error (see Section 1.2.3), including + the contents of the silently-discarded datagram, and SHOULD record + the event in a statistics counter. + + DISCUSSION: + Silent discard of erroneous datagrams is generally intended + to prevent "broadcast storms". + + 3.2 PROTOCOL WALK-THROUGH + + 3.2.1 Internet Protocol -- IP + + 3.2.1.1 Version Number: RFC-791 Section 3.1 + + A datagram whose version number is not 4 MUST be silently + discarded. + + 3.2.1.2 Checksum: RFC-791 Section 3.1 + + A host MUST verify the IP header checksum on every received + datagram and silently discard every datagram that has a bad + checksum. + + 3.2.1.3 Addressing: RFC-791 Section 3.2 + + There are now five classes of IP addresses: Class A through + Class E. Class D addresses are used for IP multicasting + [IP:4], while Class E addresses are reserved for + experimental use. + + A multicast (Class D) address is a 28-bit logical address + that stands for a group of hosts, and may be either + permanent or transient. Permanent multicast addresses are + allocated by the Internet Assigned Number Authority + [INTRO:6], while transient addresses may be allocated + + + +Internet Engineering Task Force [Page 29] + + + + +RFC1122 INTERNET LAYER October 1989 + + + dynamically to transient groups. Group membership is + determined dynamically using IGMP [IP:4]. + + We now summarize the important special cases for Class A, B, + and C IP addresses, using the following notation for an IP + address: + + { <Network-number>, <Host-number> } + + or + { <Network-number>, <Subnet-number>, <Host-number> } + + and the notation "-1" for a field that contains all 1 bits. + This notation is not intended to imply that the 1-bits in an + address mask need be contiguous. + + (a) { 0, 0 } + + This host on this network. MUST NOT be sent, except as + a source address as part of an initialization procedure + by which the host learns its own IP address. + + See also Section 3.3.6 for a non-standard use of {0,0}. + + (b) { 0, <Host-number> } + + Specified host on this network. It MUST NOT be sent, + except as a source address as part of an initialization + procedure by which the host learns its full IP address. + + (c) { -1, -1 } + + Limited broadcast. It MUST NOT be used as a source + address. + + A datagram with this destination address will be + received by every host on the connected physical + network but will not be forwarded outside that network. + + (d) { <Network-number>, -1 } + + Directed broadcast to the specified network. It MUST + NOT be used as a source address. + + (e) { <Network-number>, <Subnet-number>, -1 } + + Directed broadcast to the specified subnet. It MUST + NOT be used as a source address. + + + +Internet Engineering Task Force [Page 30] + + + + +RFC1122 INTERNET LAYER October 1989 + + + (f) { <Network-number>, -1, -1 } + + Directed broadcast to all subnets of the specified + subnetted network. It MUST NOT be used as a source + address. + + (g) { 127, <any> } + + Internal host loopback address. Addresses of this form + MUST NOT appear outside a host. + + The <Network-number> is administratively assigned so that + its value will be unique in the entire world. + + IP addresses are not permitted to have the value 0 or -1 for + any of the <Host-number>, <Network-number>, or <Subnet- + number> fields (except in the special cases listed above). + This implies that each of these fields will be at least two + bits long. + + For further discussion of broadcast addresses, see Section + 3.3.6. + + A host MUST support the subnet extensions to IP [IP:3]. As + a result, there will be an address mask of the form: + {-1, -1, 0} associated with each of the host's local IP + addresses; see Sections 3.2.2.9 and 3.3.1.1. + + When a host sends any datagram, the IP source address MUST + be one of its own IP addresses (but not a broadcast or + multicast address). + + A host MUST silently discard an incoming datagram that is + not destined for the host. An incoming datagram is destined + for the host if the datagram's destination address field is: + + (1) (one of) the host's IP address(es); or + + (2) an IP broadcast address valid for the connected + network; or + + (3) the address for a multicast group of which the host is + a member on the incoming physical interface. + + For most purposes, a datagram addressed to a broadcast or + multicast destination is processed as if it had been + addressed to one of the host's IP addresses; we use the term + "specific-destination address" for the equivalent local IP + + + +Internet Engineering Task Force [Page 31] + + + + +RFC1122 INTERNET LAYER October 1989 + + + address of the host. The specific-destination address is + defined to be the destination address in the IP header + unless the header contains a broadcast or multicast address, + in which case the specific-destination is an IP address + assigned to the physical interface on which the datagram + arrived. + + A host MUST silently discard an incoming datagram containing + an IP source address that is invalid by the rules of this + section. This validation could be done in either the IP + layer or by each protocol in the transport layer. + + DISCUSSION: + A mis-addressed datagram might be caused by a link- + layer broadcast of a unicast datagram or by a gateway + or host that is confused or mis-configured. + + An architectural goal for Internet hosts was to allow + IP addresses to be featureless 32-bit numbers, avoiding + algorithms that required a knowledge of the IP address + format. Otherwise, any future change in the format or + interpretation of IP addresses will require host + software changes. However, validation of broadcast and + multicast addresses violates this goal; a few other + violations are described elsewhere in this document. + + Implementers should be aware that applications + depending upon the all-subnets directed broadcast + address (f) may be unusable on some networks. All- + subnets broadcast is not widely implemented in vendor + gateways at present, and even when it is implemented, a + particular network administration may disable it in the + gateway configuration. + + 3.2.1.4 Fragmentation and Reassembly: RFC-791 Section 3.2 + + The Internet model requires that every host support + reassembly. See Sections 3.3.2 and 3.3.3 for the + requirements on fragmentation and reassembly. + + 3.2.1.5 Identification: RFC-791 Section 3.2 + + When sending an identical copy of an earlier datagram, a + host MAY optionally retain the same Identification field in + the copy. + + + + + + +Internet Engineering Task Force [Page 32] + + + + +RFC1122 INTERNET LAYER October 1989 + + + DISCUSSION: + Some Internet protocol experts have maintained that + when a host sends an identical copy of an earlier + datagram, the new copy should contain the same + Identification value as the original. There are two + suggested advantages: (1) if the datagrams are + fragmented and some of the fragments are lost, the + receiver may be able to reconstruct a complete datagram + from fragments of the original and the copies; (2) a + congested gateway might use the IP Identification field + (and Fragment Offset) to discard duplicate datagrams + from the queue. + + However, the observed patterns of datagram loss in the + Internet do not favor the probability of retransmitted + fragments filling reassembly gaps, while other + mechanisms (e.g., TCP repacketizing upon + retransmission) tend to prevent retransmission of an + identical datagram [IP:9]. Therefore, we believe that + retransmitting the same Identification field is not + useful. Also, a connectionless transport protocol like + UDP would require the cooperation of the application + programs to retain the same Identification value in + identical datagrams. + + 3.2.1.6 Type-of-Service: RFC-791 Section 3.2 + + The "Type-of-Service" byte in the IP header is divided into + two sections: the Precedence field (high-order 3 bits), and + a field that is customarily called "Type-of-Service" or + "TOS" (low-order 5 bits). In this document, all references + to "TOS" or the "TOS field" refer to the low-order 5 bits + only. + + The Precedence field is intended for Department of Defense + applications of the Internet protocols. The use of non-zero + values in this field is outside the scope of this document + and the IP standard specification. Vendors should consult + the Defense Communication Agency (DCA) for guidance on the + IP Precedence field and its implications for other protocol + layers. However, vendors should note that the use of + precedence will most likely require that its value be passed + between protocol layers in just the same way as the TOS + field is passed. + + The IP layer MUST provide a means for the transport layer to + set the TOS field of every datagram that is sent; the + default is all zero bits. The IP layer SHOULD pass received + + + +Internet Engineering Task Force [Page 33] + + + + +RFC1122 INTERNET LAYER October 1989 + + + TOS values up to the transport layer. + + The particular link-layer mappings of TOS contained in RFC- + 795 SHOULD NOT be implemented. + + DISCUSSION: + While the TOS field has been little used in the past, + it is expected to play an increasing role in the near + future. The TOS field is expected to be used to + control two aspects of gateway operations: routing and + queueing algorithms. See Section 2 of [INTRO:1] for + the requirements on application programs to specify TOS + values. + + The TOS field may also be mapped into link-layer + service selectors. This has been applied to provide + effective sharing of serial lines by different classes + of TCP traffic, for example. However, the mappings + suggested in RFC-795 for networks that were included in + the Internet as of 1981 are now obsolete. + + 3.2.1.7 Time-to-Live: RFC-791 Section 3.2 + + A host MUST NOT send a datagram with a Time-to-Live (TTL) + value of zero. + + A host MUST NOT discard a datagram just because it was + received with TTL less than 2. + + The IP layer MUST provide a means for the transport layer to + set the TTL field of every datagram that is sent. When a + fixed TTL value is used, it MUST be configurable. The + current suggested value will be published in the "Assigned + Numbers" RFC. + + DISCUSSION: + The TTL field has two functions: limit the lifetime of + TCP segments (see RFC-793 [TCP:1], p. 28), and + terminate Internet routing loops. Although TTL is a + time in seconds, it also has some attributes of a hop- + count, since each gateway is required to reduce the TTL + field by at least one. + + The intent is that TTL expiration will cause a datagram + to be discarded by a gateway but not by the destination + host; however, hosts that act as gateways by forwarding + datagrams must follow the gateway rules for TTL. + + + + +Internet Engineering Task Force [Page 34] + + + + +RFC1122 INTERNET LAYER October 1989 + + + A higher-layer protocol may want to set the TTL in + order to implement an "expanding scope" search for some + Internet resource. This is used by some diagnostic + tools, and is expected to be useful for locating the + "nearest" server of a given class using IP + multicasting, for example. A particular transport + protocol may also want to specify its own TTL bound on + maximum datagram lifetime. + + A fixed value must be at least big enough for the + Internet "diameter," i.e., the longest possible path. + A reasonable value is about twice the diameter, to + allow for continued Internet growth. + + 3.2.1.8 Options: RFC-791 Section 3.2 + + There MUST be a means for the transport layer to specify IP + options to be included in transmitted IP datagrams (see + Section 3.4). + + All IP options (except NOP or END-OF-LIST) received in + datagrams MUST be passed to the transport layer (or to ICMP + processing when the datagram is an ICMP message). The IP + and transport layer MUST each interpret those IP options + that they understand and silently ignore the others. + + Later sections of this document discuss specific IP option + support required by each of ICMP, TCP, and UDP. + + DISCUSSION: + Passing all received IP options to the transport layer + is a deliberate "violation of strict layering" that is + designed to ease the introduction of new transport- + relevant IP options in the future. Each layer must + pick out any options that are relevant to its own + processing and ignore the rest. For this purpose, + every IP option except NOP and END-OF-LIST will include + a specification of its own length. + + This document does not define the order in which a + receiver must process multiple options in the same IP + header. Hosts sending multiple options must be aware + that this introduces an ambiguity in the meaning of + certain options when combined with a source-route + option. + + IMPLEMENTATION: + The IP layer must not crash as the result of an option + + + +Internet Engineering Task Force [Page 35] + + + + +RFC1122 INTERNET LAYER October 1989 + + + length that is outside the possible range. For + example, erroneous option lengths have been observed to + put some IP implementations into infinite loops. + + Here are the requirements for specific IP options: + + + (a) Security Option + + Some environments require the Security option in every + datagram; such a requirement is outside the scope of + this document and the IP standard specification. Note, + however, that the security options described in RFC-791 + and RFC-1038 are obsolete. For DoD applications, + vendors should consult [IP:8] for guidance. + + + (b) Stream Identifier Option + + This option is obsolete; it SHOULD NOT be sent, and it + MUST be silently ignored if received. + + + (c) Source Route Options + + A host MUST support originating a source route and MUST + be able to act as the final destination of a source + route. + + If host receives a datagram containing a completed + source route (i.e., the pointer points beyond the last + field), the datagram has reached its final destination; + the option as received (the recorded route) MUST be + passed up to the transport layer (or to ICMP message + processing). This recorded route will be reversed and + used to form a return source route for reply datagrams + (see discussion of IP Options in Section 4). When a + return source route is built, it MUST be correctly + formed even if the recorded route included the source + host (see case (B) in the discussion below). + + An IP header containing more than one Source Route + option MUST NOT be sent; the effect on routing of + multiple Source Route options is implementation- + specific. + + Section 3.3.5 presents the rules for a host acting as + an intermediate hop in a source route, i.e., forwarding + + + +Internet Engineering Task Force [Page 36] + + + + +RFC1122 INTERNET LAYER October 1989 + + + a source-routed datagram. + + DISCUSSION: + If a source-routed datagram is fragmented, each + fragment will contain a copy of the source route. + Since the processing of IP options (including a + source route) must precede reassembly, the + original datagram will not be reassembled until + the final destination is reached. + + Suppose a source routed datagram is to be routed + from host S to host D via gateways G1, G2, ... Gn. + There was an ambiguity in the specification over + whether the source route option in a datagram sent + out by S should be (A) or (B): + + (A): {>>G2, G3, ... Gn, D} <--- CORRECT + + (B): {S, >>G2, G3, ... Gn, D} <---- WRONG + + (where >> represents the pointer). If (A) is + sent, the datagram received at D will contain the + option: {G1, G2, ... Gn >>}, with S and D as the + IP source and destination addresses. If (B) were + sent, the datagram received at D would again + contain S and D as the same IP source and + destination addresses, but the option would be: + {S, G1, ...Gn >>}; i.e., the originating host + would be the first hop in the route. + + + (d) Record Route Option + + Implementation of originating and processing the Record + Route option is OPTIONAL. + + + (e) Timestamp Option + + Implementation of originating and processing the + Timestamp option is OPTIONAL. If it is implemented, + the following rules apply: + + o The originating host MUST record a timestamp in a + Timestamp option whose Internet address fields are + not pre-specified or whose first pre-specified + address is the host's interface address. + + + + +Internet Engineering Task Force [Page 37] + + + + +RFC1122 INTERNET LAYER October 1989 + + + o The destination host MUST (if possible) add the + current timestamp to a Timestamp option before + passing the option to the transport layer or to + ICMP for processing. + + o A timestamp value MUST follow the rules given in + Section 3.2.2.8 for the ICMP Timestamp message. + + + 3.2.2 Internet Control Message Protocol -- ICMP + + ICMP messages are grouped into two classes. + + * + ICMP error messages: + + Destination Unreachable (see Section 3.2.2.1) + Redirect (see Section 3.2.2.2) + Source Quench (see Section 3.2.2.3) + Time Exceeded (see Section 3.2.2.4) + Parameter Problem (see Section 3.2.2.5) + + + * + ICMP query messages: + + Echo (see Section 3.2.2.6) + Information (see Section 3.2.2.7) + Timestamp (see Section 3.2.2.8) + Address Mask (see Section 3.2.2.9) + + + If an ICMP message of unknown type is received, it MUST be + silently discarded. + + Every ICMP error message includes the Internet header and at + least the first 8 data octets of the datagram that triggered + the error; more than 8 octets MAY be sent; this header and data + MUST be unchanged from the received datagram. + + In those cases where the Internet layer is required to pass an + ICMP error message to the transport layer, the IP protocol + number MUST be extracted from the original header and used to + select the appropriate transport protocol entity to handle the + error. + + An ICMP error message SHOULD be sent with normal (i.e., zero) + TOS bits. + + + +Internet Engineering Task Force [Page 38] + + + + +RFC1122 INTERNET LAYER October 1989 + + + An ICMP error message MUST NOT be sent as the result of + receiving: + + * an ICMP error message, or + + * a datagram destined to an IP broadcast or IP multicast + address, or + + * a datagram sent as a link-layer broadcast, or + + * a non-initial fragment, or + + * a datagram whose source address does not define a single + host -- e.g., a zero address, a loopback address, a + broadcast address, a multicast address, or a Class E + address. + + NOTE: THESE RESTRICTIONS TAKE PRECEDENCE OVER ANY REQUIREMENT + ELSEWHERE IN THIS DOCUMENT FOR SENDING ICMP ERROR MESSAGES. + + DISCUSSION: + These rules will prevent the "broadcast storms" that have + resulted from hosts returning ICMP error messages in + response to broadcast datagrams. For example, a broadcast + UDP segment to a non-existent port could trigger a flood + of ICMP Destination Unreachable datagrams from all + machines that do not have a client for that destination + port. On a large Ethernet, the resulting collisions can + render the network useless for a second or more. + + Every datagram that is broadcast on the connected network + should have a valid IP broadcast address as its IP + destination (see Section 3.3.6). However, some hosts + violate this rule. To be certain to detect broadcast + datagrams, therefore, hosts are required to check for a + link-layer broadcast as well as an IP-layer broadcast + address. + + IMPLEMENTATION: + This requires that the link layer inform the IP layer when + a link-layer broadcast datagram has been received; see + Section 2.4. + + 3.2.2.1 Destination Unreachable: RFC-792 + + The following additional codes are hereby defined: + + 6 = destination network unknown + + + +Internet Engineering Task Force [Page 39] + + + + +RFC1122 INTERNET LAYER October 1989 + + + 7 = destination host unknown + + 8 = source host isolated + + 9 = communication with destination network + administratively prohibited + + 10 = communication with destination host + administratively prohibited + + 11 = network unreachable for type of service + + 12 = host unreachable for type of service + + A host SHOULD generate Destination Unreachable messages with + code: + + 2 (Protocol Unreachable), when the designated transport + protocol is not supported; or + + 3 (Port Unreachable), when the designated transport + protocol (e.g., UDP) is unable to demultiplex the + datagram but has no protocol mechanism to inform the + sender. + + A Destination Unreachable message that is received MUST be + reported to the transport layer. The transport layer SHOULD + use the information appropriately; for example, see Sections + 4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol + that has its own mechanism for notifying the sender that a + port is unreachable (e.g., TCP, which sends RST segments) + MUST nevertheless accept an ICMP Port Unreachable for the + same purpose. + + A Destination Unreachable message that is received with code + 0 (Net), 1 (Host), or 5 (Bad Source Route) may result from a + routing transient and MUST therefore be interpreted as only + a hint, not proof, that the specified destination is + unreachable [IP:11]. For example, it MUST NOT be used as + proof of a dead gateway (see Section 3.3.1). + + 3.2.2.2 Redirect: RFC-792 + + A host SHOULD NOT send an ICMP Redirect message; Redirects + are to be sent only by gateways. + + A host receiving a Redirect message MUST update its routing + information accordingly. Every host MUST be prepared to + + + +Internet Engineering Task Force [Page 40] + + + + +RFC1122 INTERNET LAYER October 1989 + + + accept both Host and Network Redirects and to process them + as described in Section 3.3.1.2 below. + + A Redirect message SHOULD be silently discarded if the new + gateway address it specifies is not on the same connected + (sub-) net through which the Redirect arrived [INTRO:2, + Appendix A], or if the source of the Redirect is not the + current first-hop gateway for the specified destination (see + Section 3.3.1). + + 3.2.2.3 Source Quench: RFC-792 + + A host MAY send a Source Quench message if it is + approaching, or has reached, the point at which it is forced + to discard incoming datagrams due to a shortage of + reassembly buffers or other resources. See Section 2.2.3 of + [INTRO:2] for suggestions on when to send Source Quench. + + If a Source Quench message is received, the IP layer MUST + report it to the transport layer (or ICMP processing). In + general, the transport or application layer SHOULD implement + a mechanism to respond to Source Quench for any protocol + that can send a sequence of datagrams to the same + destination and which can reasonably be expected to maintain + enough state information to make this feasible. See Section + 4 for the handling of Source Quench by TCP and UDP. + + DISCUSSION: + A Source Quench may be generated by the target host or + by some gateway in the path of a datagram. The host + receiving a Source Quench should throttle itself back + for a period of time, then gradually increase the + transmission rate again. The mechanism to respond to + Source Quench may be in the transport layer (for + connection-oriented protocols like TCP) or in the + application layer (for protocols that are built on top + of UDP). + + A mechanism has been proposed [IP:14] to make the IP + layer respond directly to Source Quench by controlling + the rate at which datagrams are sent, however, this + proposal is currently experimental and not currently + recommended. + + 3.2.2.4 Time Exceeded: RFC-792 + + An incoming Time Exceeded message MUST be passed to the + transport layer. + + + +Internet Engineering Task Force [Page 41] + + + + +RFC1122 INTERNET LAYER October 1989 + + + DISCUSSION: + A gateway will send a Time Exceeded Code 0 (In Transit) + message when it discards a datagram due to an expired + TTL field. This indicates either a gateway routing + loop or too small an initial TTL value. + + A host may receive a Time Exceeded Code 1 (Reassembly + Timeout) message from a destination host that has timed + out and discarded an incomplete datagram; see Section + 3.3.2 below. In the future, receipt of this message + might be part of some "MTU discovery" procedure, to + discover the maximum datagram size that can be sent on + the path without fragmentation. + + 3.2.2.5 Parameter Problem: RFC-792 + + A host SHOULD generate Parameter Problem messages. An + incoming Parameter Problem message MUST be passed to the + transport layer, and it MAY be reported to the user. + + DISCUSSION: + The ICMP Parameter Problem message is sent to the + source host for any problem not specifically covered by + another ICMP message. Receipt of a Parameter Problem + message generally indicates some local or remote + implementation error. + + A new variant on the Parameter Problem message is hereby + defined: + Code 1 = required option is missing. + + DISCUSSION: + This variant is currently in use in the military + community for a missing security option. + + 3.2.2.6 Echo Request/Reply: RFC-792 + + Every host MUST implement an ICMP Echo server function that + receives Echo Requests and sends corresponding Echo Replies. + A host SHOULD also implement an application-layer interface + for sending an Echo Request and receiving an Echo Reply, for + diagnostic purposes. + + An ICMP Echo Request destined to an IP broadcast or IP + multicast address MAY be silently discarded. + + + + + + +Internet Engineering Task Force [Page 42] + + + + +RFC1122 INTERNET LAYER October 1989 + + + DISCUSSION: + This neutral provision results from a passionate debate + between those who feel that ICMP Echo to a broadcast + address provides a valuable diagnostic capability and + those who feel that misuse of this feature can too + easily create packet storms. + + The IP source address in an ICMP Echo Reply MUST be the same + as the specific-destination address (defined in Section + 3.2.1.3) of the corresponding ICMP Echo Request message. + + Data received in an ICMP Echo Request MUST be entirely + included in the resulting Echo Reply. However, if sending + the Echo Reply requires intentional fragmentation that is + not implemented, the datagram MUST be truncated to maximum + transmission size (see Section 3.3.3) and sent. + + Echo Reply messages MUST be passed to the ICMP user + interface, unless the corresponding Echo Request originated + in the IP layer. + + If a Record Route and/or Time Stamp option is received in an + ICMP Echo Request, this option (these options) SHOULD be + updated to include the current host and included in the IP + header of the Echo Reply message, without "truncation". + Thus, the recorded route will be for the entire round trip. + + If a Source Route option is received in an ICMP Echo + Request, the return route MUST be reversed and used as a + Source Route option for the Echo Reply message. + + 3.2.2.7 Information Request/Reply: RFC-792 + + A host SHOULD NOT implement these messages. + + DISCUSSION: + The Information Request/Reply pair was intended to + support self-configuring systems such as diskless + workstations, to allow them to discover their IP + network numbers at boot time. However, the RARP and + BOOTP protocols provide better mechanisms for a host to + discover its own IP address. + + 3.2.2.8 Timestamp and Timestamp Reply: RFC-792 + + A host MAY implement Timestamp and Timestamp Reply. If they + are implemented, the following rules MUST be followed. + + + + +Internet Engineering Task Force [Page 43] + + + + +RFC1122 INTERNET LAYER October 1989 + + + o The ICMP Timestamp server function returns a Timestamp + Reply to every Timestamp message that is received. If + this function is implemented, it SHOULD be designed for + minimum variability in delay (e.g., implemented in the + kernel to avoid delay in scheduling a user process). + + The following cases for Timestamp are to be handled + according to the corresponding rules for ICMP Echo: + + o An ICMP Timestamp Request message to an IP broadcast or + IP multicast address MAY be silently discarded. + + o The IP source address in an ICMP Timestamp Reply MUST + be the same as the specific-destination address of the + corresponding Timestamp Request message. + + o If a Source-route option is received in an ICMP Echo + Request, the return route MUST be reversed and used as + a Source Route option for the Timestamp Reply message. + + o If a Record Route and/or Timestamp option is received + in a Timestamp Request, this (these) option(s) SHOULD + be updated to include the current host and included in + the IP header of the Timestamp Reply message. + + o Incoming Timestamp Reply messages MUST be passed up to + the ICMP user interface. + + The preferred form for a timestamp value (the "standard + value") is in units of milliseconds since midnight Universal + Time. However, it may be difficult to provide this value + with millisecond resolution. For example, many systems use + clocks that update only at line frequency, 50 or 60 times + per second. Therefore, some latitude is allowed in a + "standard value": + + (a) A "standard value" MUST be updated at least 15 times + per second (i.e., at most the six low-order bits of the + value may be undefined). + + (b) The accuracy of a "standard value" MUST approximate + that of operator-set CPU clocks, i.e., correct within a + few minutes. + + + + + + + + +Internet Engineering Task Force [Page 44] + + + + +RFC1122 INTERNET LAYER October 1989 + + + 3.2.2.9 Address Mask Request/Reply: RFC-950 + + A host MUST support the first, and MAY implement all three, + of the following methods for determining the address mask(s) + corresponding to its IP address(es): + + (1) static configuration information; + + (2) obtaining the address mask(s) dynamically as a side- + effect of the system initialization process (see + [INTRO:1]); and + + (3) sending ICMP Address Mask Request(s) and receiving ICMP + Address Mask Reply(s). + + The choice of method to be used in a particular host MUST be + configurable. + + When method (3), the use of Address Mask messages, is + enabled, then: + + (a) When it initializes, the host MUST broadcast an Address + Mask Request message on the connected network + corresponding to the IP address. It MUST retransmit + this message a small number of times if it does not + receive an immediate Address Mask Reply. + + (b) Until it has received an Address Mask Reply, the host + SHOULD assume a mask appropriate for the address class + of the IP address, i.e., assume that the connected + network is not subnetted. + + (c) The first Address Mask Reply message received MUST be + used to set the address mask corresponding to the + particular local IP address. This is true even if the + first Address Mask Reply message is "unsolicited", in + which case it will have been broadcast and may arrive + after the host has ceased to retransmit Address Mask + Requests. Once the mask has been set by an Address + Mask Reply, later Address Mask Reply messages MUST be + (silently) ignored. + + Conversely, if Address Mask messages are disabled, then no + ICMP Address Mask Requests will be sent, and any ICMP + Address Mask Replies received for that local IP address MUST + be (silently) ignored. + + A host SHOULD make some reasonableness check on any address + + + +Internet Engineering Task Force [Page 45] + + + + +RFC1122 INTERNET LAYER October 1989 + + + mask it installs; see IMPLEMENTATION section below. + + A system MUST NOT send an Address Mask Reply unless it is an + authoritative agent for address masks. An authoritative + agent may be a host or a gateway, but it MUST be explicitly + configured as a address mask agent. Receiving an address + mask via an Address Mask Reply does not give the receiver + authority and MUST NOT be used as the basis for issuing + Address Mask Replies. + + With a statically configured address mask, there SHOULD be + an additional configuration flag that determines whether the + host is to act as an authoritative agent for this mask, + i.e., whether it will answer Address Mask Request messages + using this mask. + + If it is configured as an agent, the host MUST broadcast an + Address Mask Reply for the mask on the appropriate interface + when it initializes. + + See "System Initialization" in [INTRO:1] for more + information about the use of Address Mask Request/Reply + messages. + + DISCUSSION + Hosts that casually send Address Mask Replies with + invalid address masks have often been a serious + nuisance. To prevent this, Address Mask Replies ought + to be sent only by authoritative agents that have been + selected by explicit administrative action. + + When an authoritative agent receives an Address Mask + Request message, it will send a unicast Address Mask + Reply to the source IP address. If the network part of + this address is zero (see (a) and (b) in 3.2.1.3), the + Reply will be broadcast. + + Getting no reply to its Address Mask Request messages, + a host will assume there is no agent and use an + unsubnetted mask, but the agent may be only temporarily + unreachable. An agent will broadcast an unsolicited + Address Mask Reply whenever it initializes, in order to + update the masks of all hosts that have initialized in + the meantime. + + IMPLEMENTATION: + The following reasonableness check on an address mask + is suggested: the mask is not all 1 bits, and it is + + + +Internet Engineering Task Force [Page 46] + + + + +RFC1122 INTERNET LAYER October 1989 + + + either zero or else the 8 highest-order bits are on. + + 3.2.3 Internet Group Management Protocol IGMP + + IGMP [IP:4] is a protocol used between hosts and gateways on a + single network to establish hosts' membership in particular + multicast groups. The gateways use this information, in + conjunction with a multicast routing protocol, to support IP + multicasting across the Internet. + + At this time, implementation of IGMP is OPTIONAL; see Section + 3.3.7 for more information. Without IGMP, a host can still + participate in multicasting local to its connected networks. + + 3.3 SPECIFIC ISSUES + + 3.3.1 Routing Outbound Datagrams + + The IP layer chooses the correct next hop for each datagram it + sends. If the destination is on a connected network, the + datagram is sent directly to the destination host; otherwise, + it has to be routed to a gateway on a connected network. + + 3.3.1.1 Local/Remote Decision + + To decide if the destination is on a connected network, the + following algorithm MUST be used [see IP:3]: + + (a) The address mask (particular to a local IP address for + a multihomed host) is a 32-bit mask that selects the + network number and subnet number fields of the + corresponding IP address. + + (b) If the IP destination address bits extracted by the + address mask match the IP source address bits extracted + by the same mask, then the destination is on the + corresponding connected network, and the datagram is to + be transmitted directly to the destination host. + + (c) If not, then the destination is accessible only through + a gateway. Selection of a gateway is described below + (3.3.1.2). + + A special-case destination address is handled as follows: + + * For a limited broadcast or a multicast address, simply + pass the datagram to the link layer for the appropriate + interface. + + + +Internet Engineering Task Force [Page 47] + + + + +RFC1122 INTERNET LAYER October 1989 + + + * For a (network or subnet) directed broadcast, the + datagram can use the standard routing algorithms. + + The host IP layer MUST operate correctly in a minimal + network environment, and in particular, when there are no + gateways. For example, if the IP layer of a host insists on + finding at least one gateway to initialize, the host will be + unable to operate on a single isolated broadcast net. + + 3.3.1.2 Gateway Selection + + To efficiently route a series of datagrams to the same + destination, the source host MUST keep a "route cache" of + mappings to next-hop gateways. A host uses the following + basic algorithm on this cache to route a datagram; this + algorithm is designed to put the primary routing burden on + the gateways [IP:11]. + + (a) If the route cache contains no information for a + particular destination, the host chooses a "default" + gateway and sends the datagram to it. It also builds a + corresponding Route Cache entry. + + (b) If that gateway is not the best next hop to the + destination, the gateway will forward the datagram to + the best next-hop gateway and return an ICMP Redirect + message to the source host. + + (c) When it receives a Redirect, the host updates the + next-hop gateway in the appropriate route cache entry, + so later datagrams to the same destination will go + directly to the best gateway. + + Since the subnet mask appropriate to the destination address + is generally not known, a Network Redirect message SHOULD be + treated identically to a Host Redirect message; i.e., the + cache entry for the destination host (only) would be updated + (or created, if an entry for that host did not exist) for + the new gateway. + + DISCUSSION: + This recommendation is to protect against gateways that + erroneously send Network Redirects for a subnetted + network, in violation of the gateway requirements + [INTRO:2]. + + When there is no route cache entry for the destination host + address (and the destination is not on the connected + + + +Internet Engineering Task Force [Page 48] + + + + +RFC1122 INTERNET LAYER October 1989 + + + network), the IP layer MUST pick a gateway from its list of + "default" gateways. The IP layer MUST support multiple + default gateways. + + As an extra feature, a host IP layer MAY implement a table + of "static routes". Each such static route MAY include a + flag specifying whether it may be overridden by ICMP + Redirects. + + DISCUSSION: + A host generally needs to know at least one default + gateway to get started. This information can be + obtained from a configuration file or else from the + host startup sequence, e.g., the BOOTP protocol (see + [INTRO:1]). + + It has been suggested that a host can augment its list + of default gateways by recording any new gateways it + learns about. For example, it can record every gateway + to which it is ever redirected. Such a feature, while + possibly useful in some circumstances, may cause + problems in other cases (e.g., gateways are not all + equal), and it is not recommended. + + A static route is typically a particular preset mapping + from destination host or network into a particular + next-hop gateway; it might also depend on the Type-of- + Service (see next section). Static routes would be set + up by system administrators to override the normal + automatic routing mechanism, to handle exceptional + situations. However, any static routing information is + a potential source of failure as configurations change + or equipment fails. + + 3.3.1.3 Route Cache + + Each route cache entry needs to include the following + fields: + + (1) Local IP address (for a multihomed host) + + (2) Destination IP address + + (3) Type(s)-of-Service + + (4) Next-hop gateway IP address + + Field (2) MAY be the full IP address of the destination + + + +Internet Engineering Task Force [Page 49] + + + + +RFC1122 INTERNET LAYER October 1989 + + + host, or only the destination network number. Field (3), + the TOS, SHOULD be included. + + See Section 3.3.4.2 for a discussion of the implications of + multihoming for the lookup procedure in this cache. + + DISCUSSION: + Including the Type-of-Service field in the route cache + and considering it in the host route algorithm will + provide the necessary mechanism for the future when + Type-of-Service routing is commonly used in the + Internet. See Section 3.2.1.6. + + Each route cache entry defines the endpoints of an + Internet path. Although the connecting path may change + dynamically in an arbitrary way, the transmission + characteristics of the path tend to remain + approximately constant over a time period longer than a + single typical host-host transport connection. + Therefore, a route cache entry is a natural place to + cache data on the properties of the path. Examples of + such properties might be the maximum unfragmented + datagram size (see Section 3.3.3), or the average + round-trip delay measured by a transport protocol. + This data will generally be both gathered and used by a + higher layer protocol, e.g., by TCP, or by an + application using UDP. Experiments are currently in + progress on caching path properties in this manner. + + There is no consensus on whether the route cache should + be keyed on destination host addresses alone, or allow + both host and network addresses. Those who favor the + use of only host addresses argue that: + + (1) As required in Section 3.3.1.2, Redirect messages + will generally result in entries keyed on + destination host addresses; the simplest and most + general scheme would be to use host addresses + always. + + (2) The IP layer may not always know the address mask + for a network address in a complex subnetted + environment. + + (3) The use of only host addresses allows the + destination address to be used as a pure 32-bit + number, which may allow the Internet architecture + to be more easily extended in the future without + + + +Internet Engineering Task Force [Page 50] + + + + +RFC1122 INTERNET LAYER October 1989 + + + any change to the hosts. + + The opposing view is that allowing a mixture of + destination hosts and networks in the route cache: + + (1) Saves memory space. + + (2) Leads to a simpler data structure, easily + combining the cache with the tables of default and + static routes (see below). + + (3) Provides a more useful place to cache path + properties, as discussed earlier. + + + IMPLEMENTATION: + The cache needs to be large enough to include entries + for the maximum number of destination hosts that may be + in use at one time. + + A route cache entry may also include control + information used to choose an entry for replacement. + This might take the form of a "recently used" bit, a + use count, or a last-used timestamp, for example. It + is recommended that it include the time of last + modification of the entry, for diagnostic purposes. + + An implementation may wish to reduce the overhead of + scanning the route cache for every datagram to be + transmitted. This may be accomplished with a hash + table to speed the lookup, or by giving a connection- + oriented transport protocol a "hint" or temporary + handle on the appropriate cache entry, to be passed to + the IP layer with each subsequent datagram. + + Although we have described the route cache, the lists + of default gateways, and a table of static routes as + conceptually distinct, in practice they may be combined + into a single "routing table" data structure. + + 3.3.1.4 Dead Gateway Detection + + The IP layer MUST be able to detect the failure of a "next- + hop" gateway that is listed in its route cache and to choose + an alternate gateway (see Section 3.3.1.5). + + Dead gateway detection is covered in some detail in RFC-816 + [IP:11]. Experience to date has not produced a complete + + + +Internet Engineering Task Force [Page 51] + + + + +RFC1122 INTERNET LAYER October 1989 + + + algorithm which is totally satisfactory, though it has + identified several forbidden paths and promising techniques. + + * A particular gateway SHOULD NOT be used indefinitely in + the absence of positive indications that it is + functioning. + + * Active probes such as "pinging" (i.e., using an ICMP + Echo Request/Reply exchange) are expensive and scale + poorly. In particular, hosts MUST NOT actively check + the status of a first-hop gateway by simply pinging the + gateway continuously. + + * Even when it is the only effective way to verify a + gateway's status, pinging MUST be used only when + traffic is being sent to the gateway and when there is + no other positive indication to suggest that the + gateway is functioning. + + * To avoid pinging, the layers above and/or below the + Internet layer SHOULD be able to give "advice" on the + status of route cache entries when either positive + (gateway OK) or negative (gateway dead) information is + available. + + + DISCUSSION: + If an implementation does not include an adequate + mechanism for detecting a dead gateway and re-routing, + a gateway failure may cause datagrams to apparently + vanish into a "black hole". This failure can be + extremely confusing for users and difficult for network + personnel to debug. + + The dead-gateway detection mechanism must not cause + unacceptable load on the host, on connected networks, + or on first-hop gateway(s). The exact constraints on + the timeliness of dead gateway detection and on + acceptable load may vary somewhat depending on the + nature of the host's mission, but a host generally + needs to detect a failed first-hop gateway quickly + enough that transport-layer connections will not break + before an alternate gateway can be selected. + + Passing advice from other layers of the protocol stack + complicates the interfaces between the layers, but it + is the preferred approach to dead gateway detection. + Advice can come from almost any part of the IP/TCP + + + +Internet Engineering Task Force [Page 52] + + + + +RFC1122 INTERNET LAYER October 1989 + + + architecture, but it is expected to come primarily from + the transport and link layers. Here are some possible + sources for gateway advice: + + o TCP or any connection-oriented transport protocol + should be able to give negative advice, e.g., + triggered by excessive retransmissions. + + o TCP may give positive advice when (new) data is + acknowledged. Even though the route may be + asymmetric, an ACK for new data proves that the + acknowleged data must have been transmitted + successfully. + + o An ICMP Redirect message from a particular gateway + should be used as positive advice about that + gateway. + + o Link-layer information that reliably detects and + reports host failures (e.g., ARPANET Destination + Dead messages) should be used as negative advice. + + o Failure to ARP or to re-validate ARP mappings may + be used as negative advice for the corresponding + IP address. + + o Packets arriving from a particular link-layer + address are evidence that the system at this + address is alive. However, turning this + information into advice about gateways requires + mapping the link-layer address into an IP address, + and then checking that IP address against the + gateways pointed to by the route cache. This is + probably prohibitively inefficient. + + Note that positive advice that is given for every + datagram received may cause unacceptable overhead in + the implementation. + + While advice might be passed using required arguments + in all interfaces to the IP layer, some transport and + application layer protocols cannot deduce the correct + advice. These interfaces must therefore allow a + neutral value for advice, since either always-positive + or always-negative advice leads to incorrect behavior. + + There is another technique for dead gateway detection + that has been commonly used but is not recommended. + + + +Internet Engineering Task Force [Page 53] + + + + +RFC1122 INTERNET LAYER October 1989 + + + This technique depends upon the host passively + receiving ("wiretapping") the Interior Gateway Protocol + (IGP) datagrams that the gateways are broadcasting to + each other. This approach has the drawback that a host + needs to recognize all the interior gateway protocols + that gateways may use (see [INTRO:2]). In addition, it + only works on a broadcast network. + + At present, pinging (i.e., using ICMP Echo messages) is + the mechanism for gateway probing when absolutely + required. A successful ping guarantees that the + addressed interface and its associated machine are up, + but it does not guarantee that the machine is a gateway + as opposed to a host. The normal inference is that if + a Redirect or other evidence indicates that a machine + was a gateway, successful pings will indicate that the + machine is still up and hence still a gateway. + However, since a host silently discards packets that a + gateway would forward or redirect, this assumption + could sometimes fail. To avoid this problem, a new + ICMP message under development will ask "are you a + gateway?" + + IMPLEMENTATION: + The following specific algorithm has been suggested: + + o Associate a "reroute timer" with each gateway + pointed to by the route cache. Initialize the + timer to a value Tr, which must be small enough to + allow detection of a dead gateway before transport + connections time out. + + o Positive advice would reset the reroute timer to + Tr. Negative advice would reduce or zero the + reroute timer. + + o Whenever the IP layer used a particular gateway to + route a datagram, it would check the corresponding + reroute timer. If the timer had expired (reached + zero), the IP layer would send a ping to the + gateway, followed immediately by the datagram. + + o The ping (ICMP Echo) would be sent again if + necessary, up to N times. If no ping reply was + received in N tries, the gateway would be assumed + to have failed, and a new first-hop gateway would + be chosen for all cache entries pointing to the + failed gateway. + + + +Internet Engineering Task Force [Page 54] + + + + +RFC1122 INTERNET LAYER October 1989 + + + Note that the size of Tr is inversely related to the + amount of advice available. Tr should be large enough + to insure that: + + * Any pinging will be at a low level (e.g., <10%) of + all packets sent to a gateway from the host, AND + + * pinging is infrequent (e.g., every 3 minutes) + + Since the recommended algorithm is concerned with the + gateways pointed to by route cache entries, rather than + the cache entries themselves, a two level data + structure (perhaps coordinated with ARP or similar + caches) may be desirable for implementing a route + cache. + + 3.3.1.5 New Gateway Selection + + If the failed gateway is not the current default, the IP + layer can immediately switch to a default gateway. If it is + the current default that failed, the IP layer MUST select a + different default gateway (assuming more than one default is + known) for the failed route and for establishing new routes. + + DISCUSSION: + When a gateway does fail, the other gateways on the + connected network will learn of the failure through + some inter-gateway routing protocol. However, this + will not happen instantaneously, since gateway routing + protocols typically have a settling time of 30-60 + seconds. If the host switches to an alternative + gateway before the gateways have agreed on the failure, + the new target gateway will probably forward the + datagram to the failed gateway and send a Redirect back + to the host pointing to the failed gateway (!). The + result is likely to be a rapid oscillation in the + contents of the host's route cache during the gateway + settling period. It has been proposed that the dead- + gateway logic should include some hysteresis mechanism + to prevent such oscillations. However, experience has + not shown any harm from such oscillations, since + service cannot be restored to the host until the + gateways' routing information does settle down. + + IMPLEMENTATION: + One implementation technique for choosing a new default + gateway is to simply round-robin among the default + gateways in the host's list. Another is to rank the + + + +Internet Engineering Task Force [Page 55] + + + + +RFC1122 INTERNET LAYER October 1989 + + + gateways in priority order, and when the current + default gateway is not the highest priority one, to + "ping" the higher-priority gateways slowly to detect + when they return to service. This pinging can be at a + very low rate, e.g., 0.005 per second. + + 3.3.1.6 Initialization + + The following information MUST be configurable: + + (1) IP address(es). + + (2) Address mask(s). + + (3) A list of default gateways, with a preference level. + + A manual method of entering this configuration data MUST be + provided. In addition, a variety of methods can be used to + determine this information dynamically; see the section on + "Host Initialization" in [INTRO:1]. + + DISCUSSION: + Some host implementations use "wiretapping" of gateway + protocols on a broadcast network to learn what gateways + exist. A standard method for default gateway discovery + is under development. + + 3.3.2 Reassembly + + The IP layer MUST implement reassembly of IP datagrams. + + We designate the largest datagram size that can be reassembled + by EMTU_R ("Effective MTU to receive"); this is sometimes + called the "reassembly buffer size". EMTU_R MUST be greater + than or equal to 576, SHOULD be either configurable or + indefinite, and SHOULD be greater than or equal to the MTU of + the connected network(s). + + DISCUSSION: + A fixed EMTU_R limit should not be built into the code + because some application layer protocols require EMTU_R + values larger than 576. + + IMPLEMENTATION: + An implementation may use a contiguous reassembly buffer + for each datagram, or it may use a more complex data + structure that places no definite limit on the reassembled + datagram size; in the latter case, EMTU_R is said to be + + + +Internet Engineering Task Force [Page 56] + + + + +RFC1122 INTERNET LAYER October 1989 + + + "indefinite". + + Logically, reassembly is performed by simply copying each + fragment into the packet buffer at the proper offset. + Note that fragments may overlap if successive + retransmissions use different packetizing but the same + reassembly Id. + + The tricky part of reassembly is the bookkeeping to + determine when all bytes of the datagram have been + reassembled. We recommend Clark's algorithm [IP:10] that + requires no additional data space for the bookkeeping. + However, note that, contrary to [IP:10], the first + fragment header needs to be saved for inclusion in a + possible ICMP Time Exceeded (Reassembly Timeout) message. + + There MUST be a mechanism by which the transport layer can + learn MMS_R, the maximum message size that can be received and + reassembled in an IP datagram (see GET_MAXSIZES calls in + Section 3.4). If EMTU_R is not indefinite, then the value of + MMS_R is given by: + + MMS_R = EMTU_R - 20 + + since 20 is the minimum size of an IP header. + + There MUST be a reassembly timeout. The reassembly timeout + value SHOULD be a fixed value, not set from the remaining TTL. + It is recommended that the value lie between 60 seconds and 120 + seconds. If this timeout expires, the partially-reassembled + datagram MUST be discarded and an ICMP Time Exceeded message + sent to the source host (if fragment zero has been received). + + DISCUSSION: + The IP specification says that the reassembly timeout + should be the remaining TTL from the IP header, but this + does not work well because gateways generally treat TTL as + a simple hop count rather than an elapsed time. If the + reassembly timeout is too small, datagrams will be + discarded unnecessarily, and communication may fail. The + timeout needs to be at least as large as the typical + maximum delay across the Internet. A realistic minimum + reassembly timeout would be 60 seconds. + + It has been suggested that a cache might be kept of + round-trip times measured by transport protocols for + various destinations, and that these values might be used + to dynamically determine a reasonable reassembly timeout + + + +Internet Engineering Task Force [Page 57] + + + + +RFC1122 INTERNET LAYER October 1989 + + + value. Further investigation of this approach is + required. + + If the reassembly timeout is set too high, buffer + resources in the receiving host will be tied up too long, + and the MSL (Maximum Segment Lifetime) [TCP:1] will be + larger than necessary. The MSL controls the maximum rate + at which fragmented datagrams can be sent using distinct + values of the 16-bit Ident field; a larger MSL lowers the + maximum rate. The TCP specification [TCP:1] arbitrarily + assumes a value of 2 minutes for MSL. This sets an upper + limit on a reasonable reassembly timeout value. + + 3.3.3 Fragmentation + + Optionally, the IP layer MAY implement a mechanism to fragment + outgoing datagrams intentionally. + + We designate by EMTU_S ("Effective MTU for sending") the + maximum IP datagram size that may be sent, for a particular + combination of IP source and destination addresses and perhaps + TOS. + + A host MUST implement a mechanism to allow the transport layer + to learn MMS_S, the maximum transport-layer message size that + may be sent for a given {source, destination, TOS} triplet (see + GET_MAXSIZES call in Section 3.4). If no local fragmentation + is performed, the value of MMS_S will be: + + MMS_S = EMTU_S - <IP header size> + + and EMTU_S must be less than or equal to the MTU of the network + interface corresponding to the source address of the datagram. + Note that <IP header size> in this equation will be 20, unless + the IP reserves space to insert IP options for its own purposes + in addition to any options inserted by the transport layer. + + A host that does not implement local fragmentation MUST ensure + that the transport layer (for TCP) or the application layer + (for UDP) obtains MMS_S from the IP layer and does not send a + datagram exceeding MMS_S in size. + + It is generally desirable to avoid local fragmentation and to + choose EMTU_S low enough to avoid fragmentation in any gateway + along the path. In the absence of actual knowledge of the + minimum MTU along the path, the IP layer SHOULD use + EMTU_S <= 576 whenever the destination address is not on a + connected network, and otherwise use the connected network's + + + +Internet Engineering Task Force [Page 58] + + + + +RFC1122 INTERNET LAYER October 1989 + + + MTU. + + The MTU of each physical interface MUST be configurable. + + A host IP layer implementation MAY have a configuration flag + "All-Subnets-MTU", indicating that the MTU of the connected + network is to be used for destinations on different subnets + within the same network, but not for other networks. Thus, + this flag causes the network class mask, rather than the subnet + address mask, to be used to choose an EMTU_S. For a multihomed + host, an "All-Subnets-MTU" flag is needed for each network + interface. + + DISCUSSION: + Picking the correct datagram size to use when sending data + is a complex topic [IP:9]. + + (a) In general, no host is required to accept an IP + datagram larger than 576 bytes (including header and + data), so a host must not send a larger datagram + without explicit knowledge or prior arrangement with + the destination host. Thus, MMS_S is only an upper + bound on the datagram size that a transport protocol + may send; even when MMS_S exceeds 556, the transport + layer must limit its messages to 556 bytes in the + absence of other knowledge about the destination + host. + + (b) Some transport protocols (e.g., TCP) provide a way to + explicitly inform the sender about the largest + datagram the other end can receive and reassemble + [IP:7]. There is no corresponding mechanism in the + IP layer. + + A transport protocol that assumes an EMTU_R larger + than 576 (see Section 3.3.2), can send a datagram of + this larger size to another host that implements the + same protocol. + + (c) Hosts should ideally limit their EMTU_S for a given + destination to the minimum MTU of all the networks + along the path, to avoid any fragmentation. IP + fragmentation, while formally correct, can create a + serious transport protocol performance problem, + because loss of a single fragment means all the + fragments in the segment must be retransmitted + [IP:9]. + + + + +Internet Engineering Task Force [Page 59] + + + + +RFC1122 INTERNET LAYER October 1989 + + + Since nearly all networks in the Internet currently + support an MTU of 576 or greater, we strongly recommend + the use of 576 for datagrams sent to non-local networks. + + It has been suggested that a host could determine the MTU + over a given path by sending a zero-offset datagram + fragment and waiting for the receiver to time out the + reassembly (which cannot complete!) and return an ICMP + Time Exceeded message. This message would include the + largest remaining fragment header in its body. More + direct mechanisms are being experimented with, but have + not yet been adopted (see e.g., RFC-1063). + + 3.3.4 Local Multihoming + + 3.3.4.1 Introduction + + A multihomed host has multiple IP addresses, which we may + think of as "logical interfaces". These logical interfaces + may be associated with one or more physical interfaces, and + these physical interfaces may be connected to the same or + different networks. + + Here are some important cases of multihoming: + + (a) Multiple Logical Networks + + The Internet architects envisioned that each physical + network would have a single unique IP network (or + subnet) number. However, LAN administrators have + sometimes found it useful to violate this assumption, + operating a LAN with multiple logical networks per + physical connected network. + + If a host connected to such a physical network is + configured to handle traffic for each of N different + logical networks, then the host will have N logical + interfaces. These could share a single physical + interface, or might use N physical interfaces to the + same network. + + (b) Multiple Logical Hosts + + When a host has multiple IP addresses that all have the + same <Network-number> part (and the same <Subnet- + number> part, if any), the logical interfaces are known + as "logical hosts". These logical interfaces might + share a single physical interface or might use separate + + + +Internet Engineering Task Force [Page 60] + + + + +RFC1122 INTERNET LAYER October 1989 + + + physical interfaces to the same physical network. + + (c) Simple Multihoming + + In this case, each logical interface is mapped into a + separate physical interface and each physical interface + is connected to a different physical network. The term + "multihoming" was originally applied only to this case, + but it is now applied more generally. + + A host with embedded gateway functionality will + typically fall into the simple multihoming case. Note, + however, that a host may be simply multihomed without + containing an embedded gateway, i.e., without + forwarding datagrams from one connected network to + another. + + This case presents the most difficult routing problems. + The choice of interface (i.e., the choice of first-hop + network) may significantly affect performance or even + reachability of remote parts of the Internet. + + + Finally, we note another possibility that is NOT + multihoming: one logical interface may be bound to multiple + physical interfaces, in order to increase the reliability or + throughput between directly connected machines by providing + alternative physical paths between them. For instance, two + systems might be connected by multiple point-to-point links. + We call this "link-layer multiplexing". With link-layer + multiplexing, the protocols above the link layer are unaware + that multiple physical interfaces are present; the link- + layer device driver is responsible for multiplexing and + routing packets across the physical interfaces. + + In the Internet protocol architecture, a transport protocol + instance ("entity") has no address of its own, but instead + uses a single Internet Protocol (IP) address. This has + implications for the IP, transport, and application layers, + and for the interfaces between them. In particular, the + application software may have to be aware of the multiple IP + addresses of a multihomed host; in other cases, the choice + can be made within the network software. + + 3.3.4.2 Multihoming Requirements + + The following general rules apply to the selection of an IP + source address for sending a datagram from a multihomed + + + +Internet Engineering Task Force [Page 61] + + + + +RFC1122 INTERNET LAYER October 1989 + + + host. + + (1) If the datagram is sent in response to a received + datagram, the source address for the response SHOULD be + the specific-destination address of the request. See + Sections 4.1.3.5 and 4.2.3.7 and the "General Issues" + section of [INTRO:1] for more specific requirements on + higher layers. + + Otherwise, a source address must be selected. + + (2) An application MUST be able to explicitly specify the + source address for initiating a connection or a + request. + + (3) In the absence of such a specification, the networking + software MUST choose a source address. Rules for this + choice are described below. + + + There are two key requirement issues related to multihoming: + + (A) A host MAY silently discard an incoming datagram whose + destination address does not correspond to the physical + interface through which it is received. + + (B) A host MAY restrict itself to sending (non-source- + routed) IP datagrams only through the physical + interface that corresponds to the IP source address of + the datagrams. + + + DISCUSSION: + Internet host implementors have used two different + conceptual models for multihoming, briefly summarized + in the following discussion. This document takes no + stand on which model is preferred; each seems to have a + place. This ambivalence is reflected in the issues (A) + and (B) being optional. + + o Strong ES Model + + The Strong ES (End System, i.e., host) model + emphasizes the host/gateway (ES/IS) distinction, + and would therefore substitute MUST for MAY in + issues (A) and (B) above. It tends to model a + multihomed host as a set of logical hosts within + the same physical host. + + + +Internet Engineering Task Force [Page 62] + + + + +RFC1122 INTERNET LAYER October 1989 + + + With respect to (A), proponents of the Strong ES + model note that automatic Internet routing + mechanisms could not route a datagram to a + physical interface that did not correspond to the + destination address. + + Under the Strong ES model, the route computation + for an outgoing datagram is the mapping: + + route(src IP addr, dest IP addr, TOS) + -> gateway + + Here the source address is included as a parameter + in order to select a gateway that is directly + reachable on the corresponding physical interface. + Note that this model logically requires that in + general there be at least one default gateway, and + preferably multiple defaults, for each IP source + address. + + o Weak ES Model + + This view de-emphasizes the ES/IS distinction, and + would therefore substitute MUST NOT for MAY in + issues (A) and (B). This model may be the more + natural one for hosts that wiretap gateway routing + protocols, and is necessary for hosts that have + embedded gateway functionality. + + The Weak ES Model may cause the Redirect mechanism + to fail. If a datagram is sent out a physical + interface that does not correspond to the + destination address, the first-hop gateway will + not realize when it needs to send a Redirect. On + the other hand, if the host has embedded gateway + functionality, then it has routing information + without listening to Redirects. + + In the Weak ES model, the route computation for an + outgoing datagram is the mapping: + + route(dest IP addr, TOS) -> gateway, interface + + + + + + + + + +Internet Engineering Task Force [Page 63] + + + + +RFC1122 INTERNET LAYER October 1989 + + + 3.3.4.3 Choosing a Source Address + + DISCUSSION: + When it sends an initial connection request (e.g., a + TCP "SYN" segment) or a datagram service request (e.g., + a UDP-based query), the transport layer on a multihomed + host needs to know which source address to use. If the + application does not specify it, the transport layer + must ask the IP layer to perform the conceptual + mapping: + + GET_SRCADDR(remote IP addr, TOS) + -> local IP address + + Here TOS is the Type-of-Service value (see Section + 3.2.1.6), and the result is the desired source address. + The following rules are suggested for implementing this + mapping: + + (a) If the remote Internet address lies on one of the + (sub-) nets to which the host is directly + connected, a corresponding source address may be + chosen, unless the corresponding interface is + known to be down. + + (b) The route cache may be consulted, to see if there + is an active route to the specified destination + network through any network interface; if so, a + local IP address corresponding to that interface + may be chosen. + + (c) The table of static routes, if any (see Section + 3.3.1.2) may be similarly consulted. + + (d) The default gateways may be consulted. If these + gateways are assigned to different interfaces, the + interface corresponding to the gateway with the + highest preference may be chosen. + + In the future, there may be a defined way for a + multihomed host to ask the gateways on all connected + networks for advice about the best network to use for a + given destination. + + IMPLEMENTATION: + It will be noted that this process is essentially the + same as datagram routing (see Section 3.3.1), and + therefore hosts may be able to combine the + + + +Internet Engineering Task Force [Page 64] + + + + +RFC1122 INTERNET LAYER October 1989 + + + implementation of the two functions. + + 3.3.5 Source Route Forwarding + + Subject to restrictions given below, a host MAY be able to act + as an intermediate hop in a source route, forwarding a source- + routed datagram to the next specified hop. + + However, in performing this gateway-like function, the host + MUST obey all the relevant rules for a gateway forwarding + source-routed datagrams [INTRO:2]. This includes the following + specific provisions, which override the corresponding host + provisions given earlier in this document: + + (A) TTL (ref. Section 3.2.1.7) + + The TTL field MUST be decremented and the datagram perhaps + discarded as specified for a gateway in [INTRO:2]. + + (B) ICMP Destination Unreachable (ref. Section 3.2.2.1) + + A host MUST be able to generate Destination Unreachable + messages with the following codes: + + 4 (Fragmentation Required but DF Set) when a source- + routed datagram cannot be fragmented to fit into the + target network; + + 5 (Source Route Failed) when a source-routed datagram + cannot be forwarded, e.g., because of a routing + problem or because the next hop of a strict source + route is not on a connected network. + + (C) IP Source Address (ref. Section 3.2.1.3) + + A source-routed datagram being forwarded MAY (and normally + will) have a source address that is not one of the IP + addresses of the forwarding host. + + (D) Record Route Option (ref. Section 3.2.1.8d) + + A host that is forwarding a source-routed datagram + containing a Record Route option MUST update that option, + if it has room. + + (E) Timestamp Option (ref. Section 3.2.1.8e) + + A host that is forwarding a source-routed datagram + + + +Internet Engineering Task Force [Page 65] + + + + +RFC1122 INTERNET LAYER October 1989 + + + containing a Timestamp Option MUST add the current + timestamp to that option, according to the rules for this + option. + + To define the rules restricting host forwarding of source- + routed datagrams, we use the term "local source-routing" if the + next hop will be through the same physical interface through + which the datagram arrived; otherwise, it is "non-local + source-routing". + + o A host is permitted to perform local source-routing + without restriction. + + o A host that supports non-local source-routing MUST have a + configurable switch to disable forwarding, and this switch + MUST default to disabled. + + o The host MUST satisfy all gateway requirements for + configurable policy filters [INTRO:2] restricting non- + local forwarding. + + If a host receives a datagram with an incomplete source route + but does not forward it for some reason, the host SHOULD return + an ICMP Destination Unreachable (code 5, Source Route Failed) + message, unless the datagram was itself an ICMP error message. + + 3.3.6 Broadcasts + + Section 3.2.1.3 defined the four standard IP broadcast address + forms: + + Limited Broadcast: {-1, -1} + + Directed Broadcast: {<Network-number>,-1} + + Subnet Directed Broadcast: + {<Network-number>,<Subnet-number>,-1} + + All-Subnets Directed Broadcast: {<Network-number>,-1,-1} + + A host MUST recognize any of these forms in the destination + address of an incoming datagram. + + There is a class of hosts* that use non-standard broadcast + address forms, substituting 0 for -1. All hosts SHOULD +_________________________ +*4.2BSD Unix and its derivatives, but not 4.3BSD. + + + + +Internet Engineering Task Force [Page 66] + + + + +RFC1122 INTERNET LAYER October 1989 + + + recognize and accept any of these non-standard broadcast + addresses as the destination address of an incoming datagram. + A host MAY optionally have a configuration option to choose the + 0 or the -1 form of broadcast address, for each physical + interface, but this option SHOULD default to the standard (-1) + form. + + When a host sends a datagram to a link-layer broadcast address, + the IP destination address MUST be a legal IP broadcast or IP + multicast address. + + A host SHOULD silently discard a datagram that is received via + a link-layer broadcast (see Section 2.4) but does not specify + an IP multicast or broadcast destination address. + + Hosts SHOULD use the Limited Broadcast address to broadcast to + a connected network. + + + DISCUSSION: + Using the Limited Broadcast address instead of a Directed + Broadcast address may improve system robustness. Problems + are often caused by machines that do not understand the + plethora of broadcast addresses (see Section 3.2.1.3), or + that may have different ideas about which broadcast + addresses are in use. The prime example of the latter is + machines that do not understand subnetting but are + attached to a subnetted net. Sending a Subnet Broadcast + for the connected network will confuse those machines, + which will see it as a message to some other host. + + There has been discussion on whether a datagram addressed + to the Limited Broadcast address ought to be sent from all + the interfaces of a multihomed host. This specification + takes no stand on the issue. + + 3.3.7 IP Multicasting + + A host SHOULD support local IP multicasting on all connected + networks for which a mapping from Class D IP addresses to + link-layer addresses has been specified (see below). Support + for local IP multicasting includes sending multicast datagrams, + joining multicast groups and receiving multicast datagrams, and + leaving multicast groups. This implies support for all of + [IP:4] except the IGMP protocol itself, which is OPTIONAL. + + + + + + +Internet Engineering Task Force [Page 67] + + + + +RFC1122 INTERNET LAYER October 1989 + + + DISCUSSION: + IGMP provides gateways that are capable of multicast + routing with the information required to support IP + multicasting across multiple networks. At this time, + multicast-routing gateways are in the experimental stage + and are not widely available. For hosts that are not + connected to networks with multicast-routing gateways or + that do not need to receive multicast datagrams + originating on other networks, IGMP serves no purpose and + is therefore optional for now. However, the rest of + [IP:4] is currently recommended for the purpose of + providing IP-layer access to local network multicast + addressing, as a preferable alternative to local broadcast + addressing. It is expected that IGMP will become + recommended at some future date, when multicast-routing + gateways have become more widely available. + + If IGMP is not implemented, a host SHOULD still join the "all- + hosts" group (224.0.0.1) when the IP layer is initialized and + remain a member for as long as the IP layer is active. + + DISCUSSION: + Joining the "all-hosts" group will support strictly local + uses of multicasting, e.g., a gateway discovery protocol, + even if IGMP is not implemented. + + The mapping of IP Class D addresses to local addresses is + currently specified for the following types of networks: + + o Ethernet/IEEE 802.3, as defined in [IP:4]. + + o Any network that supports broadcast but not multicast, + addressing: all IP Class D addresses map to the local + broadcast address. + + o Any type of point-to-point link (e.g., SLIP or HDLC + links): no mapping required. All IP multicast datagrams + are sent as-is, inside the local framing. + + Mappings for other types of networks will be specified in the + future. + + A host SHOULD provide a way for higher-layer protocols or + applications to determine which of the host's connected + network(s) support IP multicast addressing. + + + + + + +Internet Engineering Task Force [Page 68] + + + + +RFC1122 INTERNET LAYER October 1989 + + + 3.3.8 Error Reporting + + Wherever practical, hosts MUST return ICMP error datagrams on + detection of an error, except in those cases where returning an + ICMP error message is specifically prohibited. + + DISCUSSION: + A common phenomenon in datagram networks is the "black + hole disease": datagrams are sent out, but nothing comes + back. Without any error datagrams, it is difficult for + the user to figure out what the problem is. + + 3.4 INTERNET/TRANSPORT LAYER INTERFACE + + The interface between the IP layer and the transport layer MUST + provide full access to all the mechanisms of the IP layer, + including options, Type-of-Service, and Time-to-Live. The + transport layer MUST either have mechanisms to set these interface + parameters, or provide a path to pass them through from an + application, or both. + + DISCUSSION: + Applications are urged to make use of these mechanisms where + applicable, even when the mechanisms are not currently + effective in the Internet (e.g., TOS). This will allow these + mechanisms to be immediately useful when they do become + effective, without a large amount of retrofitting of host + software. + + We now describe a conceptual interface between the transport layer + and the IP layer, as a set of procedure calls. This is an + extension of the information in Section 3.3 of RFC-791 [IP:1]. + + + * Send Datagram + + SEND(src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt + => result ) + + where the parameters are defined in RFC-791. Passing an Id + parameter is optional; see Section 3.2.1.5. + + + * Receive Datagram + + RECV(BufPTR, prot + => result, src, dst, SpecDest, TOS, len, opt) + + + + +Internet Engineering Task Force [Page 69] + + + + +RFC1122 INTERNET LAYER October 1989 + + + All the parameters are defined in RFC-791, except for: + + SpecDest = specific-destination address of datagram + (defined in Section 3.2.1.3) + + The result parameter dst contains the datagram's destination + address. Since this may be a broadcast or multicast address, + the SpecDest parameter (not shown in RFC-791) MUST be passed. + The parameter opt contains all the IP options received in the + datagram; these MUST also be passed to the transport layer. + + + * Select Source Address + + GET_SRCADDR(remote, TOS) -> local + + remote = remote IP address + TOS = Type-of-Service + local = local IP address + + See Section 3.3.4.3. + + + * Find Maximum Datagram Sizes + + GET_MAXSIZES(local, remote, TOS) -> MMS_R, MMS_S + + MMS_R = maximum receive transport-message size. + MMS_S = maximum send transport-message size. + (local, remote, TOS defined above) + + See Sections 3.3.2 and 3.3.3. + + + * Advice on Delivery Success + + ADVISE_DELIVPROB(sense, local, remote, TOS) + + Here the parameter sense is a 1-bit flag indicating whether + positive or negative advice is being given; see the + discussion in Section 3.3.1.4. The other parameters were + defined earlier. + + + * Send ICMP Message + + SEND_ICMP(src, dst, TOS, TTL, BufPTR, len, Id, DF, opt) + -> result + + + +Internet Engineering Task Force [Page 70] + + + + +RFC1122 INTERNET LAYER October 1989 + + + (Parameters defined in RFC-791). + + Passing an Id parameter is optional; see Section 3.2.1.5. + The transport layer MUST be able to send certain ICMP + messages: Port Unreachable or any of the query-type + messages. This function could be considered to be a special + case of the SEND() call, of course; we describe it separately + for clarity. + + + * Receive ICMP Message + + RECV_ICMP(BufPTR ) -> result, src, dst, len, opt + + (Parameters defined in RFC-791). + + The IP layer MUST pass certain ICMP messages up to the + appropriate transport-layer routine. This function could be + considered to be a special case of the RECV() call, of + course; we describe it separately for clarity. + + For an ICMP error message, the data that is passed up MUST + include the original Internet header plus all the octets of + the original message that are included in the ICMP message. + This data will be used by the transport layer to locate the + connection state information, if any. + + In particular, the following ICMP messages are to be passed + up: + + o Destination Unreachable + + o Source Quench + + o Echo Reply (to ICMP user interface, unless the Echo + Request originated in the IP layer) + + o Timestamp Reply (to ICMP user interface) + + o Time Exceeded + + + DISCUSSION: + In the future, there may be additions to this interface to + pass path data (see Section 3.3.1.3) between the IP and + transport layers. + + + + + +Internet Engineering Task Force [Page 71] + + + + +RFC1122 INTERNET LAYER October 1989 + + + 3.5 INTERNET LAYER REQUIREMENTS SUMMARY + + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-------------------------------------------------|--------|-|-|-|-|-|-- + | | | | | | | +Implement IP and ICMP |3.1 |x| | | | | +Handle remote multihoming in application layer |3.1 |x| | | | | +Support local multihoming |3.1 | | |x| | | +Meet gateway specs if forward datagrams |3.1 |x| | | | | +Configuration switch for embedded gateway |3.1 |x| | | | |1 + Config switch default to non-gateway |3.1 |x| | | | |1 + Auto-config based on number of interfaces |3.1 | | | | |x|1 +Able to log discarded datagrams |3.1 | |x| | | | + Record in counter |3.1 | |x| | | | + | | | | | | | +Silently discard Version != 4 |3.2.1.1 |x| | | | | +Verify IP checksum, silently discard bad dgram |3.2.1.2 |x| | | | | +Addressing: | | | | | | | + Subnet addressing (RFC-950) |3.2.1.3 |x| | | | | + Src address must be host's own IP address |3.2.1.3 |x| | | | | + Silently discard datagram with bad dest addr |3.2.1.3 |x| | | | | + Silently discard datagram with bad src addr |3.2.1.3 |x| | | | | +Support reassembly |3.2.1.4 |x| | | | | +Retain same Id field in identical datagram |3.2.1.5 | | |x| | | + | | | | | | | +TOS: | | | | | | | + Allow transport layer to set TOS |3.2.1.6 |x| | | | | + Pass received TOS up to transport layer |3.2.1.6 | |x| | | | + Use RFC-795 link-layer mappings for TOS |3.2.1.6 | | | |x| | +TTL: | | | | | | | + Send packet with TTL of 0 |3.2.1.7 | | | | |x| + Discard received packets with TTL < 2 |3.2.1.7 | | | | |x| + Allow transport layer to set TTL |3.2.1.7 |x| | | | | + Fixed TTL is configurable |3.2.1.7 |x| | | | | + | | | | | | | +IP Options: | | | | | | | + Allow transport layer to send IP options |3.2.1.8 |x| | | | | + Pass all IP options rcvd to higher layer |3.2.1.8 |x| | | | | + + + +Internet Engineering Task Force [Page 72] + + + + +RFC1122 INTERNET LAYER October 1989 + + + IP layer silently ignore unknown options |3.2.1.8 |x| | | | | + Security option |3.2.1.8a| | |x| | | + Send Stream Identifier option |3.2.1.8b| | | |x| | + Silently ignore Stream Identifer option |3.2.1.8b|x| | | | | + Record Route option |3.2.1.8d| | |x| | | + Timestamp option |3.2.1.8e| | |x| | | +Source Route Option: | | | | | | | + Originate & terminate Source Route options |3.2.1.8c|x| | | | | + Datagram with completed SR passed up to TL |3.2.1.8c|x| | | | | + Build correct (non-redundant) return route |3.2.1.8c|x| | | | | + Send multiple SR options in one header |3.2.1.8c| | | | |x| + | | | | | | | +ICMP: | | | | | | | + Silently discard ICMP msg with unknown type |3.2.2 |x| | | | | + Include more than 8 octets of orig datagram |3.2.2 | | |x| | | + Included octets same as received |3.2.2 |x| | | | | + Demux ICMP Error to transport protocol |3.2.2 |x| | | | | + Send ICMP error message with TOS=0 |3.2.2 | |x| | | | + Send ICMP error message for: | | | | | | | + - ICMP error msg |3.2.2 | | | | |x| + - IP b'cast or IP m'cast |3.2.2 | | | | |x| + - Link-layer b'cast |3.2.2 | | | | |x| + - Non-initial fragment |3.2.2 | | | | |x| + - Datagram with non-unique src address |3.2.2 | | | | |x| + Return ICMP error msgs (when not prohibited) |3.3.8 |x| | | | | + | | | | | | | + Dest Unreachable: | | | | | | | + Generate Dest Unreachable (code 2/3) |3.2.2.1 | |x| | | | + Pass ICMP Dest Unreachable to higher layer |3.2.2.1 |x| | | | | + Higher layer act on Dest Unreach |3.2.2.1 | |x| | | | + Interpret Dest Unreach as only hint |3.2.2.1 |x| | | | | + Redirect: | | | | | | | + Host send Redirect |3.2.2.2 | | | |x| | + Update route cache when recv Redirect |3.2.2.2 |x| | | | | + Handle both Host and Net Redirects |3.2.2.2 |x| | | | | + Discard illegal Redirect |3.2.2.2 | |x| | | | + Source Quench: | | | | | | | + Send Source Quench if buffering exceeded |3.2.2.3 | | |x| | | + Pass Source Quench to higher layer |3.2.2.3 |x| | | | | + Higher layer act on Source Quench |3.2.2.3 | |x| | | | + Time Exceeded: pass to higher layer |3.2.2.4 |x| | | | | + Parameter Problem: | | | | | | | + Send Parameter Problem messages |3.2.2.5 | |x| | | | + Pass Parameter Problem to higher layer |3.2.2.5 |x| | | | | + Report Parameter Problem to user |3.2.2.5 | | |x| | | + | | | | | | | + ICMP Echo Request or Reply: | | | | | | | + Echo server and Echo client |3.2.2.6 |x| | | | | + + + +Internet Engineering Task Force [Page 73] + + + + +RFC1122 INTERNET LAYER October 1989 + + + Echo client |3.2.2.6 | |x| | | | + Discard Echo Request to broadcast address |3.2.2.6 | | |x| | | + Discard Echo Request to multicast address |3.2.2.6 | | |x| | | + Use specific-dest addr as Echo Reply src |3.2.2.6 |x| | | | | + Send same data in Echo Reply |3.2.2.6 |x| | | | | + Pass Echo Reply to higher layer |3.2.2.6 |x| | | | | + Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | | + Reverse and reflect Source Route option |3.2.2.6 |x| | | | | + | | | | | | | + ICMP Information Request or Reply: |3.2.2.7 | | | |x| | + ICMP Timestamp and Timestamp Reply: |3.2.2.8 | | |x| | | + Minimize delay variability |3.2.2.8 | |x| | | |1 + Silently discard b'cast Timestamp |3.2.2.8 | | |x| | |1 + Silently discard m'cast Timestamp |3.2.2.8 | | |x| | |1 + Use specific-dest addr as TS Reply src |3.2.2.8 |x| | | | |1 + Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | |1 + Reverse and reflect Source Route option |3.2.2.8 |x| | | | |1 + Pass Timestamp Reply to higher layer |3.2.2.8 |x| | | | |1 + Obey rules for "standard value" |3.2.2.8 |x| | | | |1 + | | | | | | | + ICMP Address Mask Request and Reply: | | | | | | | + Addr Mask source configurable |3.2.2.9 |x| | | | | + Support static configuration of addr mask |3.2.2.9 |x| | | | | + Get addr mask dynamically during booting |3.2.2.9 | | |x| | | + Get addr via ICMP Addr Mask Request/Reply |3.2.2.9 | | |x| | | + Retransmit Addr Mask Req if no Reply |3.2.2.9 |x| | | | |3 + Assume default mask if no Reply |3.2.2.9 | |x| | | |3 + Update address mask from first Reply only |3.2.2.9 |x| | | | |3 + Reasonableness check on Addr Mask |3.2.2.9 | |x| | | | + Send unauthorized Addr Mask Reply msgs |3.2.2.9 | | | | |x| + Explicitly configured to be agent |3.2.2.9 |x| | | | | + Static config=> Addr-Mask-Authoritative flag |3.2.2.9 | |x| | | | + Broadcast Addr Mask Reply when init. |3.2.2.9 |x| | | | |3 + | | | | | | | +ROUTING OUTBOUND DATAGRAMS: | | | | | | | + Use address mask in local/remote decision |3.3.1.1 |x| | | | | + Operate with no gateways on conn network |3.3.1.1 |x| | | | | + Maintain "route cache" of next-hop gateways |3.3.1.2 |x| | | | | + Treat Host and Net Redirect the same |3.3.1.2 | |x| | | | + If no cache entry, use default gateway |3.3.1.2 |x| | | | | + Support multiple default gateways |3.3.1.2 |x| | | | | + Provide table of static routes |3.3.1.2 | | |x| | | + Flag: route overridable by Redirects |3.3.1.2 | | |x| | | + Key route cache on host, not net address |3.3.1.3 | | |x| | | + Include TOS in route cache |3.3.1.3 | |x| | | | + | | | | | | | + Able to detect failure of next-hop gateway |3.3.1.4 |x| | | | | + Assume route is good forever |3.3.1.4 | | | |x| | + + + +Internet Engineering Task Force [Page 74] + + + + +RFC1122 INTERNET LAYER October 1989 + + + Ping gateways continuously |3.3.1.4 | | | | |x| + Ping only when traffic being sent |3.3.1.4 |x| | | | | + Ping only when no positive indication |3.3.1.4 |x| | | | | + Higher and lower layers give advice |3.3.1.4 | |x| | | | + Switch from failed default g'way to another |3.3.1.5 |x| | | | | + Manual method of entering config info |3.3.1.6 |x| | | | | + | | | | | | | +REASSEMBLY and FRAGMENTATION: | | | | | | | + Able to reassemble incoming datagrams |3.3.2 |x| | | | | + At least 576 byte datagrams |3.3.2 |x| | | | | + EMTU_R configurable or indefinite |3.3.2 | |x| | | | + Transport layer able to learn MMS_R |3.3.2 |x| | | | | + Send ICMP Time Exceeded on reassembly timeout |3.3.2 |x| | | | | + Fixed reassembly timeout value |3.3.2 | |x| | | | + | | | | | | | + Pass MMS_S to higher layers |3.3.3 |x| | | | | + Local fragmentation of outgoing packets |3.3.3 | | |x| | | + Else don't send bigger than MMS_S |3.3.3 |x| | | | | + Send max 576 to off-net destination |3.3.3 | |x| | | | + All-Subnets-MTU configuration flag |3.3.3 | | |x| | | + | | | | | | | +MULTIHOMING: | | | | | | | + Reply with same addr as spec-dest addr |3.3.4.2 | |x| | | | + Allow application to choose local IP addr |3.3.4.2 |x| | | | | + Silently discard d'gram in "wrong" interface |3.3.4.2 | | |x| | | + Only send d'gram through "right" interface |3.3.4.2 | | |x| | |4 + | | | | | | | +SOURCE-ROUTE FORWARDING: | | | | | | | + Forward datagram with Source Route option |3.3.5 | | |x| | |1 + Obey corresponding gateway rules |3.3.5 |x| | | | |1 + Update TTL by gateway rules |3.3.5 |x| | | | |1 + Able to generate ICMP err code 4, 5 |3.3.5 |x| | | | |1 + IP src addr not local host |3.3.5 | | |x| | |1 + Update Timestamp, Record Route options |3.3.5 |x| | | | |1 + Configurable switch for non-local SRing |3.3.5 |x| | | | |1 + Defaults to OFF |3.3.5 |x| | | | |1 + Satisfy gwy access rules for non-local SRing |3.3.5 |x| | | | |1 + If not forward, send Dest Unreach (cd 5) |3.3.5 | |x| | | |2 + | | | | | | | +BROADCAST: | | | | | | | + Broadcast addr as IP source addr |3.2.1.3 | | | | |x| + Receive 0 or -1 broadcast formats OK |3.3.6 | |x| | | | + Config'ble option to send 0 or -1 b'cast |3.3.6 | | |x| | | + Default to -1 broadcast |3.3.6 | |x| | | | + Recognize all broadcast address formats |3.3.6 |x| | | | | + Use IP b'cast/m'cast addr in link-layer b'cast |3.3.6 |x| | | | | + Silently discard link-layer-only b'cast dg's |3.3.6 | |x| | | | + Use Limited Broadcast addr for connected net |3.3.6 | |x| | | | + + + +Internet Engineering Task Force [Page 75] + + + + +RFC1122 INTERNET LAYER October 1989 + + + | | | | | | | +MULTICAST: | | | | | | | + Support local IP multicasting (RFC-1112) |3.3.7 | |x| | | | + Support IGMP (RFC-1112) |3.3.7 | | |x| | | + Join all-hosts group at startup |3.3.7 | |x| | | | + Higher layers learn i'face m'cast capability |3.3.7 | |x| | | | + | | | | | | | +INTERFACE: | | | | | | | + Allow transport layer to use all IP mechanisms |3.4 |x| | | | | + Pass interface ident up to transport layer |3.4 |x| | | | | + Pass all IP options up to transport layer |3.4 |x| | | | | + Transport layer can send certain ICMP messages |3.4 |x| | | | | + Pass spec'd ICMP messages up to transp. layer |3.4 |x| | | | | + Include IP hdr+8 octets or more from orig. |3.4 |x| | | | | + Able to leap tall buildings at a single bound |3.5 | |x| | | | + +Footnotes: + +(1) Only if feature is implemented. + +(2) This requirement is overruled if datagram is an ICMP error message. + +(3) Only if feature is implemented and is configured "on". + +(4) Unless has embedded gateway functionality or is source routed. + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 76] + + + + +RFC1122 TRANSPORT LAYER -- UDP October 1989 + + +4. TRANSPORT PROTOCOLS + + 4.1 USER DATAGRAM PROTOCOL -- UDP + + 4.1.1 INTRODUCTION + + The User Datagram Protocol UDP [UDP:1] offers only a minimal + transport service -- non-guaranteed datagram delivery -- and + gives applications direct access to the datagram service of the + IP layer. UDP is used by applications that do not require the + level of service of TCP or that wish to use communications + services (e.g., multicast or broadcast delivery) not available + from TCP. + + UDP is almost a null protocol; the only services it provides + over IP are checksumming of data and multiplexing by port + number. Therefore, an application program running over UDP + must deal directly with end-to-end communication problems that + a connection-oriented protocol would have handled -- e.g., + retransmission for reliable delivery, packetization and + reassembly, flow control, congestion avoidance, etc., when + these are required. The fairly complex coupling between IP and + TCP will be mirrored in the coupling between UDP and many + applications using UDP. + + 4.1.2 PROTOCOL WALK-THROUGH + + There are no known errors in the specification of UDP. + + 4.1.3 SPECIFIC ISSUES + + 4.1.3.1 Ports + + UDP well-known ports follow the same rules as TCP well-known + ports; see Section 4.2.2.1 below. + + If a datagram arrives addressed to a UDP port for which + there is no pending LISTEN call, UDP SHOULD send an ICMP + Port Unreachable message. + + 4.1.3.2 IP Options + + UDP MUST pass any IP option that it receives from the IP + layer transparently to the application layer. + + An application MUST be able to specify IP options to be sent + in its UDP datagrams, and UDP MUST pass these options to the + IP layer. + + + +Internet Engineering Task Force [Page 77] + + + + +RFC1122 TRANSPORT LAYER -- UDP October 1989 + + + DISCUSSION: + At present, the only options that need be passed + through UDP are Source Route, Record Route, and Time + Stamp. However, new options may be defined in the + future, and UDP need not and should not make any + assumptions about the format or content of options it + passes to or from the application; an exception to this + might be an IP-layer security option. + + An application based on UDP will need to obtain a + source route from a request datagram and supply a + reversed route for sending the corresponding reply. + + 4.1.3.3 ICMP Messages + + UDP MUST pass to the application layer all ICMP error + messages that it receives from the IP layer. Conceptually + at least, this may be accomplished with an upcall to the + ERROR_REPORT routine (see Section 4.2.4.1). + + DISCUSSION: + Note that ICMP error messages resulting from sending a + UDP datagram are received asynchronously. A UDP-based + application that wants to receive ICMP error messages + is responsible for maintaining the state necessary to + demultiplex these messages when they arrive; for + example, the application may keep a pending receive + operation for this purpose. The application is also + responsible to avoid confusion from a delayed ICMP + error message resulting from an earlier use of the same + port(s). + + 4.1.3.4 UDP Checksums + + A host MUST implement the facility to generate and validate + UDP checksums. An application MAY optionally be able to + control whether a UDP checksum will be generated, but it + MUST default to checksumming on. + + If a UDP datagram is received with a checksum that is non- + zero and invalid, UDP MUST silently discard the datagram. + An application MAY optionally be able to control whether UDP + datagrams without checksums should be discarded or passed to + the application. + + DISCUSSION: + Some applications that normally run only across local + area networks have chosen to turn off UDP checksums for + + + +Internet Engineering Task Force [Page 78] + + + + +RFC1122 TRANSPORT LAYER -- UDP October 1989 + + + efficiency. As a result, numerous cases of undetected + errors have been reported. The advisability of ever + turning off UDP checksumming is very controversial. + + IMPLEMENTATION: + There is a common implementation error in UDP + checksums. Unlike the TCP checksum, the UDP checksum + is optional; the value zero is transmitted in the + checksum field of a UDP header to indicate the absence + of a checksum. If the transmitter really calculates a + UDP checksum of zero, it must transmit the checksum as + all 1's (65535). No special action is required at the + receiver, since zero and 65535 are equivalent in 1's + complement arithmetic. + + 4.1.3.5 UDP Multihoming + + When a UDP datagram is received, its specific-destination + address MUST be passed up to the application layer. + + An application program MUST be able to specify the IP source + address to be used for sending a UDP datagram or to leave it + unspecified (in which case the networking software will + choose an appropriate source address). There SHOULD be a + way to communicate the chosen source address up to the + application layer (e.g, so that the application can later + receive a reply datagram only from the corresponding + interface). + + DISCUSSION: + A request/response application that uses UDP should use + a source address for the response that is the same as + the specific destination address of the request. See + the "General Issues" section of [INTRO:1]. + + 4.1.3.6 Invalid Addresses + + A UDP datagram received with an invalid IP source address + (e.g., a broadcast or multicast address) must be discarded + by UDP or by the IP layer (see Section 3.2.1.3). + + When a host sends a UDP datagram, the source address MUST be + (one of) the IP address(es) of the host. + + 4.1.4 UDP/APPLICATION LAYER INTERFACE + + The application interface to UDP MUST provide the full services + of the IP/transport interface described in Section 3.4 of this + + + +Internet Engineering Task Force [Page 79] + + + + +RFC1122 TRANSPORT LAYER -- UDP October 1989 + + + document. Thus, an application using UDP needs the functions + of the GET_SRCADDR(), GET_MAXSIZES(), ADVISE_DELIVPROB(), and + RECV_ICMP() calls described in Section 3.4. For example, + GET_MAXSIZES() can be used to learn the effective maximum UDP + maximum datagram size for a particular {interface,remote + host,TOS} triplet. + + An application-layer program MUST be able to set the TTL and + TOS values as well as IP options for sending a UDP datagram, + and these values must be passed transparently to the IP layer. + UDP MAY pass the received TOS up to the application layer. + + 4.1.5 UDP REQUIREMENTS SUMMARY + + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-------------------------------------------------|--------|-|-|-|-|-|-- + | | | | | | | + UDP | | | | | | | +-------------------------------------------------|--------|-|-|-|-|-|-- + | | | | | | | +UDP send Port Unreachable |4.1.3.1 | |x| | | | + | | | | | | | +IP Options in UDP | | | | | | | + - Pass rcv'd IP options to applic layer |4.1.3.2 |x| | | | | + - Applic layer can specify IP options in Send |4.1.3.2 |x| | | | | + - UDP passes IP options down to IP layer |4.1.3.2 |x| | | | | + | | | | | | | +Pass ICMP msgs up to applic layer |4.1.3.3 |x| | | | | + | | | | | | | +UDP checksums: | | | | | | | + - Able to generate/check checksum |4.1.3.4 |x| | | | | + - Silently discard bad checksum |4.1.3.4 |x| | | | | + - Sender Option to not generate checksum |4.1.3.4 | | |x| | | + - Default is to checksum |4.1.3.4 |x| | | | | + - Receiver Option to require checksum |4.1.3.4 | | |x| | | + | | | | | | | +UDP Multihoming | | | | | | | + - Pass spec-dest addr to application |4.1.3.5 |x| | | | | + + + +Internet Engineering Task Force [Page 80] + + + + +RFC1122 TRANSPORT LAYER -- UDP October 1989 + + + - Applic layer can specify Local IP addr |4.1.3.5 |x| | | | | + - Applic layer specify wild Local IP addr |4.1.3.5 |x| | | | | + - Applic layer notified of Local IP addr used |4.1.3.5 | |x| | | | + | | | | | | | +Bad IP src addr silently discarded by UDP/IP |4.1.3.6 |x| | | | | +Only send valid IP source address |4.1.3.6 |x| | | | | +UDP Application Interface Services | | | | | | | +Full IP interface of 3.4 for application |4.1.4 |x| | | | | + - Able to spec TTL, TOS, IP opts when send dg |4.1.4 |x| | | | | + - Pass received TOS up to applic layer |4.1.4 | | |x| | | + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 81] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP + + 4.2.1 INTRODUCTION + + The Transmission Control Protocol TCP [TCP:1] is the primary + virtual-circuit transport protocol for the Internet suite. TCP + provides reliable, in-sequence delivery of a full-duplex stream + of octets (8-bit bytes). TCP is used by those applications + needing reliable, connection-oriented transport service, e.g., + mail (SMTP), file transfer (FTP), and virtual terminal service + (Telnet); requirements for these application-layer protocols + are described in [INTRO:1]. + + 4.2.2 PROTOCOL WALK-THROUGH + + 4.2.2.1 Well-Known Ports: RFC-793 Section 2.7 + + DISCUSSION: + TCP reserves port numbers in the range 0-255 for + "well-known" ports, used to access services that are + standardized across the Internet. The remainder of the + port space can be freely allocated to application + processes. Current well-known port definitions are + listed in the RFC entitled "Assigned Numbers" + [INTRO:6]. A prerequisite for defining a new well- + known port is an RFC documenting the proposed service + in enough detail to allow new implementations. + + Some systems extend this notion by adding a third + subdivision of the TCP port space: reserved ports, + which are generally used for operating-system-specific + services. For example, reserved ports might fall + between 256 and some system-dependent upper limit. + Some systems further choose to protect well-known and + reserved ports by permitting only privileged users to + open TCP connections with those port values. This is + perfectly reasonable as long as the host does not + assume that all hosts protect their low-numbered ports + in this manner. + + 4.2.2.2 Use of Push: RFC-793 Section 2.8 + + When an application issues a series of SEND calls without + setting the PUSH flag, the TCP MAY aggregate the data + internally without sending it. Similarly, when a series of + segments is received without the PSH bit, a TCP MAY queue + the data internally without passing it to the receiving + application. + + + +Internet Engineering Task Force [Page 82] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + The PSH bit is not a record marker and is independent of + segment boundaries. The transmitter SHOULD collapse + successive PSH bits when it packetizes data, to send the + largest possible segment. + + A TCP MAY implement PUSH flags on SEND calls. If PUSH flags + are not implemented, then the sending TCP: (1) must not + buffer data indefinitely, and (2) MUST set the PSH bit in + the last buffered segment (i.e., when there is no more + queued data to be sent). + + The discussion in RFC-793 on pages 48, 50, and 74 + erroneously implies that a received PSH flag must be passed + to the application layer. Passing a received PSH flag to + the application layer is now OPTIONAL. + + An application program is logically required to set the PUSH + flag in a SEND call whenever it needs to force delivery of + the data to avoid a communication deadlock. However, a TCP + SHOULD send a maximum-sized segment whenever possible, to + improve performance (see Section 4.2.3.4). + + DISCUSSION: + When the PUSH flag is not implemented on SEND calls, + i.e., when the application/TCP interface uses a pure + streaming model, responsibility for aggregating any + tiny data fragments to form reasonable sized segments + is partially borne by the application layer. + + Generally, an interactive application protocol must set + the PUSH flag at least in the last SEND call in each + command or response sequence. A bulk transfer protocol + like FTP should set the PUSH flag on the last segment + of a file or when necessary to prevent buffer deadlock. + + At the receiver, the PSH bit forces buffered data to be + delivered to the application (even if less than a full + buffer has been received). Conversely, the lack of a + PSH bit can be used to avoid unnecessary wakeup calls + to the application process; this can be an important + performance optimization for large timesharing hosts. + Passing the PSH bit to the receiving application allows + an analogous optimization within the application. + + 4.2.2.3 Window Size: RFC-793 Section 3.1 + + The window size MUST be treated as an unsigned number, or + else large window sizes will appear like negative windows + + + +Internet Engineering Task Force [Page 83] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + and TCP will not work. It is RECOMMENDED that + implementations reserve 32-bit fields for the send and + receive window sizes in the connection record and do all + window computations with 32 bits. + + DISCUSSION: + It is known that the window field in the TCP header is + too small for high-speed, long-delay paths. + Experimental TCP options have been defined to extend + the window size; see for example [TCP:11]. In + anticipation of the adoption of such an extension, TCP + implementors should treat windows as 32 bits. + + 4.2.2.4 Urgent Pointer: RFC-793 Section 3.1 + + The second sentence is in error: the urgent pointer points + to the sequence number of the LAST octet (not LAST+1) in a + sequence of urgent data. The description on page 56 (last + sentence) is correct. + + A TCP MUST support a sequence of urgent data of any length. + + A TCP MUST inform the application layer asynchronously + whenever it receives an Urgent pointer and there was + previously no pending urgent data, or whenever the Urgent + pointer advances in the data stream. There MUST be a way + for the application to learn how much urgent data remains to + be read from the connection, or at least to determine + whether or not more urgent data remains to be read. + + DISCUSSION: + Although the Urgent mechanism may be used for any + application, it is normally used to send "interrupt"- + type commands to a Telnet program (see "Using Telnet + Synch Sequence" section in [INTRO:1]). + + The asynchronous or "out-of-band" notification will + allow the application to go into "urgent mode", reading + data from the TCP connection. This allows control + commands to be sent to an application whose normal + input buffers are full of unprocessed data. + + IMPLEMENTATION: + The generic ERROR-REPORT() upcall described in Section + 4.2.4.1 is a possible mechanism for informing the + application of the arrival of urgent data. + + + + + +Internet Engineering Task Force [Page 84] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + 4.2.2.5 TCP Options: RFC-793 Section 3.1 + + A TCP MUST be able to receive a TCP option in any segment. + A TCP MUST ignore without error any TCP option it does not + implement, assuming that the option has a length field (all + TCP options defined in the future will have length fields). + TCP MUST be prepared to handle an illegal option length + (e.g., zero) without crashing; a suggested procedure is to + reset the connection and log the reason. + + 4.2.2.6 Maximum Segment Size Option: RFC-793 Section 3.1 + + TCP MUST implement both sending and receiving the Maximum + Segment Size option [TCP:4]. + + TCP SHOULD send an MSS (Maximum Segment Size) option in + every SYN segment when its receive MSS differs from the + default 536, and MAY send it always. + + If an MSS option is not received at connection setup, TCP + MUST assume a default send MSS of 536 (576-40) [TCP:4]. + + The maximum size of a segment that TCP really sends, the + "effective send MSS," MUST be the smaller of the send MSS + (which reflects the available reassembly buffer size at the + remote host) and the largest size permitted by the IP layer: + + Eff.snd.MSS = + + min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize + + where: + + * SendMSS is the MSS value received from the remote host, + or the default 536 if no MSS option is received. + + * MMS_S is the maximum size for a transport-layer message + that TCP may send. + + * TCPhdrsize is the size of the TCP header; this is + normally 20, but may be larger if TCP options are to be + sent. + + * IPoptionsize is the size of any IP options that TCP + will pass to the IP layer with the current message. + + + The MSS value to be sent in an MSS option must be less than + + + +Internet Engineering Task Force [Page 85] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + or equal to: + + MMS_R - 20 + + where MMS_R is the maximum size for a transport-layer + message that can be received (and reassembled). TCP obtains + MMS_R and MMS_S from the IP layer; see the generic call + GET_MAXSIZES in Section 3.4. + + DISCUSSION: + The choice of TCP segment size has a strong effect on + performance. Larger segments increase throughput by + amortizing header size and per-datagram processing + overhead over more data bytes; however, if the packet + is so large that it causes IP fragmentation, efficiency + drops sharply if any fragments are lost [IP:9]. + + Some TCP implementations send an MSS option only if the + destination host is on a non-connected network. + However, in general the TCP layer may not have the + appropriate information to make this decision, so it is + preferable to leave to the IP layer the task of + determining a suitable MTU for the Internet path. We + therefore recommend that TCP always send the option (if + not 536) and that the IP layer determine MMS_R as + specified in 3.3.3 and 3.4. A proposed IP-layer + mechanism to measure the MTU would then modify the IP + layer without changing TCP. + + 4.2.2.7 TCP Checksum: RFC-793 Section 3.1 + + Unlike the UDP checksum (see Section 4.1.3.4), the TCP + checksum is never optional. The sender MUST generate it and + the receiver MUST check it. + + 4.2.2.8 TCP Connection State Diagram: RFC-793 Section 3.2, + page 23 + + There are several problems with this diagram: + + (a) The arrow from SYN-SENT to SYN-RCVD should be labeled + with "snd SYN,ACK", to agree with the text on page 68 + and with Figure 8. + + (b) There could be an arrow from SYN-RCVD state to LISTEN + state, conditioned on receiving a RST after a passive + open (see text page 70). + + + + +Internet Engineering Task Force [Page 86] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + (c) It is possible to go directly from FIN-WAIT-1 to the + TIME-WAIT state (see page 75 of the spec). + + + 4.2.2.9 Initial Sequence Number Selection: RFC-793 Section + 3.3, page 27 + + A TCP MUST use the specified clock-driven selection of + initial sequence numbers. + + 4.2.2.10 Simultaneous Open Attempts: RFC-793 Section 3.4, page + 32 + + There is an error in Figure 8: the packet on line 7 should + be identical to the packet on line 5. + + A TCP MUST support simultaneous open attempts. + + DISCUSSION: + It sometimes surprises implementors that if two + applications attempt to simultaneously connect to each + other, only one connection is generated instead of two. + This was an intentional design decision; don't try to + "fix" it. + + 4.2.2.11 Recovery from Old Duplicate SYN: RFC-793 Section 3.4, + page 33 + + Note that a TCP implementation MUST keep track of whether a + connection has reached SYN_RCVD state as the result of a + passive OPEN or an active OPEN. + + 4.2.2.12 RST Segment: RFC-793 Section 3.4 + + A TCP SHOULD allow a received RST segment to include data. + + DISCUSSION + It has been suggested that a RST segment could contain + ASCII text that encoded and explained the cause of the + RST. No standard has yet been established for such + data. + + 4.2.2.13 Closing a Connection: RFC-793 Section 3.5 + + A TCP connection may terminate in two ways: (1) the normal + TCP close sequence using a FIN handshake, and (2) an "abort" + in which one or more RST segments are sent and the + connection state is immediately discarded. If a TCP + + + +Internet Engineering Task Force [Page 87] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + connection is closed by the remote site, the local + application MUST be informed whether it closed normally or + was aborted. + + The normal TCP close sequence delivers buffered data + reliably in both directions. Since the two directions of a + TCP connection are closed independently, it is possible for + a connection to be "half closed," i.e., closed in only one + direction, and a host is permitted to continue sending data + in the open direction on a half-closed connection. + + A host MAY implement a "half-duplex" TCP close sequence, so + that an application that has called CLOSE cannot continue to + read data from the connection. If such a host issues a + CLOSE call while received data is still pending in TCP, or + if new data is received after CLOSE is called, its TCP + SHOULD send a RST to show that data was lost. + + When a connection is closed actively, it MUST linger in + TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime). + However, it MAY accept a new SYN from the remote TCP to + reopen the connection directly from TIME-WAIT state, if it: + + (1) assigns its initial sequence number for the new + connection to be larger than the largest sequence + number it used on the previous connection incarnation, + and + + (2) returns to TIME-WAIT state if the SYN turns out to be + an old duplicate. + + + DISCUSSION: + TCP's full-duplex data-preserving close is a feature + that is not included in the analogous ISO transport + protocol TP4. + + Some systems have not implemented half-closed + connections, presumably because they do not fit into + the I/O model of their particular operating system. On + these systems, once an application has called CLOSE, it + can no longer read input data from the connection; this + is referred to as a "half-duplex" TCP close sequence. + + The graceful close algorithm of TCP requires that the + connection state remain defined on (at least) one end + of the connection, for a timeout period of 2xMSL, i.e., + 4 minutes. During this period, the (remote socket, + + + +Internet Engineering Task Force [Page 88] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + local socket) pair that defines the connection is busy + and cannot be reused. To shorten the time that a given + port pair is tied up, some TCPs allow a new SYN to be + accepted in TIME-WAIT state. + + 4.2.2.14 Data Communication: RFC-793 Section 3.7, page 40 + + Since RFC-793 was written, there has been extensive work on + TCP algorithms to achieve efficient data communication. + Later sections of the present document describe required and + recommended TCP algorithms to determine when to send data + (Section 4.2.3.4), when to send an acknowledgment (Section + 4.2.3.2), and when to update the window (Section 4.2.3.3). + + DISCUSSION: + One important performance issue is "Silly Window + Syndrome" or "SWS" [TCP:5], a stable pattern of small + incremental window movements resulting in extremely + poor TCP performance. Algorithms to avoid SWS are + described below for both the sending side (Section + 4.2.3.4) and the receiving side (Section 4.2.3.3). + + In brief, SWS is caused by the receiver advancing the + right window edge whenever it has any new buffer space + available to receive data and by the sender using any + incremental window, no matter how small, to send more + data [TCP:5]. The result can be a stable pattern of + sending tiny data segments, even though both sender and + receiver have a large total buffer space for the + connection. SWS can only occur during the transmission + of a large amount of data; if the connection goes + quiescent, the problem will disappear. It is caused by + typical straightforward implementation of window + management, but the sender and receiver algorithms + given below will avoid it. + + Another important TCP performance issue is that some + applications, especially remote login to character-at- + a-time hosts, tend to send streams of one-octet data + segments. To avoid deadlocks, every TCP SEND call from + such applications must be "pushed", either explicitly + by the application or else implicitly by TCP. The + result may be a stream of TCP segments that contain one + data octet each, which makes very inefficient use of + the Internet and contributes to Internet congestion. + The Nagle Algorithm described in Section 4.2.3.4 + provides a simple and effective solution to this + problem. It does have the effect of clumping + + + +Internet Engineering Task Force [Page 89] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + characters over Telnet connections; this may initially + surprise users accustomed to single-character echo, but + user acceptance has not been a problem. + + Note that the Nagle algorithm and the send SWS + avoidance algorithm play complementary roles in + improving performance. The Nagle algorithm discourages + sending tiny segments when the data to be sent + increases in small increments, while the SWS avoidance + algorithm discourages small segments resulting from the + right window edge advancing in small increments. + + A careless implementation can send two or more + acknowledgment segments per data segment received. For + example, suppose the receiver acknowledges every data + segment immediately. When the application program + subsequently consumes the data and increases the + available receive buffer space again, the receiver may + send a second acknowledgment segment to update the + window at the sender. The extreme case occurs with + single-character segments on TCP connections using the + Telnet protocol for remote login service. Some + implementations have been observed in which each + incoming 1-character segment generates three return + segments: (1) the acknowledgment, (2) a one byte + increase in the window, and (3) the echoed character, + respectively. + + 4.2.2.15 Retransmission Timeout: RFC-793 Section 3.7, page 41 + + The algorithm suggested in RFC-793 for calculating the + retransmission timeout is now known to be inadequate; see + Section 4.2.3.1 below. + + Recent work by Jacobson [TCP:7] on Internet congestion and + TCP retransmission stability has produced a transmission + algorithm combining "slow start" with "congestion + avoidance". A TCP MUST implement this algorithm. + + If a retransmitted packet is identical to the original + packet (which implies not only that the data boundaries have + not changed, but also that the window and acknowledgment + fields of the header have not changed), then the same IP + Identification field MAY be used (see Section 3.2.1.5). + + IMPLEMENTATION: + Some TCP implementors have chosen to "packetize" the + data stream, i.e., to pick segment boundaries when + + + +Internet Engineering Task Force [Page 90] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + segments are originally sent and to queue these + segments in a "retransmission queue" until they are + acknowledged. Another design (which may be simpler) is + to defer packetizing until each time data is + transmitted or retransmitted, so there will be no + segment retransmission queue. + + In an implementation with a segment retransmission + queue, TCP performance may be enhanced by repacketizing + the segments awaiting acknowledgment when the first + retransmission timeout occurs. That is, the + outstanding segments that fitted would be combined into + one maximum-sized segment, with a new IP Identification + value. The TCP would then retain this combined segment + in the retransmit queue until it was acknowledged. + However, if the first two segments in the + retransmission queue totalled more than one maximum- + sized segment, the TCP would retransmit only the first + segment using the original IP Identification field. + + 4.2.2.16 Managing the Window: RFC-793 Section 3.7, page 41 + + A TCP receiver SHOULD NOT shrink the window, i.e., move the + right window edge to the left. However, a sending TCP MUST + be robust against window shrinking, which may cause the + "useable window" (see Section 4.2.3.4) to become negative. + + If this happens, the sender SHOULD NOT send new data, but + SHOULD retransmit normally the old unacknowledged data + between SND.UNA and SND.UNA+SND.WND. The sender MAY also + retransmit old data beyond SND.UNA+SND.WND, but SHOULD NOT + time out the connection if data beyond the right window edge + is not acknowledged. If the window shrinks to zero, the TCP + MUST probe it in the standard way (see next Section). + + DISCUSSION: + Many TCP implementations become confused if the window + shrinks from the right after data has been sent into a + larger window. Note that TCP has a heuristic to select + the latest window update despite possible datagram + reordering; as a result, it may ignore a window update + with a smaller window than previously offered if + neither the sequence number nor the acknowledgment + number is increased. + + + + + + + +Internet Engineering Task Force [Page 91] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + 4.2.2.17 Probing Zero Windows: RFC-793 Section 3.7, page 42 + + Probing of zero (offered) windows MUST be supported. + + A TCP MAY keep its offered receive window closed + indefinitely. As long as the receiving TCP continues to + send acknowledgments in response to the probe segments, the + sending TCP MUST allow the connection to stay open. + + DISCUSSION: + It is extremely important to remember that ACK + (acknowledgment) segments that contain no data are not + reliably transmitted by TCP. If zero window probing is + not supported, a connection may hang forever when an + ACK segment that re-opens the window is lost. + + The delay in opening a zero window generally occurs + when the receiving application stops taking data from + its TCP. For example, consider a printer daemon + application, stopped because the printer ran out of + paper. + + The transmitting host SHOULD send the first zero-window + probe when a zero window has existed for the retransmission + timeout period (see Section 4.2.2.15), and SHOULD increase + exponentially the interval between successive probes. + + DISCUSSION: + This procedure minimizes delay if the zero-window + condition is due to a lost ACK segment containing a + window-opening update. Exponential backoff is + recommended, possibly with some maximum interval not + specified here. This procedure is similar to that of + the retransmission algorithm, and it may be possible to + combine the two procedures in the implementation. + + 4.2.2.18 Passive OPEN Calls: RFC-793 Section 3.8 + + Every passive OPEN call either creates a new connection + record in LISTEN state, or it returns an error; it MUST NOT + affect any previously created connection record. + + A TCP that supports multiple concurrent users MUST provide + an OPEN call that will functionally allow an application to + LISTEN on a port while a connection block with the same + local port is in SYN-SENT or SYN-RECEIVED state. + + DISCUSSION: + + + +Internet Engineering Task Force [Page 92] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + Some applications (e.g., SMTP servers) may need to + handle multiple connection attempts at about the same + time. The probability of a connection attempt failing + is reduced by giving the application some means of + listening for a new connection at the same time that an + earlier connection attempt is going through the three- + way handshake. + + IMPLEMENTATION: + Acceptable implementations of concurrent opens may + permit multiple passive OPEN calls, or they may allow + "cloning" of LISTEN-state connections from a single + passive OPEN call. + + 4.2.2.19 Time to Live: RFC-793 Section 3.9, page 52 + + RFC-793 specified that TCP was to request the IP layer to + send TCP segments with TTL = 60. This is obsolete; the TTL + value used to send TCP segments MUST be configurable. See + Section 3.2.1.7 for discussion. + + 4.2.2.20 Event Processing: RFC-793 Section 3.9 + + While it is not strictly required, a TCP SHOULD be capable + of queueing out-of-order TCP segments. Change the "may" in + the last sentence of the first paragraph on page 70 to + "should". + + DISCUSSION: + Some small-host implementations have omitted segment + queueing because of limited buffer space. This + omission may be expected to adversely affect TCP + throughput, since loss of a single segment causes all + later segments to appear to be "out of sequence". + + In general, the processing of received segments MUST be + implemented to aggregate ACK segments whenever possible. + For example, if the TCP is processing a series of queued + segments, it MUST process them all before sending any ACK + segments. + + Here are some detailed error corrections and notes on the + Event Processing section of RFC-793. + + (a) CLOSE Call, CLOSE-WAIT state, p. 61: enter LAST-ACK + state, not CLOSING. + + (b) LISTEN state, check for SYN (pp. 65, 66): With a SYN + + + +Internet Engineering Task Force [Page 93] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + bit, if the security/compartment or the precedence is + wrong for the segment, a reset is sent. The wrong form + of reset is shown in the text; it should be: + + <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK> + + + (c) SYN-SENT state, Check for SYN, p. 68: When the + connection enters ESTABLISHED state, the following + variables must be set: + SND.WND <- SEG.WND + SND.WL1 <- SEG.SEQ + SND.WL2 <- SEG.ACK + + + (d) Check security and precedence, p. 71: The first heading + "ESTABLISHED STATE" should really be a list of all + states other than SYN-RECEIVED: ESTABLISHED, FIN-WAIT- + 1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, and + TIME-WAIT. + + (e) Check SYN bit, p. 71: "In SYN-RECEIVED state and if + the connection was initiated with a passive OPEN, then + return this connection to the LISTEN state and return. + Otherwise...". + + (f) Check ACK field, SYN-RECEIVED state, p. 72: When the + connection enters ESTABLISHED state, the variables + listed in (c) must be set. + + (g) Check ACK field, ESTABLISHED state, p. 72: The ACK is a + duplicate if SEG.ACK =< SND.UNA (the = was omitted). + Similarly, the window should be updated if: SND.UNA =< + SEG.ACK =< SND.NXT. + + (h) USER TIMEOUT, p. 77: + + It would be better to notify the application of the + timeout rather than letting TCP force the connection + closed. However, see also Section 4.2.3.5. + + + 4.2.2.21 Acknowledging Queued Segments: RFC-793 Section 3.9 + + A TCP MAY send an ACK segment acknowledging RCV.NXT when a + valid segment arrives that is in the window but not at the + left window edge. + + + + +Internet Engineering Task Force [Page 94] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + DISCUSSION: + RFC-793 (see page 74) was ambiguous about whether or + not an ACK segment should be sent when an out-of-order + segment was received, i.e., when SEG.SEQ was unequal to + RCV.NXT. + + One reason for ACKing out-of-order segments might be to + support an experimental algorithm known as "fast + retransmit". With this algorithm, the sender uses the + "redundant" ACK's to deduce that a segment has been + lost before the retransmission timer has expired. It + counts the number of times an ACK has been received + with the same value of SEG.ACK and with the same right + window edge. If more than a threshold number of such + ACK's is received, then the segment containing the + octets starting at SEG.ACK is assumed to have been lost + and is retransmitted, without awaiting a timeout. The + threshold is chosen to compensate for the maximum + likely segment reordering in the Internet. There is + not yet enough experience with the fast retransmit + algorithm to determine how useful it is. + + 4.2.3 SPECIFIC ISSUES + + 4.2.3.1 Retransmission Timeout Calculation + + A host TCP MUST implement Karn's algorithm and Jacobson's + algorithm for computing the retransmission timeout ("RTO"). + + o Jacobson's algorithm for computing the smoothed round- + trip ("RTT") time incorporates a simple measure of the + variance [TCP:7]. + + o Karn's algorithm for selecting RTT measurements ensures + that ambiguous round-trip times will not corrupt the + calculation of the smoothed round-trip time [TCP:6]. + + This implementation also MUST include "exponential backoff" + for successive RTO values for the same segment. + Retransmission of SYN segments SHOULD use the same algorithm + as data segments. + + DISCUSSION: + There were two known problems with the RTO calculations + specified in RFC-793. First, the accurate measurement + of RTTs is difficult when there are retransmissions. + Second, the algorithm to compute the smoothed round- + trip time is inadequate [TCP:7], because it incorrectly + + + +Internet Engineering Task Force [Page 95] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + assumed that the variance in RTT values would be small + and constant. These problems were solved by Karn's and + Jacobson's algorithm, respectively. + + The performance increase resulting from the use of + these improvements varies from noticeable to dramatic. + Jacobson's algorithm for incorporating the measured RTT + variance is especially important on a low-speed link, + where the natural variation of packet sizes causes a + large variation in RTT. One vendor found link + utilization on a 9.6kb line went from 10% to 90% as a + result of implementing Jacobson's variance algorithm in + TCP. + + The following values SHOULD be used to initialize the + estimation parameters for a new connection: + + (a) RTT = 0 seconds. + + (b) RTO = 3 seconds. (The smoothed variance is to be + initialized to the value that will result in this RTO). + + The recommended upper and lower bounds on the RTO are known + to be inadequate on large internets. The lower bound SHOULD + be measured in fractions of a second (to accommodate high + speed LANs) and the upper bound should be 2*MSL, i.e., 240 + seconds. + + DISCUSSION: + Experience has shown that these initialization values + are reasonable, and that in any case the Karn and + Jacobson algorithms make TCP behavior reasonably + insensitive to the initial parameter choices. + + 4.2.3.2 When to Send an ACK Segment + + A host that is receiving a stream of TCP data segments can + increase efficiency in both the Internet and the hosts by + sending fewer than one ACK (acknowledgment) segment per data + segment received; this is known as a "delayed ACK" [TCP:5]. + + A TCP SHOULD implement a delayed ACK, but an ACK should not + be excessively delayed; in particular, the delay MUST be + less than 0.5 seconds, and in a stream of full-sized + segments there SHOULD be an ACK for at least every second + segment. + + DISCUSSION: + + + +Internet Engineering Task Force [Page 96] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + A delayed ACK gives the application an opportunity to + update the window and perhaps to send an immediate + response. In particular, in the case of character-mode + remote login, a delayed ACK can reduce the number of + segments sent by the server by a factor of 3 (ACK, + window update, and echo character all combined in one + segment). + + In addition, on some large multi-user hosts, a delayed + ACK can substantially reduce protocol processing + overhead by reducing the total number of packets to be + processed [TCP:5]. However, excessive delays on ACK's + can disturb the round-trip timing and packet "clocking" + algorithms [TCP:7]. + + 4.2.3.3 When to Send a Window Update + + A TCP MUST include a SWS avoidance algorithm in the receiver + [TCP:5]. + + IMPLEMENTATION: + The receiver's SWS avoidance algorithm determines when + the right window edge may be advanced; this is + customarily known as "updating the window". This + algorithm combines with the delayed ACK algorithm (see + Section 4.2.3.2) to determine when an ACK segment + containing the current window will really be sent to + the receiver. We use the notation of RFC-793; see + Figures 4 and 5 in that document. + + The solution to receiver SWS is to avoid advancing the + right window edge RCV.NXT+RCV.WND in small increments, + even if data is received from the network in small + segments. + + Suppose the total receive buffer space is RCV.BUFF. At + any given moment, RCV.USER octets of this total may be + tied up with data that has been received and + acknowledged but which the user process has not yet + consumed. When the connection is quiescent, RCV.WND = + RCV.BUFF and RCV.USER = 0. + + Keeping the right window edge fixed as data arrives and + is acknowledged requires that the receiver offer less + than its full buffer space, i.e., the receiver must + specify a RCV.WND that keeps RCV.NXT+RCV.WND constant + as RCV.NXT increases. Thus, the total buffer space + RCV.BUFF is generally divided into three parts: + + + +Internet Engineering Task Force [Page 97] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + + |<------- RCV.BUFF ---------------->| + 1 2 3 + ----|---------|------------------|------|---- + RCV.NXT ^ + (Fixed) + + 1 - RCV.USER = data received but not yet consumed; + 2 - RCV.WND = space advertised to sender; + 3 - Reduction = space available but not yet + advertised. + + + The suggested SWS avoidance algorithm for the receiver + is to keep RCV.NXT+RCV.WND fixed until the reduction + satisfies: + + RCV.BUFF - RCV.USER - RCV.WND >= + + min( Fr * RCV.BUFF, Eff.snd.MSS ) + + where Fr is a fraction whose recommended value is 1/2, + and Eff.snd.MSS is the effective send MSS for the + connection (see Section 4.2.2.6). When the inequality + is satisfied, RCV.WND is set to RCV.BUFF-RCV.USER. + + Note that the general effect of this algorithm is to + advance RCV.WND in increments of Eff.snd.MSS (for + realistic receive buffers: Eff.snd.MSS < RCV.BUFF/2). + Note also that the receiver must use its own + Eff.snd.MSS, assuming it is the same as the sender's. + + 4.2.3.4 When to Send Data + + A TCP MUST include a SWS avoidance algorithm in the sender. + + A TCP SHOULD implement the Nagle Algorithm [TCP:9] to + coalesce short segments. However, there MUST be a way for + an application to disable the Nagle algorithm on an + individual connection. In all cases, sending data is also + subject to the limitation imposed by the Slow Start + algorithm (Section 4.2.2.15). + + DISCUSSION: + The Nagle algorithm is generally as follows: + + If there is unacknowledged data (i.e., SND.NXT > + SND.UNA), then the sending TCP buffers all user + + + +Internet Engineering Task Force [Page 98] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + data (regardless of the PSH bit), until the + outstanding data has been acknowledged or until + the TCP can send a full-sized segment (Eff.snd.MSS + bytes; see Section 4.2.2.6). + + Some applications (e.g., real-time display window + updates) require that the Nagle algorithm be turned + off, so small data segments can be streamed out at the + maximum rate. + + IMPLEMENTATION: + The sender's SWS avoidance algorithm is more difficult + than the receivers's, because the sender does not know + (directly) the receiver's total buffer space RCV.BUFF. + An approach which has been found to work well is for + the sender to calculate Max(SND.WND), the maximum send + window it has seen so far on the connection, and to use + this value as an estimate of RCV.BUFF. Unfortunately, + this can only be an estimate; the receiver may at any + time reduce the size of RCV.BUFF. To avoid a resulting + deadlock, it is necessary to have a timeout to force + transmission of data, overriding the SWS avoidance + algorithm. In practice, this timeout should seldom + occur. + + The "useable window" [TCP:5] is: + + U = SND.UNA + SND.WND - SND.NXT + + i.e., the offered window less the amount of data sent + but not acknowledged. If D is the amount of data + queued in the sending TCP but not yet sent, then the + following set of rules is recommended. + + Send data: + + (1) if a maximum-sized segment can be sent, i.e, if: + + min(D,U) >= Eff.snd.MSS; + + + (2) or if the data is pushed and all queued data can + be sent now, i.e., if: + + [SND.NXT = SND.UNA and] PUSHED and D <= U + + (the bracketed condition is imposed by the Nagle + algorithm); + + + +Internet Engineering Task Force [Page 99] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + (3) or if at least a fraction Fs of the maximum window + can be sent, i.e., if: + + [SND.NXT = SND.UNA and] + + min(D.U) >= Fs * Max(SND.WND); + + + (4) or if data is PUSHed and the override timeout + occurs. + + Here Fs is a fraction whose recommended value is 1/2. + The override timeout should be in the range 0.1 - 1.0 + seconds. It may be convenient to combine this timer + with the timer used to probe zero windows (Section + 4.2.2.17). + + Finally, note that the SWS avoidance algorithm just + specified is to be used instead of the sender-side + algorithm contained in [TCP:5]. + + 4.2.3.5 TCP Connection Failures + + Excessive retransmission of the same segment by TCP + indicates some failure of the remote host or the Internet + path. This failure may be of short or long duration. The + following procedure MUST be used to handle excessive + retransmissions of data segments [IP:11]: + + (a) There are two thresholds R1 and R2 measuring the amount + of retransmission that has occurred for the same + segment. R1 and R2 might be measured in time units or + as a count of retransmissions. + + (b) When the number of transmissions of the same segment + reaches or exceeds threshold R1, pass negative advice + (see Section 3.3.1.4) to the IP layer, to trigger + dead-gateway diagnosis. + + (c) When the number of transmissions of the same segment + reaches a threshold R2 greater than R1, close the + connection. + + (d) An application MUST be able to set the value for R2 for + a particular connection. For example, an interactive + application might set R2 to "infinity," giving the user + control over when to disconnect. + + + + +Internet Engineering Task Force [Page 100] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + (d) TCP SHOULD inform the application of the delivery + problem (unless such information has been disabled by + the application; see Section 4.2.4.1), when R1 is + reached and before R2. This will allow a remote login + (User Telnet) application program to inform the user, + for example. + + The value of R1 SHOULD correspond to at least 3 + retransmissions, at the current RTO. The value of R2 SHOULD + correspond to at least 100 seconds. + + An attempt to open a TCP connection could fail with + excessive retransmissions of the SYN segment or by receipt + of a RST segment or an ICMP Port Unreachable. SYN + retransmissions MUST be handled in the general way just + described for data retransmissions, including notification + of the application layer. + + However, the values of R1 and R2 may be different for SYN + and data segments. In particular, R2 for a SYN segment MUST + be set large enough to provide retransmission of the segment + for at least 3 minutes. The application can close the + connection (i.e., give up on the open attempt) sooner, of + course. + + DISCUSSION: + Some Internet paths have significant setup times, and + the number of such paths is likely to increase in the + future. + + 4.2.3.6 TCP Keep-Alives + + Implementors MAY include "keep-alives" in their TCP + implementations, although this practice is not universally + accepted. If keep-alives are included, the application MUST + be able to turn them on or off for each TCP connection, and + they MUST default to off. + + Keep-alive packets MUST only be sent when no data or + acknowledgement packets have been received for the + connection within an interval. This interval MUST be + configurable and MUST default to no less than two hours. + + It is extremely important to remember that ACK segments that + contain no data are not reliably transmitted by TCP. + Consequently, if a keep-alive mechanism is implemented it + MUST NOT interpret failure to respond to any specific probe + as a dead connection. + + + +Internet Engineering Task Force [Page 101] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + An implementation SHOULD send a keep-alive segment with no + data; however, it MAY be configurable to send a keep-alive + segment containing one garbage octet, for compatibility with + erroneous TCP implementations. + + DISCUSSION: + A "keep-alive" mechanism periodically probes the other + end of a connection when the connection is otherwise + idle, even when there is no data to be sent. The TCP + specification does not include a keep-alive mechanism + because it could: (1) cause perfectly good connections + to break during transient Internet failures; (2) + consume unnecessary bandwidth ("if no one is using the + connection, who cares if it is still good?"); and (3) + cost money for an Internet path that charges for + packets. + + Some TCP implementations, however, have included a + keep-alive mechanism. To confirm that an idle + connection is still active, these implementations send + a probe segment designed to elicit a response from the + peer TCP. Such a segment generally contains SEG.SEQ = + SND.NXT-1 and may or may not contain one garbage octet + of data. Note that on a quiet connection SND.NXT = + RCV.NXT, so that this SEG.SEQ will be outside the + window. Therefore, the probe causes the receiver to + return an acknowledgment segment, confirming that the + connection is still live. If the peer has dropped the + connection due to a network partition or a crash, it + will respond with a RST instead of an acknowledgment + segment. + + Unfortunately, some misbehaved TCP implementations fail + to respond to a segment with SEG.SEQ = SND.NXT-1 unless + the segment contains data. Alternatively, an + implementation could determine whether a peer responded + correctly to keep-alive packets with no garbage data + octet. + + A TCP keep-alive mechanism should only be invoked in + server applications that might otherwise hang + indefinitely and consume resources unnecessarily if a + client crashes or aborts a connection during a network + failure. + + + + + + + +Internet Engineering Task Force [Page 102] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + 4.2.3.7 TCP Multihoming + + If an application on a multihomed host does not specify the + local IP address when actively opening a TCP connection, + then the TCP MUST ask the IP layer to select a local IP + address before sending the (first) SYN. See the function + GET_SRCADDR() in Section 3.4. + + At all other times, a previous segment has either been sent + or received on this connection, and TCP MUST use the same + local address is used that was used in those previous + segments. + + 4.2.3.8 IP Options + + When received options are passed up to TCP from the IP + layer, TCP MUST ignore options that it does not understand. + + A TCP MAY support the Time Stamp and Record Route options. + + An application MUST be able to specify a source route when + it actively opens a TCP connection, and this MUST take + precedence over a source route received in a datagram. + + When a TCP connection is OPENed passively and a packet + arrives with a completed IP Source Route option (containing + a return route), TCP MUST save the return route and use it + for all segments sent on this connection. If a different + source route arrives in a later segment, the later + definition SHOULD override the earlier one. + + 4.2.3.9 ICMP Messages + + TCP MUST act on an ICMP error message passed up from the IP + layer, directing it to the connection that created the + error. The necessary demultiplexing information can be + found in the IP header contained within the ICMP message. + + o Source Quench + + TCP MUST react to a Source Quench by slowing + transmission on the connection. The RECOMMENDED + procedure is for a Source Quench to trigger a "slow + start," as if a retransmission timeout had occurred. + + o Destination Unreachable -- codes 0, 1, 5 + + Since these Unreachable messages indicate soft error + + + +Internet Engineering Task Force [Page 103] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + conditions, TCP MUST NOT abort the connection, and it + SHOULD make the information available to the + application. + + DISCUSSION: + TCP could report the soft error condition directly + to the application layer with an upcall to the + ERROR_REPORT routine, or it could merely note the + message and report it to the application only when + and if the TCP connection times out. + + o Destination Unreachable -- codes 2-4 + + These are hard error conditions, so TCP SHOULD abort + the connection. + + o Time Exceeded -- codes 0, 1 + + This should be handled the same way as Destination + Unreachable codes 0, 1, 5 (see above). + + o Parameter Problem + + This should be handled the same way as Destination + Unreachable codes 0, 1, 5 (see above). + + + 4.2.3.10 Remote Address Validation + + A TCP implementation MUST reject as an error a local OPEN + call for an invalid remote IP address (e.g., a broadcast or + multicast address). + + An incoming SYN with an invalid source address must be + ignored either by TCP or by the IP layer (see Section + 3.2.1.3). + + A TCP implementation MUST silently discard an incoming SYN + segment that is addressed to a broadcast or multicast + address. + + 4.2.3.11 TCP Traffic Patterns + + IMPLEMENTATION: + The TCP protocol specification [TCP:1] gives the + implementor much freedom in designing the algorithms + that control the message flow over the connection -- + packetizing, managing the window, sending + + + +Internet Engineering Task Force [Page 104] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + acknowledgments, etc. These design decisions are + difficult because a TCP must adapt to a wide range of + traffic patterns. Experience has shown that a TCP + implementor needs to verify the design on two extreme + traffic patterns: + + o Single-character Segments + + Even if the sender is using the Nagle Algorithm, + when a TCP connection carries remote login traffic + across a low-delay LAN the receiver will generally + get a stream of single-character segments. If + remote terminal echo mode is in effect, the + receiver's system will generally echo each + character as it is received. + + o Bulk Transfer + + When TCP is used for bulk transfer, the data + stream should be made up (almost) entirely of + segments of the size of the effective MSS. + Although TCP uses a sequence number space with + byte (octet) granularity, in bulk-transfer mode + its operation should be as if TCP used a sequence + space that counted only segments. + + Experience has furthermore shown that a single TCP can + effectively and efficiently handle these two extremes. + + The most important tool for verifying a new TCP + implementation is a packet trace program. There is a + large volume of experience showing the importance of + tracing a variety of traffic patterns with other TCP + implementations and studying the results carefully. + + + 4.2.3.12 Efficiency + + IMPLEMENTATION: + Extensive experience has led to the following + suggestions for efficient implementation of TCP: + + (a) Don't Copy Data + + In bulk data transfer, the primary CPU-intensive + tasks are copying data from one place to another + and checksumming the data. It is vital to + minimize the number of copies of TCP data. Since + + + +Internet Engineering Task Force [Page 105] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + the ultimate speed limitation may be fetching data + across the memory bus, it may be useful to combine + the copy with checksumming, doing both with a + single memory fetch. + + (b) Hand-Craft the Checksum Routine + + A good TCP checksumming routine is typically two + to five times faster than a simple and direct + implementation of the definition. Great care and + clever coding are often required and advisable to + make the checksumming code "blazing fast". See + [TCP:10]. + + (c) Code for the Common Case + + TCP protocol processing can be complicated, but + for most segments there are only a few simple + decisions to be made. Per-segment processing will + be greatly speeded up by coding the main line to + minimize the number of decisions in the most + common case. + + + 4.2.4 TCP/APPLICATION LAYER INTERFACE + + 4.2.4.1 Asynchronous Reports + + There MUST be a mechanism for reporting soft TCP error + conditions to the application. Generically, we assume this + takes the form of an application-supplied ERROR_REPORT + routine that may be upcalled [INTRO:7] asynchronously from + the transport layer: + + ERROR_REPORT(local connection name, reason, subreason) + + The precise encoding of the reason and subreason parameters + is not specified here. However, the conditions that are + reported asynchronously to the application MUST include: + + * ICMP error message arrived (see 4.2.3.9) + + * Excessive retransmissions (see 4.2.3.5) + + * Urgent pointer advance (see 4.2.2.4). + + However, an application program that does not want to + receive such ERROR_REPORT calls SHOULD be able to + + + +Internet Engineering Task Force [Page 106] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + effectively disable these calls. + + DISCUSSION: + These error reports generally reflect soft errors that + can be ignored without harm by many applications. It + has been suggested that these error report calls should + default to "disabled," but this is not required. + + 4.2.4.2 Type-of-Service + + The application layer MUST be able to specify the Type-of- + Service (TOS) for segments that are sent on a connection. + It not required, but the application SHOULD be able to + change the TOS during the connection lifetime. TCP SHOULD + pass the current TOS value without change to the IP layer, + when it sends segments on the connection. + + The TOS will be specified independently in each direction on + the connection, so that the receiver application will + specify the TOS used for ACK segments. + + TCP MAY pass the most recently received TOS up to the + application. + + DISCUSSION + Some applications (e.g., SMTP) change the nature of + their communication during the lifetime of a + connection, and therefore would like to change the TOS + specification. + + Note also that the OPEN call specified in RFC-793 + includes a parameter ("options") in which the caller + can specify IP options such as source route, record + route, or timestamp. + + 4.2.4.3 Flush Call + + Some TCP implementations have included a FLUSH call, which + will empty the TCP send queue of any data for which the user + has issued SEND calls but which is still to the right of the + current send window. That is, it flushes as much queued + send data as possible without losing sequence number + synchronization. This is useful for implementing the "abort + output" function of Telnet. + + + + + + + +Internet Engineering Task Force [Page 107] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + 4.2.4.4 Multihoming + + The user interface outlined in sections 2.7 and 3.8 of RFC- + 793 needs to be extended for multihoming. The OPEN call + MUST have an optional parameter: + + OPEN( ... [local IP address,] ... ) + + to allow the specification of the local IP address. + + DISCUSSION: + Some TCP-based applications need to specify the local + IP address to be used to open a particular connection; + FTP is an example. + + IMPLEMENTATION: + A passive OPEN call with a specified "local IP address" + parameter will await an incoming connection request to + that address. If the parameter is unspecified, a + passive OPEN will await an incoming connection request + to any local IP address, and then bind the local IP + address of the connection to the particular address + that is used. + + For an active OPEN call, a specified "local IP address" + parameter will be used for opening the connection. If + the parameter is unspecified, the networking software + will choose an appropriate local IP address (see + Section 3.3.4.2) for the connection + + 4.2.5 TCP REQUIREMENT SUMMARY + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-------------------------------------------------|--------|-|-|-|-|-|-- + | | | | | | | +Push flag | | | | | | | + Aggregate or queue un-pushed data |4.2.2.2 | | |x| | | + Sender collapse successive PSH flags |4.2.2.2 | |x| | | | + SEND call can specify PUSH |4.2.2.2 | | |x| | | + + + +Internet Engineering Task Force [Page 108] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + If cannot: sender buffer indefinitely |4.2.2.2 | | | | |x| + If cannot: PSH last segment |4.2.2.2 |x| | | | | + Notify receiving ALP of PSH |4.2.2.2 | | |x| | |1 + Send max size segment when possible |4.2.2.2 | |x| | | | + | | | | | | | +Window | | | | | | | + Treat as unsigned number |4.2.2.3 |x| | | | | + Handle as 32-bit number |4.2.2.3 | |x| | | | + Shrink window from right |4.2.2.16| | | |x| | + Robust against shrinking window |4.2.2.16|x| | | | | + Receiver's window closed indefinitely |4.2.2.17| | |x| | | + Sender probe zero window |4.2.2.17|x| | | | | + First probe after RTO |4.2.2.17| |x| | | | + Exponential backoff |4.2.2.17| |x| | | | + Allow window stay zero indefinitely |4.2.2.17|x| | | | | + Sender timeout OK conn with zero wind |4.2.2.17| | | | |x| + | | | | | | | +Urgent Data | | | | | | | + Pointer points to last octet |4.2.2.4 |x| | | | | + Arbitrary length urgent data sequence |4.2.2.4 |x| | | | | + Inform ALP asynchronously of urgent data |4.2.2.4 |x| | | | |1 + ALP can learn if/how much urgent data Q'd |4.2.2.4 |x| | | | |1 + | | | | | | | +TCP Options | | | | | | | + Receive TCP option in any segment |4.2.2.5 |x| | | | | + Ignore unsupported options |4.2.2.5 |x| | | | | + Cope with illegal option length |4.2.2.5 |x| | | | | + Implement sending & receiving MSS option |4.2.2.6 |x| | | | | + Send MSS option unless 536 |4.2.2.6 | |x| | | | + Send MSS option always |4.2.2.6 | | |x| | | + Send-MSS default is 536 |4.2.2.6 |x| | | | | + Calculate effective send seg size |4.2.2.6 |x| | | | | + | | | | | | | +TCP Checksums | | | | | | | + Sender compute checksum |4.2.2.7 |x| | | | | + Receiver check checksum |4.2.2.7 |x| | | | | + | | | | | | | +Use clock-driven ISN selection |4.2.2.9 |x| | | | | + | | | | | | | +Opening Connections | | | | | | | + Support simultaneous open attempts |4.2.2.10|x| | | | | + SYN-RCVD remembers last state |4.2.2.11|x| | | | | + Passive Open call interfere with others |4.2.2.18| | | | |x| + Function: simultan. LISTENs for same port |4.2.2.18|x| | | | | + Ask IP for src address for SYN if necc. |4.2.3.7 |x| | | | | + Otherwise, use local addr of conn. |4.2.3.7 |x| | | | | + OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x| + Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | | + + + +Internet Engineering Task Force [Page 109] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + | | | | | | | +Closing Connections | | | | | | | + RST can contain data |4.2.2.12| |x| | | | + Inform application of aborted conn |4.2.2.13|x| | | | | + Half-duplex close connections |4.2.2.13| | |x| | | + Send RST to indicate data lost |4.2.2.13| |x| | | | + In TIME-WAIT state for 2xMSL seconds |4.2.2.13|x| | | | | + Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | | + | | | | | | | +Retransmissions | | | | | | | + Jacobson Slow Start algorithm |4.2.2.15|x| | | | | + Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | | + Retransmit with same IP ident |4.2.2.15| | |x| | | + Karn's algorithm |4.2.3.1 |x| | | | | + Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | | + Exponential backoff |4.2.3.1 |x| | | | | + SYN RTO calc same as data |4.2.3.1 | |x| | | | + Recommended initial values and bounds |4.2.3.1 | |x| | | | + | | | | | | | +Generating ACK's: | | | | | | | + Queue out-of-order segments |4.2.2.20| |x| | | | + Process all Q'd before send ACK |4.2.2.20|x| | | | | + Send ACK for out-of-order segment |4.2.2.21| | |x| | | + Delayed ACK's |4.2.3.2 | |x| | | | + Delay < 0.5 seconds |4.2.3.2 |x| | | | | + Every 2nd full-sized segment ACK'd |4.2.3.2 |x| | | | | + Receiver SWS-Avoidance Algorithm |4.2.3.3 |x| | | | | + | | | | | | | +Sending data | | | | | | | + Configurable TTL |4.2.2.19|x| | | | | + Sender SWS-Avoidance Algorithm |4.2.3.4 |x| | | | | + Nagle algorithm |4.2.3.4 | |x| | | | + Application can disable Nagle algorithm |4.2.3.4 |x| | | | | + | | | | | | | +Connection Failures: | | | | | | | + Negative advice to IP on R1 retxs |4.2.3.5 |x| | | | | + Close connection on R2 retxs |4.2.3.5 |x| | | | | + ALP can set R2 |4.2.3.5 |x| | | | |1 + Inform ALP of R1<=retxs<R2 |4.2.3.5 | |x| | | |1 + Recommended values for R1, R2 |4.2.3.5 | |x| | | | + Same mechanism for SYNs |4.2.3.5 |x| | | | | + R2 at least 3 minutes for SYN |4.2.3.5 |x| | | | | + | | | | | | | +Send Keep-alive Packets: |4.2.3.6 | | |x| | | + - Application can request |4.2.3.6 |x| | | | | + - Default is "off" |4.2.3.6 |x| | | | | + - Only send if idle for interval |4.2.3.6 |x| | | | | + - Interval configurable |4.2.3.6 |x| | | | | + + + +Internet Engineering Task Force [Page 110] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + - Default at least 2 hrs. |4.2.3.6 |x| | | | | + - Tolerant of lost ACK's |4.2.3.6 |x| | | | | + | | | | | | | +IP Options | | | | | | | + Ignore options TCP doesn't understand |4.2.3.8 |x| | | | | + Time Stamp support |4.2.3.8 | | |x| | | + Record Route support |4.2.3.8 | | |x| | | + Source Route: | | | | | | | + ALP can specify |4.2.3.8 |x| | | | |1 + Overrides src rt in datagram |4.2.3.8 |x| | | | | + Build return route from src rt |4.2.3.8 |x| | | | | + Later src route overrides |4.2.3.8 | |x| | | | + | | | | | | | +Receiving ICMP Messages from IP |4.2.3.9 |x| | | | | + Dest. Unreach (0,1,5) => inform ALP |4.2.3.9 | |x| | | | + Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x| + Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | | + Source Quench => slow start |4.2.3.9 | |x| | | | + Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | | + Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | | + | | | | | | | +Address Validation | | | | | | | + Reject OPEN call to invalid IP address |4.2.3.10|x| | | | | + Reject SYN from invalid IP address |4.2.3.10|x| | | | | + Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | | + | | | | | | | +TCP/ALP Interface Services | | | | | | | + Error Report mechanism |4.2.4.1 |x| | | | | + ALP can disable Error Report Routine |4.2.4.1 | |x| | | | + ALP can specify TOS for sending |4.2.4.2 |x| | | | | + Passed unchanged to IP |4.2.4.2 | |x| | | | + ALP can change TOS during connection |4.2.4.2 | |x| | | | + Pass received TOS up to ALP |4.2.4.2 | | |x| | | + FLUSH call |4.2.4.3 | | |x| | | + Optional local IP addr parm. in OPEN |4.2.4.4 |x| | | | | +-------------------------------------------------|--------|-|-|-|-|-|-- +-------------------------------------------------|--------|-|-|-|-|-|-- + +FOOTNOTES: + +(1) "ALP" means Application-Layer program. + + + + + + + + + + +Internet Engineering Task Force [Page 111] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + +5. REFERENCES + +INTRODUCTORY REFERENCES + + +[INTRO:1] "Requirements for Internet Hosts -- Application and Support," + IETF Host Requirements Working Group, R. Braden, Ed., RFC-1123, + October 1989. + +[INTRO:2] "Requirements for Internet Gateways," R. Braden and J. + Postel, RFC-1009, June 1987. + +[INTRO:3] "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006, + (three volumes), SRI International, December 1985. + +[INTRO:4] "Official Internet Protocols," J. Reynolds and J. Postel, + RFC-1011, May 1987. + + This document is republished periodically with new RFC numbers; the + latest version must be used. + +[INTRO:5] "Protocol Document Order Information," O. Jacobsen and J. + Postel, RFC-980, March 1986. + +[INTRO:6] "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, May + 1987. + + This document is republished periodically with new RFC numbers; the + latest version must be used. + +[INTRO:7] "Modularity and Efficiency in Protocol Implementations," D. + Clark, RFC-817, July 1982. + +[INTRO:8] "The Structuring of Systems Using Upcalls," D. Clark, 10th ACM + SOSP, Orcas Island, Washington, December 1985. + + +Secondary References: + + +[INTRO:9] "A Protocol for Packet Network Intercommunication," V. Cerf + and R. Kahn, IEEE Transactions on Communication, May 1974. + +[INTRO:10] "The ARPA Internet Protocol," J. Postel, C. Sunshine, and D. + Cohen, Computer Networks, Vol. 5, No. 4, July 1981. + +[INTRO:11] "The DARPA Internet Protocol Suite," B. Leiner, J. Postel, + R. Cole and D. Mills, Proceedings INFOCOM 85, IEEE, Washington DC, + + + +Internet Engineering Task Force [Page 112] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + March 1985. Also in: IEEE Communications Magazine, March 1985. + Also available as ISI-RS-85-153. + +[INTRO:12] "Final Text of DIS8473, Protocol for Providing the + Connectionless Mode Network Service," ANSI, published as RFC-994, + March 1986. + +[INTRO:13] "End System to Intermediate System Routing Exchange + Protocol," ANSI X3S3.3, published as RFC-995, April 1986. + + +LINK LAYER REFERENCES + + +[LINK:1] "Trailer Encapsulations," S. Leffler and M. Karels, RFC-893, + April 1984. + +[LINK:2] "An Ethernet Address Resolution Protocol," D. Plummer, RFC-826, + November 1982. + +[LINK:3] "A Standard for the Transmission of IP Datagrams over Ethernet + Networks," C. Hornig, RFC-894, April 1984. + +[LINK:4] "A Standard for the Transmission of IP Datagrams over IEEE 802 + "Networks," J. Postel and J. Reynolds, RFC-1042, February 1988. + + This RFC contains a great deal of information of importance to + Internet implementers planning to use IEEE 802 networks. + + +IP LAYER REFERENCES + + +[IP:1] "Internet Protocol (IP)," J. Postel, RFC-791, September 1981. + +[IP:2] "Internet Control Message Protocol (ICMP)," J. Postel, RFC-792, + September 1981. + +[IP:3] "Internet Standard Subnetting Procedure," J. Mogul and J. Postel, + RFC-950, August 1985. + +[IP:4] "Host Extensions for IP Multicasting," S. Deering, RFC-1112, + August 1989. + +[IP:5] "Military Standard Internet Protocol," MIL-STD-1777, Department + of Defense, August 1983. + + This specification, as amended by RFC-963, is intended to describe + + + +Internet Engineering Task Force [Page 113] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + the Internet Protocol but has some serious omissions (e.g., the + mandatory subnet extension [IP:3] and the optional multicasting + extension [IP:4]). It is also out of date. If there is a + conflict, RFC-791, RFC-792, and RFC-950 must be taken as + authoritative, while the present document is authoritative over + all. + +[IP:6] "Some Problems with the Specification of the Military Standard + Internet Protocol," D. Sidhu, RFC-963, November 1985. + +[IP:7] "The TCP Maximum Segment Size and Related Topics," J. Postel, + RFC-879, November 1983. + + Discusses and clarifies the relationship between the TCP Maximum + Segment Size option and the IP datagram size. + +[IP:8] "Internet Protocol Security Options," B. Schofield, RFC-1108, + October 1989. + +[IP:9] "Fragmentation Considered Harmful," C. Kent and J. Mogul, ACM + SIGCOMM-87, August 1987. Published as ACM Comp Comm Review, Vol. + 17, no. 5. + + This useful paper discusses the problems created by Internet + fragmentation and presents alternative solutions. + +[IP:10] "IP Datagram Reassembly Algorithms," D. Clark, RFC-815, July + 1982. + + This and the following paper should be read by every implementor. + +[IP:11] "Fault Isolation and Recovery," D. Clark, RFC-816, July 1982. + +SECONDARY IP REFERENCES: + + +[IP:12] "Broadcasting Internet Datagrams in the Presence of Subnets," J. + Mogul, RFC-922, October 1984. + +[IP:13] "Name, Addresses, Ports, and Routes," D. Clark, RFC-814, July + 1982. + +[IP:14] "Something a Host Could Do with Source Quench: The Source Quench + Introduced Delay (SQUID)," W. Prue and J. Postel, RFC-1016, July + 1987. + + This RFC first described directed broadcast addresses. However, + the bulk of the RFC is concerned with gateways, not hosts. + + + +Internet Engineering Task Force [Page 114] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + +UDP REFERENCES: + + +[UDP:1] "User Datagram Protocol," J. Postel, RFC-768, August 1980. + + +TCP REFERENCES: + + +[TCP:1] "Transmission Control Protocol," J. Postel, RFC-793, September + 1981. + + +[TCP:2] "Transmission Control Protocol," MIL-STD-1778, US Department of + Defense, August 1984. + + This specification as amended by RFC-964 is intended to describe + the same protocol as RFC-793 [TCP:1]. If there is a conflict, + RFC-793 takes precedence, and the present document is authoritative + over both. + + +[TCP:3] "Some Problems with the Specification of the Military Standard + Transmission Control Protocol," D. Sidhu and T. Blumer, RFC-964, + November 1985. + + +[TCP:4] "The TCP Maximum Segment Size and Related Topics," J. Postel, + RFC-879, November 1983. + + +[TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813, + July 1982. + + +[TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge, ACM + SIGCOMM-87, August 1987. + + +[TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-88, + August 1988. + + +SECONDARY TCP REFERENCES: + + +[TCP:8] "Modularity and Efficiency in Protocol Implementation," D. + Clark, RFC-817, July 1982. + + + +Internet Engineering Task Force [Page 115] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + +[TCP:9] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January 1984. + + +[TCP:10] "Computing the Internet Checksum," R. Braden, D. Borman, and C. + Partridge, RFC-1071, September 1988. + + +[TCP:11] "TCP Extensions for Long-Delay Paths," V. Jacobson & R. Braden, + RFC-1072, October 1988. + + +Security Considerations + + There are many security issues in the communication layers of host + software, but a full discussion is beyond the scope of this RFC. + + The Internet architecture generally provides little protection + against spoofing of IP source addresses, so any security mechanism + that is based upon verifying the IP source address of a datagram + should be treated with suspicion. However, in restricted + environments some source-address checking may be possible. For + example, there might be a secure LAN whose gateway to the rest of the + Internet discarded any incoming datagram with a source address that + spoofed the LAN address. In this case, a host on the LAN could use + the source address to test for local vs. remote source. This problem + is complicated by source routing, and some have suggested that + source-routed datagram forwarding by hosts (see Section 3.3.5) should + be outlawed for security reasons. + + Security-related issues are mentioned in sections concerning the IP + Security option (Section 3.2.1.8), the ICMP Parameter Problem message + (Section 3.2.2.5), IP options in UDP datagrams (Section 4.1.3.2), and + reserved TCP ports (Section 4.2.2.1). + +Author's Address + + Robert Braden + USC/Information Sciences Institute + 4676 Admiralty Way + Marina del Rey, CA 90292-6695 + + Phone: (213) 822 1511 + + EMail: Braden@ISI.EDU + + + + + + + +Internet Engineering Task Force [Page 116] + |