diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc814.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc814.txt')
-rw-r--r-- | doc/rfc/rfc814.txt | 763 |
1 files changed, 763 insertions, 0 deletions
diff --git a/doc/rfc/rfc814.txt b/doc/rfc/rfc814.txt new file mode 100644 index 0000000..b82819e --- /dev/null +++ b/doc/rfc/rfc814.txt @@ -0,0 +1,763 @@ + +RFC: 814 + + + + NAME, ADDRESSES, PORTS, AND ROUTES + + David D. Clark + MIT Laboratory for Computer Science + Computer Systems and Communications Group + July, 1982 + + + 1. Introduction + + + It has been said that the principal function of an operating system + +is to define a number of different names for the same object, so that it + +can busy itself keeping track of the relationship between all of the + +different names. Network protocols seem to have somewhat the same + +characteristic. In TCP/IP, there are several ways of referring to + +things. At the human visible interface, there are character string + +"names" to identify networks, hosts, and services. Host names are + +translated into network "addresses", 32-bit values that identify the + +network to which a host is attached, and the location of the host on + +that net. Service names are translated into a "port identifier", which + +in TCP is a 16-bit value. Finally, addresses are translated into + +"routes", which are the sequence of steps a packet must take to reach + +the specified addresses. Routes show up explicitly in the form of the + +internet routing options, and also implicitly in the address to route + +translation tables which all hosts and gateways maintain. + + + This RFC gives suggestions and guidance for the design of the + +tables and algorithms necessary to keep track of these various sorts of + +identifiers inside a host implementation of TCP/IP. + + 2 + + + 2. The Scope of the Problem + + + One of the first questions one can ask about a naming mechanism is + +how many names one can expect to encounter. In order to answer this, it + +is necessary to know something about the expected maximum size of the + +internet. Currently, the internet is fairly small. It contains no more + +than 25 active networks, and no more than a few hundred hosts. This + +makes it possible to install tables which exhaustively list all of these + +elements. However, any implementation undertaken now should be based on + +an assumption of a much larger internet. The guidelines currently + +recommended are an upper limit of about 1,000 networks. If we imagine + +an average number of 25 hosts per net, this would suggest a maximum + +number of 25,000 hosts. It is quite unclear whether this host estimate + +is high or low, but even if it is off by several factors of two, the + +resulting number is still large enough to suggest that current table + +management strategies are unacceptable. Some fresh techniques will be + +required to deal with the internet of the future. + + + 3. Names + + + As the previous section suggests, the internet will eventually have + +a sufficient number of names that a host cannot have a static table + +which provides a translation from every name to its associated address. + +There are several reasons other than sheer size why a host would not + +wish to have such a table. First, with that many names, we can expect + +names to be added and deleted at such a rate that an installer might + +spend all his time just revising the table. Second, most of the names + +will refer to addresses of machines with which nothing will ever be + + 3 + + +exchanged. In fact, there may be whole networks with which a particular + +host will never have any traffic. + + + To cope with this large and somewhat dynamic environment, the + +internet is moving from its current position in which a single name + +table is maintained by the NIC and distributed to all hosts, to a + +distributed approach in which each network (or group of networks) is + +responsible for maintaining its own names and providing a "name server" + +to translate between the names and the addresses in that network. Each + +host is assumed to store not a complete set of name-address + +translations, but only a cache of recently used names. When a name is + +provided by a user for translation to an address, the host will first + +examine its local cache, and if the name is not found there, will + +communicate with an appropriate name server to obtain the information, + +which it may then insert into its cache for future reference. + + + Unfortunately, the name server mechanism is not totally in place in + +the internet yet, so for the moment, it is necessary to continue to use + +the old strategy of maintaining a complete table of all names in every + +host. Implementors, however, should structure this table in such a way + +that it is easy to convert later to a name server approach. In + +particular, a reasonable programming strategy would be to make the name + +table accessible only through a subroutine interface, rather than by + +scattering direct references to the table all through the code. In this + +way, it will be possible, at a later date, to replace the subroutine + +with one capable of making calls on remote name servers. + + + A problem which occasionally arises in the ARPANET today is that + + 4 + + +the information in a local host table is out of date, because a host has + +moved, and a revision of the host table has not yet been installed from + +the NIC. In this case, one attempts to connect to a particular host and + +discovers an unexpected machine at the address obtained from the local + +table. If a human is directly observing the connection attempt, the + +error is usually detected immediately. However, for unattended + +operations such as the sending of queued mail, this sort of problem can + +lead to a great deal of confusion. + + + The nameserver scheme will only make this problem worse, if hosts + +cache locally the address associated with names that have been looked + +up, because the host has no way of knowing when the address has changed + +and the cache entry should be removed. To solve this problem, plans are + +currently under way to define a simple facility by which a host can + +query a foreign address to determine what name is actually associated + +with it. SMTP already defines a verification technique based on this + +approach. + + + 4. Addresses + + + The IP layer must know something about addresses. In particular, + +when a datagram is being sent out from a host, the IP layer must decide + +where to send it on the immediately connected network, based on the + +internet address. Mechanically, the IP first tests the internet address + +to see whether the network number of the recipient is the same as the + +network number of the sender. If so, the packet can be sent directly to + +the final recipient. If not, the datagram must be sent to a gateway for + +further forwarding. In this latter case, a second decision must be + + 5 + + +made, as there may be more than one gateway available on the immediately + +attached network. + + + When the internet address format was first specified, 8 bits were + +reserved to identify the network. Early implementations thus + +implemented the above algorithm by means of a table with 256 entries, + +one for each possible net, that specified the gateway of choice for that + +net, with a special case entry for those nets to which the host was + +immediately connected. Such tables were sometimes statically filled in, + +which caused confusion and malfunctions when gateways and networks moved + +(or crashed). + + + The current definition of the internet address provides three + +different options for network numbering, with the goal of allowing a + +very large number of networks to be part of the internet. Thus, it is + +no longer possible to imagine having an exhaustive table to select a + +gateway for any foreign net. Again, current implementations must use a + +strategy based on a local cache of routing information for addresses + +currently being used. + + + The recommended strategy for address to route translation is as + +follows. When the IP layer receives an outbound datagram for + +transmission, it extracts the network number from the destination + +address, and queries its local table to determine whether it knows a + +suitable gateway to which to send the datagram. If it does, the job is + +done. (But see RFC 816 on Fault Isolation and Recovery, for + +recommendations on how to deal with the possible failure of the + +gateway.) If there is no such entry in the local table, then select any + + 6 + + +accessible gateway at random, insert that as an entry in the table, and + +use it to send the packet. Either the guess will be right or wrong. If + +it is wrong, the gateway to which the packet was sent will return an + +ICMP redirect message to report that there is a better gateway to reach + +the net in question. The arrival of this redirect should cause an + +update of the local table. + + + The number of entries in the local table should be determined by + +the maximum number of active connections which this particular host can + +support at any one time. For a large time sharing system, one might + +imagine a table with 100 or more entries. For a personal computer being + +used to support a single user telnet connection, only one address to + +gateway association need be maintained at once. + + + The above strategy actually does not completely solve the problem, + +but only pushes it down one level, where the problem then arises of how + +a new host, freshly arriving on the internet, finds all of its + +accessible gateways. Intentionally, this problem is not solved within + +the internetwork architecture. The reason is that different networks + +have drastically different strategies for allowing a host to find out + +about other hosts on its immediate network. Some nets permit a + +broadcast mechanism. In this case, a host can send out a message and + +expect an answer back from all of the attached gateways. In other + +cases, where a particular network is richly provided with tools to + +support the internet, there may be a special network mechanism which a + +host can invoke to determine where the gateways are. In other cases, it + +may be necessary for an installer to manually provide the name of at + + 7 + + +least one accessible gateway. Once a host has discovered the name of + +one gateway, it can build up a table of all other available gateways, by + +keeping track of every gateway that has been reported back to it in an + +ICMP message. + + + 5. Advanced Topics in Addressing and Routing + + + The preceding discussion describes the mechanism required in a + +minimal implementation, an implementation intended only to provide + +operational service access today to the various networks that make up + +the internet. For any host which will participate in future research, + +as contrasted with service, some additional features are required. + +These features will also be helpful for service hosts if they wish to + +obtain access to some of the more exotic networks which will become part + +of the internet over the next few years. All implementors are urged to + +at least provide a structure into which these features could be later + +integrated. + + + There are several features, either already a part of the + +architecture or now under development, which are used to modify or + +expand the relationships between addresses and routes. The IP source + +route options allow a host to explicitly direct a datagram through a + +series of gateways to its foreign host. An alternative form of the ICMP + +redirect packet has been proposed, which would return information + +specific to a particular destination host, not a destination net. + +Finally, additional IP options have been proposed to identify particular + +routes within the internet that are unacceptable. The difficulty with + +implementing these new features is that the mechanisms do not lie + + 8 + + +entirely within the bounds of IP. All the mechanisms above are designed + +to apply to a particular connection, so that their use must be specified + +at the TCP level. Thus, the interface between IP and the layers above + +it must include mechanisms to allow passing this information back and + +forth, and TCP (or any other protocol at this level, such as UDP), must + +be prepared to store this information. The passing of information + +between IP and TCP is made more complicated by the fact that some of the + +information, in particular ICMP packets, may arrive at any time. The + +normal interface envisioned between TCP and IP is one across which + +packets can be sent or received. The existence of asynchronous ICMP + +messages implies that there must be an additional channel between the + +two, unrelated to the actual sending and receiving of data. (In fact, + +there are many other ICMP messages which arrive asynchronously and which + +must be passed from IP up to higher layers. See RFC 816, Fault + +Isolation and Recovery.) + + + Source routes are already in use in the internet, and many + +implementations will wish to be able to take advantage of them. The + +following sorts of usages should be permitted. First, a user, when + +initiating a TCP connection, should be able to hand a source route into + +TCP, which in turn must hand the source route to IP with every outgoing + +datagram. The user might initially obtain the source route by querying + +a different sort of name server, which would return a source route + +instead of an address, or the user may have fabricated the source route + +manually. A TCP which is listening for a connection, rather than + +attempting to open one, must be prepared to receive a datagram which + +contains a IP return route, in which case it must remember this return + +route, and use it as a source route on all returning datagrams. + + 9 + + + 6. Ports and Service Identifiers + + + The IP layer of the architecture contains the address information + +which specifies the destination host to which the datagram is being + +sent. In fact, datagrams are not intended just for particular hosts, + +but for particular agents within a host, processes or other entities + +that are the actual source and sink of the data. IP performs only a + +very simple dispatching once the datagram has arrived at the target + +host, it dispatches it to a particular protocol. It is the + +responsibility of that protocol handler, for example TCP, to finish + +dispatching the datagram to the particular connection for which it is + +destined. This next layer of dispatching is done using "port + +identifiers", which are a part of the header of the higher level + +protocol, and not the IP layer. + + + This two-layer dispatching architecture has caused a problem for + +certain implementations. In particular, some implementations have + +wished to put the IP layer within the kernel of the operating system, + +and the TCP layer as a user domain application program. Strict + +adherence to this partitioning can lead to grave performance problems, + +for the datagram must first be dispatched from the kernel to a TCP + +process, which then dispatches the datagram to its final destination + +process. The overhead of scheduling this dispatch process can severely + +limit the achievable throughput of the implementation. + + + As is discussed in RFC 817, Modularity and Efficiency in Protocol + +Implementations, this particular separation between kernel and user + +leads to other performance problems, even ignoring the issue of port + + 10 + + +level dispatching. However, there is an acceptable shortcut which can + +be taken to move the higher level dispatching function into the IP + +layer, if this makes the implementation substantially easier. + + + In principle, every higher level protocol could have a different + +dispatching algorithm. The reason for this is discussed below. + +However, for the protocols involved in the service offering being + +implemented today, TCP and UDP, the dispatching algorithm is exactly the + +same, and the port field is located in precisely the same place in the + +header. Therefore, unless one is interested in participating in further + +protocol research, there is only one higher level dispatch algorithm. + +This algorithm takes into account the internet level foreign address, + +the protocol number, and the local port and foreign port from the higher + +level protocol header. This algorithm can be implemented as a sort of + +adjunct to the IP layer implementation, as long as no other higher level + +protocols are to be implemented. (Actually, the above statement is only + +partially true, in that the UDP dispatch function is subset of the TCP + +dispatch function. UDP dispatch depends only protocol number and local + +port. However, there is an occasion within TCP when this exact same + +subset comes into play, when a process wishes to listen for a connection + +from any foreign host. Thus, the range of mechanisms necessary to + +support TCP dispatch are also sufficient to support precisely the UDP + +requirement.) + + + The decision to remove port level dispatching from IP to the higher + +level protocol has been questioned by some implementors. It has been + +argued that if all of the address structure were part of the IP layer, + + 11 + + +then IP could do all of the packet dispatching function within the host, + +which would lead to a simpler modularity. Three problems were + +identified with this. First, not all protocol implementors could agree + +on the size of the port identifier. TCP selected a fairly short port + +identifier, 16 bits, to reduce header size. Other protocols being + +designed, however, wanted a larger port identifier, perhaps 32 bits, so + +that the port identifier, if properly selected, could be considered + +probabilistically unique. Thus, constraining the port id to one + +particular IP level mechanism would prevent certain fruitful lines of + +research. Second, ports serve a special function in addition to + +datagram delivery: certain port numbers are reserved to identify + +particular services. Thus, TCP port 23 is the remote login service. If + +ports were implemented at the IP level, then the assignment of well + +known ports could not be done on a protocol basis, but would have to be + +done in a centralized manner for all of the IP architecture. Third, IP + +was designed with a very simple layering role: IP contained exactly + +those functions that the gateways must understand. If the port idea had + +been made a part of the IP layer, it would have suggested that gateways + +needed to know about ports, which is not the case. + + + There are, of course, other ways to avoid these problems. In + +particular, the "well-known port" problem can be solved by devising a + +second mechanism, distinct from port dispatching, to name well-known + +ports. Several protocols have settled on the idea of including, in the + +packet which sets up a connection to a particular service, a more + +general service descriptor, such as a character string field. These + +special packets, which are requesting connection to a particular + + 12 + + +service, are routed on arrival to a special server, sometimes called a + +"rendezvous server", which examines the service request, selects a + +random port which is to be used for this instance of the service, and + +then passes the packet along to the service itself to commence the + +interaction. + + + For the internet architecture, this strategy had the serious flaw + +that it presumed all protocols would fit into the same service paradigm: + +an initial setup phase, which might contain a certain overhead such as + +indirect routing through a rendezvous server, followed by the packets of + +the interaction itself, which would flow directly to the process + +providing the service. Unfortunately, not all high level protocols in + +internet were expected to fit this model. The best example of this is + +isolated datagram exchange using UDP. The simplest exchange in UDP is + +one process sending a single datagram to another. Especially on a local + +net, where the net related overhead is very low, this kind of simple + +single datagram interchange can be extremely efficient, with very low + +overhead in the hosts. However, since these individual packets would + +not be part of an established connection, if IP supported a strategy + +based on a rendezvous server and service descriptors, every isolated + +datagram would have to be routed indirectly in the receiving host + +through the rendezvous server, which would substantially increase the + +overhead of processing, and every datagram would have to carry the full + +service request field, which would increase the size of the packet + +header. + + + In general, if a network is intended for "virtual circuit service", + + 13 + + +or things similar to that, then using a special high overhead mechanism + +for circuit setup makes sense. However, current directions in research + +are leading away from this class of protocol, so once again the + +architecture was designed not to preclude alternative protocol + +structures. The only rational position was that the particular + +dispatching strategy used should be part of the higher level protocol + +design, not the IP layer. + + + This same argument about circuit setup mechanisms also applies to + +the design of the IP address structure. Many protocols do not transmit + +a full address field as part of every packet, but rather transmit a + +short identifier which is created as part of a circuit setup from source + +to destination. If the full address needs to be carried in only the + +first packet of a long exchange, then the overhead of carrying a very + +long address field can easily be justified. Under these circumstances, + +one can create truly extravagant address fields, which are capable of + +extending to address almost any conceivable entity. However, this + +strategy is useable only in a virtual circuit net, where the packets + +being transmitted are part of a established sequence, otherwise this + +large extravagant address must be transported on every packet. Since + +Internet explicitly rejected this restriction on the architecture, it + +was necessary to come up with an address field that was compact enough + +to be sent in every datagram, but general enough to correctly route the + +datagram through the catanet without a previous setup phase. The IP + +address of 32 bits is the compromise that results. Clearly it requires + +a substantial amount of shoehorning to address all of the interesting + +places in the universe with only 32 bits. On the other hand, had the + + 14 + + +address field become much bigger, IP would have been susceptible to + +another criticism, which is that the header had grown unworkably large. + +Again, the fundamental design decision was that the protocol be designed + +in such a way that it supported research in new and different sorts of + +protocol architectures. + + + There are some limited restrictions imposed by the IP design on the + +port mechanism selected by the higher level process. In particular, + +when a packet goes awry somewhere on the internet, the offending packet + +is returned, along with an error indication, as part of an ICMP packet. + +An ICMP packet returns only the IP layer, and the next 64 bits of the + +original datagram. Thus, any higher level protocol which wishes to sort + +out from which port a particular offending datagram came must make sure + +that the port information is contained within the first 64 bits of the + +next level header. This also means, in most cases, that it is possible + +to imagine, as part of the IP layer, a port dispatch mechanism which + +works by masking and matching on the first 64 bits of the incoming + +higher level header. + + |