diff options
Diffstat (limited to 'doc/rfc/rfc2887.txt')
-rw-r--r-- | doc/rfc/rfc2887.txt | 1235 |
1 files changed, 1235 insertions, 0 deletions
diff --git a/doc/rfc/rfc2887.txt b/doc/rfc/rfc2887.txt new file mode 100644 index 0000000..453619b --- /dev/null +++ b/doc/rfc/rfc2887.txt @@ -0,0 +1,1235 @@ + + + + + + +Network Working Group M. Handley +Request for Comments: 2887 S. Floyd +Category: Informational ACIRI + B. Whetten + Talarian + R. Kermode + Motorola + L. Vicisano + Cisco + M. Luby + Digital Fountain, Inc. + August 2000 + + + The Reliable Multicast Design Space for Bulk Data Transfer + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2000). All Rights Reserved. + +Abstract + + The design space for reliable multicast is rich, with many possible + solutions having been devised. However, application requirements + serve to constrain this design space to a relatively small solution + space. This document provides an overview of the design space and + the ways in which application constraints affect possible solutions. + +1. Introduction + + The term "general purpose reliable multicast protocol" is something + of an oxymoron. Different applications have different requirements + of a reliable multicast protocol, and these requirements constrain + the design space in ways that two applications with differing + requirements often cannot share a single solution. There are however + many successful reliable multicast protocol designs that serve more + special purpose requirements well. + + In this document we attempt to review the design space for reliable + multicast protocols intended for bulk data transfer. The term bulk + data transfer should be taken as having broad meaning - the main + limitations are that the data stream is continuous and long lived - + + + +Handley, et al. Informational [Page 1] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + constraints necessary for the forms of congestion control we + currently understand. The purpose of this review is to gather + together an overview of the field and to make explicit the + constraints imposed by particular mechanisms. The aim is to provide + guidance to the standardization process for protocols and protocol + building blocks. In doing this, we cluster potential solutions into + a number of loose categories - real protocols may be composed of + mechanisms from more than one of these clusters. + + The main constraint on solutions is imposed by the need to scale to + large receiver sets. For small receiver sets the design space is + much less restricted. + +2. Application Constraints + + Application requirements for reliable multicast (RM) are as broad and + varied as the applications themselves. However, there are a set of + requirements that significantly affect the design of an RM protocol. + A brief list includes: + + o Does the application need to know that everyone received the data? + + o Does the application need to constrain differences between + receivers? + + o Does the application need to scale to large numbers of receivers? + + o Does the application need to be totally reliable? + + o Does the application need ordered data? + + o Does the application need to provide low-delay delivery? + + o Does the application need to provide time-bounded delivery? + + o Does the application need many interacting senders? + + o Is the application data flow intermittent? + + o Does the application need to work in the public Internet? + + o Does the application need to work without a return path (e.g. + satellite)? + + o Does the application need to provide secure delivery? + + + + + + +Handley, et al. Informational [Page 2] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + In the context of standardizing bulk data transfer protocols, we can + rule out applications with multiple interacting senders and + intermittent data flows. It is not that these applications are + unimportant, but that we do not yet have effective congestion control + for such applications. + +2.1. Did everyone receive the data? + + In many applications a logically defined unit or units of data is to + be delivered to multiple clients, e.g., a file or a set of files, a + software package, a stock quote or package of stock quotes, an event + notification, a set of slides, a frame or block from a video. An + application data unit (ADU) is defined to be a logically separable + unit of data that is useful to the application. In some cases, an + application data unit may be short enough to fit into a single packet + (e.g., an event notification or a stock quote), whereas in other + cases an application data unit may be much longer than a packet + (e.g., a software package). + + A protocol may optionally provide delivery confirmation to ensure + reliable delivery, i.e., a mechanism for receivers to inform the + sender when data has been delivered. There are two types of + confirmation, at the application data unit level and at the packet + level. Application data unit confirmation is useful at the + application level, e.g., to inform the application about receiver + progress and to decide when to stop sending packets about a + particular application data unit. Packet confirmation is useful at + the transport level, e.g., to inform the transport level when it can + release buffer space being used for storing packets for which + delivery has been confirmed. + + Some applications have a strong requirement for confirmation that all + the receivers got an ADU, or if not, to be informed of which specific + receivers failed to receive the entire ADU. Examples include + applications where receivers pay for data, and reliable file-system + replication. Other applications do not have such a requirement. An + example is the distribution of free software. + + If the application does need to know that every receiver got the ADU, + then a positive acknowledgment must be received from every receiver, + although it may be possible to aggregate these acknowledgments. If + the application needs to know precisely which receivers failed to get + the ADU, additional constraints are placed on acknowledgment + aggregation. + + It should be noted that different mechanisms can be used for ADU- + level confirmation and packet-level confirmation in the same + application. For example, an ADU-level confirmation mechanism using + + + +Handley, et al. Informational [Page 3] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + positive acknowledgments may sit on top of a packet-level NACK or + FEC-based transport. Typically this only makes sense when ADUs are + significantly larger than a single packet. + +2.2. Constraining differences + + Some applications need to constrain differences between receivers so + that the data reception characteristics for all receivers falls + within some range. An example is a stock price feed, where it is + unacceptable for a receiver to suffer delivery that is delayed + significantly more than any other receiver. + + This requirement is difficult to satisfy without harming performance. + Typically solutions involve not sending more than a limited amount of + new data until positive acknowledgments have been received from all + the receivers. Such a solution does not cope with network and end- + system failures well. + +2.3. Receiver Set Scaling + + There are many applications for RM that do not need to scale to large + numbers of receivers. For such applications, a range of solutions + may be available that are not available for applications where + scaling to large receiver sets is a requirement. + + A protocol must achieve good throughput of application data units to + receivers. This means that most data that is delivered to receivers + is useful in recovering the application data unit that they are + trying to receive. A protocol must also provide good congestion + control to fairly share the available network resources between all + applications. Receiver set scaling is one of the most important + constraints in meeting these requirements, because it strictly limits + the mechanisms that can be used to achieve these requirements to + those that will efficiently scale to a large receiver population. + Acknowledgement packets have been employed by many systems to achieve + these goals, but it is important to understand the strength and + limitations of different ways of using such packets. + + In a very small system, it may be acceptable to have the receivers + acknowledge every packet. This approach provides the sender with the + maximum amount of information about reception conditions at all the + receivers, information that can be used both to achieve good + throughput and to achieve congestion control. + + + + + + + + +Handley, et al. Informational [Page 4] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + For larger systems, such "flat ACK" schemes cause acknowledge + implosions at the sender. Attempts have been made to reduce this + problem by sending aggregate ACKs infrequently [RMWT98, BC94], but it + is very difficult to incorporate effective congestion control into + such protocols because of the spareceness of feedback. + + Using negative acknowledgments (NACKs) instead of ACKs reduces this + problem to one of NACK implosion (only from the receivers missing the + packets), and because the sender really only needs to know that at + least one receiver is missing data in order to achieve good + throughput, various NACK suppression mechanisms can be applied. + + An alternative to NACKs is ACK aggregation, which can be done by + arranging the receivers into a logical tree, so that each leaf sends + ACKs to its parent which aggregates them, and passes them on up the + tree. Tree-based protocols scale well, but tree formation can be + problematic. + + Other ACK topologies such as rings are also possible, but are often + more difficult to form and maintain than trees are. An alternative + strategy is to add mechanisms to routers so that they can help out in + achieving good throughput or in reducing the cost of achieving good + throughput. + + All these solutions improve receiver set scaling, but they all have + limits of one form or another. One class of solutions scales to an + infinite number of receivers by having no feedback channel whatsoever + in order to achieve good throughput. These open-loop solutions take + the initial data and encode it using an FEC-style mechanism. This + encoded data is transmitted in a continuous stream. Receivers then + join the session and receive packets until they have sufficient + packets to decode the original data, at which point they leave the + session. + + Thus, it is clear that the intended scale of the session constrains + the possible solutions. All solutions will work for very small + sessions, but as the intended receive set increases, the range of + possible solutions that can be deployed safely decreases. + + It should also be noted that hybrids of these mechanisms are + possible, and that using one mechanism at the packet-level and a + different (typically higher overhead) solution at the ADU level may + also scale reasonably if the ADUs are large compared to packets. + + + + + + + + +Handley, et al. Informational [Page 5] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + +2.4. Total vs Semi-reliable + + Many applications require delivery of application data units to be + totally reliable; if any of the application data unit is missing, + none of the received portion of the application data unit is useful. + File transfer applications are a good example of applications + requiring total reliability. + + However, some applications do not need total reliability. An example + is audio broadcasting, where missing packets reduce the quality of + the received audio but do not render it unusable. Such applications + can sometimes get by without any additional reliability over native + IP reliability, but often having a semi-reliable multicast protocol + is desirable. + +2.5. Time-bounded Delivery + + Many applications just require data to be delivered to the receivers + as fast as possible. They have no absolute deadline for delivery. + + However, some applications have hard delivery constraints - if the + data does not arrive at the receiver by a certain time, there is no + point in delivering it at all. Such time-boundedness may be as a + result of real-time constraints such as with audio or video + streaming, or as the result of new data superseding old data. In + both cases, the requirement is for the application to have a greater + degree of control over precisely what the application sends at which + time than might be required with applications such as file transfer. + + Time-bounded delivery usually also implies a semi-reliable protocol, + but the converse does not necessarily hold. + +3. Network Constraints + + The properties of the network in which the application is being + deployed may themselves constrain the reliable multicast design + space. + +3.1. Internet vs Intranet + + In principle the Internet and intranets are the same. In practice + however, the fact that an intranet is under one administration might + allow for solutions to be configured that can not easily be done in + the public Internet. Thus, if the data is of very high value, it + might be appropriate to enhance the routers to provide assistance to + a reliable multicast transport protocol. In the public Internet, it + is less likely that the additional expense required to support this + state in the routers would be acceptable. + + + +Handley, et al. Informational [Page 6] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + +3.2. Return Path + + In principle, when feedback is required from receivers, this feedback + can be multicast or unicast. Multicast feedback has advantages, + especially in NACK-based protocols where it is valuable for NACK + suppression. However, it is not clear at this time whether all ISPs + will allow all members of a session to send to that session. If + multicast feedback is not allowed, then unicast feedback can almost + always be substituted, although often at the expense of additional + messages and mechanisms. + + Some networks may not allow any form of feedback however. The + primary example of this occurs with satellite broadcasts where the + back channel may be very narrow or even non-existent. For such + networks the solution space is very constrained - only FEC-based + encodings have any real chance of working. If the receivers are + direct satellite receivers, then no congestion control is needed, but + it is dangerous to make such assumptions because it is possible for a + satellite hop to feed downstream networks. Thus, congestion control + still needs to be considered with solutions that do not have a return + path. + +3.3. Network Assistance + + A reliable multicast protocol must involve mechanisms running in end + hosts, and must involve routers forwarding multicast packets. + However under some circumstances, it is possible to rely on some + additional degree of assistance from network elements. Broadly + speaking we can cluster RM protocols into four classes depending on + the degree of support received from other network elements. + + No Additional Support + The routers merely forward packets, and only the sender and + receivers have any reliable multicast protocol state. + + Layered Approaches + Data is split across multiple multicast groups. Receivers join + appropriate groups to receive only the traffic they require. This + may in some cases require fast join or leave functionality from + the routers, and may require more forwarding state in the routers. + + Server-based Approaches + Additional nodes are used to assist with data delivery or feedback + aggregation. These additional nodes might not be normal senders + or receivers, and may be present on the distribution or feedback + tree only to provide assistance to the reliable multicast + protocol. They would not otherwise receive the multicast traffic. + + + + +Handley, et al. Informational [Page 7] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + Router-based Approaches + With router-based approaches, routers on the normal data + distribution tree from the sender to the receivers assist in the + delivery of data or feedback aggregation or suppression. As + routers can directly influence multicast routing, they have more + control over which traffic goes to which group members than + server-based approaches. However routers do not normally have a + large amount of spare memory or processing power, which restricts + how much functionality can be placed in the routers. In addition, + router code is normally more difficult to upgrade than application + code, so router-based approaches need to be very general as they + are more difficult to deploy and to change. + +4. Good Throughput Mechanisms + + Two main concerns that a RM protocol must address are congestion + control and good throughput. Packet loss plays a major role with + respect to both concerns. The primary symptom of congestion in many + networks is packet loss. The primary obstacle that must be overcome + to achieve good throughput is packet loss. Thus, measuring and + reacting to packet loss is crucial to address both concerns. RM + solutions that address these concerns can be roughly categorized as + using one or more of the following techniques: + + o Data packet acknowledgment. + + o Negative acknowledgment of missing data packets. + + o Redundancy allowing not all packets to be received. + + These techniques themselves can be usefully subdivided, so that we + can examine the parts of the requirement space in which each + mechanism can be deployed. In this section, we focus on using these + mechanisms for achieving good throughput, and in the next section we + focus on using these mechanisms for congestion control. + +4.1. ACK-based Mechanisms + + The simplest ACK-based mechanism involves every receiver sending an + ACK packet for every data packet it receives and resending packets + that are lost by any receiver. Such mechanisms are limited to very + small receiver groups by the implosion of ACKs received at the + sender, and for this reason they are impractical for most + applications. + + + + + + + +Handley, et al. Informational [Page 8] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + Putting multiple ACKs into a single data packet [RMWT98] reduces the + implosion problem by a constant amount, allowing slightly larger + receiver groups. However a limit is soon reached whereby feedback to + the sender is too infrequent for sender-based congestion control + mechanisms to work reliably. + + Arranging the receivers into a ring [WKM94] whereby an "ACK-token" is + passed around the ring prevents the implosion problem for data. + However ring creation and maintenance may itself be problematic. + Also if ring creation does not take into account network topology + (something which is difficult to achieve in practice), then the + number of ACK packets crossing the network backbone for each data + packet sent may increase O(n) with the number of receivers. + +4.1.1. Tree-based ACK Mechanisms + + Arranging the receivers into a tree [MWB+98, KCW98] whereby receivers + generate ACKs to a parent node, which aggregates those ACKs to its + parent in turn, is both more robust and more easily configured than a + ring. The ACK-tree is typically only used for ACK-aggregation - data + packets are multicast from the sender to the receivers as normal. + Trees are easier to construct than rings because more local + information can be used in their construction. Also they can be more + fault tolerant than rings because node failures only affect a subset + of receivers, each of which can easily and locally decide to by-pass + its parent and report directly to the node one level higher in the + tree. With good ACK-tree formation, tree-based ACK mechanisms have + the potential to be one of the most scalable RM solutions. + + To be simple to deploy, tree-based protocols must be self-organizing + - the receivers must form the tree themselves using local information + in a scalable manner. Such mechanisms are possible, but are not + trivial. The main scaling limitations of tree-based protocols + therefore come from the tree formation and maintenance mechanisms + rather than from the use of ACKs. Without such a scalable and + automatic tree-formation mechanism, tree-based protocols must rely on + manual configuration, which significantly limits their applicability + (often to intranets) and (due to the complexity of configuration) + their scalability. + + Orthogonal to the issue of tree formation is the issue of subtree + retransmission. With appropriate router mechanisms, or the use of + multiple multicast groups, it is possible to allow the intermediate + tree nodes to retransmit missing data to the nodes below them in the + tree rather than relying on the original sender to retransmit the + data. This relies on there being a good correlation at the point of + the intermediate node between the ACK tree and the actual data tree, + as well as there being a mechanism to constrain the retransmission to + + + +Handley, et al. Informational [Page 9] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + the subtree. A good automatic tree formation mechanism combined with + the use of administrative scoped multicast groups might provide such + a solution. Without such tree formation mechanisms, subtree + retransmission is difficult to deploy in large groups in the public + internet. This could also be solved by the use of transport- + level router mechanisms to assist or perform retransmission, although + existing router mechanisms [FLST98] support NACK-based rather than + ACK-based protocols. + + Another important issue is the nature of the aggregation performed at + interior nodes on the ACK-tree. Such nodes could: + + 1. aggregate ACKs by sending a single ACK when all their children + have ACKed, + + 2. aggregate ACKs by listing all the children that have ACKed, + + 3. send an aggregated ACK with a NACK-like exception list. + + For data packets, 1. is clearly more scalable, and should be + preferred. However if the sender needs to know exactly which + receivers received the data, 2. and 3. provide this information. + Fortunately, there is usually no need to do this on a per-packet + basis, but rather on a per-ADU basis. Doing 1. on a per packet + basis, and 3. on a per ADU basis is the most scalable solution for + applications that need this information, and suffers virtually no + disadvantage compared to the other solutions used on a per-packet + basis. + +4.2. NACK-based mechanisms + + Instead of sending an ACK for every data packet received, receivers + can send a negative acknowledgment (NACK) for every data packet they + discover they did not receive. This has a number of advantages over + ACK-based mechanisms: + + o The sender no longer needs to know exactly how many receivers + there are. This removes the topology-building phase needed for + ring- or tree-style ACK-based algorithms. + + o Fault-tolerance is made somewhat simpler by making receivers + responsible for reliability. + + o Sender state can be significantly reduced because the sender does + not need to keep track of the receivers state. + + + + + + +Handley, et al. Informational [Page 10] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + o Only a single NACK is needed from any receiver to indicate a + packet that is missing by any number of receivers. Thus NACK + suppression is possible. + + The disadvantages are that it is more difficult for the sender to + know that it can free transmission buffers, and that additional + session level mechanisms are needed if the sender really needs to + know if a particular receiver actually received all the data. + However for many applications, neither of these is an issue. + +4.2.1. NACK Suppression + + The key differences between NACK-based protocols is in how NACK- + suppression is performed. The goal is for only one NACK to reach the + sender (or a node that can resend the missing data) as soon as + possible after the loss is first noticed, and for only one copy of + the missing data to be received by those nodes needing + retransmission. + + Different mechanisms come close to satisfying these goals in + different ways. + + o SRM [FJM95] uses random timers weighted by the round trip time + between the sender and each node missing the data. This is + effective, but requires computing the RTT to each receiver before + suppression works properly. + + o NTE [HC97] uses a sender-triggered mechanism based on random keys + and sliding masks. This does not require random timers, and works + for very large sessions, but makes it difficult to provide the + constant low-level stream of feedback needed to perform congestion + control. + + o AAP [Ha99] uses exponentially distributed random timers and is + effective for large sessions without needing to compute the RTT to + each receiver. + + o PGM [FLST98] and LMS [PPV98] use additional mechanisms in routers + to suppress duplicate NACKs. In the case of PGM, router + assistance suppliments SRM-stype random timers and localizes the + suppression so that the whole group does not need suppressing. + + The most general of these mechanisms is probably exponentially + weighted random timers. Although SRM style timers can reduce + feedback delay, they are harder to use correctly in situations where + all the RTTs are not known, or where the number of respondees is + + + + + +Handley, et al. Informational [Page 11] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + unknown. In contrast, exponentially weighted random timers work well + across a large range of session sizes with good worst case delay + characteristics. + + Either form of random timer based mechanism can be supplemented by + router-support where it is available. Sender triggered NACK + mechanisms (e.g. [HC97]) are more difficult to integrate with + router-based support mechanisms. + +4.3. Replication + + Some RM protocols can be designed so as to not need explicit + reliability mechanisms except in comparatively rare cases. An + example is in a multicast game, where the position of a moving object + is continuously multicast. This positional stream does not require + additional reliability because a new position superseding the old one + will be sent before any retransmission could take place. However, + when the moving object interacts with other objects or stops moving, + then an explicit reliability mechanism is required to reliably send + the interaction information or last position. + + It is not just games that can be built in this manner - the NTE + shared text editor[HC97] uses just such a mechanism with changes to a + line of text. For every change the whole line is sent, and so long + as the user keeps typing no explicit reliability mechanism is needed. + The major advantage of replication is that it is not susceptible to + spatially uncorrelated packet loss. With a traditional ACK or NACK + based protocol, the probability of any particular packet being + received by all the receivers in a large group can be very low. This + leads to high retransmission rates. In contrast, replicated + streams do not suffer as the size of the receiver group increases - + different receivers lose different packets, but this does not + increase network traffic. + +4.4. Packet-level Forward Error Correction + + Forward Error Correction (FEC) is a well known technique for + protecting data against corruption. For reliable multicast it is + most useful in the form of erasure codes. + + The simplest form of packet-level FEC is to take a group of packets + that is to be sent, and to XOR the packets together to form a + newpacket which is also sent. If there were three original packets + plus the XOR packet sent, then if a receiver is missing any one of + the original data packets, but receives the XOR packet, then it can + reproduce the missing original packet. + + + + + +Handley, et al. Informational [Page 12] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + More general erasure codes exist [BKKKLZ95], [Ri97], [LMSSS97] that + allow the generation of n encoding packets from k original data + packets. In such cases, so long as at least k of the n encoding + packets are received, then the k original data packets can be + reproduced. + + To apply FEC the sender groups data packets into rounds, and encoding + packets are produced based on all the data packets in a round. A + round may consist of all data packets in an entire application data + unit in some cases, whereas in other cases it may consist of a group + of data packets that make up only a small portion of an application + data unit. + + Using erasure codes to repair packet loss is a significant + improvement over simple retransmission because the dependency on + which packets have been lost is removed. Thus, the amount of repair + traffic required to repair spatially uncorrelated packet loss is + considerably lessened. + + We can divide packet-level FEC schemes into two categories: proactive + FEC and reactive FEC. The difference between the two is that for + proactive FEC the sender decides a priori how many encoding packets + to send for each round of data packets, whereas for reactive FEC the + sender initially transmits only the original data packets for each + round. Then, the sender uses feedback from the receivers to compute + how many packets were lost by the receiver that experienced the most + loss in each round, and then only that number of additional encoding + packets are sent for that round. These encoding packets will then + also serve to repair loss at the other receivers that are missing + fewer packets. The receivers report via ACKs or NACKs how many + packets are missing from each round. With NACKs, only the receiver + missing the most packets need send a NACK for this round, so this is + used to weight the random timers in the NACK calculation. + + Proactive and reactive FEC can be combined, e.g., a certain amount of + proactive FEC can be sent for each round and if there are receivers + that experience more loss than can be overcome by this for some + rounds then they can request and receive additional encoding packets + for these rounds. + + FEC is very effective at reducing the repair traffic for packet loss. + However, it requires that the data to be sent to be grouped into + rounds, which can add to end-to-end latency. For bulk-data + applications this is typically not a problem, but this may be an + issue for interactive applications where replication may be a better + solution. + + + + + +Handley, et al. Informational [Page 13] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + +4.5. Layered FEC + + An alternative use of packet level FEC is possible when data is + spread across several multicast groups [RVC98], [BLMR98]. In such + cases, the original k data packets are used to generate n encoding + packets, where n is much larger than k. The n encoded packets are + then striped across multiple multicast groups. When a receiver + wishes to receive the original data it joins one or more of the + multicast groups, and receives the encoding packets. Once it has + received k different encoding packets, the receiver can then leave + all the multicast groups and reconstruct the original data. + + The primary importance of such a layering is that it allows different + receivers to be able to receive the traffic at different rates + according to the available capacity. Such schemes do not require any + form of feedback from the receivers to the sender to ensure good + throughput, and therefore the need for good throughput does not + constrain the size of the receiver set. However, to perform adequate + network congestion control using receiver joins and leaves in this + manner may require coordination between members that are behind the + same congested link from the sender. As described in the next + section, [RVC98] suggests such a layered congestion control scheme. + +5. Congestion Control Mechanisms + + The basic delivery model of the Internet is best-effort service. No + guarantees are given as to throughput, delay or packet loss. End- + systems are expected to be adaptive, and to reduce their transmission + rate to a level appropriate for the congestion state of the network. + Although increasingly the Internet will start to support reserved + bandwidth and differentiated service classes for specialist + applications, unless an end-system knows explicitly that it has + reserved bandwidth, it must still perform congestion control. + + Broadly speaking, there are five classes of single-sender multicast + congestion control solution: + + o Sender-controlled, one group. + + A single multicast group is used for data distribution. Feedback + from the group members is used to control the rate of this group. + The goal is to transmit at a rate dictated by the slowest + receiver. + + + + + + + + +Handley, et al. Informational [Page 14] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + o Sender-controlled, multiple groups. + + One initial multicast group is adaptively subdivided into multiple + subgroups with subdivisions centered on congestion points in the + network. Application-level relays buffer data from a group nearer + the original sender, and retransmit it at a slower rate into a + group further from the original sender. In this way, different + receivers can receiver the data at different rates. Sender-based + congestion control takes place between the members of a subgroup + and their relay. + + o Receiver-controlled, one group. + + A single multicast group is used for data distribution. The + receivers determine if the sender is transmitting too rapidly for + the current congestion state of the network, and they leave the + group if this is the case. + + o Receiver-controlled, layered organization. + + A layered approach for how to combine this scheme with a + congestion control protocol that requires no receiver feedback is + described in [RVC98]. The sender stripes data across multiple + multicast groups simultaneously. Receivers join and leave these + layered groups depending on their measurements of the congestion + state of the network, so that the amount of data being received is + always appropriate. However, this scheme relies on receivers to + join and leave the different multicast groups in a coordinated + fashion behind a bottleneck link, and it has not yet been + completely confirmed that this approach will scale in practice to + the Internet. As a result, more work on this congestion control + mechanism would be beneficial. + + o Router-based congestion control. + + It is possible to add additional mechanisms to multicast routers + to assist in multicast congestion control. Such mechanisms could + include: + + o Conditional joins (a multicast join that specifies a loss rate + above which it is acceptable for the router to reject the + join). + + o Router filtering of traffic that exceeds a reasonable rate. + This may include mechanisms for filtering traffic at different + points in the network at different rates depending on local + congestion conditions [LVS99]. + + + + +Handley, et al. Informational [Page 15] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + o Fair queuing schemes combined with end-to-end adaptation. + + Router-based schemes generally require more state in network + routers than has traditionally been acceptable for backbone + routers. Thus, in the near-term, such schemes are only likely to + be applicable for intranet solutions. + + For reliable multicast protocols, it is important to consider + congestion control at the same time as reliability is being + considered. The same mechanisms that are used to provide reliability + will sometimes be used to provide congestion control. + + In the case of receiver-based congestion control, open-loop delivery + using FEC is the likely choice for achieving good throughput for + bulk- data transfer. This is because open-loop delivery requires no + feedback from receivers, and thus it is a perfect match with a + receiver-based congestion-control mechanism that operates without + feedback from receivers. + +6. Security Considerations + + Generally speaking, security considerations have relatively little + effect on constraining the design space for reliable multicast + protocols. The primary issues constraining the design space are all + related to receiver-set scaling. For authentication of the source + and of data integrity, receiver-set scaling is not a significant + issue. However, for data encryption, key distribution and + particularly re-keying may be significantly affected by receiver-set + scaling. Tree and graph based re-keying solutions[WHA98,WGL97] would + appear to be appropriate solutions to these problems. It is not + clear however that such re-keying solutions need to directly affect + the design of the data distribution part of a reliable multicast + protocol. + + The primary question to consider for the security of reliable + multicast protocols is the role of third-parties. If nodes other + than the original source of the data are allowed to send or resend + data packets, then the security model for the protocol must take this + into account. In particular, it must be clear whether such third + parties are trusted or untrusted. A requirement for trusted third + parties can make protocols difficult to deploy on the Internet. + + Untrusted third parties (such as receivers that retransmit the data) + may be used so long as the data authentication mechanisms take this + into account. Typically this means that the original sender + digitally signs and timestamps the data, and that the third parties + resend this signed timestamped payload unmodified. + + + + +Handley, et al. Informational [Page 16] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + Unlike unicast protocols, denial-of-service attacks on multicast + transport state are easy if the protocol design does not take such + attacks into account. This is because any receiver can join the + session, and can then produce feedback that influences the progress + of a session involving many other receivers. Hence protection + against denial-of-service attacks on reliable multicast protocols + must be carefully considered. A receiver that requests + retransmission of every packet, or that refuses to acknowledge + packets in an ACK-based protocol can potentially bring a reliable + multicast session to a standstill. Senders must have appropriate + policy to deal with such conditions, and if necessary, evict the + receiver from the group. A single receiver masquerading as a large + number of receivers may still be an issue under such circumstances + with protocols that support NACK-like functionality. Providing + unique "keys" to each NACKer when they first NACK using a unicast + response might potentially prevent such attacks. + + Denial-of-service attacks caused by traffic flooding are however + somewhat easier to protect against than with unicast. Unwanted + senders can simply be pruned from the distribution tree using the + mechanisms implemented in IGMP v3[CDT99]. + +7. Conclusions + + In this document we present an overview of the design space for + reliable multicast within the context of one-to-many bulk-data + transfer. Other flavors of multicast application are not considered + in this document, and hence the overview given should not be + considered inclusive of the design space for protocols that fall + outside the context of one-to-many bulk-data transfer. During the + course of this overview, we have reaffirmed the notion that the + process of reliable multicast protocol design is affected by a number + of factors that render the generation of a "one size fits all + solution" moot. These factors are then described to show how an + application's needs serve to constrain the set of available + techniques that may be used to create a reliable multicast protocol. + We examined a number of basic techniques and to show how well they + can meet the needs of certain types of applications. + + This document is intended to provide guidance to the IETF community + regarding the standardization of reliable multicast protocols for + bulk-data transfer. Given the degree to which application + requirements constrain reliable multicast solutions, and the diverse + set of applications that need to be supported, it should be clear + that any standardization work should take great pains to be future- + proof. This would seem to imply not standardizing complete reliable + multicast transport protocols in one pass, but rather examining the + degree to which such protocols are separable into functional building + + + +Handley, et al. Informational [Page 17] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + blocks, and standardizing these blocks separately to the maximum + degree that makes sense. Such an approach allows for protocol + evolution, and allows applications with new constraints to be + supported with maximal reuse of existing and tested mechanisms. + +8. Acknowledgments + + This document represents an overview of the reliable multicast design + space. The ideas presented are not those of the authors, but are + collected from the varied presentations and discussions in the IRTF + Reliable Multicast Research Group. Although they are too numerous to + list here, we thank everyone who has participated in these + discussions for their contributions. + +9. Authors' Addresses + + Mark Handley + ATT Center for Internet Research at ICSI, + International Computer Science Institute, + 1947 Center Street, Suite 600, + Berkeley, CA 94704, USA + + EMail: mjh@aciri.org + + + Sally Floyd + ATT Center for Internet Research at ICSI, + International Computer Science Institute, + 1947 Center Street, Suite 600, + Berkeley, CA 94704, USA + + EMail: floyd@aciri.org + + + Brian Whetten + Talarian Corporation, + 333 Distel Circle, + Los Altos, CA 94022, USA + + EMail: whetten@talarian.com + + + + + + + + + + + +Handley, et al. Informational [Page 18] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + Roger Kermode + Motorola Australian Research Centre + Level 3, 12 Lord St, + Botany NSW 2019, + Australia + + EMail: Roger.Kermode@motorola.com + + + Lorenzo Vicisano + Cisco Systems, + 170 West Tasman Dr. + San Jose, CA 95134, USA + + EMail: lorenzo@cisco.com + + + Michael Luby + Digital Fountain, Inc. + 600 Alabama Street + San Francisco, CA 94110 + + EMail: luby@digitalfountain.com + +10. References + + [BC94] K. Birman, T. Clark. "Performance of the Isis Distributed + Computing Toolkit." Technical Report TR-94-1432, Dept. of + Computer Science, Cornell University. + + [BKKKLZ95] J. Bloemer, M. Kalfane, M. Karpinski, R. Karp, M. Luby, D. + Zuckerman, "An XOR-based Erasure Resilient Coding Scheme", + ICSI Technical Report No. TR-95-048, August 1995. + + [BLMR98] J. Byers, M. Luby, M. Mitzenmacher, A. Rege, "A Digital + Fountain Approach to Reliable Distribution of Bulk Data", + Proc ACM SIGCOMM 98. + + [CDT99] Cain, B., Deering, S., and A. Thyagarajan, "Internet Group + Management Protocol, Version 3", Work in Progress. + + [FLST98] Farinacci, D., Lin, S., Speakman, T. and A. Tweedly, "PGM + reliable transport protocol specification", Work in + Progress. + + [FJM95] S. Floyd, V. Jacobson, S. McCanne, "A Reliable Multicast + Framework for Light-weight Sessions and Application Level + Framing", Proc ACM SIGCOMM 95, Aug 1995 pp. 342-356. + + + +Handley, et al. Informational [Page 19] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + [Ha99] Handley, M., "Multicast address allocation protocol + (AAP)", Work in Progress. + + [HC97] M. Handley and J. Crowcroft, "Network text editor (NTE) a + scalable shared text editor for MBone," ACM Computer + Communication Review, vol. 27, pp. 197-208, Oct. 1997. ACM + SIGCOMM'97, Sept. 1997. + + [KCW98] Kadansky, M., Chiu, D. and J. Wesley, "Tree-based reliable + multicast (TRAM)", Work in Progress. + + [LMSSS97] M. Luby, M. Mitzenmacher, A. Shokrollahi, D. Spielman, V. + Stemann, "Practical Loss-Resilient Codes", Proc ACM + Symposium on Theory of Computing, 1997. + + [MWB+98] Montgomery, T., Whetten, B., Basavaiah, M., Paul, S., + Rastogi, N., Conlan, J. and T. Yeh, "THE RMTP-II + PROTOCOL", Work in Progress. + + [PPV98] C. Papadopoulos, G. Parulkar, and G. Varghese, "An error + control scheme for large-scale multicast applications," in + Proceedings of the Conference on Computer Communications + (IEEE Infocom), (San Francisco, California), p. 1188, + March/April 1998. + + [Ri97] L. Rizzo, "Effective erasure codes for reliable computer + communication protocols," ACM Computer Communication + Review, vol. 27, pp. 24-36, Apr. 1997. + + [RV97] L. Rizzo, L. Vicisano, "A Reliable Multicast data + Distribution Protocol based on software FEC techniques", + Proc. of The Fourth IEEE Workshop on the Architecture and + Implementation of High Performance Communication Systems + (HPCS'97), Sani Beach, Chalkidiki, Greece June 23-25, + 1997. + + [RVC98] L. Rizzo, L. Vicisano, J. Crowcroft, "The RLC multicast + congestion control algorithm", submitted to IEEE Network - + special issue multicast. + + [RMWT98] Robertson, K., Miller, K., White, M. and A. Tweedly, + "StarBurst multicast file transfer protocol (MFTP) + specification", Work in Progress. + + [WHA98] Wallner, D., Hardler, E. and R. Agee, "Key Management for + Multicast: Issues and Architectures", RFC 2627, June 1999. + + + + + +Handley, et al. Informational [Page 20] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + + [WKM94] Brian Whetten, Simon Kaplan, and Todd Montgomery, "A high + performance totally ordered multicast protocol," research + memorandum, Aug. 1994. + + [WGL97] C.K. Wong, M. Gouda, S. Lam, "Secure Group Communications + Using Key Graphs," Technical Report TR 97-23, Department + of Computer Sciences, The University of Texas at Austin, + July 1997. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Handley, et al. Informational [Page 21] + +RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000 + + +11. Full Copyright Statement + + Copyright (C) The Internet Society (2000). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Handley, et al. Informational [Page 22] + |