summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc2887.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc2887.txt')
-rw-r--r--doc/rfc/rfc2887.txt1235
1 files changed, 1235 insertions, 0 deletions
diff --git a/doc/rfc/rfc2887.txt b/doc/rfc/rfc2887.txt
new file mode 100644
index 0000000..453619b
--- /dev/null
+++ b/doc/rfc/rfc2887.txt
@@ -0,0 +1,1235 @@
+
+
+
+
+
+
+Network Working Group M. Handley
+Request for Comments: 2887 S. Floyd
+Category: Informational ACIRI
+ B. Whetten
+ Talarian
+ R. Kermode
+ Motorola
+ L. Vicisano
+ Cisco
+ M. Luby
+ Digital Fountain, Inc.
+ August 2000
+
+
+ The Reliable Multicast Design Space for Bulk Data Transfer
+
+Status of this Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2000). All Rights Reserved.
+
+Abstract
+
+ The design space for reliable multicast is rich, with many possible
+ solutions having been devised. However, application requirements
+ serve to constrain this design space to a relatively small solution
+ space. This document provides an overview of the design space and
+ the ways in which application constraints affect possible solutions.
+
+1. Introduction
+
+ The term "general purpose reliable multicast protocol" is something
+ of an oxymoron. Different applications have different requirements
+ of a reliable multicast protocol, and these requirements constrain
+ the design space in ways that two applications with differing
+ requirements often cannot share a single solution. There are however
+ many successful reliable multicast protocol designs that serve more
+ special purpose requirements well.
+
+ In this document we attempt to review the design space for reliable
+ multicast protocols intended for bulk data transfer. The term bulk
+ data transfer should be taken as having broad meaning - the main
+ limitations are that the data stream is continuous and long lived -
+
+
+
+Handley, et al. Informational [Page 1]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ constraints necessary for the forms of congestion control we
+ currently understand. The purpose of this review is to gather
+ together an overview of the field and to make explicit the
+ constraints imposed by particular mechanisms. The aim is to provide
+ guidance to the standardization process for protocols and protocol
+ building blocks. In doing this, we cluster potential solutions into
+ a number of loose categories - real protocols may be composed of
+ mechanisms from more than one of these clusters.
+
+ The main constraint on solutions is imposed by the need to scale to
+ large receiver sets. For small receiver sets the design space is
+ much less restricted.
+
+2. Application Constraints
+
+ Application requirements for reliable multicast (RM) are as broad and
+ varied as the applications themselves. However, there are a set of
+ requirements that significantly affect the design of an RM protocol.
+ A brief list includes:
+
+ o Does the application need to know that everyone received the data?
+
+ o Does the application need to constrain differences between
+ receivers?
+
+ o Does the application need to scale to large numbers of receivers?
+
+ o Does the application need to be totally reliable?
+
+ o Does the application need ordered data?
+
+ o Does the application need to provide low-delay delivery?
+
+ o Does the application need to provide time-bounded delivery?
+
+ o Does the application need many interacting senders?
+
+ o Is the application data flow intermittent?
+
+ o Does the application need to work in the public Internet?
+
+ o Does the application need to work without a return path (e.g.
+ satellite)?
+
+ o Does the application need to provide secure delivery?
+
+
+
+
+
+
+Handley, et al. Informational [Page 2]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ In the context of standardizing bulk data transfer protocols, we can
+ rule out applications with multiple interacting senders and
+ intermittent data flows. It is not that these applications are
+ unimportant, but that we do not yet have effective congestion control
+ for such applications.
+
+2.1. Did everyone receive the data?
+
+ In many applications a logically defined unit or units of data is to
+ be delivered to multiple clients, e.g., a file or a set of files, a
+ software package, a stock quote or package of stock quotes, an event
+ notification, a set of slides, a frame or block from a video. An
+ application data unit (ADU) is defined to be a logically separable
+ unit of data that is useful to the application. In some cases, an
+ application data unit may be short enough to fit into a single packet
+ (e.g., an event notification or a stock quote), whereas in other
+ cases an application data unit may be much longer than a packet
+ (e.g., a software package).
+
+ A protocol may optionally provide delivery confirmation to ensure
+ reliable delivery, i.e., a mechanism for receivers to inform the
+ sender when data has been delivered. There are two types of
+ confirmation, at the application data unit level and at the packet
+ level. Application data unit confirmation is useful at the
+ application level, e.g., to inform the application about receiver
+ progress and to decide when to stop sending packets about a
+ particular application data unit. Packet confirmation is useful at
+ the transport level, e.g., to inform the transport level when it can
+ release buffer space being used for storing packets for which
+ delivery has been confirmed.
+
+ Some applications have a strong requirement for confirmation that all
+ the receivers got an ADU, or if not, to be informed of which specific
+ receivers failed to receive the entire ADU. Examples include
+ applications where receivers pay for data, and reliable file-system
+ replication. Other applications do not have such a requirement. An
+ example is the distribution of free software.
+
+ If the application does need to know that every receiver got the ADU,
+ then a positive acknowledgment must be received from every receiver,
+ although it may be possible to aggregate these acknowledgments. If
+ the application needs to know precisely which receivers failed to get
+ the ADU, additional constraints are placed on acknowledgment
+ aggregation.
+
+ It should be noted that different mechanisms can be used for ADU-
+ level confirmation and packet-level confirmation in the same
+ application. For example, an ADU-level confirmation mechanism using
+
+
+
+Handley, et al. Informational [Page 3]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ positive acknowledgments may sit on top of a packet-level NACK or
+ FEC-based transport. Typically this only makes sense when ADUs are
+ significantly larger than a single packet.
+
+2.2. Constraining differences
+
+ Some applications need to constrain differences between receivers so
+ that the data reception characteristics for all receivers falls
+ within some range. An example is a stock price feed, where it is
+ unacceptable for a receiver to suffer delivery that is delayed
+ significantly more than any other receiver.
+
+ This requirement is difficult to satisfy without harming performance.
+ Typically solutions involve not sending more than a limited amount of
+ new data until positive acknowledgments have been received from all
+ the receivers. Such a solution does not cope with network and end-
+ system failures well.
+
+2.3. Receiver Set Scaling
+
+ There are many applications for RM that do not need to scale to large
+ numbers of receivers. For such applications, a range of solutions
+ may be available that are not available for applications where
+ scaling to large receiver sets is a requirement.
+
+ A protocol must achieve good throughput of application data units to
+ receivers. This means that most data that is delivered to receivers
+ is useful in recovering the application data unit that they are
+ trying to receive. A protocol must also provide good congestion
+ control to fairly share the available network resources between all
+ applications. Receiver set scaling is one of the most important
+ constraints in meeting these requirements, because it strictly limits
+ the mechanisms that can be used to achieve these requirements to
+ those that will efficiently scale to a large receiver population.
+ Acknowledgement packets have been employed by many systems to achieve
+ these goals, but it is important to understand the strength and
+ limitations of different ways of using such packets.
+
+ In a very small system, it may be acceptable to have the receivers
+ acknowledge every packet. This approach provides the sender with the
+ maximum amount of information about reception conditions at all the
+ receivers, information that can be used both to achieve good
+ throughput and to achieve congestion control.
+
+
+
+
+
+
+
+
+Handley, et al. Informational [Page 4]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ For larger systems, such "flat ACK" schemes cause acknowledge
+ implosions at the sender. Attempts have been made to reduce this
+ problem by sending aggregate ACKs infrequently [RMWT98, BC94], but it
+ is very difficult to incorporate effective congestion control into
+ such protocols because of the spareceness of feedback.
+
+ Using negative acknowledgments (NACKs) instead of ACKs reduces this
+ problem to one of NACK implosion (only from the receivers missing the
+ packets), and because the sender really only needs to know that at
+ least one receiver is missing data in order to achieve good
+ throughput, various NACK suppression mechanisms can be applied.
+
+ An alternative to NACKs is ACK aggregation, which can be done by
+ arranging the receivers into a logical tree, so that each leaf sends
+ ACKs to its parent which aggregates them, and passes them on up the
+ tree. Tree-based protocols scale well, but tree formation can be
+ problematic.
+
+ Other ACK topologies such as rings are also possible, but are often
+ more difficult to form and maintain than trees are. An alternative
+ strategy is to add mechanisms to routers so that they can help out in
+ achieving good throughput or in reducing the cost of achieving good
+ throughput.
+
+ All these solutions improve receiver set scaling, but they all have
+ limits of one form or another. One class of solutions scales to an
+ infinite number of receivers by having no feedback channel whatsoever
+ in order to achieve good throughput. These open-loop solutions take
+ the initial data and encode it using an FEC-style mechanism. This
+ encoded data is transmitted in a continuous stream. Receivers then
+ join the session and receive packets until they have sufficient
+ packets to decode the original data, at which point they leave the
+ session.
+
+ Thus, it is clear that the intended scale of the session constrains
+ the possible solutions. All solutions will work for very small
+ sessions, but as the intended receive set increases, the range of
+ possible solutions that can be deployed safely decreases.
+
+ It should also be noted that hybrids of these mechanisms are
+ possible, and that using one mechanism at the packet-level and a
+ different (typically higher overhead) solution at the ADU level may
+ also scale reasonably if the ADUs are large compared to packets.
+
+
+
+
+
+
+
+
+Handley, et al. Informational [Page 5]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+2.4. Total vs Semi-reliable
+
+ Many applications require delivery of application data units to be
+ totally reliable; if any of the application data unit is missing,
+ none of the received portion of the application data unit is useful.
+ File transfer applications are a good example of applications
+ requiring total reliability.
+
+ However, some applications do not need total reliability. An example
+ is audio broadcasting, where missing packets reduce the quality of
+ the received audio but do not render it unusable. Such applications
+ can sometimes get by without any additional reliability over native
+ IP reliability, but often having a semi-reliable multicast protocol
+ is desirable.
+
+2.5. Time-bounded Delivery
+
+ Many applications just require data to be delivered to the receivers
+ as fast as possible. They have no absolute deadline for delivery.
+
+ However, some applications have hard delivery constraints - if the
+ data does not arrive at the receiver by a certain time, there is no
+ point in delivering it at all. Such time-boundedness may be as a
+ result of real-time constraints such as with audio or video
+ streaming, or as the result of new data superseding old data. In
+ both cases, the requirement is for the application to have a greater
+ degree of control over precisely what the application sends at which
+ time than might be required with applications such as file transfer.
+
+ Time-bounded delivery usually also implies a semi-reliable protocol,
+ but the converse does not necessarily hold.
+
+3. Network Constraints
+
+ The properties of the network in which the application is being
+ deployed may themselves constrain the reliable multicast design
+ space.
+
+3.1. Internet vs Intranet
+
+ In principle the Internet and intranets are the same. In practice
+ however, the fact that an intranet is under one administration might
+ allow for solutions to be configured that can not easily be done in
+ the public Internet. Thus, if the data is of very high value, it
+ might be appropriate to enhance the routers to provide assistance to
+ a reliable multicast transport protocol. In the public Internet, it
+ is less likely that the additional expense required to support this
+ state in the routers would be acceptable.
+
+
+
+Handley, et al. Informational [Page 6]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+3.2. Return Path
+
+ In principle, when feedback is required from receivers, this feedback
+ can be multicast or unicast. Multicast feedback has advantages,
+ especially in NACK-based protocols where it is valuable for NACK
+ suppression. However, it is not clear at this time whether all ISPs
+ will allow all members of a session to send to that session. If
+ multicast feedback is not allowed, then unicast feedback can almost
+ always be substituted, although often at the expense of additional
+ messages and mechanisms.
+
+ Some networks may not allow any form of feedback however. The
+ primary example of this occurs with satellite broadcasts where the
+ back channel may be very narrow or even non-existent. For such
+ networks the solution space is very constrained - only FEC-based
+ encodings have any real chance of working. If the receivers are
+ direct satellite receivers, then no congestion control is needed, but
+ it is dangerous to make such assumptions because it is possible for a
+ satellite hop to feed downstream networks. Thus, congestion control
+ still needs to be considered with solutions that do not have a return
+ path.
+
+3.3. Network Assistance
+
+ A reliable multicast protocol must involve mechanisms running in end
+ hosts, and must involve routers forwarding multicast packets.
+ However under some circumstances, it is possible to rely on some
+ additional degree of assistance from network elements. Broadly
+ speaking we can cluster RM protocols into four classes depending on
+ the degree of support received from other network elements.
+
+ No Additional Support
+ The routers merely forward packets, and only the sender and
+ receivers have any reliable multicast protocol state.
+
+ Layered Approaches
+ Data is split across multiple multicast groups. Receivers join
+ appropriate groups to receive only the traffic they require. This
+ may in some cases require fast join or leave functionality from
+ the routers, and may require more forwarding state in the routers.
+
+ Server-based Approaches
+ Additional nodes are used to assist with data delivery or feedback
+ aggregation. These additional nodes might not be normal senders
+ or receivers, and may be present on the distribution or feedback
+ tree only to provide assistance to the reliable multicast
+ protocol. They would not otherwise receive the multicast traffic.
+
+
+
+
+Handley, et al. Informational [Page 7]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ Router-based Approaches
+ With router-based approaches, routers on the normal data
+ distribution tree from the sender to the receivers assist in the
+ delivery of data or feedback aggregation or suppression. As
+ routers can directly influence multicast routing, they have more
+ control over which traffic goes to which group members than
+ server-based approaches. However routers do not normally have a
+ large amount of spare memory or processing power, which restricts
+ how much functionality can be placed in the routers. In addition,
+ router code is normally more difficult to upgrade than application
+ code, so router-based approaches need to be very general as they
+ are more difficult to deploy and to change.
+
+4. Good Throughput Mechanisms
+
+ Two main concerns that a RM protocol must address are congestion
+ control and good throughput. Packet loss plays a major role with
+ respect to both concerns. The primary symptom of congestion in many
+ networks is packet loss. The primary obstacle that must be overcome
+ to achieve good throughput is packet loss. Thus, measuring and
+ reacting to packet loss is crucial to address both concerns. RM
+ solutions that address these concerns can be roughly categorized as
+ using one or more of the following techniques:
+
+ o Data packet acknowledgment.
+
+ o Negative acknowledgment of missing data packets.
+
+ o Redundancy allowing not all packets to be received.
+
+ These techniques themselves can be usefully subdivided, so that we
+ can examine the parts of the requirement space in which each
+ mechanism can be deployed. In this section, we focus on using these
+ mechanisms for achieving good throughput, and in the next section we
+ focus on using these mechanisms for congestion control.
+
+4.1. ACK-based Mechanisms
+
+ The simplest ACK-based mechanism involves every receiver sending an
+ ACK packet for every data packet it receives and resending packets
+ that are lost by any receiver. Such mechanisms are limited to very
+ small receiver groups by the implosion of ACKs received at the
+ sender, and for this reason they are impractical for most
+ applications.
+
+
+
+
+
+
+
+Handley, et al. Informational [Page 8]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ Putting multiple ACKs into a single data packet [RMWT98] reduces the
+ implosion problem by a constant amount, allowing slightly larger
+ receiver groups. However a limit is soon reached whereby feedback to
+ the sender is too infrequent for sender-based congestion control
+ mechanisms to work reliably.
+
+ Arranging the receivers into a ring [WKM94] whereby an "ACK-token" is
+ passed around the ring prevents the implosion problem for data.
+ However ring creation and maintenance may itself be problematic.
+ Also if ring creation does not take into account network topology
+ (something which is difficult to achieve in practice), then the
+ number of ACK packets crossing the network backbone for each data
+ packet sent may increase O(n) with the number of receivers.
+
+4.1.1. Tree-based ACK Mechanisms
+
+ Arranging the receivers into a tree [MWB+98, KCW98] whereby receivers
+ generate ACKs to a parent node, which aggregates those ACKs to its
+ parent in turn, is both more robust and more easily configured than a
+ ring. The ACK-tree is typically only used for ACK-aggregation - data
+ packets are multicast from the sender to the receivers as normal.
+ Trees are easier to construct than rings because more local
+ information can be used in their construction. Also they can be more
+ fault tolerant than rings because node failures only affect a subset
+ of receivers, each of which can easily and locally decide to by-pass
+ its parent and report directly to the node one level higher in the
+ tree. With good ACK-tree formation, tree-based ACK mechanisms have
+ the potential to be one of the most scalable RM solutions.
+
+ To be simple to deploy, tree-based protocols must be self-organizing
+ - the receivers must form the tree themselves using local information
+ in a scalable manner. Such mechanisms are possible, but are not
+ trivial. The main scaling limitations of tree-based protocols
+ therefore come from the tree formation and maintenance mechanisms
+ rather than from the use of ACKs. Without such a scalable and
+ automatic tree-formation mechanism, tree-based protocols must rely on
+ manual configuration, which significantly limits their applicability
+ (often to intranets) and (due to the complexity of configuration)
+ their scalability.
+
+ Orthogonal to the issue of tree formation is the issue of subtree
+ retransmission. With appropriate router mechanisms, or the use of
+ multiple multicast groups, it is possible to allow the intermediate
+ tree nodes to retransmit missing data to the nodes below them in the
+ tree rather than relying on the original sender to retransmit the
+ data. This relies on there being a good correlation at the point of
+ the intermediate node between the ACK tree and the actual data tree,
+ as well as there being a mechanism to constrain the retransmission to
+
+
+
+Handley, et al. Informational [Page 9]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ the subtree. A good automatic tree formation mechanism combined with
+ the use of administrative scoped multicast groups might provide such
+ a solution. Without such tree formation mechanisms, subtree
+ retransmission is difficult to deploy in large groups in the public
+ internet. This could also be solved by the use of transport-
+ level router mechanisms to assist or perform retransmission, although
+ existing router mechanisms [FLST98] support NACK-based rather than
+ ACK-based protocols.
+
+ Another important issue is the nature of the aggregation performed at
+ interior nodes on the ACK-tree. Such nodes could:
+
+ 1. aggregate ACKs by sending a single ACK when all their children
+ have ACKed,
+
+ 2. aggregate ACKs by listing all the children that have ACKed,
+
+ 3. send an aggregated ACK with a NACK-like exception list.
+
+ For data packets, 1. is clearly more scalable, and should be
+ preferred. However if the sender needs to know exactly which
+ receivers received the data, 2. and 3. provide this information.
+ Fortunately, there is usually no need to do this on a per-packet
+ basis, but rather on a per-ADU basis. Doing 1. on a per packet
+ basis, and 3. on a per ADU basis is the most scalable solution for
+ applications that need this information, and suffers virtually no
+ disadvantage compared to the other solutions used on a per-packet
+ basis.
+
+4.2. NACK-based mechanisms
+
+ Instead of sending an ACK for every data packet received, receivers
+ can send a negative acknowledgment (NACK) for every data packet they
+ discover they did not receive. This has a number of advantages over
+ ACK-based mechanisms:
+
+ o The sender no longer needs to know exactly how many receivers
+ there are. This removes the topology-building phase needed for
+ ring- or tree-style ACK-based algorithms.
+
+ o Fault-tolerance is made somewhat simpler by making receivers
+ responsible for reliability.
+
+ o Sender state can be significantly reduced because the sender does
+ not need to keep track of the receivers state.
+
+
+
+
+
+
+Handley, et al. Informational [Page 10]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ o Only a single NACK is needed from any receiver to indicate a
+ packet that is missing by any number of receivers. Thus NACK
+ suppression is possible.
+
+ The disadvantages are that it is more difficult for the sender to
+ know that it can free transmission buffers, and that additional
+ session level mechanisms are needed if the sender really needs to
+ know if a particular receiver actually received all the data.
+ However for many applications, neither of these is an issue.
+
+4.2.1. NACK Suppression
+
+ The key differences between NACK-based protocols is in how NACK-
+ suppression is performed. The goal is for only one NACK to reach the
+ sender (or a node that can resend the missing data) as soon as
+ possible after the loss is first noticed, and for only one copy of
+ the missing data to be received by those nodes needing
+ retransmission.
+
+ Different mechanisms come close to satisfying these goals in
+ different ways.
+
+ o SRM [FJM95] uses random timers weighted by the round trip time
+ between the sender and each node missing the data. This is
+ effective, but requires computing the RTT to each receiver before
+ suppression works properly.
+
+ o NTE [HC97] uses a sender-triggered mechanism based on random keys
+ and sliding masks. This does not require random timers, and works
+ for very large sessions, but makes it difficult to provide the
+ constant low-level stream of feedback needed to perform congestion
+ control.
+
+ o AAP [Ha99] uses exponentially distributed random timers and is
+ effective for large sessions without needing to compute the RTT to
+ each receiver.
+
+ o PGM [FLST98] and LMS [PPV98] use additional mechanisms in routers
+ to suppress duplicate NACKs. In the case of PGM, router
+ assistance suppliments SRM-stype random timers and localizes the
+ suppression so that the whole group does not need suppressing.
+
+ The most general of these mechanisms is probably exponentially
+ weighted random timers. Although SRM style timers can reduce
+ feedback delay, they are harder to use correctly in situations where
+ all the RTTs are not known, or where the number of respondees is
+
+
+
+
+
+Handley, et al. Informational [Page 11]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ unknown. In contrast, exponentially weighted random timers work well
+ across a large range of session sizes with good worst case delay
+ characteristics.
+
+ Either form of random timer based mechanism can be supplemented by
+ router-support where it is available. Sender triggered NACK
+ mechanisms (e.g. [HC97]) are more difficult to integrate with
+ router-based support mechanisms.
+
+4.3. Replication
+
+ Some RM protocols can be designed so as to not need explicit
+ reliability mechanisms except in comparatively rare cases. An
+ example is in a multicast game, where the position of a moving object
+ is continuously multicast. This positional stream does not require
+ additional reliability because a new position superseding the old one
+ will be sent before any retransmission could take place. However,
+ when the moving object interacts with other objects or stops moving,
+ then an explicit reliability mechanism is required to reliably send
+ the interaction information or last position.
+
+ It is not just games that can be built in this manner - the NTE
+ shared text editor[HC97] uses just such a mechanism with changes to a
+ line of text. For every change the whole line is sent, and so long
+ as the user keeps typing no explicit reliability mechanism is needed.
+ The major advantage of replication is that it is not susceptible to
+ spatially uncorrelated packet loss. With a traditional ACK or NACK
+ based protocol, the probability of any particular packet being
+ received by all the receivers in a large group can be very low. This
+ leads to high retransmission rates. In contrast, replicated
+ streams do not suffer as the size of the receiver group increases -
+ different receivers lose different packets, but this does not
+ increase network traffic.
+
+4.4. Packet-level Forward Error Correction
+
+ Forward Error Correction (FEC) is a well known technique for
+ protecting data against corruption. For reliable multicast it is
+ most useful in the form of erasure codes.
+
+ The simplest form of packet-level FEC is to take a group of packets
+ that is to be sent, and to XOR the packets together to form a
+ newpacket which is also sent. If there were three original packets
+ plus the XOR packet sent, then if a receiver is missing any one of
+ the original data packets, but receives the XOR packet, then it can
+ reproduce the missing original packet.
+
+
+
+
+
+Handley, et al. Informational [Page 12]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ More general erasure codes exist [BKKKLZ95], [Ri97], [LMSSS97] that
+ allow the generation of n encoding packets from k original data
+ packets. In such cases, so long as at least k of the n encoding
+ packets are received, then the k original data packets can be
+ reproduced.
+
+ To apply FEC the sender groups data packets into rounds, and encoding
+ packets are produced based on all the data packets in a round. A
+ round may consist of all data packets in an entire application data
+ unit in some cases, whereas in other cases it may consist of a group
+ of data packets that make up only a small portion of an application
+ data unit.
+
+ Using erasure codes to repair packet loss is a significant
+ improvement over simple retransmission because the dependency on
+ which packets have been lost is removed. Thus, the amount of repair
+ traffic required to repair spatially uncorrelated packet loss is
+ considerably lessened.
+
+ We can divide packet-level FEC schemes into two categories: proactive
+ FEC and reactive FEC. The difference between the two is that for
+ proactive FEC the sender decides a priori how many encoding packets
+ to send for each round of data packets, whereas for reactive FEC the
+ sender initially transmits only the original data packets for each
+ round. Then, the sender uses feedback from the receivers to compute
+ how many packets were lost by the receiver that experienced the most
+ loss in each round, and then only that number of additional encoding
+ packets are sent for that round. These encoding packets will then
+ also serve to repair loss at the other receivers that are missing
+ fewer packets. The receivers report via ACKs or NACKs how many
+ packets are missing from each round. With NACKs, only the receiver
+ missing the most packets need send a NACK for this round, so this is
+ used to weight the random timers in the NACK calculation.
+
+ Proactive and reactive FEC can be combined, e.g., a certain amount of
+ proactive FEC can be sent for each round and if there are receivers
+ that experience more loss than can be overcome by this for some
+ rounds then they can request and receive additional encoding packets
+ for these rounds.
+
+ FEC is very effective at reducing the repair traffic for packet loss.
+ However, it requires that the data to be sent to be grouped into
+ rounds, which can add to end-to-end latency. For bulk-data
+ applications this is typically not a problem, but this may be an
+ issue for interactive applications where replication may be a better
+ solution.
+
+
+
+
+
+Handley, et al. Informational [Page 13]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+4.5. Layered FEC
+
+ An alternative use of packet level FEC is possible when data is
+ spread across several multicast groups [RVC98], [BLMR98]. In such
+ cases, the original k data packets are used to generate n encoding
+ packets, where n is much larger than k. The n encoded packets are
+ then striped across multiple multicast groups. When a receiver
+ wishes to receive the original data it joins one or more of the
+ multicast groups, and receives the encoding packets. Once it has
+ received k different encoding packets, the receiver can then leave
+ all the multicast groups and reconstruct the original data.
+
+ The primary importance of such a layering is that it allows different
+ receivers to be able to receive the traffic at different rates
+ according to the available capacity. Such schemes do not require any
+ form of feedback from the receivers to the sender to ensure good
+ throughput, and therefore the need for good throughput does not
+ constrain the size of the receiver set. However, to perform adequate
+ network congestion control using receiver joins and leaves in this
+ manner may require coordination between members that are behind the
+ same congested link from the sender. As described in the next
+ section, [RVC98] suggests such a layered congestion control scheme.
+
+5. Congestion Control Mechanisms
+
+ The basic delivery model of the Internet is best-effort service. No
+ guarantees are given as to throughput, delay or packet loss. End-
+ systems are expected to be adaptive, and to reduce their transmission
+ rate to a level appropriate for the congestion state of the network.
+ Although increasingly the Internet will start to support reserved
+ bandwidth and differentiated service classes for specialist
+ applications, unless an end-system knows explicitly that it has
+ reserved bandwidth, it must still perform congestion control.
+
+ Broadly speaking, there are five classes of single-sender multicast
+ congestion control solution:
+
+ o Sender-controlled, one group.
+
+ A single multicast group is used for data distribution. Feedback
+ from the group members is used to control the rate of this group.
+ The goal is to transmit at a rate dictated by the slowest
+ receiver.
+
+
+
+
+
+
+
+
+Handley, et al. Informational [Page 14]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ o Sender-controlled, multiple groups.
+
+ One initial multicast group is adaptively subdivided into multiple
+ subgroups with subdivisions centered on congestion points in the
+ network. Application-level relays buffer data from a group nearer
+ the original sender, and retransmit it at a slower rate into a
+ group further from the original sender. In this way, different
+ receivers can receiver the data at different rates. Sender-based
+ congestion control takes place between the members of a subgroup
+ and their relay.
+
+ o Receiver-controlled, one group.
+
+ A single multicast group is used for data distribution. The
+ receivers determine if the sender is transmitting too rapidly for
+ the current congestion state of the network, and they leave the
+ group if this is the case.
+
+ o Receiver-controlled, layered organization.
+
+ A layered approach for how to combine this scheme with a
+ congestion control protocol that requires no receiver feedback is
+ described in [RVC98]. The sender stripes data across multiple
+ multicast groups simultaneously. Receivers join and leave these
+ layered groups depending on their measurements of the congestion
+ state of the network, so that the amount of data being received is
+ always appropriate. However, this scheme relies on receivers to
+ join and leave the different multicast groups in a coordinated
+ fashion behind a bottleneck link, and it has not yet been
+ completely confirmed that this approach will scale in practice to
+ the Internet. As a result, more work on this congestion control
+ mechanism would be beneficial.
+
+ o Router-based congestion control.
+
+ It is possible to add additional mechanisms to multicast routers
+ to assist in multicast congestion control. Such mechanisms could
+ include:
+
+ o Conditional joins (a multicast join that specifies a loss rate
+ above which it is acceptable for the router to reject the
+ join).
+
+ o Router filtering of traffic that exceeds a reasonable rate.
+ This may include mechanisms for filtering traffic at different
+ points in the network at different rates depending on local
+ congestion conditions [LVS99].
+
+
+
+
+Handley, et al. Informational [Page 15]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ o Fair queuing schemes combined with end-to-end adaptation.
+
+ Router-based schemes generally require more state in network
+ routers than has traditionally been acceptable for backbone
+ routers. Thus, in the near-term, such schemes are only likely to
+ be applicable for intranet solutions.
+
+ For reliable multicast protocols, it is important to consider
+ congestion control at the same time as reliability is being
+ considered. The same mechanisms that are used to provide reliability
+ will sometimes be used to provide congestion control.
+
+ In the case of receiver-based congestion control, open-loop delivery
+ using FEC is the likely choice for achieving good throughput for
+ bulk- data transfer. This is because open-loop delivery requires no
+ feedback from receivers, and thus it is a perfect match with a
+ receiver-based congestion-control mechanism that operates without
+ feedback from receivers.
+
+6. Security Considerations
+
+ Generally speaking, security considerations have relatively little
+ effect on constraining the design space for reliable multicast
+ protocols. The primary issues constraining the design space are all
+ related to receiver-set scaling. For authentication of the source
+ and of data integrity, receiver-set scaling is not a significant
+ issue. However, for data encryption, key distribution and
+ particularly re-keying may be significantly affected by receiver-set
+ scaling. Tree and graph based re-keying solutions[WHA98,WGL97] would
+ appear to be appropriate solutions to these problems. It is not
+ clear however that such re-keying solutions need to directly affect
+ the design of the data distribution part of a reliable multicast
+ protocol.
+
+ The primary question to consider for the security of reliable
+ multicast protocols is the role of third-parties. If nodes other
+ than the original source of the data are allowed to send or resend
+ data packets, then the security model for the protocol must take this
+ into account. In particular, it must be clear whether such third
+ parties are trusted or untrusted. A requirement for trusted third
+ parties can make protocols difficult to deploy on the Internet.
+
+ Untrusted third parties (such as receivers that retransmit the data)
+ may be used so long as the data authentication mechanisms take this
+ into account. Typically this means that the original sender
+ digitally signs and timestamps the data, and that the third parties
+ resend this signed timestamped payload unmodified.
+
+
+
+
+Handley, et al. Informational [Page 16]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ Unlike unicast protocols, denial-of-service attacks on multicast
+ transport state are easy if the protocol design does not take such
+ attacks into account. This is because any receiver can join the
+ session, and can then produce feedback that influences the progress
+ of a session involving many other receivers. Hence protection
+ against denial-of-service attacks on reliable multicast protocols
+ must be carefully considered. A receiver that requests
+ retransmission of every packet, or that refuses to acknowledge
+ packets in an ACK-based protocol can potentially bring a reliable
+ multicast session to a standstill. Senders must have appropriate
+ policy to deal with such conditions, and if necessary, evict the
+ receiver from the group. A single receiver masquerading as a large
+ number of receivers may still be an issue under such circumstances
+ with protocols that support NACK-like functionality. Providing
+ unique "keys" to each NACKer when they first NACK using a unicast
+ response might potentially prevent such attacks.
+
+ Denial-of-service attacks caused by traffic flooding are however
+ somewhat easier to protect against than with unicast. Unwanted
+ senders can simply be pruned from the distribution tree using the
+ mechanisms implemented in IGMP v3[CDT99].
+
+7. Conclusions
+
+ In this document we present an overview of the design space for
+ reliable multicast within the context of one-to-many bulk-data
+ transfer. Other flavors of multicast application are not considered
+ in this document, and hence the overview given should not be
+ considered inclusive of the design space for protocols that fall
+ outside the context of one-to-many bulk-data transfer. During the
+ course of this overview, we have reaffirmed the notion that the
+ process of reliable multicast protocol design is affected by a number
+ of factors that render the generation of a "one size fits all
+ solution" moot. These factors are then described to show how an
+ application's needs serve to constrain the set of available
+ techniques that may be used to create a reliable multicast protocol.
+ We examined a number of basic techniques and to show how well they
+ can meet the needs of certain types of applications.
+
+ This document is intended to provide guidance to the IETF community
+ regarding the standardization of reliable multicast protocols for
+ bulk-data transfer. Given the degree to which application
+ requirements constrain reliable multicast solutions, and the diverse
+ set of applications that need to be supported, it should be clear
+ that any standardization work should take great pains to be future-
+ proof. This would seem to imply not standardizing complete reliable
+ multicast transport protocols in one pass, but rather examining the
+ degree to which such protocols are separable into functional building
+
+
+
+Handley, et al. Informational [Page 17]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ blocks, and standardizing these blocks separately to the maximum
+ degree that makes sense. Such an approach allows for protocol
+ evolution, and allows applications with new constraints to be
+ supported with maximal reuse of existing and tested mechanisms.
+
+8. Acknowledgments
+
+ This document represents an overview of the reliable multicast design
+ space. The ideas presented are not those of the authors, but are
+ collected from the varied presentations and discussions in the IRTF
+ Reliable Multicast Research Group. Although they are too numerous to
+ list here, we thank everyone who has participated in these
+ discussions for their contributions.
+
+9. Authors' Addresses
+
+ Mark Handley
+ ATT Center for Internet Research at ICSI,
+ International Computer Science Institute,
+ 1947 Center Street, Suite 600,
+ Berkeley, CA 94704, USA
+
+ EMail: mjh@aciri.org
+
+
+ Sally Floyd
+ ATT Center for Internet Research at ICSI,
+ International Computer Science Institute,
+ 1947 Center Street, Suite 600,
+ Berkeley, CA 94704, USA
+
+ EMail: floyd@aciri.org
+
+
+ Brian Whetten
+ Talarian Corporation,
+ 333 Distel Circle,
+ Los Altos, CA 94022, USA
+
+ EMail: whetten@talarian.com
+
+
+
+
+
+
+
+
+
+
+
+Handley, et al. Informational [Page 18]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ Roger Kermode
+ Motorola Australian Research Centre
+ Level 3, 12 Lord St,
+ Botany NSW 2019,
+ Australia
+
+ EMail: Roger.Kermode@motorola.com
+
+
+ Lorenzo Vicisano
+ Cisco Systems,
+ 170 West Tasman Dr.
+ San Jose, CA 95134, USA
+
+ EMail: lorenzo@cisco.com
+
+
+ Michael Luby
+ Digital Fountain, Inc.
+ 600 Alabama Street
+ San Francisco, CA 94110
+
+ EMail: luby@digitalfountain.com
+
+10. References
+
+ [BC94] K. Birman, T. Clark. "Performance of the Isis Distributed
+ Computing Toolkit." Technical Report TR-94-1432, Dept. of
+ Computer Science, Cornell University.
+
+ [BKKKLZ95] J. Bloemer, M. Kalfane, M. Karpinski, R. Karp, M. Luby, D.
+ Zuckerman, "An XOR-based Erasure Resilient Coding Scheme",
+ ICSI Technical Report No. TR-95-048, August 1995.
+
+ [BLMR98] J. Byers, M. Luby, M. Mitzenmacher, A. Rege, "A Digital
+ Fountain Approach to Reliable Distribution of Bulk Data",
+ Proc ACM SIGCOMM 98.
+
+ [CDT99] Cain, B., Deering, S., and A. Thyagarajan, "Internet Group
+ Management Protocol, Version 3", Work in Progress.
+
+ [FLST98] Farinacci, D., Lin, S., Speakman, T. and A. Tweedly, "PGM
+ reliable transport protocol specification", Work in
+ Progress.
+
+ [FJM95] S. Floyd, V. Jacobson, S. McCanne, "A Reliable Multicast
+ Framework for Light-weight Sessions and Application Level
+ Framing", Proc ACM SIGCOMM 95, Aug 1995 pp. 342-356.
+
+
+
+Handley, et al. Informational [Page 19]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ [Ha99] Handley, M., "Multicast address allocation protocol
+ (AAP)", Work in Progress.
+
+ [HC97] M. Handley and J. Crowcroft, "Network text editor (NTE) a
+ scalable shared text editor for MBone," ACM Computer
+ Communication Review, vol. 27, pp. 197-208, Oct. 1997. ACM
+ SIGCOMM'97, Sept. 1997.
+
+ [KCW98] Kadansky, M., Chiu, D. and J. Wesley, "Tree-based reliable
+ multicast (TRAM)", Work in Progress.
+
+ [LMSSS97] M. Luby, M. Mitzenmacher, A. Shokrollahi, D. Spielman, V.
+ Stemann, "Practical Loss-Resilient Codes", Proc ACM
+ Symposium on Theory of Computing, 1997.
+
+ [MWB+98] Montgomery, T., Whetten, B., Basavaiah, M., Paul, S.,
+ Rastogi, N., Conlan, J. and T. Yeh, "THE RMTP-II
+ PROTOCOL", Work in Progress.
+
+ [PPV98] C. Papadopoulos, G. Parulkar, and G. Varghese, "An error
+ control scheme for large-scale multicast applications," in
+ Proceedings of the Conference on Computer Communications
+ (IEEE Infocom), (San Francisco, California), p. 1188,
+ March/April 1998.
+
+ [Ri97] L. Rizzo, "Effective erasure codes for reliable computer
+ communication protocols," ACM Computer Communication
+ Review, vol. 27, pp. 24-36, Apr. 1997.
+
+ [RV97] L. Rizzo, L. Vicisano, "A Reliable Multicast data
+ Distribution Protocol based on software FEC techniques",
+ Proc. of The Fourth IEEE Workshop on the Architecture and
+ Implementation of High Performance Communication Systems
+ (HPCS'97), Sani Beach, Chalkidiki, Greece June 23-25,
+ 1997.
+
+ [RVC98] L. Rizzo, L. Vicisano, J. Crowcroft, "The RLC multicast
+ congestion control algorithm", submitted to IEEE Network -
+ special issue multicast.
+
+ [RMWT98] Robertson, K., Miller, K., White, M. and A. Tweedly,
+ "StarBurst multicast file transfer protocol (MFTP)
+ specification", Work in Progress.
+
+ [WHA98] Wallner, D., Hardler, E. and R. Agee, "Key Management for
+ Multicast: Issues and Architectures", RFC 2627, June 1999.
+
+
+
+
+
+Handley, et al. Informational [Page 20]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+ [WKM94] Brian Whetten, Simon Kaplan, and Todd Montgomery, "A high
+ performance totally ordered multicast protocol," research
+ memorandum, Aug. 1994.
+
+ [WGL97] C.K. Wong, M. Gouda, S. Lam, "Secure Group Communications
+ Using Key Graphs," Technical Report TR 97-23, Department
+ of Computer Sciences, The University of Texas at Austin,
+ July 1997.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Handley, et al. Informational [Page 21]
+
+RFC 2887 Multicast Design Space for Bulk Data Transfer August 2000
+
+
+11. Full Copyright Statement
+
+ Copyright (C) The Internet Society (2000). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assigns.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is currently provided by the
+ Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Handley, et al. Informational [Page 22]
+