summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc3208.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc3208.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc3208.txt')
-rw-r--r--doc/rfc/rfc3208.txt6219
1 files changed, 6219 insertions, 0 deletions
diff --git a/doc/rfc/rfc3208.txt b/doc/rfc/rfc3208.txt
new file mode 100644
index 0000000..fb82c26
--- /dev/null
+++ b/doc/rfc/rfc3208.txt
@@ -0,0 +1,6219 @@
+
+
+
+
+
+
+Network Working Group T. Speakman
+Request for Comments: 3208 Cisco Systems
+Category: Experimental J. Crowcroft
+ UCL
+ J. Gemmell
+ Microsoft
+ D. Farinacci
+ Procket Networks
+ S. Lin
+ Juniper Networks
+ D. Leshchiner
+ TIBCO Software
+ M. Luby
+ Digital Fountain
+ T. Montgomery
+ Talarian Corporation
+ L. Rizzo
+ University of Pisa
+ A. Tweedly
+ N. Bhaskar
+ R. Edmonstone
+ R. Sumanasekera
+ L. Vicisano
+ Cisco Systems
+ December 2001
+
+
+ PGM Reliable Transport Protocol Specification
+
+Status of this Memo
+
+ This memo defines an Experimental Protocol for the Internet
+ community. It does not specify an Internet standard of any kind.
+ Discussion and suggestions for improvement are requested.
+ Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2001). All Rights Reserved.
+
+Abstract
+
+ Pragmatic General Multicast (PGM) is a reliable multicast transport
+ protocol for applications that require ordered or unordered,
+ duplicate-free, multicast data delivery from multiple sources to
+ multiple receivers. PGM guarantees that a receiver in the group
+ either receives all data packets from transmissions and repairs, or
+ is able to detect unrecoverable data packet loss. PGM is
+
+
+
+Speakman, et. al. Experimental [Page 1]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ specifically intended as a workable solution for multicast
+ applications with basic reliability requirements. Its central design
+ goal is simplicity of operation with due regard for scalability and
+ network efficiency.
+
+Table of Contents
+
+ 1. Introduction and Overview .................................. 3
+ 2. Architectural Description .................................. 9
+ 3. Terms and Concepts ......................................... 12
+ 4. Procedures - General ....................................... 18
+ 5. Procedures - Sources ....................................... 19
+ 6. Procedures - Receivers ..................................... 22
+ 7. Procedures - Network Elements .............................. 27
+ 8. Packet Formats ............................................. 31
+ 9. Options .................................................... 40
+ 10. Security Considerations .................................... 56
+ 11. Appendix A - Forward Error Correction ...................... 58
+ 12. Appendix B - Support for Congestion Control ................ 72
+ 13. Appendix C - SPM Requests .................................. 79
+ 14. Appendix D - Poll Mechanism ................................ 82
+ 15. Appendix E - Implosion Prevention .......................... 92
+ 16. Appendix F - Transmit Window Example ....................... 98
+ 17 Appendix G - Applicability Statement ....................... 103
+ 18. Abbreviations .............................................. 105
+ 19. Acknowledgments ............................................ 106
+ 20. References ................................................. 106
+ 21. Authors' Addresses.......................................... 108
+ 22. Full Copyright Statement ................................... 111
+
+Nota Bene:
+
+ The publication of this specification is intended to freeze the
+ definition of PGM in the interest of fostering both ongoing and
+ prospective experimentation with the protocol. The intent of that
+ experimentation is to provide experience with the implementation and
+ deployment of a reliable multicast protocol of this class so as to be
+ able to feed that experience back into the longer-term
+ standardization process underway in the Reliable Multicast Transport
+ Working Group of the IETF. Appendix G provides more specific detail
+ on the scope and status of some of this experimentation. Reports of
+ experiments include [16-23]. Additional results and new
+ experimentation are encouraged.
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 2]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+1. Introduction and Overview
+
+ A variety of reliable protocols have been proposed for multicast data
+ delivery, each with an emphasis on particular types of applications,
+ network characteristics, or definitions of reliability ([1], [2],
+ [3], [4]). In this tradition, Pragmatic General Multicast (PGM) is a
+ reliable transport protocol for applications that require ordered or
+ unordered, duplicate-free, multicast data delivery from multiple
+ sources to multiple receivers.
+
+ PGM is specifically intended as a workable solution for multicast
+ applications with basic reliability requirements rather than as a
+ comprehensive solution for multicast applications with sophisticated
+ ordering, agreement, and robustness requirements. Its central design
+ goal is simplicity of operation with due regard for scalability and
+ network efficiency.
+
+ PGM has no notion of group membership. It simply provides reliable
+ multicast data delivery within a transmit window advanced by a source
+ according to a purely local strategy. Reliable delivery is provided
+ within a source's transmit window from the time a receiver joins the
+ group until it departs. PGM guarantees that a receiver in the group
+ either receives all data packets from transmissions and repairs, or
+ is able to detect unrecoverable data packet loss. PGM supports any
+ number of sources within a multicast group, each fully identified by
+ a globally unique Transport Session Identifier (TSI), but since these
+ sources/sessions operate entirely independently of each other, this
+ specification is phrased in terms of a single source and extends
+ without modification to multiple sources.
+
+ More specifically, PGM is not intended for use with applications that
+ depend either upon acknowledged delivery to a known group of
+ recipients, or upon total ordering amongst multiple sources.
+
+ Rather, PGM is best suited to those applications in which members may
+ join and leave at any time, and that are either insensitive to
+ unrecoverable data packet loss or are prepared to resort to
+ application recovery in the event. Through its optional extensions,
+ PGM provides specific mechanisms to support applications as disparate
+ as stock and news updates, data conferencing, low-delay real-time
+ video transfer, and bulk data transfer.
+
+ In the following text, transport-layer originators of PGM data
+ packets are referred to as sources, transport-layer consumers of PGM
+ data packets are referred to as receivers, and network-layer entities
+ in the intervening network are referred to as network elements.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 3]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Unless otherwise specified, the term "repair" will be used to
+ indicate both the actual retransmission of a copy of a missing packet
+ or the transmission of an FEC repair packet.
+
+Terminology
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [14] and
+ indicate requirement levels for compliant PGM implementations.
+
+1.1. Summary of Operation
+
+ PGM runs over a datagram multicast protocol such as IP multicast [5].
+ In the normal course of data transfer, a source multicasts sequenced
+ data packets (ODATA), and receivers unicast selective negative
+ acknowledgments (NAKs) for data packets detected to be missing from
+ the expected sequence. Network elements forward NAKs PGM-hop-by-
+ PGM-hop to the source, and confirm each hop by multicasting a NAK
+ confirmation (NCF) in response on the interface on which the NAK was
+ received. Repairs (RDATA) may be provided either by the source
+ itself or by a Designated Local Repairer (DLR) in response to a NAK.
+
+ Since NAKs provide the sole mechanism for reliability, PGM is
+ particularly sensitive to their loss. To minimize NAK loss, PGM
+ defines a network-layer hop-by-hop procedure for reliable NAK
+ forwarding.
+
+ Upon detection of a missing data packet, a receiver repeatedly
+ unicasts a NAK to the last-hop PGM network element on the
+ distribution tree from the source. A receiver repeats this NAK until
+ it receives a NAK confirmation (NCF) multicast to the group from that
+ PGM network element. That network element responds with an NCF to
+ the first occurrence of the NAK and any further retransmissions of
+ that same NAK from any receiver. In turn, the network element
+ repeatedly forwards the NAK to the upstream PGM network element on
+ the reverse of the distribution path from the source of the original
+ data packet until it also receives an NCF from that network element.
+ Finally, the source itself receives and confirms the NAK by
+ multicasting an NCF to the group.
+
+ While NCFs are multicast to the group, they are not propagated by PGM
+ network elements since they act as hop-by-hop confirmations.
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 4]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ To avoid NAK implosion, PGM specifies procedures for subnet-based NAK
+ suppression amongst receivers and NAK elimination within network
+ elements. The usual result is the propagation of just one copy of a
+ given NAK along the reverse of the distribution path from any network
+ with directly connected receivers to a source.
+
+ The net effect is that unicast NAKs return from a receiver to a
+ source on the reverse of the path on which ODATA was forwarded, that
+ is, on the reverse of the distribution tree from the source. More
+ specifically, they return through exactly the same sequence of PGM
+ network elements through which ODATA was forwarded, but in reverse.
+ The reasons for handling NAKs this way will become clear in the
+ discussion of constraining repairs, but first it's necessary to
+ describe the mechanisms for establishing the requisite source path
+ state in PGM network elements.
+
+ To establish source path state in PGM network elements, the basic
+ data transfer operation is augmented by Source Path Messages (SPMs)
+ from a source, periodically interleaved with ODATA. SPMs function
+ primarily to establish source path state for a given TSI in all PGM
+ network elements on the distribution tree from the source. PGM
+ network elements use this information to address returning unicast
+ NAKs directly to the upstream PGM network element toward the source,
+ and thereby insure that NAKs return from a receiver to a source on
+ the reverse of the distribution path for the TSI.
+
+ SPMs are sent by a source at a rate that serves to maintain up-to-
+ date PGM neighbor information. In addition, SPMs complement the role
+ of DATA packets in provoking further NAKs from receivers, and
+ maintaining receive window state in the receivers.
+
+ As a further efficiency, PGM specifies procedures for the constraint
+ of repairs by network elements so that they reach only those network
+ segments containing group members that did not receive the original
+ transmission. As NAKs traverse the reverse of the ODATA path
+ (upward), they establish repair state in the network elements which
+ is used in turn to constrain the (downward) forwarding of the
+ corresponding RDATA.
+
+ Besides procedures for the source to provide repairs, PGM also
+ specifies options and procedures that permit designated local
+ repairers (DLRs) to announce their availability and to redirect
+ repair requests (NAKs) to themselves rather than to the original
+ source. In addition to these conventional procedures for loss
+ recovery through selective ARQ, Appendix A specifies Forward Error
+ Correction (FEC) procedures for sources to provide and receivers to
+ request general error correcting parity packets rather than selective
+ retransmissions.
+
+
+
+Speakman, et. al. Experimental [Page 5]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Finally, since PGM operates without regular return traffic from
+ receivers, conventional feedback mechanisms for transport flow and
+ congestion control cannot be applied. Appendix B specifies a TCP-
+ friendly, NE-based solution for PGM congestion control, and cites a
+ reference to a TCP-friendly, end-to-end solution for PGM congestion
+ control.
+
+ In its basic operation, PGM relies on a purely rate-limited
+ transmission strategy in the source to bound the bandwidth consumed
+ by PGM transport sessions and to define the transmit window
+ maintained by the source.
+
+ PGM defines four basic packet types: three that flow downstream
+ (SPMs, DATA, NCFs), and one that flows upstream (NAKs).
+
+1.2. Design Goals and Constraints
+
+ PGM has been designed to serve that broad range of multicast
+ applications that have relatively simple reliability requirements,
+ and to do so in a way that realizes the much advertised but often
+ unrealized network efficiencies of multicast data transfer. The
+ usual impediments to realizing these efficiencies are the implosion
+ of negative and positive acknowledgments from receivers to sources,
+ repair latency from the source, and the propagation of repairs to
+ disinterested receivers.
+
+1.2.1. Reliability.
+
+ Reliable data delivery across an unreliable network is conventionally
+ achieved through an end-to-end protocol in which a source (implicitly
+ or explicitly) solicits receipt confirmation from a receiver, and the
+ receiver responds positively or negatively. While the frequency of
+ negative acknowledgments is a function of the reliability of the
+ network and the receiver's resources (and so, potentially quite low),
+ the frequency of positive acknowledgments is fixed at at least the
+ rate at which the transmit window is advanced, and usually more
+ often.
+
+ Negative acknowledgments primarily determine repairs and reliability.
+ Positive acknowledgments primarily determine transmit buffer
+ management.
+
+ When these principles are extended without modification to multicast
+ protocols, the result, at least for positive acknowledgments, is a
+ burden of positive acknowledgments transmitted to the source that
+ quickly threatens to overwhelm it as the number of receivers grows.
+ More succinctly, ACK implosion keeps ACK-based reliable multicast
+ protocols from scaling well.
+
+
+
+Speakman, et. al. Experimental [Page 6]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ One of the goals of PGM is to get as strong a definition of
+ reliability as possible from as simple a protocol as possible. ACK
+ implosion can be addressed in a variety of effective but complicated
+ ways, most of which require re-transmit capability from other than
+ the original source.
+
+ An alternative is to dispense with positive acknowledgments
+ altogether, and to resort to other strategies for buffer management
+ while retaining negative acknowledgments for repairs and reliability.
+ The approach taken in PGM is to retain negative acknowledgments, but
+ to dispense with positive acknowledgments and resort instead to
+ timeouts at the source to manage transmit resources.
+
+ The definition of reliability with PGM is a direct consequence of
+ this design decision. PGM guarantees that a receiver either receives
+ all data packets from transmissions and repairs, or is able to detect
+ unrecoverable data packet loss.
+
+ PGM includes strategies for repeatedly provoking NAKs from receivers,
+ and for adding reliability to the NAKs themselves. By reinforcing
+ the NAK mechanism, PGM minimizes the probability that a receiver will
+ detect a missing data packet so late that the packet is unavailable
+ for repair either from the source or from a designated local repairer
+ (DLR). Without ACKs and knowledge of group membership, however, PGM
+ cannot eliminate this possibility.
+
+1.2.2. Group Membership
+
+ A second consequence of eliminating ACKs is that knowledge of group
+ membership is neither required nor provided by the protocol.
+ Although a source may receive some PGM packets (NAKs for instance)
+ from some receivers, the identity of the receivers does not figure in
+ the processing of those packets. Group membership MAY change during
+ the course of a PGM transport session without the knowledge of or
+ consequence to the source or the remaining receivers.
+
+1.2.3. Efficiency
+
+ While PGM avoids the implosion of positive acknowledgments simply by
+ dispensing with ACKs, the implosion of negative acknowledgments is
+ addressed directly.
+
+ Receivers observe a random back-off prior to generating a NAK during
+ which interval the NAK is suppressed (i.e. it is not sent, but the
+ receiver acts as if it had sent it) by the receiver upon receipt of a
+ matching NCF. In addition, PGM network elements eliminate duplicate
+ NAKs received on different interfaces on the same network element.
+
+
+
+
+Speakman, et. al. Experimental [Page 7]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ The combination of these two strategies usually results in the source
+ receiving just a single NAK for any given lost data packet.
+
+ Whether a repair is provided from a DLR or the original source, it is
+ important to constrain that repair to only those network segments
+ containing members that negatively acknowledged the original
+ transmission rather than propagating it throughout the group. PGM
+ specifies procedures for network elements to use the pattern of NAKs
+ to define a sub-tree within the group upon which to forward the
+ corresponding repair so that it reaches only those receivers that
+ missed it in the first place.
+
+1.2.4. Simplicity
+
+ PGM is designed to achieve the greatest improvement in reliability
+ (as compared to the usual UDP) with the least complexity. As a
+ result, PGM does NOT address conference control, global ordering
+ amongst multiple sources in the group, nor recovery from network
+ partitions.
+
+1.2.5. Operability
+
+ PGM is designed to function, albeit with less efficiency, even when
+ some or all of the network elements in the multicast tree have no
+ knowledge of PGM. To that end, all PGM data packets can be
+ conventionally multicast routed by non-PGM network elements with no
+ loss of functionality, but with some inefficiency in the propagation
+ of RDATA and NCFs.
+
+ In addition, since NAKs are unicast to the last-hop PGM network
+ element and NCFs are multicast to the group, NAK/NCF operation is
+ also consistent across non-PGM network elements. Note that for NAK
+ suppression to be most effective, receivers should always have a PGM
+ network element as a first hop network element between themselves and
+ every path to every PGM source. If receivers are several hops
+ removed from the first PGM network element, the efficacy of NAK
+ suppression may degrade.
+
+1.3. Options
+
+ In addition to the basic data transfer operation described above, PGM
+ specifies several end-to-end options to address specific application
+ requirements. PGM specifies options to support fragmentation, late
+ joining, redirection, Forward Error Correction (FEC), reachability,
+ and session synchronization/termination/reset. Options MAY be
+ appended to PGM data packet headers only by their original
+ transmitters. While they MAY be interpreted by network elements,
+ options are neither added nor removed by network elements.
+
+
+
+Speakman, et. al. Experimental [Page 8]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ All options are receiver-significant (i.e., they must be interpreted
+ by receivers). Some options are also network-significant (i.e., they
+ must be interpreted by network elements).
+
+ Fragmentation MAY be used in conjunction with data packets to allow a
+ transport-layer entity at the source to break up application-layer
+ data packets into multiple PGM data packets to conform with the
+ maximum transmission unit (MTU) supported by the network layer.
+
+ Late joining allows a source to indicate whether or not receivers may
+ request all available repairs when they initially join a particular
+ transport session.
+
+ Redirection MAY be used in conjunction with Poll Responses to allow a
+ DLR to respond to normal NCFs or POLLs with a redirecting POLR
+ advertising its own address as an alternative re-transmitter to the
+ original source.
+
+ FEC techniques MAY be applied by receivers to use source-provided
+ parity packets rather than selective retransmissions to effect loss
+ recovery.
+
+2. Architectural Description
+
+ As an end-to-end transport protocol, PGM specifies packet formats and
+ procedures for sources to transmit and for receivers to receive data.
+ To enhance the efficiency of this data transfer, PGM also specifies
+ packet formats and procedures for network elements to improve the
+ reliability of NAKs and to constrain the propagation of repairs. The
+ division of these functions is described in this section and expanded
+ in detail in the next section.
+
+2.1. Source Functions
+
+ Data Transmission
+
+ Sources multicast ODATA packets to the group within the
+ transmit window at a given transmit rate.
+
+ Source Path State
+
+ Sources multicast SPMs to the group, interleaved with ODATA if
+ present, to establish source path state in PGM network
+ elements.
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 9]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ NAK Reliability
+
+ Sources multicast NCFs to the group in response to any NAKs
+ they receive.
+
+ Repairs
+
+ Sources multicast RDATA packets to the group in response to
+ NAKs received for data packets within the transmit window.
+
+ Transmit Window Advance
+
+ Sources MAY advance the trailing edge of the window according
+ to one of a number of strategies. Implementations MAY support
+ automatic adjustments such as keeping the window at a fixed
+ size in bytes, a fixed number of packets or a fixed real time
+ duration. In addition, they MAY optionally delay window
+ advancement based on NAK-silence for a certain period. Some
+ possible strategies are outlined later in this document.
+
+2.2. Receiver Functions
+
+ Source Path State
+
+ Receivers use SPMs to determine the last-hop PGM network
+ element for a given TSI to which to direct their NAKs.
+
+ Data Reception
+
+ Receivers receive ODATA within the transmit window and
+ eliminate any duplicates.
+
+ Repair Requests
+
+ Receivers unicast NAKs to the last-hop PGM network element (and
+ MAY optionally multicast a NAK with TTL of 1 to the local
+ group) for data packets within the receive window detected to
+ be missing from the expected sequence. A receiver MUST
+ repeatedly transmit a given NAK until it receives a matching
+ NCF.
+
+ NAK Suppression
+
+ Receivers suppress NAKs for which a matching NCF or NAK is
+ received during the NAK transmit back-off interval.
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 10]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Receive Window Advance
+
+ Receivers immediately advance their receive windows upon
+ receipt of any PGM data packet or SPM within the transmit
+ window that advances the receive window.
+
+2.3. Network Element Functions
+
+ Network elements forward ODATA without intervention.
+
+ Source Path State
+
+ Network elements intercept SPMs and use them to establish
+ source path state for the corresponding TSI before multicast
+ forwarding them in the usual way.
+
+ NAK Reliability
+
+ Network elements multicast NCFs to the group in response to any
+ NAK they receive. For each NAK received, network elements
+ create repair state recording the transport session identifier,
+ the sequence number of the NAK, and the input interface on
+ which the NAK was received.
+
+ Constrained NAK Forwarding
+
+ Network elements repeatedly unicast forward only the first copy
+ of any NAK they receive to the upstream PGM network element on
+ the distribution path for the TSI until they receive an NCF in
+ response. In addition, they MAY optionally multicast this NAK
+ upstream with TTL of 1.
+
+ Nota Bene: Once confirmed by an NCF, network elements discard NAK
+ packets; NAKs are NOT retained in network elements beyond this
+ forwarding operation, but state about the reception of them is
+ stored.
+
+ NAK Elimination
+
+ Network elements discard exact duplicates of any NAK for which
+ they already have repair state (i.e., that has been forwarded
+ either by themselves or a neighboring PGM network element), and
+ respond with a matching NCF.
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 11]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Constrained RDATA Forwarding
+
+ Network elements use NAKs to maintain repair state consisting
+ of a list of interfaces upon which a given NAK was received,
+ and they forward the corresponding RDATA only on these
+ interfaces.
+
+ NAK Anticipation
+
+ If a network element hears an upstream NCF (i.e., on the
+ upstream interface for the distribution tree for the TSI), it
+ establishes repair state without outgoing interfaces in
+ anticipation of responding to and eliminating duplicates of the
+ NAK that may arrive from downstream.
+
+3. Terms and Concepts
+
+ Before proceeding from the preceding overview to the detail in the
+ subsequent Procedures, this section presents some concepts and
+ definitions that make that detail more intelligible.
+
+3.1. Transport Session Identifiers
+
+ Every PGM packet is identified by a:
+
+ TSI transport session identifier
+
+ TSIs MUST be globally unique, and only one source at a time may act
+ as the source for a transport session. (Note that repairers do not
+ change the TSI in any RDATA they transmit). TSIs are composed of the
+ concatenation of a globally unique source identifier (GSI) and a
+ source-assigned data-source port.
+
+ Since all PGM packets originated by receivers are in response to PGM
+ packets originated by a source, receivers simply echo the TSI heard
+ from the source in any corresponding packets they originate.
+
+ Since all PGM packets originated by network elements are in response
+ to PGM packets originated by a receiver, network elements simply echo
+ the TSI heard from the receiver in any corresponding packets they
+ originate.
+
+3.2. Sequence Numbers
+
+ PGM uses a circular sequence number space from 0 through ((2**32) -
+ 1) to identify and order ODATA packets. Sources MUST number ODATA
+ packets in unit increments in the order in which the corresponding
+ application data is submitted for transmission. Within a transmit or
+
+
+
+Speakman, et. al. Experimental [Page 12]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ receive window (defined below), a sequence number x is "less" or
+ "older" than sequence number y if it numbers an ODATA packet
+ preceding ODATA packet y, and a sequence number y is "greater" or
+ "more recent" than sequence number x if it numbers an ODATA packet
+ subsequent to ODATA packet x.
+
+3.3. Transmit Window
+
+ The description of the operation of PGM rests fundamentally on the
+ definition of the source-maintained transmit window. This definition
+ in turn is derived directly from the amount of transmitted data (in
+ seconds) a source retains for repair (TXW_SECS), and the maximum
+ transmit rate (in bytes/second) maintained by a source to regulate
+ its bandwidth utilization (TXW_MAX_RTE).
+
+ In terms of sequence numbers, the transmit window is the range of
+ sequence numbers consumed by the source for sequentially numbering
+ and transmitting the most recent TXW_SECS of ODATA packets. The
+ trailing (or left) edge of the transmit window (TXW_TRAIL) is defined
+ as the sequence number of the oldest data packet available for repair
+ from a source. The leading (or right) edge of the transmit window
+ (TXW_LEAD) is defined as the sequence number of the most recent data
+ packet a source has transmitted.
+
+ The size of the transmit window in sequence numbers (TXW_SQNS) (i.e.,
+ the difference between the leading and trailing edges plus one) MUST
+ be no greater than half the PGM sequence number space less one.
+
+ When TXW_TRAIL is equal to TXW_LEAD, the transmit window size is one.
+ When TXW_TRAIL is equal to TXW_LEAD plus one, the transmit window
+ size is empty.
+
+3.4. Receive Window
+
+ The receive window at the receivers is determined entirely by PGM
+ packets from the source. That is, a receiver simply obeys what the
+ source tells it in terms of window state and advancement.
+
+ For a given transport session identified by a TSI, a receiver
+ maintains:
+
+ RXW_TRAIL the sequence number defining the trailing edge of the
+ receive window, the sequence number (known from data
+ packets and SPMs) of the oldest data packet available
+ for repair from the source
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 13]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ RXW_LEAD the sequence number defining the leading edge of the
+ receive window, the greatest sequence number of any
+ received data packet within the transmit window
+
+ The receive window is the range of sequence numbers a receiver is
+ expected to use to identify receivable ODATA.
+
+ A data packet is described as being "in" the receive window if its
+ sequence number is in the receive window.
+
+ The receive window is advanced by the receiver when it receives an
+ SPM or ODATA packet within the transmit window that increments
+ RXW_TRAIL. Receivers also advance their receive windows upon receipt
+ of any PGM data packet within the receive window that advances the
+ receive window.
+
+3.5. Source Path State
+
+ To establish the repair state required to constrain RDATA, it's
+ essential that NAKs return from a receiver to a source on the reverse
+ of the distribution tree from the source. That is, they must return
+ through the same sequence of PGM network elements through which the
+ ODATA was forwarded, but in reverse. There are two reasons for this,
+ the less obvious one being by far the more important.
+
+ The first and obvious reason is that RDATA is forwarded on the same
+ path as ODATA and so repair state must be established on this path if
+ it is to constrain the propagation of RDATA.
+
+ The second and less obvious reason is that in the absence of repair
+ state, PGM network elements do NOT forward RDATA, so the default
+ behavior is to discard repairs. If repair state is not properly
+ established for interfaces on which ODATA went missing, then
+ receivers on those interfaces will continue to NAK for lost data and
+ ultimately experience unrecoverable data loss.
+
+ The principle function of SPMs is to provide the source path state
+ required for PGM network elements to forward NAKs from one PGM
+ network element to the next on the reverse of the distribution tree
+ for the TSI, establishing repair state each step of the way. This
+ source path state is simply the address of the upstream PGM network
+ element on the reverse of the distribution tree for the TSI. That
+ upstream PGM network element may be more than one subnet hop away.
+ SPMs establish the identity of the upstream PGM network element on
+ the distribution tree for each TSI in each group in each PGM network
+ element, a sort of virtual PGM topology. So although NAKs are
+ unicast addressed, they are NOT unicast routed by PGM network
+ elements in the conventional sense. Instead PGM network elements use
+
+
+
+Speakman, et. al. Experimental [Page 14]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ the source path state established by SPMs to direct NAKs PGM-hop-by-
+ PGM-hop toward the source. The idea is to constrain NAKs to the pure
+ PGM topology spanning the more heterogeneous underlying topology of
+ both PGM and non-PGM network elements.
+
+ The result is repair state in every PGM network element between the
+ receiver and the source so that the corresponding RDATA is never
+ discarded by a PGM network element for lack of repair state.
+
+ SPMs also maintain transmit window state in receivers by advertising
+ the trailing and leading edges of the transmit window (SPM_TRAIL and
+ SPM_LEAD). In the absence of data, SPMs MAY be used to close the
+ transmit window in time by advancing the transmit window until
+ SPM_TRAIL is equal to SPM_LEAD plus one.
+
+3.6. Packet Contents
+
+ This section just provides enough short-hand to make the Procedures
+ intelligible. For the full details of packet contents, please refer
+ to Packet Formats below.
+
+3.6.1. Source Path Messages
+
+3.6.1.1. SPMs
+
+ SPMs are transmitted by sources to establish source-path state in PGM
+ network elements, and to provide transmit-window state in receivers.
+
+ SPMs are multicast to the group and contain:
+
+ SPM_TSI the source-assigned TSI for the session to which the
+ SPM corresponds
+
+ SPM_SQN a sequence number assigned sequentially by the source
+ in unit increments and scoped by SPM_TSI
+
+ Nota Bene: this is an entirely separate sequence than is used to
+ number ODATA and RDATA.
+
+ SPM_TRAIL the sequence number defining the trailing edge of the
+ source's transmit window (TXW_TRAIL)
+
+ SPM_LEAD the sequence number defining the leading edge of the
+ source's transmit window (TXW_LEAD)
+
+ SPM_PATH the network-layer address (NLA) of the interface on
+ the PGM network element on which the SPM is forwarded
+
+
+
+
+Speakman, et. al. Experimental [Page 15]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+3.6.2. Data Packets
+
+3.6.2.1. ODATA - Original Data
+
+ ODATA packets are transmitted by sources to send application data to
+ receivers.
+
+ ODATA packets are multicast to the group and contain:
+
+ OD_TSI the globally unique source-assigned TSI
+
+ OD_TRAIL the sequence number defining the trailing edge of the
+ source's transmit window (TXW_TRAIL)
+
+ OD_TRAIL makes the protocol more robust in the face of
+ lost SPMs. By including the trailing edge of the
+ transmit window on every data packet, receivers that
+ have missed any SPMs that advanced the transmit window
+ can still detect the case, recover the application,
+ and potentially re-synchronize to the transport
+ session.
+
+ OD_SQN a sequence number assigned sequentially by the source
+ in unit increments and scoped by OD_TSI
+
+3.6.2.2. RDATA - Repair Data
+
+ RDATA packets are repair packets transmitted by sources or DLRs in
+ response to NAKs.
+
+ RDATA packets are multicast to the group and contain:
+
+ RD_TSI OD_TSI of the ODATA packet for which this is a repair
+
+ RD_TRAIL the sequence number defining the trailing edge of the
+ source's transmit window (TXW_TRAIL). This is updated
+ to the most current value when the repair is sent, so
+ it is not necessarily the same as OD_TRAIL of the
+ ODATA packet for which this is a repair
+
+ RD_SQN OD_SQN of the ODATA packet for which this is a repair
+
+3.6.3. Negative Acknowledgments
+
+3.6.3.1. NAKs - Negative Acknowledgments
+
+ NAKs are transmitted by receivers to request repairs for missing data
+ packets.
+
+
+
+Speakman, et. al. Experimental [Page 16]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ NAKs are unicast (PGM-hop-by-PGM-hop) to the source and contain:
+
+ NAK_TSI OD_TSI of the ODATA packet for which a repair is
+ requested
+
+ NAK_SQN OD_SQN of the ODATA packet for which a repair is
+ requested
+
+ NAK_SRC the unicast NLA of the original source of the missing
+ ODATA.
+
+ NAK_GRP the multicast group NLA
+
+3.6.3.2. NNAKs - Null Negative Acknowledgments
+
+ NNAKs are transmitted by a DLR that receives NAKs redirected to it by
+ either receivers or network elements to provide flow-control feed-
+ back to a source.
+
+ NNAKs are unicast (PGM-hop-by-PGM-hop) to the source and contain:
+
+ NNAK_TSI NAK_TSI of the corresponding re-directed NAK.
+
+ NNAK_SQN NAK_SQN of the corresponding re-directed NAK.
+
+ NNAK_SRC NAK_SRC of the corresponding re-directed NAK.
+
+ NNAK_GRP NAK_GRP of the corresponding re-directed NAK.
+
+3.6.4. Negative Acknowledgment Confirmations
+
+3.6.4.1. NCFs - NAK confirmations
+
+ NCFs are transmitted by network elements and sources in response to
+ NAKs.
+
+ NCFs are multicast to the group and contain:
+
+ NCF_TSI NAK_TSI of the NAK being confirmed
+
+ NCF_SQN NAK_SQN of the NAK being confirmed
+
+ NCF_SRC NAK_SRC of the NAK being confirmed
+
+ NCF_GRP NAK_GRP of the NAK being confirmed
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 17]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+3.6.5. Option Encodings
+
+ OPT_LENGTH 0x00 - Option's Length
+
+ OPT_FRAGMENT 0x01 - Fragmentation
+
+ OPT_NAK_LIST 0x02 - List of NAK entries
+
+ OPT_JOIN 0x03 - Late Joining
+
+ OPT_REDIRECT 0x07 - Redirect
+
+ OPT_SYN 0x0D - Synchronization
+
+ OPT_FIN 0x0E - Session Fin receivers, conventional
+ feedbackish
+
+ OPT_RST 0x0F - Session Reset
+
+ OPT_PARITY_PRM 0x08 - Forward Error Correction Parameters
+
+ OPT_PARITY_GRP 0x09 - Forward Error Correction Group Number
+
+ OPT_CURR_TGSIZE 0x0A - Forward Error Correction Group Size
+
+ OPT_CR 0x10 - Congestion Report
+
+ OPT_CRQST 0x11 - Congestion Report Request
+
+ OPT_NAK_BO_IVL 0x04 - NAK Back-Off Interval
+
+ OPT_NAK_BO_RNG 0x05 - NAK Back-Off Range
+
+ OPT_NBR_UNREACH 0x0B - Neighbor Unreachable
+
+ OPT_PATH_NLA 0x0C - Path NLA
+
+ OPT_INVALID 0x7F - Option invalidated
+
+4. Procedures - General
+
+ Since SPMs, NCFs, and RDATA must be treated conditionally by PGM
+ network elements, they must be distinguished from other packets in
+ the chosen multicast network protocol if PGM network elements are to
+ extract them from the usual switching path.
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 18]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ The most obvious way for network elements to achieve this is to
+ examine every packet in the network for the PGM transport protocol
+ and packet types. However, the overhead of this approach is costly
+ for high-performance, multi-protocol network elements. An
+ alternative, and a requirement for PGM over IP multicast, is that
+ SPMs, NCFs, and RDATA MUST be transmitted with the IP Router Alert
+ Option [6]. This option gives network elements a network-layer
+ indication that a packet should be extracted from IP switching for
+ more detailed processing.
+
+5. Procedures - Sources
+
+5.1. Data Transmission
+
+ Since PGM relies on a purely rate-limited transmission strategy in
+ the source to bound the bandwidth consumed by PGM transport sessions,
+ an assortment of techniques is assembled here to make that strategy
+ as conservative and robust as possible. These techniques are the
+ minimum REQUIRED of a PGM source.
+
+5.1.1. Maximum Cumulative Transmit Rate
+
+ A source MUST number ODATA packets in the order in which they are
+ submitted for transmission by the application. A source MUST
+ transmit ODATA packets in sequence and only within the transmit
+ window beginning with TXW_TRAIL at no greater a rate than
+ TXW_MAX_RTE.
+
+ TXW_MAX_RTE is typically the maximum cumulative transmit rate of SPM,
+ ODATA, and RDATA. Different transmission strategies MAY define
+ TXW_MAX_RTE as appropriate for the implementation.
+
+5.1.2. Transmit Rate Regulation
+
+ To regulate its transmit rate, a source MUST use a token bucket
+ scheme or any other traffic management scheme that yields equivalent
+ behavior. A token bucket [7] is characterized by a continually
+ sustainable data rate (the token rate) and the extent to which the
+ data rate may exceed the token rate for short periods of time (the
+ token bucket size). Over any arbitrarily chosen interval, the number
+ of bytes the source may transmit MUST NOT exceed the token bucket
+ size plus the product of the token rate and the chosen interval.
+
+ In addition, a source MUST bound the maximum rate at which successive
+ packets may be transmitted using a leaky bucket scheme drained at a
+ maximum transmit rate, or equivalent mechanism.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 19]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+5.1.3. Outgoing Packet Ordering
+
+ To preserve the logic of PGM's transmit window, a source MUST
+ strictly prioritize sending of pending NCFs first, pending SPMs
+ second, and only send ODATA or RDATA when no NCFs or SPMs are
+ pending. The priority of RDATA versus ODATA is application
+ dependent. The sender MAY implement weighted bandwidth sharing
+ between RDATA and ODATA. Note that strict prioritization of RDATA
+ over ODATA may stall progress of ODATA if there are receivers that
+ keep generating NAKs so as to always have RDATA pending (e.g. a
+ steady stream of late joiners with OPT_JOIN). Strictly prioritizing
+ ODATA over RDATA may lead to a larger portion of receivers getting
+ unrecoverable losses.
+
+5.1.4. Ambient SPMs
+
+ Interleaved with ODATA and RDATA, a source MUST transmit SPMs at a
+ rate at least sufficient to maintain current source path state in PGM
+ network elements. Note that source path state in network elements
+ does not track underlying changes in the distribution tree from a
+ source until an SPM traverses the altered distribution tree. The
+ consequence is that NAKs may go unconfirmed both at receivers and
+ amongst network elements while changes in the underlying distribution
+ tree take place.
+
+5.1.5. Heartbeat SPMs
+
+ In the absence of data to transmit, a source SHOULD transmit SPMs at
+ a decaying rate in order to assist early detection of lost data, to
+ maintain current source path state in PGM network elements, and to
+ maintain current receive window state in the receivers.
+
+ In this scheme [8], a source maintains an inter-heartbeat timer
+ IHB_TMR which times the interval between the most recent packet
+ (ODATA, RDATA, or SPM) transmission and the next heartbeat
+ transmission. IHB_TMR is initialized to a minimum interval IHB_MIN
+ after the transmission of any data packet. If IHB_TMR expires, the
+ source transmits a heartbeat SPM and initializes IHB_TMR to double
+ its previous value. The transmission of consecutive heartbeat SPMs
+ doubles IHB each time up to a maximum interval IHB_MAX. The
+ transmission of any data packet initializes IHB_TMR to IHB_MIN once
+ again. The effect is to provoke prompt detection of missing packets
+ in the absence of data to transmit, and to do so with minimal
+ bandwidth overhead.
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 20]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+5.1.6. Ambient and Heartbeat SPMs
+
+ Ambient and heartbeat SPMs are described as driven by separate timers
+ in this specification to highlight their contrasting functions.
+ Ambient SPMs are driven by a count-down timer that expires regularly
+ while heartbeat SPMs are driven by a count-down timer that keeps
+ being reset by data, and the interval of which changes once it begins
+ to expire. The ambient SPM timer is just counting down in real-time
+ while the heartbeat timer is measuring the inter-data-packet
+ interval.
+
+ In the presence of data, no heartbeat SPMs will be transmitted since
+ the transmission of data keeps setting the IHB_TMR back to its
+ initial value. At the same time however, ambient SPMs MUST be
+ interleaved into the data as a matter of course, not necessarily as a
+ heartbeat mechanism. This ambient transmission of SPMs is REQUIRED
+ to keep the distribution tree information in the network current and
+ to allow new receivers to synchronize with the session.
+
+ An implementation SHOULD de-couple ambient and heartbeat SPM timers
+ sufficiently to permit them to be configured independently of each
+ other.
+
+5.2. Negative Acknowledgment Confirmation
+
+ A source MUST immediately multicast an NCF in response to any NAK it
+ receives. The NCF is REQUIRED since the alternative of responding
+ immediately with RDATA would not allow other PGM network elements on
+ the same subnet to do NAK anticipation, nor would it allow DLRs on
+ the same subnet to provide repairs. A source SHOULD be able to
+ detect a NAK storm and adopt countermeasure to protect the network
+ against a denial of service. A possible countermeasure is to send
+ the first NCF immediately in response to a NAK and then delay the
+ generation of further NCFs (for identical NAKs) by a small interval,
+ so that identical NCFs are rate-limited, without affecting the
+ ability to suppress NAKs.
+
+5.3. Repairs
+
+ After multicasting an NCF in response to a NAK, a source MUST then
+ multicast RDATA (while respecting TXW_MAX_RTE) in response to any NAK
+ it receives for data packets within the transmit window.
+
+ In the interest of increasing the efficiency of a particular RDATA
+ packet, a source MAY delay RDATA transmission to accommodate the
+ arrival of NAKs from the whole loss neighborhood. This delay SHOULD
+ not exceed twice the greatest propagation delay in the loss
+ neighborhood.
+
+
+
+Speakman, et. al. Experimental [Page 21]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+6. Procedures - Receivers
+
+6.1. Data Reception
+
+ Initial data reception
+
+ A receiver SHOULD initiate data reception beginning with the first
+ data packet it receives within the advertised transmit window. This
+ packet's sequence number (ODATA_SQN) temporarily defines the trailing
+ edge of the transmit window from the receiver's perspective. That
+ is, it is assigned to RXW_TRAIL_INIT within the receiver, and until
+ the trailing edge sequence number advertised in subsequent packets
+ (SPMs or ODATA or RDATA) increments past RXW_TRAIL_INIT, the receiver
+ MUST only request repairs for sequence numbers subsequent to
+ RXW_TRAIL_INIT. Thereafter, it MAY request repairs anywhere in the
+ transmit window. This temporary restriction on repair requests
+ prevents receivers from requesting a potentially large amount of
+ history when they first begin to receive a given PGM transport
+ session.
+
+ Note that the JOIN option, discussed later, MAY be used to provide a
+ different value for RXW_TRAIL_INIT.
+
+ Receiving and discarding data packets
+
+ Within a given transport session, a receiver MUST accept any ODATA or
+ RDATA packets within the receive window. A receiver MUST discard any
+ data packet that duplicates one already received in the transmit
+ window. A receiver MUST discard any data packet outside of the
+ receive window.
+
+ Contiguous data
+
+ Contiguous data is comprised of those data packets within the receive
+ window that have been received and are in the range from RXW_TRAIL up
+ to (but not including) the first missing sequence number in the
+ receive window. The most recently received data packet of contiguous
+ data defines the leading edge of contiguous data.
+
+ As its default mode of operation, a receiver MUST deliver only
+ contiguous data packets to the application, and it MUST do so in the
+ order defined by those data packets' sequence numbers. This provides
+ applications with a reliable ordered data flow.
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 22]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Non contiguous data
+
+ PGM receiver implementations MAY optionally provide a mode of
+ operation in which data is delivered to an application in the order
+ received. However, the implementation MUST only deliver complete
+ application protocol data units (APDUs) to the application. That is,
+ APDUs that have been fragmented into different TPDUs MUST be
+ reassembled before delivery to the application.
+
+6.2. Source Path Messages
+
+ Receivers MUST receive and sequence SPMs for any TSI they are
+ receiving. An SPM is in sequence if its sequence number is greater
+ than that of the most recent in-sequence SPM and within half the PGM
+ number space. Out-of-sequence SPMs MUST be discarded.
+
+ For each TSI, receivers MUST use the most recent SPM to determine the
+ NLA of the upstream PGM network element for use in NAK addressing. A
+ receiver MUST NOT initiate repair requests until it has received at
+ least one SPM for the corresponding TSI.
+
+ Since SPMs require per-hop processing, it is likely that they will be
+ forwarded at a slower rate than data, and that they will arrive out
+ of sync with the data stream. In this case, the window information
+ that the SPMs carry will be out of date. Receivers SHOULD expect
+ this to be the case and SHOULD detect it by comparing the packet lead
+ and trail values with the values the receivers have stored for lead
+ and trail. If the SPM packet values are less, they SHOULD be
+ ignored, but the rest of the packet SHOULD be processed as normal.
+
+6.3. Data Recovery by Negative Acknowledgment
+
+ Detecting missing data packets
+
+ Receivers MUST detect gaps in the expected data sequence in the
+ following manners:
+
+ by comparing the sequence number on the most recently received
+ ODATA or RDATA packet with the leading edge of contiguous data
+
+ by comparing SPM_LEAD of the most recently received SPM with the
+ leading edge of contiguous data
+
+ In both cases, if the receiver has not received all intervening data
+ packets, it MAY initiate selective NAK generation for each missing
+ sequence number.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 23]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ In addition, a receiver may detect a single missing data packet by
+ receiving an NCF or multicast NAK for a data packet within the
+ transmit window which it has not received. In this case it MAY
+ initiate selective NAK generation for the said sequence number.
+
+ In all cases, receivers SHOULD temper the initiation of NAK
+ generation to account for simple mis-ordering introduced by the
+ network. A possible mechanism to achieve this is to assume loss only
+ after the reception of N packets with sequence numbers higher than
+ those of the (assumed) lost packets. A possible value for N is 2.
+ This method SHOULD be complemented with a timeout based mechanism
+ that handles the loss of the last packet before a pause in the
+ transmission of the data stream. The leading edge field in SPMs
+ SHOULD also be taken into account in the loss detection algorithm.
+
+ Generating NAKs
+
+ NAK generation follows the detection of a missing data packet and is
+ the cycle of:
+
+ waiting for a random period of time (NAK_RB_IVL) while listening
+ for matching NCFs or NAKs
+
+ transmitting a NAK if a matching NCF or NAK is not heard
+
+ waiting a period (NAK_RPT_IVL) for a matching NCF and recommencing
+ NAK generation if the matching NCF is not received
+
+ waiting a period (NAK_RDATA_IVL) for data and recommencing NAK
+ generation if the matching data is not received
+
+ The entire generation process can be summarized by the following
+ state machine:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 24]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ |
+ | detect missing tpdu
+ | - clear data retry count
+ | - clear NCF retry count
+ V
+ matching NCF |--------------------------|
+ <---------------| BACK-OFF_STATE | <----------------------
+ | | start timer(NAK_RB_IVL) | ^ ^
+ | | | | |
+ | |--------------------------| | |
+ | matching | | timer expires | |
+ | NAK | | - send NAK | |
+ | | | | |
+ | V V | |
+ | |--------------------------| | |
+ | | WAIT_NCF_STATE | | |
+ | matching NCF | start timer(NAK_RPT_IVL) | | |
+ |<--------------| |------------> |
+ | |--------------------------| timer expires |
+ | | | ^ - increment NCF |
+ | NAK_NCF_RETRIES | | | retry count |
+ | exceeded | | | |
+ | V ----------- |
+ | Cancelation matching NAK |
+ | - restart timer(NAK_RPT_IVL) |
+ | |
+ | |
+ V |--------------------------| |
+ --------------->| WAIT_DATA_STATE |----------------------->
+ |start timer(NAK_RDATA_IVL)| timer expires
+ | | - increment data
+ |--------------------------| retry count
+ | | ^
+ NAK_DATA_RETRIES | | |
+ exceeded | | |
+ | -----------
+ | matching NCF or NAK
+ V - restart timer(NAK_RDATA_IVL)
+ Cancellation
+
+ In any state, receipt of matching RDATA or ODATA completes data
+ recovery and successful exit from the state machine. State
+ transition stops any running timers.
+
+ In any state, if the trailing edge of the window moves beyond the
+ sequence number, data recovery for that sequence number terminates.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 25]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ During NAK_RB_IVL a NAK is said to be pending. When awaiting data or
+ an NCF, a NAK is said to be outstanding.
+
+ Backing off NAK transmission
+
+ Before transmitting a NAK, a receiver MUST wait some interval
+ NAK_RB_IVL chosen randomly over some time period NAK_BO_IVL. During
+ this period, receipt of a matching NAK or a matching NCF will suspend
+ NAK generation. NAK_RB_IVL is counted down from the time a missing
+ data packet is detected.
+
+ A value for NAK_BO_IVL learned from OPT_NAK_BO_IVL (see 16.4.1 below)
+ MUST NOT be used by a receiver (i.e., the receiver MUST NOT NAK)
+ unless either NAK_BO_IVL_SQN is zero, or the receiver has seen
+ POLL_RND == 0 for POLL_SQN =< NAK_BO_IVL_SQN within half the sequence
+ number space.
+
+ When a parity NAK (Appendix A, FEC) is being generated, the back-off
+ interval SHOULD be inversely biased with respect to the number of
+ parity packets requested. This way NAKs requesting larger numbers of
+ parity packets are likely to be sent first and thus suppress other
+ NAKs. A NAK for a given transmission group suppresses another NAK
+ for the same transmission group only if it is requesting an equal or
+ larger number of parity packets.
+
+ When a receiver has to transmit a sequence of NAKs, it SHOULD
+ transmit the NAKs in order from oldest to most recent.
+
+ Suspending NAK generation
+
+ Suspending NAK generation just means waiting for either NAK_RB_IVL,
+ NAK_RPT_IVL or NAK_RDATA_IVL to pass. A receiver MUST suspend NAK
+ generation if a duplicate of the NAK is already pending from this
+ receiver or the NAK is already outstanding from this or another
+ receiver.
+
+ NAK suppression
+
+ A receiver MUST suppress NAK generation and wait at least
+ NAK_RDATA_IVL before recommencing NAK generation if it hears a
+ matching NCF or NAK during NAK_RB_IVL. A matching NCF must match
+ NCF_TSI with NAK_TSI, and NCF_SQN with NAK_SQN.
+
+ Transmitting a NAK
+
+ Upon expiry of NAK_RB_IVL, a receiver MUST unicast a NAK to the
+ upstream PGM network element for the TSI specifying the transport
+ session identifier and missing sequence number. In addition, it MAY
+
+
+
+Speakman, et. al. Experimental [Page 26]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ multicast a NAK with TTL of 1 to the group, if the PGM parent is not
+ directly connected. It also records both the address of the source
+ of the corresponding ODATA and the address of the group in the NAK
+ header.
+
+ It MUST repeat the NAK at a rate governed by NAK_RPT_IVL up to
+ NAK_NCF_RETRIES times while waiting for a matching NCF. It MUST then
+ wait NAK_RDATA_IVL before recommencing NAK generation. If it hears a
+ matching NCF or NAK during NAK_RDATA_IVL, it MUST wait anew for
+ NAK_RDATA_IVL before recommencing NAK generation (i.e. matching NCFs
+ and NAKs restart NAK_RDATA_IVL).
+
+ Completion of NAK generation
+
+ NAK generation is complete only upon the receipt of the matching
+ RDATA (or even ODATA) packet at any time during NAK generation.
+
+ Cancellation of NAK generation
+
+ NAK generation is cancelled upon the advancing of the receive window
+ so as to exclude the matching sequence number of a pending or
+ outstanding NAK, or NAK_DATA_RETRIES / NAK_NCF_RETRIES being
+ exceeded. Cancellation of NAK generation indicates unrecoverable
+ data loss.
+
+ Receiving NCFs and multicast NAKs
+
+ A receiver MUST discard any NCFs or NAKs it hears for data packets
+ outside the transmit window or for data packets it has received.
+ Otherwise they are treated as appropriate for the current repair
+ state.
+
+7. Procedures - Network Elements
+
+7.1. Source Path State
+
+ Upon receipt of an in-sequence SPM, a network element records the
+ Source Path Address SPM_PATH with the multicast routing information
+ for the TSI. If the receiving network element is on the same subnet
+ as the forwarding network element, this address will be the same as
+ the address of the immediately upstream network element on the
+ distribution tree for the TSI. If, however, non-PGM network elements
+ intervene between the forwarding and the receiving network elements,
+ this address will be the address of the first PGM network element
+ across the intervening network elements.
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 27]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ The network element then forwards the SPM on each outgoing interface
+ for that TSI. As it does so, it encodes the network address of the
+ outgoing interface in SPM_PATH in each copy of the SPM it forwards.
+
+7.2. NAK Confirmation
+
+ Network elements MUST immediately transmit an NCF in response to any
+ unicast NAK they receive. The NCF MUST be multicast to the group on
+ the interface on which the NAK was received.
+
+ Nota Bene: In order to avoid creating multicast routing state for
+ PGM network elements across non-PGM-capable clouds, the network-
+ header source address of NCFs transmitted by network elements MUST
+ be set to the ODATA source's NLA, not the network element's NLA as
+ might be expected.
+
+ Network elements should be able to detect a NAK storm and adopt
+ counter-measure to protect the network against a denial of service.
+ A possible countermeasure is to send the first NCF immediately in
+ response to a NAK and then delay the generation of further NCFs (for
+ identical NAKs) by a small interval, so that identical NCFs are
+ rate-limited, without affecting the ability to suppress NAKs.
+
+ Simultaneously, network elements MUST establish repair state for the
+ NAK if such state does not already exist, and add the interface on
+ which the NAK was received to the corresponding repair interface list
+ if the interface is not already listed.
+
+7.3. Constrained NAK Forwarding
+
+ The NAK forwarding procedures for network elements are quite similar
+ to those for receivers, but three important differences should be
+ noted.
+
+ First, network elements do NOT back off before forwarding a NAK
+ (i.e., there is no NAK_BO_IVL) since the resulting delay of the NAK
+ would compound with each hop. Note that NAK arrivals will be
+ randomized by the receivers from which they originate, and this
+ factor in conjunction with NAK anticipation and elimination will
+ combine to forestall NAK storms on subnets with a dense network
+ element population.
+
+ Second, network elements do NOT retry confirmed NAKs if RDATA is not
+ seen; they simply discard the repair state and rely on receivers to
+ re-request the repair. This approach keeps the repair state in the
+ network elements relatively ephemeral and responsive to underlying
+ routing changes.
+
+
+
+
+Speakman, et. al. Experimental [Page 28]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Third, note that ODATA does NOT cancel NAK forwarding in network
+ elements since it is switched by network elements without transport-
+ layer intervention.
+
+ Nota Bene: Once confirmed by an NCF, network elements discard NAK
+ packets; they are NOT retained in network elements beyond this
+ forwarding operation.
+
+ NAK forwarding requires that a network element listen to NCFs for the
+ same transport session. NAK forwarding also requires that a network
+ element observe two time out intervals for any given NAK (i.e., per
+ NAK_TSI and NAK_SQN): NAK_RPT_IVL and NAK_RDATA_IVL.
+
+ The NAK repeat interval NAK_RPT_IVL, limits the length of time for
+ which a network element will repeat a NAK while waiting for a
+ corresponding NCF. NAK_RPT_IVL is counted down from the transmission
+ of a NAK. Expiry of NAK_RPT_IVL cancels NAK forwarding (due to
+ missing NCF).
+
+ The NAK RDATA interval NAK_RDATA_IVL, limits the length of time for
+ which a network element will wait for the corresponding RDATA.
+ NAK_RDATA_IVL is counted down from the time a matching NCF is
+ received. Expiry of NAK_RDATA_IVL causes the network element to
+ discard the corresponding repair state (due to missing RDATA).
+
+ During NAK_RPT_IVL, a NAK is said to be pending. During
+ NAK_RDATA_IVL, a NAK is said to be outstanding.
+
+ A Network element MUST forward NAKs only to the upstream PGM network
+ element for the TSI.
+
+ A network element MUST repeat a NAK at a rate of NAK_RPT_RTE for an
+ interval of NAK_RPT_IVL until it receives a matching NCF. A matching
+ NCF must match NCF_TSI with NAK_TSI, and NCF_SQN with NAK_SQN.
+
+ Upon reception of the corresponding NCF, network elements MUST wait
+ at least NAK_RDATA_IVL for the corresponding RDATA. Receipt of the
+ corresponding RDATA at any time during NAK forwarding cancels NAK
+ forwarding and tears down the corresponding repair state in the
+ network element.
+
+7.4. NAK elimination
+
+ Two NAKs duplicate each other if they bear the same NAK_TSI and
+ NAK_SQN. Network elements MUST discard all duplicates of a NAK that
+ is pending.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 29]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Once a NAK is outstanding, network elements MUST discard all
+ duplicates of that NAK for NAK_ELIM_IVL. Upon expiry of
+ NAK_ELIM_IVL, network elements MUST suspend NAK elimination for that
+ TSI/SQN until the first duplicate of that NAK is seen after the
+ expiry of NAK_ELIM_IVL. This duplicate MUST be forwarded in the
+ usual manner. Once this duplicate NAK is outstanding, network
+ elements MUST once again discard all duplicates of that NAK for
+ NAK_ELIM_IVL, and so on. NAK_RDATA_IVL MUST be reset each time a NAK
+ for the corresponding TSI/SQN is confirmed (i.e., each time
+ NAK_ELIM_IVL is reset). NAK_ELIM_IVL MUST be some small fraction of
+ NAK_RDATA_IVL.
+
+ NAK_ELIM_IVL acts to balance implosion prevention against repair
+ state liveness. That is, it results in the elimination of all but at
+ most one NAK per NAK_ELIM_IVL thereby allowing repeated NAKs to keep
+ the repair state alive in the PGM network elements.
+
+7.5. NAK Anticipation
+
+ An unsolicited NCF is one that is received by a network element when
+ the network element has no corresponding pending or outstanding NAK.
+ Network elements MUST process unsolicited NCFs differently depending
+ on the interface on which they are received.
+
+ If the interface on which an NCF is received is the same interface
+ the network element would use to reach the upstream PGM network
+ element, the network element simply establishes repair state for
+ NCF_TSI and NCF_SQN without adding the interface to the repair
+ interface list, and discards the NCF. If the repair state already
+ exists, the network element restarts the NAK_RDATA_IVL and
+ NAK_ELIM_IVL timers and discards the NCF.
+
+ If the interface on which an NCF is received is not the same
+ interface the network element would use to reach the upstream PGM
+ network element, the network element does not establish repair state
+ and just discards the NCF.
+
+ Anticipated NAKs permit the elimination of any subsequent matching
+ NAKs from downstream. Upon establishing anticipated repair state,
+ network elements MUST eliminate subsequent NAKs only for a period of
+ NAK_ELIM_IVL. Upon expiry of NAK_ELIM_IVL, network elements MUST
+ suspend NAK elimination for that TSI/SQN until the first duplicate of
+ that NAK is seen after the expiry of NAK_ELIM_IVL. This duplicate
+ MUST be forwarded in the usual manner. Once this duplicate NAK is
+ outstanding, network elements MUST once again discard all duplicates
+ of that NAK for NAK_ELIM_IVL, and so on. NAK_RDATA_IVL MUST be reset
+
+
+
+
+
+Speakman, et. al. Experimental [Page 30]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ each time a NAK for the corresponding TSI/SQN is confirmed (i.e.,
+ each time NAK_ELIM_IVL is reset). NAK_ELIM_IVL must be some small
+ fraction of NAK_RDATA_IVL.
+
+7.6. NAK Shedding
+
+ Network elements MAY implement local procedures for withholding NAK
+ confirmations for receivers detected to be reporting excessive loss.
+ The result of these procedures would ultimately be unrecoverable data
+ loss in the receiver.
+
+7.7. Addressing NAKs
+
+ A PGM network element uses the source and group addresses (NLAs)
+ contained in the transport header to find the state for the
+ corresponding TSI, looks up the corresponding upstream PGM network
+ element's address, uses it to re-address the (unicast) NAK, and
+ unicasts it on the upstream interface for the distribution tree for
+ the TSI.
+
+7.8. Constrained RDATA Forwarding
+
+ Network elements MUST maintain repair state for each interface on
+ which a given NAK is received at least once. Network elements MUST
+ then use this list of interfaces to constrain the forwarding of the
+ corresponding RDATA packet only to those interfaces in the list. An
+ RDATA packet corresponds to a NAK if it matches NAK_TSI and NAK_SQN.
+
+ Network elements MUST maintain this repair state only until either
+ the corresponding RDATA is received and forwarded, or NAK_RDATA_IVL
+ passes after forwarding the most recent instance of a given NAK.
+ Thereafter, the corresponding repair state MUST be discarded.
+
+ Network elements SHOULD discard and not forward RDATA packets for
+ which they have no repair state. Note that the consequence of this
+ procedure is that, while it constrains repairs to the interested
+ subset of the network, loss of repair state precipitates further NAKs
+ from neglected receivers.
+
+8. Packet Formats
+
+ All of the packet formats described in this section are transport-
+ layer headers that MUST immediately follow the network-layer header
+ in the packet. Only data packet headers (ODATA and RDATA) may be
+ followed in the packet by application data. For each packet type,
+ the network-header source and destination addresses are specified in
+
+
+
+
+
+Speakman, et. al. Experimental [Page 31]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ addition to the format and contents of the transport layer header.
+ Recall from General Procedures that, for PGM over IP multicast, SPMs,
+ NCFs, and RDATA MUST also bear the IP Router Alert Option.
+
+ For PGM over IP, the IP protocol number is 113.
+
+ In all packets the descriptions of Data-Source Port, Data-Destination
+ Port, Type, Options, Checksum, Global Source ID (GSI), and Transport
+ Service Data Unit (TSDU) Length are:
+
+ Data-Source Port:
+
+ A random port number generated by the source. This port number
+ MUST be unique within the source. Source Port together with
+ Global Source ID forms the TSI.
+
+ Data-Destination Port:
+
+ A globally well-known port number assigned to the given PGM
+ application.
+
+ Type:
+
+ The high-order two bits of the Type field encode a version
+ number, 0x0 in this instance. The low-order nibble of the type
+ field encodes the specific packet type. The intervening two
+ bits (the low-order two bits of the high-order nibble) are
+ reserved and MUST be zero.
+
+ Within the low-order nibble of the Type field:
+
+ values in the range 0x0 through 0x3 represent SPM-like
+ packets (i.e., session-specific, sourced by a source,
+ periodic),
+
+ values in the range 0x4 through 0x7 represent DATA-like
+ packets (i.e., data and repairs),
+
+ values in the range 0x8 through 0xB represent NAK-like
+ packets (i.e., hop-by-hop reliable NAK forwarding
+ procedures),
+
+ and values in the range 0xC through 0xF represent SPMR-like
+ packets (i.e., session-specific, sourced by a receiver,
+ asynchronous).
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 32]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Options:
+
+ This field encodes binary indications of the presence and
+ significance of any options. It also directly encodes some
+ options.
+
+ bit 0 set => One or more Option Extensions are present
+
+ bit 1 set => One or more Options are network-significant
+
+ Note that this bit is clear when OPT_FRAGMENT and/or
+ OPT_JOIN are the only options present.
+
+ bit 6 set => Packet is a parity packet for a transmission group
+ of variable sized packets (OPT_VAR_PKTLEN). Only present when
+ OPT_PARITY is also present.
+
+ bit 7 set => Packet is a parity packet (OPT_PARITY)
+
+ Bits are numbered here from left (0 = MSB) to right (7 = LSB).
+
+ All the other options (option extensions) are encoded in
+ extensions to the PGM header.
+
+ Checksum:
+
+ This field is the usual 1's complement of the 1's complement
+ sum of the entire PGM packet including header.
+
+ The checksum does not include a network-layer pseudo header for
+ compatibility with network address translation. If the
+ computed checksum is zero, it is transmitted as all ones. A
+ value of zero in this field means the transmitter generated no
+ checksum.
+
+ Note that if any entity between a source and a receiver
+ modifies the PGM header for any reason, it MUST either
+ recompute the checksum or clear it. The checksum is mandatory
+ on data packets (ODATA and RDATA).
+
+ Global Source ID:
+
+ A globally unique source identifier. This ID MUST NOT change
+ throughout the duration of the transport session. A
+ RECOMMENDED identifier is the low-order 48 bits of the MD5 [9]
+ signature of the DNS name of the source. Global Source ID
+ together with Data-Source Port forms the TSI.
+
+
+
+
+Speakman, et. al. Experimental [Page 33]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ TSDU Length:
+
+ The length in octets of the transport data unit exclusive of
+ the transport header.
+
+ Note that those who require the TPDU length must obtain it from
+ sum of the transport header length (TH) and the TSDU length.
+ TH length is the sum of the size of the particular PGM packet
+ header (type_specific_size) plus the length of any options that
+ might be present.
+
+ Address Family Indicators (AFIs) are as specified in [10].
+
+8.1. Source Path Messages
+
+ SPMs are sent by a source to establish source path state in network
+ elements and to provide transmit window state to receivers.
+
+ The network-header source address of an SPM is the unicast NLA of the
+ entity that originates the SPM.
+
+ The network-header destination address of an SPM is a multicast group
+ NLA.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Source Port | Destination Port |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Type | Options | Checksum |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Global Source ID ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... Global Source ID | TSDU Length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SPM's Sequence Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Trailing Edge Sequence Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Leading Edge Sequence Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NLA AFI | Reserved |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Path NLA ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+
+ | Option Extensions when present ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+
+
+
+Speakman, et. al. Experimental [Page 34]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Source Port:
+
+ SPM_SPORT
+
+ Data-Source Port, together with SPM_GSI forms SPM_TSI
+
+ Destination Port:
+
+ SPM_DPORT
+
+ Data-Destination Port
+
+ Type:
+
+ SPM_TYPE = 0x00
+
+ Global Source ID:
+
+ SPM_GSI
+
+ Together with SPM_SPORT forms SPM_TSI
+
+ SPM's Sequence Number
+
+ SPM_SQN
+
+ The sequence number assigned to the SPM by the source.
+
+ Trailing Edge Sequence Number:
+
+ SPM_TRAIL
+
+ The sequence number defining the current trailing edge of the
+ source's transmit window (TXW_TRAIL).
+
+ Leading Edge Sequence Number:
+
+ SPM_LEAD
+
+ The sequence number defining the current leading edge of the
+ source's transmit window (TXW_LEAD).
+
+ If SPM_TRAIL == 0 and SPM_LEAD == 0x80000000, this indicates that
+ no window information is present in the packet.
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 35]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Path NLA:
+
+ SPM_PATH
+
+ The NLA of the interface on the network element on which this SPM
+ was forwarded. Initialized by a source to the source's NLA,
+ rewritten by each PGM network element upon forwarding.
+
+8.2. Data Packets
+
+ Data packets carry application data from a source or a repairer to
+ receivers.
+
+ ODATA:
+
+ Original data packets transmitted by a source.
+
+ RDATA:
+
+ Repairs transmitted by a source or by a designated local
+ repairer (DLR) in response to a NAK.
+
+ The network-header source address of a data packet is the unicast NLA
+ of the entity that originates the data packet.
+
+ The network-header destination address of a data packet is a
+ multicast group NLA.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Source Port | Destination Port |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Type | Options | Checksum |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Global Source ID ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... Global Source ID | TSDU Length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Data Packet Sequence Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Trailing Edge Sequence Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Option Extensions when present ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Data ...
+ +-+-+- ...
+
+
+
+
+Speakman, et. al. Experimental [Page 36]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Source Port:
+
+ OD_SPORT, RD_SPORT
+
+ Data-Source Port, together with Global Source ID forms:
+
+ OD_TSI, RD_TSI
+
+ Destination Port:
+
+ OD_DPORT, RD_DPORT
+
+ Data-Destination Port
+
+ Type:
+
+ OD_TYPE = 0x04 RD_TYPE = 0x05
+
+ Global Source ID:
+
+ OD_GSI, RD_GSI
+
+ Together with Source Port forms:
+
+ OD_TSI, RD_TSI
+
+ Data Packet Sequence Number:
+
+ OD_SQN, RD_SQN
+
+ The sequence number originally assigned to the ODATA packet by the
+ source.
+
+ Trailing Edge Sequence Number:
+
+ OD_TRAIL, RD_TRAIL
+
+ The sequence number defining the current trailing edge of the
+ source's transmit window (TXW_TRAIL). In RDATA, this MAY not be
+ the same as OD_TRAIL of the ODATA packet for which it is a repair.
+
+ Data:
+
+ Application data.
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 37]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+8.3. Negative Acknowledgments and Confirmations
+
+ NAK:
+
+ Negative Acknowledgments are sent by receivers to request the
+ repair of an ODATA packet detected to be missing from the
+ expected sequence.
+
+ N-NAK:
+
+ Null Negative Acknowledgments are sent by DLRs to provide flow
+ control feedback to the source of ODATA for which the DLR has
+ provided the corresponding RDATA.
+
+ The network-header source address of a NAK is the unicast NLA of the
+ entity that originates the NAK. The network-header source address of
+ NAK is rewritten by each PGM network element with its own.
+
+ The network-header destination address of a NAK is initialized by the
+ originator of the NAK (a receiver) to the unicast NLA of the upstream
+ PGM network element known from SPMs. The network-header destination
+ address of a NAK is rewritten by each PGM network element with the
+ unicast NLA of the upstream PGM network element to which this NAK is
+ forwarded. On the final hop, the network-header destination address
+ of a NAK is rewritten by the PGM network element with the unicast NLA
+ of the original source or the unicast NLA of a DLR.
+
+ NCF:
+
+ NAK Confirmations are sent by network elements and sources to
+ confirm the receipt of a NAK.
+
+ The network-header source address of an NCF is the ODATA source's
+ NLA, not the network element's NLA as might be expected.
+
+ The network-header destination address of an NCF is a multicast group
+ NLA.
+
+ Note that in NAKs and N-NAKs, unlike the other packets, the field
+ SPORT contains the Data-Destination port and the field DPORT contains
+ the Data-Source port. As a general rule, the content of SPORT/DPORT
+ is determined by the direction of the flow: in packets which travel
+ down-stream SPORT is the port number chosen in the data source
+ (Data-Source Port) and DPORT is the data destination port number
+ (Data-Destination Port). The opposite holds for packets which travel
+ upstream. This makes DPORT the protocol endpoint in the recipient
+ host, regardless of the direction of the packet.
+
+
+
+
+Speakman, et. al. Experimental [Page 38]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Source Port | Destination Port |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Type | Options | Checksum |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Global Source ID ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... Global Source ID | TSDU Length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Requested Sequence Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NLA AFI | Reserved |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Source NLA ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+
+ | NLA AFI | Reserved |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Multicast Group NLA ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+
+ | Option Extensions when present ...
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ...
+
+ Source Port:
+
+ NAK_SPORT, NNAK_SPORT
+
+ Data-Destination Port
+
+ NCF_SPORT
+
+ Data-Source Port, together with Global Source ID forms NCF_TSI
+
+ Destination Port:
+
+ NAK_DPORT, NNAK_DPORT
+
+ Data-Source Port, together with Global Source ID forms:
+
+ NAK_TSI, NNAK_TSI
+
+ NCF_DPORT
+
+ Data-Destination Port
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 39]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Type:
+
+ NAK_TYPE = 0x08 NNAK_TYPE = 0x09
+
+ NCF_TYPE = 0x0A
+
+ Global Source ID:
+
+ NAK_GSI, NNAK_GSI, NCF_GSI
+
+ Together with Data-Source Port forms
+
+ NAK_TSI, NNAK_TSI, NCF_TSI
+
+ Requested Sequence Number:
+
+ NAK_SQN, NNAK_SQN
+
+ NAK_SQN is the sequence number of the ODATA packet for which a
+ repair is requested.
+
+ NNAK_SQN is the sequence number of the RDATA packet for which a
+ repair has been provided by a DLR.
+
+ NCF_SQN
+
+ NCF_SQN is NAK_SQN from the NAK being confirmed.
+
+ Source NLA:
+
+ NAK_SRC, NNAK_SRC, NCF_SRC
+
+ The unicast NLA of the original source of the missing ODATA.
+
+ Multicast Group NLA:
+
+ NAK_GRP, NNAK_GRP, NCF_GRP
+
+ The multicast group NLA. NCFs MAY bear OPT_REDIRECT and/or
+ OPT_NAK_LIST
+
+9. Options
+
+ PGM specifies several end-to-end options to address specific
+ application requirements. PGM specifies options to support
+ fragmentation, late joining, and redirection.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 40]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Options MAY be appended to PGM data packet headers only by their
+ original transmitters. While they MAY be interpreted by network
+ elements, options are neither added nor removed by network elements.
+
+ Options are all in the TLV style, or Type, Length, Value. The Type
+ field is contained in the first byte, where bit 0 is the OPT_END bit,
+ followed by 7 bits of type. The OPT_END bit MUST be set in the last
+ option in the option list, whichever that might be. The Length field
+ is the total length of the option in bytes, and directly follows the
+ Type field. Following the Length field are 5 reserved bits, the
+ OP_ENCODED flag, the 2 Option Extensibility bits OPX and the
+ OP_ENCODED_NULL flag. Last are 7 bits designated for option specific
+ information which may be defined on a per-option basis. If not
+ defined for a particular option, they MUST be set to 0.
+
+ The Option Extensibility bits dictate the desired treatment of an
+ option if it is unknown to the network element processing it.
+
+ Nota Bene: Only network elements pay any attention to these bits.
+
+ The OPX bits are defined as follows:
+
+ 00 - Ignore the option
+
+ 01 - Invalidate the option by changing the type to OPT_INVALID
+ = 0x7F
+
+ 10 - Discard the packet
+
+ 11 - Unsupported, and reserved for future use
+
+ Some options present in data packet (ODATA and RDATA) are strictly
+ associated with the packet content (PGM payload), OPT_FRAGMENT being
+ an example. These options must be preserved even when the data
+ packet that would normally contain them is not received, but its the
+ payload is recovered though the use of FEC. PGM specifies a
+ mechanism to accomplish this that uses the F (OP_ENCODED) and U
+ (OP_ENCODED_NULL) bits in the option common header. OP_ENCODED and
+ OP_ENCODED_NULL MUST be normally set to zero except when the option
+ is used in FEC packets to preserve original options. See Appendix A
+ for details.
+
+ There is a limit of 16 options per packet.
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 41]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ General Option Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U|Opt. Specific|
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Option Value ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+...+-+-+
+
+9.1. Option extension length - OPT_LENGTH
+
+ When option extensions are appended to the standard PGM header, the
+ extensions MUST be preceded by an option extension length field
+ specifying the total length of all option extensions.
+
+ In addition, the presence of the options MUST be encoded in the
+ Options field of the standard PGM header before the Checksum is
+ computed.
+
+ All network-significant options MUST be appended before any
+ exclusively receiver-significant options.
+
+ To provide an indication of the end of option extensions, OPT_END
+ (0x80) MUST be set in the Option Type field of the trailing option
+ extension.
+
+9.1.1. OPT_LENGTH - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Option Type | Option Length | Total length of all options |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x00
+
+ Option Length = 4 octets
+
+ Total length of all options
+
+ The total length in octets of all option extensions including
+ OPT_LENGTH.
+
+ OPT_LENGTH is NOT network-significant.
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 42]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.2. Fragmentation Option - OPT_FRAGMENT
+
+ Fragmentation allows transport-layer entities at a source to break up
+ application protocol data units (APDUs) into multiple PGM data
+ packets (TPDUs) to conform with the MTU supported by the network
+ layer. The fragmentation option MAY be applied to ODATA and RDATA
+ packets only.
+
+ Architecturally, the accumulation of TSDUs into APDUs is applied to
+ TPDUs that have already been received, duplicate eliminated, and
+ contiguously sequenced by the receiver. Thus APDUs MAY be
+ reassembled across increments of the transmit window.
+
+9.2.1. OPT_FRAGMENT - Packet Extension Contents
+
+ OPT_FRAG_OFF the offset of the fragment from the beginning of the
+ APDU
+
+ OPT_FRAG_LEN the total length of the original APDU
+
+9.2.2. OPT_FRAGMENT - Procedures - Sources
+
+ A source fragments APDUs into a contiguous series of fragments no
+ larger than the MTU supported by the network layer. A source
+ sequentially and uniquely assigns OD_SQNs to these fragments in the
+ order in which they occur in the APDU. A source then sets
+ OPT_FRAG_OFF to the value of the offset of the fragment in the
+ original APDU (where the first byte of the APDU is at offset 0, and
+ OPT_FRAG_OFF numbers the first byte in the fragment), and set
+ OPT_FRAG_LEN to the value of the total length of the original APDU.
+
+9.2.3. OPT_FRAGMENT - Procedures - Receivers
+
+ Receivers detect and accumulate fragmented packets until they have
+ received an entire contiguous sequence of packets comprising an APDU.
+ This sequence begins with the fragment bearing OPT_FRAG_OFF of 0, and
+ terminates with the fragment whose length added to its OPT_FRAG_OFF
+ is OPT_FRAG_LEN.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 43]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.2.4. OPT_FRAGMENT - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | First Sequence Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Offset |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x01
+
+ Option Length = 12 octets
+
+ First Sequence Number
+
+ Sequence Number of the PGM DATA/RDATA packet containing the first
+ fragment of the APDU.
+
+ Offset
+
+ The byte offset of the fragment from the beginning of the APDU
+ (OPT_FRAG_OFF).
+
+ Length
+
+ The total length of the original APDU (OPT_FRAG_LEN).
+
+ OPT_FRAGMENT is NOT network-significant.
+
+9.3. NAK List Option - OPT_NAK_LIST
+
+ The NAK List option MAY be used in conjunction with NAKs to allow
+ receivers to request transmission for more than one sequence number
+ with a single NAK packet. The option is limited to 62 listed NAK
+ entries. The NAK list MUST be unique and duplicate free. It MUST be
+ ordered, and MUST consist of either a list of selective or a list of
+ parity NAKs. In general, network elements, sources and receivers
+ must process a NAK list as if they had received individual NAKs for
+ each sequence number in the list. The procedures for each are
+ outlined in detail earlier in this document. Clarifications and
+ differences are detailed here.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 44]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.3.1. OPT_NAK_LIST - Packet Extensions Contents
+
+ A list of sequence numbers for which retransmission is requested.
+
+9.3.2. OPT_NAK_LIST - Procedures - Receivers
+
+ Receivers MAY append the NAK List option to a NAK to indicate that
+ they wish retransmission of a number of RDATA.
+
+ Receivers SHOULD proceed to back off NAK transmission in a manner
+ consistent with the procedures outlined for single sequence number
+ NAKs. Note that the repair of each separate sequence number will be
+ completed upon receipt of a separate RDATA packet.
+
+ Reception of an NCF or multicast NAK containing the NAK List option
+ suspends generation of NAKs for all sequence numbers within the NAK
+ list, as well as the sequence number within the NAK header.
+
+9.3.3. OPT_NAK_LIST - Procedures - Network Elements
+
+ Network elements MUST immediately respond to a NAK with an identical
+ NCF containing the same NAK list as the NAK itself.
+
+ Network elements MUST forward a NAK containing a NAK List option if
+ any one sequence number specified by the NAK (including that in the
+ main NAK header) is not currently outstanding. That is, it MUST
+ forward the NAK, if any one sequence number does not have an
+ elimination timer running for it. The NAK must be forwarded intact.
+
+ Network elements MUST eliminate a NAK containing the NAK list option
+ only if all sequence numbers specified by the NAK (including that in
+ the main NAK header) are outstanding. That is, they are all running
+ an elimination timer.
+
+ Upon receipt of an unsolicited NCF containing the NAK list option, a
+ network element MUST anticipate data for every sequence number
+ specified by the NAK as if it had received an NCF for every sequence
+ number specified by the NAK.
+
+9.3.4. OPT_NAK_LIST - Procedures - Sources
+
+ A source MUST immediately respond to a NAK with an identical NCF
+ containing the same NAK list as the NAK itself.
+
+ It MUST then multicast RDATA (while respecting TXW_MAX_RTE) for every
+ requested sequence number.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 45]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.3.5. OPT_NAK_LIST - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Requested Sequence Number 1 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ..... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Requested Sequence Number N |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x02
+
+ Option Length = 4 + (4 * number of SQNs) octets
+
+ Requested Sequence Number
+
+ A list of up to 62 additional sequence numbers to which the NAK
+ applies.
+
+ OPT_NAK_LIST is network-significant.
+
+9.4. Late Joining Option - OPT_JOIN
+
+ Late joining allows a source to bound the amount of repair history
+ receivers may request when they initially join a particular transport
+ session.
+
+ This option indicates that receivers that join a transport session in
+ progress MAY request repair of all data as far back as the given
+ minimum sequence number from the time they join the transport
+ session. The default is for receivers to receive data only from the
+ first packet they receive and onward.
+
+9.4.1. OPT_JOIN - Packet Extensions Contents
+
+ OPT_JOIN_MIN the minimum sequence number for repair
+
+9.4.2. OPT_JOIN - Procedures - Receivers
+
+ If a PGM packet (ODATA, RDATA, or SPM) bears OPT_JOIN, a receiver MAY
+ initialize the trailing edge of the receive window (RXW_TRAIL_INIT)
+ to the given Minimum Sequence Number and proceeds with normal data
+ reception.
+
+
+
+
+Speakman, et. al. Experimental [Page 46]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.4.3. OPT_JOIN - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Minimum Sequence Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+
+ Option Type = 0x03
+
+ Option Length = 8 octets
+
+ Minimum Sequence Number
+
+ The minimum sequence number defining the initial trailing edge of
+ the receive window for a late joining receiver.
+
+ OPT_JOIN is NOT network-significant.
+
+9.5. Redirect Option - OPT_REDIRECT
+
+ Redirection MAY be used by a designated local repairer (DLR) to
+ advertise its own address as an alternative to the original source,
+ for requesting repairs.
+
+ These procedures allow a PGM Network Element to use a DLR that is one
+ PGM hop from it either upstream or downstream in the multicast
+ distribution tree. The former are referred to as upstream DLRs. The
+ latter are referred to as off-tree DLRs. Off-Tree because even
+ though they are downstream of the point of loss, they might not lie
+ on the subtree affected by the loss.
+
+ A DLR MUST receive any PGM sessions for which it wishes to provide
+ retransmissions. A DLR SHOULD respond to NCFs or POLLs sourced by
+ its PGM parent with a redirecting POLR response packet containing an
+ OPT_REDIRECT which provides its own network layer address.
+ Recipients of redirecting POLRs MAY then direct NAKs for subsequent
+ ODATA sequence numbers to the DLR rather than to the original source.
+ In addition, DLRs that receive redirected NAKs for which they have
+ RDATA MUST send a NULL NAK to provide flow control to the original
+ source without also provoking a repair from that source.
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 47]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.5.1. OPT_REDIRECT - Packet Extensions Contents
+
+ OPT_REDIR_NLA the DLR's own unicast network-layer address to which
+ recipients of the redirecting POLR MAY direct
+ subsequent NAKs for the corresponding TSI.
+
+9.5.2. OPT_REDIRECT - Procedures - DLRs
+
+ A DLR MUST receive any PGM sessions for which it wishes to provide a
+ source of repairs. In addition to acting as an ordinary PGM
+ receiver, a DLR MAY then respond to NCFs or relevant POLLs sourced by
+ parent network elements (or even by the source itself) by sending a
+ POLR containing an OPT_REDIRECT providing its own network-layer
+ address.
+
+ If a DLR can provide FEC repairs it MUST denote this by setting
+ OPT_PARITY in the PGM header of its POLR response.
+
+9.5.2.1. Upstream DLRs
+
+ If the NCF completes NAK transmission initiated by the DLR itself,
+ the DLR MUST NOT send a redirecting POLR.
+
+ When a DLR receives an NCF from its upstream PGM parent, it SHOULD
+ send a redirecting POLR, multicast to the group. The DLR SHOULD
+ record that it is acting as an upstream DLR for the said session.
+ Note that this POLR MUST have both the data source's source address
+ and the router alert option in its network header.
+
+ An upstream DLR MUST act as an ordinary PGM source in responding to
+ any NAK it receives (i.e., directed to it). That is, it SHOULD
+ respond first with a normal NCF and then RDATA as usual. In
+ addition, an upstream DLR that receives redirected NAKs for which it
+ has RDATA MUST send a NULL NAK to provide flow control to the
+ original source. If it cannot provide the RDATA it forwards the NAK
+ to the upstream PGM neighbor as usual.
+
+ Nota Bene: In order to propagate on exactly the same distribution
+ tree as ODATA, RDATA and POLR packets transmitted by DLRs MUST
+ bear the ODATA source's NLA as the network-header source address,
+ not the DLR's NLA as might be expected.
+
+
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 48]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.5.2.2. Off-Tree DLRs
+
+ A DLR that receives a POLL with sub-type PGM_POLL_DLR MUST respond
+ with a unicast redirecting POLR if it provides the appropriate
+ service. The DLR SHOULD respond using the rules outlined for polling
+ in Appendix D of this text. If the DLR responds, it SHOULD record
+ that it is acting as an off-tree DLR for the said session.
+
+ An off-tree DLR acts in a special way in responding to any NAK it
+ receives (i.e., directed to it). It MUST respond to a NAK directed
+ to it from its parent by unicasting an NCF and RDATA to its parent.
+ The parent will then forward the RDATA down the distribution tree.
+ The DLR uses its own and the parent's NLA addresses in the network
+ header for the source and destination respectively. The unicast NCF
+ and RDATA packets SHOULD not have the router alert option. In all
+ other ways the RDATA header should be "as if" the packet had come
+ from the source.
+
+ Again, an off-tree DLR that receives redirected NAKs for which it has
+ RDATA MUST originate a NULL NAK to provide flow control to the
+ original source. It MUST originate the NULL NAK before originating
+ the RDATA. This must be done to reduce the state held in the network
+ element.
+
+ If it cannot provide the RDATA for a given NAK, an off-tree DLR
+ SHOULD confirm the NAK with a unicast NCF as normal, then immediately
+ send a NAK for the said data packet back to its parent.
+
+9.5.2.3. Simultaneous Upstream and Off-Tree DLR operation
+
+ Note that it is possible for a DLR to provide service to its parent
+ and to downstream network elements simultaneously. A downstream loss
+ coupled with a loss for the same data on some other part of the
+ distribution tree served by its parent could cause this. In this
+ case it may provide both upstream and off-tree functionality
+ simultaneously.
+
+ Note that a DLR differentiates between NAKs from an NE downstream or
+ from its parent by comparing the network-header source address of the
+ NAK with it's upstream PGM parent's NLA. The DLR knows the parent's
+ NLA from the session's SPM messages.
+
+
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 49]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.5.3. OPT_REDIRECT - Procedures - Network Elements
+
+9.5.3.1. Discovering DLRs
+
+ When a PGM router receives notification of a loss via a NAK, it
+ SHOULD first try to use a known DLR to recover the loss. If such a
+ DLR is not known it SHOULD initiate DLR discovery. DLR discovery may
+ occur in two ways. If there are upstream DLRs, the NAK transmitted
+ by this router to its PGM parent will trigger their discovery, via a
+ redirecting POLR. Also, a network element SHOULD initiate a search
+ for off-tree DLRs using the PGM polling mechanism, and the sub-type
+ PGM_POLL_DLR.
+
+ If a DLR can provide FEC repairs it will denote this by setting
+ OPT_PARITY in the PGM header of its POLR response. A network element
+ SHOULD only direct parity NAKs to a DLR that can provide FEC repairs.
+
+9.5.3.2. Redirected Repair
+
+ When it can, a network element SHOULD use upstream DLRs.
+
+ Upon receiving a redirecting POLR, network elements SHOULD record the
+ redirecting information for the TSI, and SHOULD redirect subsequent
+ NAKs for the same TSI to the network address provided in the
+ redirecting POLR rather than to the PGM neighbor known via the SPMs.
+ Note, however, that a redirecting POLR is NOT regarded as matching
+ the NAK that provoked it, so it does not complete the transmission of
+ that NAK. Only a normal matching NCF can complete the transmission
+ of a NAK.
+
+ For subsequent NAKs, if the network element has recorded redirection
+ information for the corresponding TSI, it MAY change the destination
+ network address of those NAKs and attempt to transmit them to the
+ DLR. No NAK for a specific SQN SHOULD be sent to an off-tree DLR if
+ a NAK for the SQN has been seen on the interface associated with the
+ DLR. Instead the NAK SHOULD be forwarded upstream. Subsequent NAKs
+ for different SQNs MAY be forwarded to the said DLR (again assuming
+ no NAK for them has been seen on the interface to the DLR).
+
+ If a corresponding NCF is not received from the DLR within
+ NAK_RPT_IVL, the network element MUST discard the redirecting
+ information for the TSI and re-attempt to forward the NAK towards the
+ PGM upstream neighbor.
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 50]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ If a NAK is received from the DLR for a requested SQN, the network
+ element MUST discard the redirecting information for the SQN and re-
+ attempt to forward the NAK towards the PGM upstream neighbor. The
+ network element MAY still direct NAKs for different SQNs to the DLR.
+
+ RDATA and NCFs from upstream DLRs will flow down the distribution
+ tree. However, RDATA and NCFs from off-tree DLRs will be unicast to
+ the network element. The network element will terminate the NCF, but
+ MUST put the source's NLA and the group address into the network
+ header and MUST add router alert before forwarding the RDATA packet
+ to the distribution subtree.
+
+ NULL NAKs from an off-tree DLR for an RDATA packet requested from
+ that off-tree DLR MUST always be forwarded upstream. The network
+ element can assume that these will arrive before the matching RDATA.
+ Other NULL NAKs are forwarded only if matching repair state has not
+ already been created. Network elements MUST NOT confirm or retry
+ NULL NAKs and they MUST NOT add the receiving interface to the repair
+ state. If a NULL NAK is used to initially create repair state, this
+ fact must be recorded so that any subsequent non-NULL NAK will not be
+ eliminated, but rather will be forwarded to provoke an actual repair.
+ State created by a NULL NAK exists only for NAK_ELIM_IVL.
+
+9.5.4. OPT_REDIRECT - Procedures - Receivers
+
+ These procedures are intended to be applied in instances where a
+ receiver's first hop router on the reverse path to the source is not
+ a PGM Network Element. So, receivers MUST ignore a redirecting POLR
+ from a DLR on the same IP subnet that the receiver resides on, since
+ this is likely to suffer identical loss to the receiver and so be
+ useless. Therefore, these procedures are entirely OPTIONAL. A
+ receiver MAY choose to ignore all redirecting POLRs since in cases
+ where its first hop router on the reverse path is PGM capable, it
+ would ignore them anyway. Also, note that receivers will never learn
+ of off-tree DLRs.
+
+ Upon receiving a redirecting POLR, receivers SHOULD record the
+ redirecting information for the TSI, and MAY redirect subsequent NAKs
+ for the same TSI to the network address provided in the redirecting
+ POLR rather than to the PGM neighbor for the corresponding ODATA for
+ which the receiver is requesting repair. Note, however, that a
+ redirecting POLR is NOT regarded as matching the NAK that provoked
+ it, so it does not complete the transmission of that NAK. Only a
+ normal matching NCF can complete the transmission of a NAK.
+
+ For subsequent NAKs, if the receiver has recorded redirection
+ information for the corresponding TSI, it MAY change the destination
+ network address of those NAKs and attempt to transmit them to the
+
+
+
+Speakman, et. al. Experimental [Page 51]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ DLR. If a corresponding NCF is not received within NAK_RPT_IVL, the
+ receiver MUST discard the redirecting information for the TSI and
+ re-attempt to forward the NAK to the PGM neighbor for the original
+ source of the missing ODATA.
+
+9.5.5. OPT_REDIRECT - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NLA AFI | Reserved |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | DLR's NLA ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+
+
+ Option Type = 0x07
+
+ Option Length = 4 + NLA length
+
+ DLR's NLA
+
+ The DLR's own unicast network address to which recipients of the
+ redirecting POLR may direct subsequent NAKs.
+
+ OPT_REDIRECT is network-significant.
+
+9.6. OPT_SYN - Synchronization Option
+
+ The SYN option indicates the starting data packet for a session. It
+ must only appear in ODATA or RDATA packets.
+
+ The SYN option MAY be used to provide a useful abstraction to
+ applications that can simplify application design by providing stream
+ start notification. It MAY also be used to let a late joiner to a
+ session know that it is indeed late (i.e. it would not see the SYN
+ option).
+
+9.6.1. OPT_SYN - Procedures - Receivers
+
+ Procedures for receivers are implementation dependent. A receiver
+ MAY use the SYN to provide its applications with abstractions of the
+ data stream.
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 52]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.6.2. OPT_SYN - Procedures - Sources
+
+ Sources MAY include OPT_SYN in the first data for a session. That
+ is, they MAY include the option in:
+
+ the first ODATA sent on a session by a PGM source
+
+ any RDATA sent as a result of loss of this ODATA packet
+
+ all FEC packets for the first transmission group; in this case it
+ is interpreted as the first packet having the SYN
+
+9.6.3. OPT_SYN - Procedures - DLRs
+
+ In an identical manner to sources, DLRs MUST provide OPT_SYN in
+ any retransmitted data that is at the start of a session.
+
+9.6.4. OPT_SYN - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x0D
+
+ Option Length = 4
+
+ OPT_SYN is NOT network-significant.
+
+9.7. OPT_FIN - Session Finish Option
+
+ This FIN option indicates the last data packet for a session and
+ an orderly close down.
+
+ The FIN option MAY be used to provide an abstraction to
+ applications that can simplify application design by providing
+ stream end notification.
+
+ This option MAY be present in the last data packet or transmission
+ group for a session. The FIN PGM option MUST appear in every SPM
+ sent after the last ODATA for a session. The SPM_LEAD sequence
+ number in an SPM with the FIN option indicates the last known data
+ successfully transmitted for the session.
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 53]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.7.1. OPT_FIN - Procedures - Receivers
+
+ A receiver SHOULD use receipt of a FIN to let it know that it can
+ tear down its data structures for the said session once a suitable
+ time period has expired (TXW_SECS). It MAY still try to solicit
+ retransmissions within the existing transmit window.
+
+ Other than this, procedures for receivers are implementation
+ dependent. A receiver MAY use the FIN to provide its applications
+ with abstractions of the data stream and to inform its
+ applications that the session is ending.
+
+ 9.7.2. OPT_FIN - Procedures - Sources
+
+ Sources MUST include OPT_FIN in every SPM sent after it has been
+ determined that the application has closed gracefully. If a
+ source is aware at the time of transmission that it is ending a
+ session the source MAY include OPT_FIN in,
+
+ the last ODATA
+
+ any associated RDATAs for the last data
+
+ FEC packets for the last transmission group; in this case it is
+ interpreted as the last packet having the FIN
+
+ When a source detects that it needs to send an OPT_FIN it SHOULD
+ immediately send it. This is done either by appending it to the last
+ data packet or transmission group or by immediately sending an SPM
+ and resetting the SPM heartbeat timer (i.e. it does not wait for a
+ timer to expire before sending the SPM). After sending an OPT_FIN,
+ the session SHOULD not close and stop sending SPMs until after a time
+ period equal to TXW_SECS.
+
+9.7.3. OPT_FIN - Procedures - DLRs
+
+ In an identical manner to sources, DLRs MUST provide OPT_FIN in any
+ retransmitted data that is at the end of a session.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 54]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.7.4. OPT_FIN - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x0E
+
+ Option Length = 4
+
+ OPT_FIN is NOT network-significant.
+
+9.8. OPT_RST - Session Reset Option
+
+ The RST option MAY appear in every SPM sent after an unrecoverable
+ error is identified by the source. This acts to notify the receivers
+ that the session is being aborted. This option MAY appear only in
+ SPMs. The SPM_LEAD sequence number in an SPM with the RST option
+ indicates the last known data successfully transmitted for the
+ session.
+
+9.8.1. OPT_RST - Procedures - Receivers
+
+ Receivers SHOULD treat the reception of OPT_RST in an SPM as an abort
+ of the session.
+
+ A receiver that receives an SPM with an OPT_RST with the N bit set
+ SHOULD not send any more NAKs for the said session towards the
+ source. If the N bit (see 9.8.5) is not set, the receiver MAY
+ continue to try to solicit retransmit data within the current
+ transmit window.
+
+9.8.2. OPT_RST - Procedures - Sources
+
+ Sources SHOULD include OPT_RST in every SPM sent after it has been
+ determined that an unrecoverable error condition has occurred. The N
+ bit of the OPT_RST SHOULD only be sent if the source has determined
+ that it cannot process NAKs for the session. The cause of the
+ OPT_RST is set to an implementation specific value. If the error
+ code is unknown, then the value of 0x00 is used. When a source
+ detects that it needs to send an OPT_RST it SHOULD immediately send
+ it. This is done by immediately sending an SPM and resetting the SPM
+ heartbeat timer (i.e. it does not wait for a timer to expire before
+ sending the SPM). After sending an OPT_RST, the session SHOULD not
+ close and stop sending SPMs until after a time period equal to
+ TXW_SECS.
+
+
+
+Speakman, et. al. Experimental [Page 55]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+9.8.3. OPT_RST - Procedures - DLRs
+
+ None.
+
+9.8.4. OPT_RST - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U|N|Error Code |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x0F
+
+ Option Length = 4
+
+ N bit
+
+ The N bit is set to 1 to indicate that NAKs for previous ODATA
+ will go unanswered from the source. The application will tell the
+ source to turn this bit on or off.
+
+ Error Code
+
+ The 6 bit error code field is used to forward an error code down
+ to the receivers from the source.
+
+ The value of 0x00 indicates an unknown reset reason. Any other
+ value indicates the application purposely aborted and gave a
+ reason (the error code value) that may have meaning to the end
+ receiver application. These values are entirely application
+ dependent.
+
+ OPT_RST is NOT network-significant.
+
+10. Security Considerations
+
+ In addition to the usual problems of end-to-end authentication, PGM
+ is vulnerable to a number of security risks that are specific to the
+ mechanisms it uses to establish source path state, to establish
+ repair state, to forward NAKs, to identify DLRs, and to distribute
+ repairs. These mechanisms expose PGM network elements themselves to
+ security risks since network elements not only switch but also
+ interpret SPMs, NAKs, NCFs, and RDATA, all of which may legitimately
+ be transmitted by PGM sources, receivers, and DLRs. Short of full
+ authentication of all neighboring sources, receivers, DLRs, and
+ network elements, the protocol is not impervious to abuse.
+
+
+
+
+Speakman, et. al. Experimental [Page 56]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ So putting aside the problems of rogue PGM network elements for the
+ moment, there are enough potential security risks to network elements
+ associated with sources, receivers, and DLRs alone. These risks
+ include denial of service through the exhausting of both CPU
+ bandwidth and memory, as well as loss of (repair) data connectivity
+ through the muddling of repair state.
+
+ False SPMs may cause PGM network elements to mis-direct NAKs intended
+ for the legitimate source with the result that the requested RDATA
+ would not be forthcoming.
+
+ False NAKs may cause PGM network elements to establish spurious
+ repair state that will expire only upon time-out and could lead to
+ memory exhaustion in the meantime.
+
+ False NCFs may cause PGM network elements to suspend NAK forwarding
+ prematurely (or to mis-direct NAKs in the case of redirecting POLRs)
+ resulting eventually in loss of RDATA.
+
+ False RDATA may cause PGM network elements to tear down legitimate
+ repair state resulting eventually in loss of legitimate RDATA.
+
+ The development of precautions for network elements to protect
+ themselves against incidental or unsophisticated versions of these
+ attacks is work outside of this spec and includes:
+
+ Damping of jitter in the value of either the network-header source
+ address of SPMs or the path NLA in SPMs. While the network-header
+ source address is expected to change seldom, the path NLA is
+ expected to change occasionally as a consequence of changes in
+ underlying multicast routing information.
+
+ The extension of NAK shedding procedures to control the volume, not
+ just the rate, of confirmed NAKs. In either case, these procedures
+ assist network elements in surviving NAK attacks at the expense of
+ maintaining service. More efficiently, network elements may use the
+ knowledge of TSIs and their associated transmit windows gleaned from
+ SPMs to control the proliferation of repair state.
+
+ A three-way handshake between network elements and DLRs that would
+ permit a network element to ascertain with greater confidence that an
+ alleged DLR is identified by the alleged network-header source
+ address, and is PGM conversant.
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 57]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+11. Appendix A - Forward Error Correction
+
+11.1. Introduction
+
+ The following procedures incorporate packet-level Reed Solomon
+ Erasure correcting techniques as described in [11] and [12] into PGM.
+ This approach to Forward Error Correction (FEC) is based upon the
+ computation of h parity packets from k data packets for a total of n
+ packets such that a receiver can reconstruct the k data packets out
+ of any k of the n packets. The original k data packets are referred
+ to as the Transmission Group, and the total n packets as the FEC
+ Block.
+
+ These procedures permit any combination of pro-active FEC or on-
+ demand FEC with conventional ARQ (selective retransmission) within a
+ given TSI to provide any flavor of layered or integrated FEC. The
+ two approaches can be used by the same or different receivers in a
+ single transport session without conflict. Once provided by a
+ source, the actual use of FEC or selective retransmission for loss
+ recovery in the session is entirely at the discretion of the
+ receivers. Note however that receivers SHOULD NOT ask for selective
+ retransmissions when FEC is available, nevertheless sources MUST
+ provide selective retransmissions in response to selective NAKs from
+ the leading partial transmission group (i.e. the most recent
+ transmission group, which is not yet full). For any group that is
+ full, the source SHOULD provide FEC on demand in response to a
+ selective NAK.
+
+ Pro-active FEC refers to the technique of computing parity packets at
+ transmission time and transmitting them as a matter of course
+ following the data packets. Pro-active FEC is RECOMMENDED for
+ providing loss recovery over simplex or asymmetric multicast channels
+ over which returning repair requests is either impossible or costly.
+ It provides increased reliability at the expense of bandwidth.
+
+ On-demand FEC refers to the technique of computing parity packets at
+ repair time and transmitting them only upon demand (i.e., receiver-
+ based loss detection and repair request). On-demand FEC is
+ RECOMMENDED for providing loss recovery of uncorrelated loss in very
+ large receiver populations in which the probability of any single
+ packet being lost is substantial. It provides equivalent reliability
+ to selective NAKs (ARQ) at no more and typically less expense of
+ bandwidth.
+
+ Selective NAKs are NAKs that request the retransmission of specific
+ packets by sequence number corresponding to the sequence number of
+ any data packets detected to be missing from the expected sequence
+ (conventional ARQ). Selective NAKs can be used for recovering losses
+
+
+
+Speakman, et. al. Experimental [Page 58]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ occurring in leading partial transmission groups, i.e. in the most
+ recent transmission group, which is not yet full. The RECOMMENDED
+ way of handling partial transmission groups, however, is for the data
+ source to use variable-size transmission groups (see below).
+
+ Parity NAKs are NAKs that request the transmission of a specific
+ number of parity packets by count corresponding to the count of the
+ number of data packets detected to be missing from a group of k data
+ packets (on-demand FEC).
+
+ The objective of these procedures is to incorporate these FEC
+ techniques into PGM so that:
+
+ sources MAY provide parity packets either pro-actively or on-
+ demand, interchangeably within the same TSI,
+
+ receivers MAY use either selective or parity NAKs interchangeably
+ within the same TSI (however, in a session where on-demand parity
+ is available receivers SHOULD only use parity NAKs).
+
+ network elements maintain repair state based on either selective
+ or parity NAKs in the same data structure, altering only search,
+ RDATA constraint, and deletion algorithms in either case,
+
+ and only OPTION additions to the basic packet formats are
+ REQUIRED.
+
+11.2. Overview
+
+ Advertising FEC parameters in the transport session
+
+ Sources add OPT_PARITY_PRM to SPMs to provide session-specific
+ parameters such as the number of packets (TGSIZE == k) in a
+ transmission group. This option lets receivers know how many packets
+ there are in a transmission group, and it lets network elements sort
+ repair state by transmission group number. This option includes an
+ indication of whether pro-active and/or on-demand parity is available
+ from the source.
+
+ Distinguishing parity packets from data packets
+
+ Sources send pro-active parity packets as ODATA (NEs do not forward
+ RDATA unless a repair state is present) and on-demand parity packets
+ as RDATA. A source MUST add OPT_PARITY to the ODATA/RDATA packet
+ header of parity packets to permit network elements and receivers to
+ distinguish them from data packets.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 59]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Data and parity packet numbering
+
+ Parity packets MUST be calculated over a fixed number k of data
+ packets known as the Transmission Group. Grouping of packets into
+ transmission groups effectively partitions a packet sequence number
+ into a high-order portion (TG_SQN) specifying the transmission group
+ (TG), and a low-order portion (PKT_SQN) specifying the packet number
+ (PKT-NUM in the range 0 through k-1) within that group. From an
+ implementation point of view, it's handy if k, the TG size, is a
+ power of 2. If so, then TG_SQN and PKT_SQN can be mapped side-by-
+ side into the 32 bit SQN. log2(TGSIZE) is then the size in bits of
+ PKT_SQN.
+
+ This mapping does not reduce the effective sequence number space
+ since parity packets marked with OPT_PARITY allow the sequence space
+ (PKT_SQN) to be completely reused in order to number the h parity
+ packets, as long as h is not greater than k.
+
+ In the case where h is greater than k, a source MUST add
+ OPT_PARITY_GRP to any parity packet numbered j greater than k-1,
+ specifying the number m of the group of k parity packets to which the
+ packet belongs, where m is just the quotient from the integer
+ division of j by k. Correspondingly, PKT-NUM for such parity packets
+ is just j modulo k. In other words, when a source needs to generate
+ more parity packets than there were original data packets (perhaps
+ because of a particularly lossy line such that a receiver lost not
+ only the original data but some of the parity RDATA as well), use the
+ OPT_PARITY_GRP option in order to number and identify the
+ transmission group of the extra packets that would exceed the normal
+ sequential number space.
+
+ Note that parity NAKs (and consequently their corresponding parity
+ NCFs) MUST also contain the OPT_PARITY flag in the options field of
+ the fixed header, and that in these packets, PKT_SQN MUST contain
+ PKT_CNT, the number of missing packets, rather than PKT_NUM, the SQN
+ of a specific missing packet. More on all this later.
+
+ Variable Transmission Group Size
+
+ The transmission group size advertised in the OPT_PARITY_PRM option
+ on SPMs MUST be a power of 2 and constant for the duration of the
+ session. However, the actual transmission group size used MAY not be
+ constant for the duration of the session, and MAY not be a power of
+ 2. When a TG size different from the one advertised in
+ OPT_PARITY_PRM is used, the TG size advertised in OPT_PARITY_PRM MUST
+ be interpreted as specifying the maximum effective size of the TG.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 60]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ When the actual TG size is not a power of 2 or is smaller than the
+ max TG size, there will be sparse utilization of the sequence number
+ space since some of the sequence numbers that would have been
+ consumed in numbering a maximum sized TG will not be assigned to
+ packets in the smaller TG. The start of the next transmission group
+ will always begin on the boundary of the maximum TG size as though
+ each of the sequence numbers had been utilized.
+
+ When the source decides to use a smaller group size than that
+ advertised in OPT_PARITY_PRM, it appends OPT_CURR_TGSIZE to the last
+ data packet (ODATA) in the truncated transmission group. This lets
+ the receiver know that it should not expect any more packets in this
+ transmission group, and that it may start requesting repairs for any
+ missing packets. If the last data packet itself went missing, the
+ receiver will detect the end of the group when it receives a parity
+ packet for the group, an SPM with SPM_LEAD equal to OD_SQN of the
+ last data packet, or the first packet of the next group, whichever
+ comes first. In addition, any parity packet from this TG will also
+ carry the OPT_CURR_TGSIZE option as will any SPM sent with SPM_LEAD
+ equal to OD_SQN of the last data packet.
+
+ Variable TSDU length
+
+ If a non constant TSDU length is used within a given transmission
+ group, the size of parity packets in the corresponding FEC block MUST
+ be equal to the size of the largest original data packet in the
+ block. Parity packets MUST be computed by padding the original
+ packets with zeros up to the size of the largest data packet. Note
+ that original data packets are transmitted without padding.
+
+ Receivers using a combination of original packets and FEC packets to
+ rebuild missing packets MUST pad the original packets in the same way
+ as the source does. The receiver MUST then feed the padded original
+ packets plus the parity packets to the FEC decoder. The decoder
+ produces the original packets padded with zeros up to the size of the
+ largest original packet in the group. In order for the receiver to
+ eliminate the padding on the reconstructed data packets, the original
+ size of the packet MUST be known, and this is accomplished as
+ follows:
+
+ The source, along with the packet payloads, encodes the TSDU
+ length and appends the 2-byte encoded length to the padded FEC
+ packets.
+
+ Receivers pad the original packets that they received to the
+ largest original packet size and then append the TSDU length to
+ the padded packets. They then pass them and the FEC packets to
+ the FEC decoder.
+
+
+
+Speakman, et. al. Experimental [Page 61]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ The decoder produces padded original packets with their original
+ TSDU length appended. Receivers MUST now use this length to get
+ rid of the padding.
+
+ A source that transmits variable size packets MUST take into account
+ the fact that FEC packets will have a size equal to the maximum size
+ of the original packets plus the size of the length field (2 bytes).
+
+ If a fixed packet size is used within a transmission group, the
+ encoded length is not appended to the parity packets. The presence
+ of the fixed header option flag OPT_VAR_PKTLEN in parity packets
+ allows receivers to distinguish between transmission groups with
+ variable sized packets and fixed-size ones, and behave accordingly.
+
+ Payload-specific options
+
+ Some options present in data packet (ODATA and RDATA) are strictly
+ associated with the packet content (PGM payload), OPT_FRAGMENT being
+ an example. These options must be preserved even when the data
+ packet that would normally contain them is not received, but its the
+ payload is recovered though the use of FEC.
+
+ To achieve this, PGM encodes the content of these options in special
+ options that are inserted in parity packets. Two flags present in
+ the the option common-header are used for this process: bit F
+ (OP_ENCODED) and bit U (OP_ENCODED_NULL).
+
+ Whenever at least one of the original packets of a TG contains a
+ payload-specific option of a given type, the source MUST include an
+ encoded version of that option type in all the parity packets it
+ transmits. The encoded option is computed by applying FEC encoding
+ to the whole option with the exception of the first three bytes of
+ the option common-header (E, Option Type, Option Length, OP_ENCODED
+ and OPX fields). The type, length and OPX of the encoded option are
+ the same as the type, length and OPX in the original options.
+ OP_ENCODED is set to 1 (all original option have OP_ENCODED = 0).
+
+ The encoding is performed using the same process that is used to
+ compute the payload of the parity packet. i.e. the FEC encoder is fed
+ with one copy of that option type for each original packet in the TG.
+ If one (or more) original packet of the TG does not contain that
+ option type, an all zeroes option is used for the encoding process.
+ To be able to distinguish this "dummy" option from valid options with
+ all-zeroes payload, OP_ENCODED_NULL is used. OP_ENCODED_NULL is set
+ to 0 in all the original options, but the value of 1 is used in the
+ encoding process if the option did not exist in the original packet.
+ On the receiver side, all option with OP_ENCODED_NULL equal to 1 are
+ discarded after decoding.
+
+
+
+Speakman, et. al. Experimental [Page 62]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ When a receiver recovers a missing packet using FEC repair packets,
+ it MUST also recover payload-specific options, if any. The presence
+ of these can be unequivocally detected through the presence of
+ encoded options in parity packets (encoded options have OP_ENCODED
+ set to 1). Receivers apply FEC-recovery to encoded options and
+ possibly original options, as they do to recover packet payloads.
+ The FEC decoding is applied to the whole option with the exception of
+ the first three bytes of the option common-header (E, Option Type,
+ Option Length, OP_ENCODED and OPX fields). Each decoded option is
+ associated with the relative payload, unless OP_ENCODED_NULL turns
+ out to be 1, in which case the decoded option is discarded.
+
+ The decoding MUST be performed using the 1st occurrence of a given
+ option type in original/parity packets. If one or more original
+ packets do not contain that option type, an option of the same type
+ with zero value must be used. This option MUST have OP_ENCODED_NULL
+ equal to 1.
+
+11.3. Packet Contents
+
+ This section just provides enough short-hand to make the Procedures
+ intelligible. For the full details of packet contents, please refer
+ to Packet Formats below.
+
+ OPT_PARITY indicated in pro-active (ODATA) and on-demand
+ (RDATA) parity packets to distinguish them from
+ data packets. This option is directly encoded in
+ the "Option" field of the fixed PGM header
+
+ OPT_VAR_PKTLEN MAY be present in pro-active (ODATA) and on-demand
+ (RDATA) parity packets to indicate that the
+ corresponding transmission group is composed of
+ variable size data packets. This option is
+ directly encoded in the "Option" field of the fixed
+ PGM header
+
+ OPT_PARITY_PRM appended by sources to SPMs to specify session-
+ specific parameters such as the transmission group
+ size and the availability of pro-active and/or on-
+ demand parity from the source
+
+ OPT_PARITY_GRP the number of the group (greater than 0) of h
+ parity packets to which the parity packet belongs
+ when more than k parity packets are provided by the
+ source
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 63]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ OPT_CURR_TGSIZE appended by sources to the last data packet and any
+ parity packets in a variable sized transmission
+ group to indicate to the receiver the actual size
+ of a transmission group. May also be appended to
+ certain SPMs
+
+11.3.1. Parity NAKs
+
+ NAK_TG_SQN the high-order portion of NAK_SQN specifying the
+ transmission group for which parity packets are
+ requested
+
+ NAK_PKT_CNT the low-order portion of NAK_SQN specifying the
+ number of missing data packets for which parity
+ packets are requested
+
+ Nota Bene: NAK_PKT_CNT (and NCF_PKT_CNT) are 0-based counters,
+ meaning that NAK_PKT_CNT = 0 means that 1 FEC RDATA is being
+ requested, and in general NAK_PKT_CNT = k - 1 means that k FEC
+ RDATA are being requested.
+
+11.3.2. Parity NCFs
+
+ NCF_TG_SQN the high-order portion of NCF_SQN specifying the
+ transmission group for which parity packets were
+ requested
+
+ NCF_PKT_CNT the low-order portion of NCF_SQN specifying the
+ number of missing data packets for which parity
+ packets were requested
+
+ Nota Bene: NCF_PKT_CNT (and NAK_PKT_CNT) are 0-based counters,
+ meaning that NAK_PKT_CNT = 0 means that 1 FEC RDATA is being
+ requested, and in general NAK_PKT_CNT = k - 1 means that k FEC
+ RDATA are being requested.
+
+11.3.3. On-demand Parity
+
+ RDATA_TG_SQN the high-order portion of RDATA_SQN specifying the
+ transmission group to which the parity packet
+ belongs
+
+ RDATA_PKT_SQN the low-order portion of RDATA_SQN specifying the
+ parity packet sequence number within the
+ transmission group
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 64]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+11.3.4. Pro-active Parity
+
+ ODATA_TG_SQN the high-order portion of ODATA_SQN specifying the
+ transmission group to which the parity packet
+ belongs
+
+ ODATA_PKT_SQN the low-order portion of ODATA_SQN specifying the
+ parity packet sequence number within the
+ transmission group
+
+11.4. Procedures - Sources
+
+ If a source elects to provide parity for a given transport session,
+ it MUST first provide the transmission group size PARITY_PRM_TGS in
+ the OPT_PARITY_PRM option of its SPMs. This becomes the maximum
+ effective transmission group size in the event that the source elects
+ to send smaller size transmission groups. If a source elects to
+ provide proactive parity for a given transport session, it MUST set
+ PARITY_PRM_PRO in the OPT_PARITY_PRM option of its SPMs. If a source
+ elects to provide on-demand parity for a given transport session, it
+ MUST set PARITY_PRM_OND in the OPT_PARITY_PRM option of its SPMs.
+
+ A source MUST send any pro-active parity packets for a given
+ transmission group only after it has first sent all of the
+ corresponding k data packets in that group. Pro-active parity
+ packets MUST be sent as ODATA with OPT_PARITY in the fixed header.
+
+ If a source elects to provide on-demand parity, it MUST respond to a
+ parity NAK for a transmission group with a parity NCF. The source
+ MUST complete the transmission of the k original data packets and the
+ proactive parity packets, possibly scheduled, before starting the
+ transmission of on-demand parity packets. Subsequently, the source
+ MUST send the number of parity packets requested by that parity NAK.
+ On-demand parity packets MUST be sent as RDATA with OPT_PARITY in the
+ fixed header. Previously transmitted pro-active parity packets
+ cannot be reused as on-demand parity packets, these MUST be computed
+ with new, previously unused, indexes.
+
+ In either case, the source MUST provide selective retransmissions
+ only in response to selective NAKs from the leading partial
+ transmission group. For any group that is full, the source SHOULD
+ provide FEC on demand in response to a selective retransmission
+ request.
+
+ In the absence of data to transmit, a source SHOULD prematurely
+ terminate the current transmission group by including OPT_CURR_TGSIZE
+ to the last data packet or to any proactive parity packets provided.
+
+
+
+
+Speakman, et. al. Experimental [Page 65]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ If the last data packet has already been transmitted and there is no
+ provision for sending proactive parity packets, an SPM with
+ OPT_CURR_TGSIZE SHOULD be sent.
+
+ A source consolidates requests for on-demand parity in the same
+ transmission group according to the following procedures. If the
+ number of pending (i.e., unsent) parity packets from a previous
+ request for on-demand parity packets is equal to or greater than
+ NAK_PKT_CNT in a subsequent NAK, that subsequent NAK MUST be
+ confirmed but MAY otherwise be ignored. If the number of pending
+ (i.e., unsent) parity packets from a previous request for on-demand
+ parity packets is less than NAK_PKT_CNT in a subsequent NAK, that
+ subsequent NAK MUST be confirmed but the source need only increase
+ the number of pending parity packets to NAK_PKT_CNT.
+
+ When a source provides parity packets relative to a transmission
+ group with variable sized packets, it MUST compute parity packets by
+ padding the smaller original packets with zeroes out to the size of
+ the largest of the original packets. The source MUST also append the
+ encoded TSDU lengths at the end of any padding or directly to the end
+ of the largest packet, and add the OPT_VAR_PKTLEN option as specified
+ in the overview description.
+
+ When a source provides variable sized transmission groups, it SHOULD
+ append the OPT_CURR_TGSIZE option to the last data packet in the
+ shortened group, and it MUST append the OPT_CURR_TGSIZE option to any
+ parity packets it sends within that group. In case the the last data
+ packet is sent before a determination has been made to shorten the
+ group and there is no provision for sending proactive parity packets,
+ an SPM with OPT_CURR_TGSIZE SHOULD be sent. The source MUST also add
+ OPT_CURR_TGSIZE to any SPM that it sends with SPM_LEAD equal to
+ OD_SQN of the last data packet.
+
+ A receiver MUST NAK for the entire number of packets missing based on
+ the maximum TG size, even if it already knows that the actual TG size
+ is smaller. The source MUST take this into account and compute the
+ number of packets effectively needed as the difference between
+ NAK_PKT_CNT and an offset computed as the difference between the max
+ TG size and the effective TG size.
+
+11.5. Procedures - Receivers
+
+ If a receiver elects to make use of parity packets for loss recovery,
+ it MUST first learn the transmission group size PARITY_PRM_TGS from
+ OPT_PARITY_PRM in the SPMs for the TSI. The transmission group size
+ is used by a receiver to determine the sequence number boundaries
+ between transmission groups.
+
+
+
+
+Speakman, et. al. Experimental [Page 66]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Thereafter, if PARITY_PRM_PRO is also set in the SPMs for the TSI, a
+ receiver SHOULD use any pro-active parity packets it receives for
+ loss recovery, and if PARITY_PRM_OND is also set in the SPMs for the
+ TSI, it MAY solicit on-demand parity packets upon loss detection. If
+ PARITY_PRM_OND is set, a receiver MUST NOT send selective NAKs,
+ except in partial transmission groups if the source does not use the
+ variable transmission-group size option. Parity packets are ODATA
+ (pro-active) or RDATA (on-demand) packets distinguished by OPT_PARITY
+ which lets receivers know that ODATA/RDATA_TG_SQN identifies the
+ group of PARITY_PRM_TGS packets to which the parity may be applied
+ for loss recovery in the corresponding transmission group, and that
+ ODATA/RDATA_PKT_SQN is being reused to number the parity packets
+ within that group. Receivers order parity packets and eliminate
+ duplicates within a transmission group based on ODATA/RDATA_PKT_SQN
+ and on OPT_PARITY_GRP if present.
+
+ To solicit on-demand parity packets, a receiver MUST send parity NAKs
+ upon loss detection. For the purposes of soliciting on-demand
+ parity, loss detection occurs at transmission group boundaries, i.e.
+ upon receipt of the last data packet in a transmission group, upon
+ receipt of any data packet in any subsequent transmission group, or
+ upon receipt of any parity packet in the current or a subsequent
+ transmission group.
+
+ A parity NAK is simply a NAK with OPT_PARITY and NAK_PKT_CNT set to
+ the count of the number of packets detected to be missing from the
+ transmission group specified by NAK_TG_SQN. Note that this
+ constrains the receiver to request no more parity packets than there
+ are data packets in the transmission group.
+
+ A receiver SHOULD bias the value of NAK_BO_IVL for parity NAKs
+ inversely proportional to NAK_PKT_CNT so that NAKs for larger losses
+ are likely to be scheduled ahead of NAKs for smaller losses in the
+ same receiver population.
+
+ A confirming NCF for a parity NAK is a parity NCF with NCF_PKT_CNT
+ equal to or greater than that specified by the parity NAK.
+
+ A receiver's NAK_RDATA_IVL timer is not cancelled until all requested
+ parity packets have been received.
+
+ In the absence of data (detected from SPMs bearing SPM_LEAD equal to
+ RXW_LEAD) on non-transmission-group boundaries, receivers MAY resort
+ to selective NAKs for any missing packets in that partial
+ transmission group.
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 67]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ When a receiver handles parity packets belonging to a transmission
+ group with variable sized packets, (detected from the presence of the
+ OPT_VAR_PKTLEN option in the parity packets), it MUST decode them as
+ specified in the overview description and use the decoded TSDU length
+ to get rid of the padding in the decoded packet.
+
+ If the source was using a variable sized transmission group via the
+ OPT_CURR_TGSIZE, the receiver might learn this before having
+ requested (and received) any retransmission. The above happens if it
+ sees OPT_CURR_TGSIZE in the last data packet of the TG, in any
+ proactive parity packet or in a SPM. If the receivers learns this
+ and determines that it has missed one or more packets in the
+ shortened transmission group, it MAY then NAK for them without
+ waiting for the start of the next transmission group. Otherwise it
+ will start NAKing at the start of the next transmission group.
+
+ In both cases, the receiver MUST NAK for the number of packets
+ missing assuming that the size of the transmission group is the
+ maximum effective transmission group. In other words, the receivers
+ cannot exploit the fact that it might already know that the
+ transmission group was smaller but MUST always NAK for the number of
+ packets it believes are missing, plus the number of packets required
+ to bring the total packets up to the maximum effective transmission
+ group size.
+
+ After the first parity packet has been delivered to the receiver, the
+ actual TG size is known to him, either because already known or
+ because discovered via OPT_CURR_TGSIZE contained in the parity
+ packet. Hence the receiver can decode the whole group as soon as the
+ minimum number of parity packets needed is received.
+
+11.6. Procedures - Network Elements
+
+ Pro-active parity packets (ODATA with OPT_PARITY) are switched by
+ network elements without transport-layer intervention.
+
+ On-demand parity packets (RDATA with OPT_PARITY) necessitate modified
+ request, confirmation and repair constraint procedures for network
+ elements. In the context of these procedures, repair state is
+ maintained per NAK_TSI and NAK_TG_SQN, and in addition to recording
+ the interfaces on which corresponding NAKs have been received,
+ records the largest value of NAK_PKT_CNT seen in corresponding NAKs
+ on each interface. This value is referred to as the known packet
+ count. The largest of the known packet counts recorded for any
+ interface in the repair state for the transmit group or carried by an
+ NCF is referred to as the largest known packet count.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 68]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Upon receipt of a parity NAK, a network element responds with the
+ corresponding parity NCF. The corresponding parity NCF is just an
+ NCF formed in the usual way (i.e., a multicast copy of the NAK with
+ the packet type changed), but with the addition of OPT_PARITY and
+ with NCF_PKT_CNT set to the larger of NAK_PKT_CNT and the known
+ packet count for the receiving interface. The network element then
+ creates repair state in the usual way with the following
+ modifications.
+
+ If repair state for the receiving interface does not exist, the
+ network element MUST create it and additionally record NAK_PKT_CNT
+ from the parity NAK as the known packet count for the receiving
+ interface.
+
+ If repair state for the receiving interface already exists, the
+ network element MUST eliminate the NAK only if NAK_ELIM_IVL has not
+ expired and NAK_PKT_CNT is equal to or less than the largest known
+ packet count. If NAK_PKT_CNT is greater than the known packet count
+ for the receiving interface, the network element MUST update the
+ latter with the larger NAK_PKT_CNT.
+
+ Upon either adding a new interface or updating the known packet count
+ for an existing interface, the network element MUST determine if
+ NAK_PKT_CNT is greater than the largest known packet count. If so or
+ if NAK_ELIM_IVL has expired, the network element MUST forward the
+ parity NAK in the usual way with a value of NAK_PKT_CNT equal to the
+ largest known packet count.
+
+ Upon receipt of an on-demand parity packet, a network element MUST
+ locate existing repair state for the corresponding RDATA_TSI and
+ RDATA_TG_SQN. If no such repair state exists, the network element
+ MUST discard the RDATA as usual.
+
+ If corresponding repair state exists, the largest known packet count
+ MUST be decremented by one, then the network element MUST forward the
+ RDATA on all interfaces in the existing repair state, and decrement
+ the known packet count by one for each. Any interfaces whose known
+ packet count is thereby reduced to zero MUST be deleted from the
+ repair state. If the number of interfaces is thereby reduced to
+ zero, the repair state itself MUST be deleted.
+
+ Upon reception of a parity NCF, network elements MUST cancel pending
+ NAK retransmission only if NCF_PKT_CNT is greater or equal to the
+ largest known packet count. Network elements MUST use parity NCFs to
+ anticipate NAKs in the usual way with the addition of recording
+ NCF_PKT_CNT from the parity NCF as the largest known packet count
+ with the anticipated state so that any subsequent NAKs received with
+ NAK_PKT_CNT equal to or less than NCF_PKT_CNT will be eliminated, and
+
+
+
+Speakman, et. al. Experimental [Page 69]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ any with NAK_PKT_CNT greater than NCF_PKT_CNT will be forwarded.
+ Network elements which receive a parity NCF with NCF_PKT_CNT larger
+ than the largest known packet count MUST also use it to anticipate
+ NAKs, increasing the largest known packet count to reflect
+ NCF_PKT_CNT (partial anticipation).
+
+ Parity NNAKs follow the usual elimination procedures with the
+ exception that NNAKs are eliminated only if existing NAK state has a
+ NAK_PKT_CNT greater than NNAK_PKT_CNT.
+
+ Network elements must take extra precaution when the source is using
+ a variable sized transmission group. Network elements learn that the
+ source is using a TG size smaller than the maximum from
+ OPT_CURR_TGSIZE in parity RDATAs or in SPMs. When this happens, they
+ compute a TG size offset as the difference between the maximum TG
+ size and the actual TG size advertised by OPT_CURR_TGSIZE. Upon
+ reception of parity RDATA, the TG size offset is used to update the
+ repair state as follows:
+
+ Any interface whose known packet count is reduced to the TG size
+ offset is deleted from the repair state.
+
+ This replaces the normal rule for deleting interfaces that applies
+ when the TG size is equal to the maximum TG size.
+
+11.7. Procedures - DLRs
+
+ A DLR with the ability to provide FEC repairs MUST indicate this by
+ setting the OPT_PARITY bit in the redirecting POLR. It MUST then
+ process any redirected FEC NAKs in the usual way.
+
+11.8. Packet Formats
+
+11.8.1. OPT_PARITY_PRM - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |P O|
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Transmission Group Size |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x08
+
+ Option Length = 8 octets
+
+ P-bit (PARITY_PRM_PRO)
+
+
+
+Speakman, et. al. Experimental [Page 70]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Indicates when set that the source is providing pro-active parity
+ packets.
+
+ O-bit (PARITY_PRM_OND)
+
+ Indicates when set that the source is providing on-demand parity
+ packets.
+
+ At least one of PARITY_PRM_PRO and PARITY_PRM_OND MUST be set.
+
+ Transmission Group Size (PARITY_PRM_TGS)
+
+ The number of data packets in the transmission group over which
+ the parity packets are calculated. If a variable transmission
+ group size is being used, then this becomes the maximum effective
+ transmission group size across the session.
+
+ OPT_PARITY_PRM MAY be appended only to SPMs.
+
+ OPT_PARITY_PRM is network-significant.
+
+11.8.2. OPT_PARITY_GRP - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Parity Group Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x09
+
+ Option Length = 8 octets
+
+ Parity Group Number (PRM_GROUP)
+
+ The number of the group of k parity packets amongst the h parity
+ packets within the transmission group to which the parity packet
+ belongs, where the first k parity packets are in group zero.
+ PRM_GROUP MUST NOT be zero.
+
+ OPT_PARITY_GRP MAY be appended only to parity packets.
+
+ OPT_PARITY_GRP is NOT network-significant.
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 71]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+11.8.3. OPT_CURR_TGSIZE - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Actual Transmission Group Size |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x0A
+
+ Option Length = 8 octets
+
+ Actual Transmission Group Size (PRM_ATGSIZE)
+
+ The actual number of data packets in this transmission group.
+ This MUST be less than or equal to the maximum transmission group
+ size PARITY_PRM_TGS in OPT_PARITY_PRM.
+
+ OPT_CURR_TGSIZE MAY be appended to data and parity packets (ODATA or
+ RDATA) and to SPMs.
+
+ OPT_CURR_TGSIZE is network-significant except when appended to ODATA.
+
+12. Appendix B - Support for Congestion Control
+
+12.1. Introduction
+
+ A source MUST implement strategies for congestion avoidance, aimed at
+ providing overall network stability, fairness among competing PGM
+ flows, and some degree of fairness towards coexisting TCP flows [13].
+ In order to do this, the source must be provided with feedback on the
+ status of the network in terms of traffic load. This appendix
+ specifies NE procedures that provide such feedback to the source in a
+ scalable way. (An alternative TCP-friendly scheme for congestion
+ control that does not require NE support can be found in [16]).
+
+ The procedures specified in this section enable the collection and
+ selective forwarding of three types of feedback to the source:
+
+ o Worst link load as measured in network elements.
+
+ o Worst end-to-end path load as measured in network elements.
+
+ o Worst end-to-end path load as reported by receivers.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 72]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ This specification defines in detail NE procedures, receivers
+ procedures and packet formats. It also defines basic procedures in
+ receivers for generating congestion reports. This specification does
+ not define the procedures used by PGM sources to adapt their
+ transmission rates in response of congestion reports. Those
+ procedures depend upon the specific congestion control scheme.
+
+ PGM defines a header option that PGM receivers may append to NAKs
+ (OPT_CR). OPT_CR carries congestion reports in NAKs that propagate
+ upstream towards the source.
+
+ During the process of hop-by-hop reverse NAK forwarding, NEs examine
+ OPT_CR and possibly modify its contents prior to forwarding the NAK
+ upstream. Forwarding CRs also has the side effect of creating
+ congestion report state in the NE. The presence of OPT_CR and its
+ contents also influences the normal NAK suppression rules. Both the
+ modification performed on the congestion report and the additional
+ suppression rules depend on the content of the congestion report and
+ on the congestion report state recorded in the NE as detailed below.
+
+ OPT_CR contains the following fields:
+
+ OPT_CR_NE_WL Reports the load in the worst link as detected though
+ NE internal measurements
+
+ OPT_CR_NE_WP Reports the load in the worst end-to-end path as
+ detected though NE internal measurements
+
+ OPT_CR_RX_WP Reports the load in the worst end-to-end path as
+ detected by receivers
+
+ A load report is either a packet drop rate (as measured at an NE's
+ interfaces) or a packet loss rate (as measured in receivers). Its
+ value is linearly encoded in the range 0-0xFFFF, where 0xFFFF
+ represents a 100% loss/drop rate. Receivers that send a NAK bearing
+ OPT_CR determine which of the three report fields are being reported.
+
+ OPT_CR also contains the following fields:
+
+ OPT_CR_NEL A bit indicating that OPT_CR_NE_WL is being reported.
+
+ OPT_CR_NEP A bit indicating that OPT_CR_NE_WP is being reported.
+
+ OPT_CR_RXP A bit indicating that OPT_CR_RX_WP is being reported.
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 73]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ OPT_CR_LEAD A SQN in the ODATA space that serves as a temporal
+ reference for the load report values. This is
+ initialized by receivers with the leading edge of the
+ transmit window as known at the moment of transmitting
+ the NAK. This value MAY be advanced in NEs that
+ modify the content of OPT_CR.
+
+ OPT_CR_RCVR The identity of the receiver that generated the worst
+ OPT_CR_RX_WP.
+
+ The complete format of the option is specified later.
+
+12.2. NE-Based Worst Link Report
+
+ To permit network elements to report worst link, receivers append
+ OPT_CR to a NAK with bit OPT_CR_NEL set and OPT_CR_NE_WL set to zero.
+ NEs receiving NAKs that contain OPT_CR_NE_WL process the option and
+ update per-TSI state related to it as described below. The ultimate
+ result of the NEs' actions ensures that when a NAK leaves a sub-tree,
+ OPT_CR_NE_WL contains a congestion report that reflects the load of
+ the worst link in that sub-tree. To achieve this, NEs rewrite
+ OPT_CR_NE_WL with the worst value among the loads measured on the
+ local (outgoing) links for the session and the congestion reports
+ received from those links.
+
+ Note that the mechanism described in this sub-section does not permit
+ the monitoring of the load on (outgoing) links at non-PGM-capable
+ multicast routers. For this reason, NE-Based Worst Link Reports
+ SHOULD be used in pure PGM topologies only. Otherwise, this
+ mechanism might fail in detecting congestion. To overcome this
+ limitation PGM sources MAY use a heuristic that combines NE-Based
+ Worst Link Reports and Receiver-Based Reports.
+
+12.3. NE-Based Worst Path Report
+
+ To permit network elements to report a worst path, receivers append
+ OPT_CR to a NAK with bit OPT_CR_NEP set and OPT_CR_NE_WP set to zero.
+ The processing of this field is similar to that of OPT_CR_NE_WL with
+ the difference that, on the reception of a NAK, the value of
+ OPT_CR_NE_WP is adjusted with the load measured on the interface on
+ which the NAK was received according to the following formula:
+
+ OPT_CR_NE_WP = if_load + OPT_CR_NE_WP * (100% - if_loss_rate)
+
+ The worst among the adjusted OPT_CR_NE_WP is then written in the
+ outgoing NAK. This results in a hop-by-hop accumulation of link loss
+ rates into a path loss rate.
+
+
+
+
+Speakman, et. al. Experimental [Page 74]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ As with OPT_CR_NE_WL, the congestion report in OPT_CR_NE_WP may be
+ invalid if the multicast distribution tree includes non-PGM-capable
+ routers.
+
+12.4. Receiver-Based Worst Report
+
+ To report a packet loss rate, receivers append OPT_CR to a NAK with
+ bit OPT_CR_RXP set and OPT_CR_RX_WP set to the packet loss rate. NEs
+ receiving NAKs that contain OPT_CR_RX_WP process the option and
+ update per-TSI state related to it as described below. The ultimate
+ result of the NEs' actions ensures that when a NAK leaves a sub-tree,
+ OPT_CR_RX_WP contains a congestion report that reflects the load of
+ the worst receiver in that sub-tree. To achieve this, NEs rewrite
+ OTP_CR_RE_WP with the worst value among the congestion reports
+ received on its outgoing links for the session. In addition to this,
+ OPT_CR_RCVR MUST contain the NLA of the receiver that originally
+ measured the value of OTP_CR_RE_WP being forwarded.
+
+12.5. Procedures - Receivers
+
+ To enable the generation of any type of congestion report, receivers
+ MUST insert OPT_CR in each NAK they generate and provide the
+ corresponding field (OPT_CR_NE_WL, OPT_CR_NE_WP, OPT_CR_RX_WP). The
+ specific fields to be reported will be advertised to receivers in
+ OPT_CRQST on the session's SPMs. Receivers MUST provide only those
+ options requested in OPT_CRQST.
+
+ Receivers MUST initialize OPT_CR_NE_WL and OPT_CR_NE_WP to 0 and they
+ MUST initialize OPT_CR_RCVR to their NLA. At the moment of sending
+ the NAK, they MUST also initialize OPT_CR_LEAD to the leading edge of
+ the transmission window.
+
+ Additionally, if a receiver generates a NAK with OPT_CR with
+ OPT_CR_RX_WP, it MUST initialize OPT_CR_RX_WP to the proper value,
+ internally computed.
+
+12.6. Procedures - Network Elements
+
+ Network elements start processing each OPT_CR by selecting a
+ reference SQN in the ODATA space. The reference SQN selected is the
+ highest SQN known to the NE. Usually this is OPT_CR_LEAD contained
+ in the NAK received.
+
+ They use the selected SQN to age the value of load measurement as
+ follows:
+
+ o locally measured load values (e.g. interface loads) are
+ considered up-to-date
+
+
+
+Speakman, et. al. Experimental [Page 75]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ o load values carried in OPT_CR are considered up-to-date and are
+ not aged so as to be independent of variance in round-trip
+ times from the network element to the receivers
+
+ o old load values recorded in the NE are exponentially aged
+ according to the difference between the selected reference SQN
+ and the reference SQN associated with the old load value.
+
+ The exponential aging is computed so that a recorded value gets
+ scaled down by a factor exp(-1/2) each time the expected inter-NAK
+ time elapses. Hence the aging formula must include the current loss
+ rate as follows:
+
+ aged_loss_rate = loss_rate * exp( - SQN_difference * loss_rate /
+ 2)
+
+ Note that the quantity 1/loss_rate is the expected SQN_lag between
+ two NAKs, hence the formula above can also be read as:
+
+ aged_loss_rate = loss_rate * exp( - 1/2 * SQN_difference /
+ SQN_lag)
+
+ which equates to (loss_rate * exp(-1/2)) when the SQN_difference is
+ equal to expected SQN_lag between two NAKs.
+
+ All the subsequent computations refer to the aged load values.
+
+ Network elements process OPT_CR by handling the three possible types
+ of congestion reports independently.
+
+ For each congestion report in an incoming NAK, a new value is
+ computed to be used in the outgoing NAK:
+
+ o The new value for OPT_CR_NE_WL is the maximum of the load
+ measured on the outgoing interfaces for the session, the value
+ of OPT_CR_NE_WL of the incoming NAK, and the value previously
+ sent upstream (recorded in the NE). All these values are as
+ obtained after the aging process.
+
+ o The new value for OPT_CR_NE_WP is the maximum of the value
+ previously sent upstream (after aging) and the value of
+ OPT_CR_NE_WP in the incoming NAK adjusted with the load on the
+ interface upon which the NAK was received (as described above).
+
+ o The new value for OPT_CR_RX_WP is the maximum of the value
+ previously sent upstream (after aging) and the value of
+ OPT_CR_RX_WP in the incoming NAK.
+
+
+
+
+Speakman, et. al. Experimental [Page 76]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ o If OPT_CR_RX_WP was selected from the incoming NAK, the new
+ value for OPT_CR_RCVR is the one in the incoming NAK.
+ Otherwise it is the value previously sent upstream.
+
+ o The new value for OPT_CR_LEAD is the reference SQN selected for
+ the aging procedure.
+
+12.6.1. Overriding Normal Suppression Rules
+
+ Normal suppression rules hold to determine if a NAK should be
+ forwarded upstream or not. However if any of the outgoing congestion
+ reports has changed by more than 5% relative to the one previously
+ sent upstream, this new NAK is not suppressed.
+
+12.6.2. Link Load Measurement
+
+ PGM routers monitor the load on all their outgoing links and record
+ it in the form of per-interface loss rate statistics. "load" MUST be
+ interpreted as the percentage of the packets meant to be forwarded on
+ the interface that were dropped. Load statistics refer to the
+ aggregate traffic on the links and not to PGM traffic only.
+
+ This document does not specify the algorithm to be used to collect
+ such statistics, but requires that such algorithm provide both
+ accuracy and responsiveness in the measurement process. As far as
+ accuracy is concerned, the expected measurement error SHOULD be
+ upper-limited (e.g. less than than 10%). As far as responsiveness is
+ concerned, the measured load SHOULD converge to the actual value in a
+ limited time (e.g. converge to 90% of the actual value in less than
+ 200 milliseconds), when the instantaneous offered load changes.
+ Whenever both requirements cannot be met at the same time, accuracy
+ SHOULD be traded for responsiveness.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 77]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+12.7. Packet Formats
+
+12.7.1. OPT_CR - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| L P R|
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Congestion Report Reference SQN |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NE Worst Link | NE Worst Path |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Rcvr Worst Path | Reserved |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NLA AFI | Reserved |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Worst Receiver's NLA ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+
+
+ Option Type = 0x10
+
+ Option Length = 20 octets + NLA length
+
+ L OPT_CR_NEL bit : set indicates OPT_CR_NE_WL is being reported
+
+ P OPT_CR_NEP bit : set indicates OPT_CR_NE_WP is being reported
+
+ R OPT_CR_RXP bit : set indicates OPT_CR_RX_WP is being reported
+
+ Congestion Report Reference SQN (OPT_CR_LEAD).
+
+ A SQN in the ODATA space that serves as a temporal reference point
+ for the load report values.
+
+ NE Worst Link (OPT_CR_NE_WL).
+
+ Reports the load in the worst link as detected though NE internal
+ measurements
+
+ NE Worst Path (OPT_CR_NE_WP).
+
+ Reports the load in the worst end-to-end path as detected though
+ NE internal measurements
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 78]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Rcvr Worst Path (OPT_CR_RX_WP).
+
+ Reports the load in the worst end-to-end path as detected by
+ receivers
+
+ Worst Receiver's NLA (OPT_CR_RCVR).
+
+ The unicast address of the receiver that generated the worst
+ OPT_CR_RX_WP.
+
+ OPT_CR MAY be appended only to NAKs.
+
+ OPT-CR is network-significant.
+
+12.7.2. OPT_CRQST - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| L P R|
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x11
+
+ Option Length = 4 octets
+
+ L OPT_CRQST_NEL bit : set indicates OPT_CR_NE_WL is being
+ requested
+
+ P OPT_CRQST_NEP bit : set indicates OPT_CR_NE_WP is being
+ requested
+
+ R OPT_CRQST_RXP bit : set indicates OPT_CR_RX_WP is being
+ requested
+
+ OPT_CRQST MAY be appended only to SPMs.
+
+ OPT-CRQST is network-significant.
+
+13. Appendix C - SPM Requests
+
+13.1. Introduction
+
+ SPM Requests (SPMRs) MAY be used to solicit an SPM from a source in a
+ non-implosive way. The typical application is for late-joining
+ receivers to solicit SPMs directly from a source in order to be able
+ to NAK for missing packets without having to wait for a regularly
+ scheduled SPM from that source.
+
+
+
+Speakman, et. al. Experimental [Page 79]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+13.2. Overview
+
+ Allowing for SPMR implosion protection procedures, a receiver MAY
+ unicast an SPMR to a source to solicit the most current session,
+ window, and path state from that source any time after the receiver
+ has joined the group. A receiver may learn the TSI and source to
+ which to direct the SPMR from any other PGM packet it receives in the
+ group, or by any other means such as from local configuration or
+ directory services. The receiver MUST use the usual SPM procedures
+ to glean the unicast address to which it should direct its NAKs from
+ the solicited SPM.
+
+13.3. Packet Contents
+
+ This section just provides enough short-hand to make the Procedures
+ intelligible. For the full details of packet contents, please refer
+ to Packet Formats below.
+
+13.3.1. SPM Requests
+
+ SPMRs are transmitted by receivers to solicit SPMs from a source.
+
+ SPMs are unicast to a source and contain:
+
+ SPMR_TSI the source-assigned TSI for the session to which the
+ SPMR corresponds
+
+13.4. Procedures - Sources
+
+ A source MUST respond immediately to an SPMR with the corresponding
+ SPM rate limited to once per IHB_MIN per TSI. The corresponding SPM
+ matches SPM_TSI to SPMR_TSI and SPM_DPORT to SPMR_DPORT.
+
+13.5. Procedures - Receivers
+
+ To moderate the potentially implosive behavior of SPMRs at least on a
+ densely populated subnet, receivers MUST use the following back-off
+ and suppression procedure based on multicasting the SPMR with a TTL
+ of 1 ahead of and in addition to unicasting the SPMR to the source.
+ The role of the multicast SPMR is to suppress the transmission of
+ identical SPMRs from the subnet.
+
+ More specifically, before unicasting a given SPMR, receivers MUST
+ choose a random delay on SPMR_BO_IVL (~250 msecs) during which they
+ listen for a multicast of an identical SPMR. If a receiver does not
+ see a matching multicast SPMR within its chosen random interval, it
+ MUST first multicast its own SPMR to the group with a TTL of 1 before
+ then unicasting its own SPMR to the source. If a receiver does see a
+
+
+
+Speakman, et. al. Experimental [Page 80]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ matching multicast SPMR within its chosen random interval, it MUST
+ refrain from unicasting its SPMR and wait instead for the
+ corresponding SPM.
+
+ In addition, receipt of the corresponding SPM within this random
+ interval SHOULD cancel transmission of an SPMR.
+
+ In either case, the receiver MUST wait at least SPMR_SPM_IVL before
+ attempting to repeat the SPMR by choosing another delay on
+ SPMR_BO_IVL and repeating the procedure above.
+
+ The corresponding SPMR matches SPMR_TSI to SPMR_TSI and SPMR_DPORT to
+ SPMR_DPORT. The corresponding SPM matches SPM_TSI to SPMR_TSI and
+ SPM_DPORT to SPMR_DPORT.
+
+13.6. SPM Requests
+
+ SPMR:
+
+ SPM Requests are sent by receivers to request the immediate
+ transmission of an SPM for the given TSI from a source.
+
+ The network-header source address of an SPMR is the unicast NLA of
+ the entity that originates the SPMR.
+
+ The network-header destination address of an SPMR is the unicast NLA
+ of the source from which the corresponding SPM is requested.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Source Port | Destination Port |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Type | Options | Checksum |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Global Source ID ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... Global Source ID | TSDU Length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Option Extensions when present ...
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ...
+
+ Source Port:
+
+ SPMR_SPORT
+
+ Data-Destination Port
+
+
+
+
+Speakman, et. al. Experimental [Page 81]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Destination Port:
+
+ SPMR_DPORT
+
+ Data-Source Port, together with Global Source ID forms SPMR_TSI
+
+ Type:
+
+ SPMR_TYPE = 0x0C
+
+ Global Source ID:
+
+ SPMR_GSI
+
+ Together with Source Port forms
+
+ SPMR_TSI
+
+14. Appendix D - Poll Mechanism
+
+14.1. Introduction
+
+ These procedures provide PGM network elements and sources with the
+ ability to poll their downstream PGM neighbors to solicit replies
+ in an implosion-controlled way.
+
+ Both general polls and specific polls are possible. The former
+ provide a PGM (parent) node with a way to check if there are any
+ PGM (children) nodes connected to it, both network elements and
+ receivers, and to estimate their number. The latter may be used
+ by PGM parent nodes to search for nodes with specific properties
+ among its PGM children. An example of application for this is DLR
+ discovery.
+
+ Polling is implemented using two additional PGM packets:
+
+ POLL a Poll Request that PGM parent nodes multicast to the group to
+ perform the poll. Similarly to NCFs, POLL packets stop at the
+ first PGM node they reach, as they are not forwarded by PGM
+ network elements.
+
+ POLR a Poll Response that PGM children nodes (either network elements
+ or receivers) use to reply to a Poll Request by addressing it
+ to the NLA of the interface from which the triggering POLL was
+ sent.
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 82]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ The polling mechanism dictates that PGM children nodes that receive a
+ POLL packet reply to it only if certain conditions are satisfied and
+ ignore the POLL otherwise. Two types of condition are possible: a
+ random condition that defines a probability of replying for the
+ polled child, and a deterministic condition. Both the random
+ condition and the deterministic condition are controlled by the
+ polling PGM parent node by specifying the probability of replying and
+ defining the deterministic condition(s) respectively. Random-only
+ poll, deterministic-only poll or a combination of the two are
+ possible.
+
+ The random condition in polls allows the prevention of implosion of
+ replies by controlling their number. Given a probability of replying
+ P and assuming that each receiver makes an independent decision, the
+ number of expected replies to a poll is P*N where N is the number of
+ PGM children relative to the polling PGM parent. The polling node
+ can control the number of expected replies by specifying P in the
+ POLL packet.
+
+14.2. Packet Contents
+
+ This section just provides enough short-hand to make the Procedures
+ intelligible. For the full details of packet contents, please refer
+ to Packet Formats below.
+
+14.2.1. POLL (Poll Request)
+
+ POLL_SQN a sequence number assigned sequentially by the polling
+ parent in unit increments and scoped by POLL_PATH and
+ the TSI of the session.
+
+ POLL_ROUND a poll round sequence number. Multiple poll rounds
+ are possible within a POLL_SQN.
+
+ POLL_S_TYPE the sub-type of the poll request
+
+ POLL_PATH the network-layer address (NLA) of the interface on
+ the PGM network element or source on which the POLL is
+ transmitted
+
+ POLL_BO_IVL the back-off interval that MUST be used to compute the
+ random back-off time to wait before sending the
+ response to a poll. POLL_BO_IVL is expressed in
+ microseconds.
+
+ POLL_RAND a random string used to implement the randomness in
+ replying
+
+
+
+
+Speakman, et. al. Experimental [Page 83]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ POLL_MASK a bit-mask used to determine the probability of random
+ replies
+
+ Poll request MAY also contain options which specify deterministic
+ conditions for the reply. No options are currently defined.
+
+14.2.2. POLR (Poll Response)
+
+ POLR_SQN POLL_SQN of the poll request for which this is a reply
+
+ POLR_ROUND POLL_ROUND of the poll request for which this is a
+ reply
+
+ Poll response MAY also contain options. No options are currently
+ defined.
+
+14.3. Procedures - General
+
+14.3.1. General Polls
+
+ General Polls may be used to check for and count PGM children that
+ are 1 PGM hop downstream of an interface of a given node. They have
+ POLL_S_TYPE equal to PGM_POLL_GENERAL. PGM children that receive a
+ general poll decide whether to reply to it only based on the random
+ condition present in the POLL.
+
+ To prevent response implosion, PGM parents that initiate a general
+ poll SHOULD establish the probability of replying to the poll, P, so
+ that the expected number of replies is contained. The expected
+ number of replies is N * P, where N is the number of children. To be
+ able to compute this number, PGM parents SHOULD already have a rough
+ estimate of the number of children. If they do not have a recent
+ estimate of this number, they SHOULD send the first poll with a very
+ low probability of replying and increase it in subsequent polls in
+ order to get the desired number of replies.
+
+ To prevent poll-response implosion caused by a sudden increase in the
+ children population occurring between two consecutive polls with
+ increasing probability of replying, PGM parents SHOULD use poll
+ rounds. Poll rounds allow PGM parents to "freeze" the size of the
+ children population when they start a poll and to maintain it
+ constant across multiple polls (with the same POLL_SQN but different
+ POLL_ROUND). This works by allowing PGM children to respond to a
+ poll only if its POLL_ROUND is zero or if they have previously
+ received a poll with the same POLL_SQN and POLL_ROUND equal to zero.
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 84]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ In addition to this PGM children MUST observe a random back-off in
+ replying to a poll. This spreads out the replies in time and allows
+ a PGM parent to abort the poll if too many replies are being
+ received. To abort an ongoing poll a PGM parent MUST initiate
+ another poll with different POLL_SQN. PGM children that receive a
+ POLL MUST cancel any pending reply for POLLs with POLL_SQN different
+ from the one of the last POLL received.
+
+ For a given poll with probability of replying P, a PGM parent
+ estimates the number of children as M / P, where M is the number of
+ responses received. PGM parents SHOULD keep polling periodically and
+ use some average of the result of recent polls as their estimate for
+ the number of children.
+
+14.3.2. Specific Polls
+
+ Specific polls provide a way to search for PGM children that comply
+ to specific requisites. As an example specific poll could be used to
+ search for down-stream DLRs. A specific poll is characterized by a
+ POLL_S_TYPE different from PGM_POLL_GENERAL. PGM children decide
+ whether to reply to a specific poll or not based on the POLL_S_TYPE,
+ on the random condition and on options possibly present in the POLL.
+ The way options should be interpreted is defined by POLL_S_TYPE. The
+ random condition MUST be interpreted as an additional condition to be
+ satisfied. To disable the random condition PGM parents MUST specify
+ a probability of replying P equal to 1.
+
+ PGM children MUST ignore a POLL packet if they do not understand
+ POLL_S_TYPE. Some specific POLL_S_TYPE may also require that the
+ children ignore a POLL if they do not fully understand all the PGM
+ options present in the packet.
+
+14.4. Procedures - PGM Parents (Sources or Network Elements)
+
+ A PGM parent (source or network element), that wants to poll the
+ first PGM-hop children connected to one of its outgoing interfaces
+ MUST send a POLL packet on that interface with:
+
+ POLL_SQN equal to POLL_SQN of the last POLL sent incremented by
+ one. If poll rounds are used, this must be equal to
+ POLL_SQN of the last group of rounds incremented by
+ one.
+
+ POLL_ROUND The round of the poll. If the poll has a single
+ round, this must be zero. If the poll has multiple
+ rounds, this must be one plus the last POLL_ROUND for
+ the same POLL_SQN, or zero if this is the first round
+ within this POLL_SQN.
+
+
+
+Speakman, et. al. Experimental [Page 85]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ POLL_S_TYPE the type of the poll. For general poll use
+ PGM_POLL_GENERAL
+
+ POLL_PATH set to the NLA of the outgoing interface
+
+ POLL_BO_IVL set to the wanted reply back-off interval. As far as
+ the choice of this is concerned, using NAK_BO_IVL is
+ usually a conservative option, however a smaller value
+ MAY be used, if the number of expected replies can be
+ determined with a good confidence or if a
+ conservatively low probability of reply (P) is being
+ used (see POLL_MASK next). When the number of
+ expected replies is unknown, a large POLL_BO_IVL
+ SHOULD be used, so that the poll can be effectively
+ aborted if the number of replies being received is too
+ large.
+
+ POLL_RAND MUST be a random string re-computed each time a new
+ poll is sent on a given interface
+
+ POLL_MASK determines the probability of replying, P, according
+ to the relationship P = 1 / ( 2 ^ B ), where B is the
+ number of bits set in POLL_MASK [15]. If this is a
+ deterministic poll, B MUST be 0, i.e. POLL_MASK MUST
+ be a all-zeroes bit-mask.
+
+ Nota Bene: POLLs transmitted by network elements MUST bear the
+ ODATA source's network-header source address, not the network
+ element's NLA. POLLs MUST also be transmitted with the IP
+
+ Router Alert Option [6], to be allow PGM network element to
+ intercept them.
+
+ A PGM parent that has started a poll by sending a POLL packet SHOULD
+ wait at least POLL_BO_IVL before starting another poll. During this
+ interval it SHOULD collect all the valid response (the one with
+ POLR_SQN and POLR_ROUND matching with the outstanding poll) and
+ process them at the end of the collection interval.
+
+ A PGM parent SHOULD observe the rules mentioned in the description of
+ general procedures, to prevent implosion of response. These rules
+ should in general be observed both for generic polls and specific
+ polls. The latter however can be performed using deterministic poll
+ (with no implosion prevention) if the expected number of replies is
+ known to be small. A PGM parent that issue a generic poll with the
+ intent of estimating the children population size SHOULD use poll
+ rounds to "freeze" the children that are involved in the measure
+ process. This allows the sender to "open the door wider" across
+
+
+
+Speakman, et. al. Experimental [Page 86]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ subsequent rounds (by increasing the probability of response),
+ without fear of being flooded by late joiners. Note the use of
+ rounds has the drawback of introducing additional delay in the
+ estimate of the population size, as the estimate obtained at the end
+ of a round-group refers to the condition present at the time of the
+ first round.
+
+ A PGM parent that has started a poll SHOULD monitor the number of
+ replies during the collection phase. If this become too large, the
+ PGM parent SHOULD abort the poll by immediately starting a new poll
+ (different POLL_SQN) and specifying a very low probability of
+ replying.
+
+
+ When polling is being used to estimate the receiver population for
+ the purpose of calculating NAK_BO_IVL, OPT_NAK_BO_IVL (see 16.4.1
+ below) MUST be appended to SPMs, MAY be appended to NCFs and POLLs,
+ and in all cases MUST have NAK_BO_IVL_SQN set to POLL_SQN of the most
+ recent complete round of polls, and MUST bear that round's
+ corresponding derived value of NAK_BAK_IVL. In this way,
+ OPT_NAK_BO_IVL provides a current value for NAK_BO_IVL at the same
+ time as information is being gathered for the calculation of a future
+ value of NAK_BO_IVL.
+
+14.5. Procedures - PGM Children (Receivers or Network Elements)
+
+ PGM receivers and network elements MUST compute a 32-bit random node
+ identifier (RAND_NODE_ID) at startup time. When a PGM child
+ (receiver or network element) receives a POLL it MUST use its
+ RAND_NODE_ID to match POLL_RAND of incoming POLLs. The match is
+ limited to the bits specified by POLL_MASK. If the incoming POLL
+ contain a POLL_MASK made of all zeroes, the match is successful
+ despite the content of POLL_RAND (deterministic reply). If the match
+ fails, then the receiver or network element MUST discard the POLL
+ without any further action, otherwise it MUST check the fields
+ POLL_ROUND, POLL_S_TYPE and any PGM option included in the POLL to
+ determine whether it SHOULD reply to the poll.
+
+ If POLL_ROUND is non-zero and the PGM receiver has not received a
+ previous poll with the same POLL_SQN and a zero POLL_ROUND, it MUST
+ discard the poll without further action.
+
+ If POLL_S_TYPE is equal to PGM_POLL_GENERAL, the PGM child MUST
+ schedule a reply to the POLL despite the presence of PGM options on
+ the POLL packet.
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 87]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ If POLL_S_TYPE is different from PGM_POLL_GENERAL, the decision on
+ whether a reply should be scheduled depends on the actual type and on
+ the options possibly present in the POLL.
+
+ If POLL_S_TYPE is unknown to the recipient of the POLL, it MUST NOT
+ reply and ignore the poll. Currently the only POLL_S_TYPE defined
+ are PGM_POLL_GENERAL and PGM_POLL_DLR.
+
+ If a PGM receiver or network element has decided to reply to a POLL,
+ it MUST schedule the transmission of a single POLR at a random time
+ in the future. The random delay is chosen in the interval [0,
+ POLL_BO_IVL]. POLL_BO_IVL is the one contained in the POLL received.
+ When this timer expires, it MUST send a POLR using POLL_PATH of the
+ received POLL as destination address. POLR_SQN MUST be equal to
+ POLL_SQN and POLR_ROUND must be equal to POLL_ROUND. The POLR MAY
+ contain PGM options according to the semantic of POLL_S_TYPE or the
+ semantic of PGM options possibly present in the POLL. If POLL_S_TYPE
+ is PGM_POLL_GENERAL no option is REQUIRED.
+
+ A PGM receiver or network element MUST cancel any pending
+ transmission of POLRs if a new POLL is received with POLL_SQN
+ different from POLR_SQN of the poll that scheduled POLRs.
+
+14.6. Constant Definition
+
+ The POLL_S_TYPE values currently defined are:
+
+ PGM_POLL_GENERAL 0
+
+ PGM_POLL_DLR 1
+
+14.7. Packet Formats
+
+ The packet formats described in this section are transport-layer
+ headers that MUST immediately follow the network-layer header in the
+ packet.
+
+ The descriptions of Data-Source Port, Data-Destination Port, Options,
+ Checksum, Global Source ID (GSI), and TSDU Length are those provided
+ in Section 8.
+
+14.7.1. Poll Request
+
+ POLL are sent by PGM parents (sources or network elements) to
+ initiate a poll among their first PGM-hop children.
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 88]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ POLLs are sent to the ODATA multicast group. The network-header
+ source address of a POLL is the ODATA source's NLA. POLL MUST be
+ transmitted with the IP Router Alert Option.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Source Port | Destination Port |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Type | Options | Checksum |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Global Source ID ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... Global Source ID | TSDU Length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | POLL's Sequence Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | POLL's Round | POLL's Sub-type |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NLA AFI | Reserved |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Path NLA ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+
+ | POLL's Back-off Interval |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Random String |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Matching Bit-Mask |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Option Extensions when present ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Source Port:
+
+ POLL_SPORT
+
+ Data-Source Port, together with POLL_GSI forms POLL_TSI
+
+ Destination Port:
+
+ POLL_DPORT
+
+ Data-Destination Port
+
+ Type:
+
+ POLL_TYPE = 0x01
+
+
+
+
+Speakman, et. al. Experimental [Page 89]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Global Source ID:
+
+ POLL_GSI
+
+ Together with POLL_SPORT forms POLL_TSI
+
+ POLL's Sequence Number
+
+ POLL_SQN
+
+ The sequence number assigned to the POLL by the originator.
+
+ POLL's Round
+
+ POLL_ROUND
+
+ The round number within the poll sequence number.
+
+ POLL's Sub-type
+
+ POLL_S_TYPE
+
+ The sub-type of the poll request.
+
+ Path NLA:
+
+ POLL_PATH
+
+ The NLA of the interface on the source or network element on which
+ this POLL was forwarded.
+
+ POLL's Back-off Interval
+
+ POLL_BO_IVL
+
+ The back-off interval used to compute a random back-off for the
+ reply, expressed in microseconds.
+
+ Random String
+
+ POLL_RAND
+
+ A random string used to implement the random condition in
+ replying.
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 90]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Matching Bit-Mask
+
+ POLL_MASK
+
+ A bit-mask used to determine the probability of random replies.
+
+14.7.2. Poll Response
+
+ POLR are sent by PGM children (receivers or network elements) to
+ reply to a POLL.
+
+ The network-header source address of a POLR is the unicast NLA of the
+ entity that originates the POLR. The network-header destination
+ address of a POLR is initialized by the originator of the POLL to the
+ unicast NLA of the upstream PGM element (source or network element)
+ known from the POLL that triggered the POLR.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Source Port | Destination Port |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Type | Options | Checksum |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Global Source ID ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... Global Source ID | TSDU Length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | POLR's Sequence Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | POLR's Round | reserved |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Option Extensions when present ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Source Port:
+
+ POLR_SPORT
+
+ Data-Destination Port
+
+ Destination Port:
+
+ POLR_DPORT
+
+ Data-Source Port, together with Global Source ID forms POLR_TSI
+
+
+
+
+
+Speakman, et. al. Experimental [Page 91]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Type:
+
+ POLR_TYPE = 0x02
+
+ Global Source ID:
+
+ POLR_GSI
+
+ Together with POLR_DPORT forms POLR_TSI
+
+ POLR's Sequence Number
+
+ POLR_SQN
+
+ The sequence number (POLL_SQN) of the POLL packet for which this
+ is a reply.
+
+ POLR's Round
+
+ POLR_ROUND
+
+ The round number (POLL_ROUND) of the POLL packet for which this is
+ a reply.
+
+15. Appendix E - Implosion Prevention
+
+15.1. Introduction
+
+ These procedures are intended to prevent NAK implosion and to limit
+ its extent in case of the loss of all or part of the suppressing
+ multicast distribution tree. They also provide a means to adaptively
+ tune the NAK back-off interval, NAK_BO_IVL.
+
+ The PGM virtual topology is established and refreshed by SPMs.
+ Between one SPM and the next, PGM nodes may have an out-of-date view
+ of the PGM topology due to multicast routing changes, flapping, or a
+ link/router failure. If any of the above happens relative to a PGM
+ parent node, a potential NAK implosion problem arises because the
+ parent node is unable to suppress the generation of duplicate NAKs as
+ it cannot reach its children using NCFs. The procedures described
+ below introduce an alternative way of performing suppression in this
+ case. They also attempt to prevent implosion by adaptively tuning
+ NAK_BO_IVL.
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 92]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+15.2. Tuning NAK_BO_IVL
+
+ Sources and network elements continuously monitor the number of
+ duplicated NAKs received and use this observation to tune the NAK
+ back-off interval (NAK_BO_IVL) for the first PGM-hop receivers
+ connected to them. Receivers learn the current value of NAK_BO_IVL
+ through OPT_NAK_BO_IVL appended to NCFs or SPMs.
+
+15.2.1. Procedures - Sources and Network Elements
+
+ For each TSI, sources and network elements advertise the value of
+ NAK_BO_IVL that their first PGM-hop receivers should use. They
+ advertise a separate value on all the outgoing interfaces for the TSI
+ and keep track of the last values advertised.
+
+ For each interface and TSI, sources and network elements count the
+ number of NAKs received for a specific repair state (i.e., per
+ sequence number per TSI) from the time the interface was first added
+ to the repair state list until the time the repair state is
+ discarded. Then they use this number to tune the current value of
+ NAK_BO_IVL as follows:
+
+ Increase the current value NAK_BO_IVL when the first duplicate NAK
+ is received for a given SQN on a particular interface.
+
+ Decrease the value of NAK_BO_IVL if no duplicate NAKs are received on
+ a particular interface for the last NAK_PROBE_NUM measurements where
+ each measurement corresponds to the creation of a new repair state.
+
+ An upper and lower limit are defined for the possible value of
+ NAK_BO_IVL at any time. These are NAK_BO_IVL_MAX and NAK_BO_IVL_MIN
+ respectively. The initial value that should be used as a starting
+ point to tune NAK_BO_IVL is NAK_BO_IVL_DEFAULT. The policies
+ RECOMMENDED for increasing and decreasing NAK_BO_IVL are multiplying
+ by two and dividing by two respectively.
+
+ Sources and network elements advertise the current value of
+ NAK_BO_IVL through the OPT_NAK_BO_IVL that they append to NCFs. They
+ MAY also append OPT_NAK_BO_IVL to outgoing SPMs.
+
+ In order to avoid forwarding the NAK_BO_IVL advertised by the parent,
+ network elements must be able to recognize OPT_NAK_BO_IVL. Network
+ elements that receive SPMs containing OPT_NAK_BO_IVL MUST either
+ remove the option or over-write its content (NAK_BO_IVL) with the
+ current value of NAK_BO_IVL for the outgoing interface(s), before
+ forwarding the SPMs.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 93]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Sources MAY advertise the value of NAK_BO_IVL_MAX and NAK_BO_IVL_MIN
+ to the session by appending a OPT_NAK_BO_RNG to SPMs.
+
+15.2.2. Procedures - Receivers
+
+ Receivers learn the value of NAK_BO_IVL to use through the option
+ OPT_NAK_BO_IVL, when this is present in NCFs or SPMs. A value for
+ NAK_BO_IVL learned from OPT_NAK_BO_IVL MUST NOT be used by a receiver
+ unless either NAK_BO_IVL_SQN is zero, or the receiver has seen
+ POLL_RND == 0 for POLL_SQN =< NAK_BO_IVL_SQN within half the sequence
+ number space. The initial value of NAK_BO_IVL is set to
+ NAK_BO_IVL_DEFAULT.
+
+ Receivers that receive an SPM containing OPT_NAK_BO_RNG MUST use its
+ content to set the local values of NAK_BO_IVL_MAX and NAK_BO_IVL_MIN.
+
+15.2.3. Adjusting NAK_BO_IVL in the absence of NAKs
+
+ Monitoring the number of duplicate NAKs provides a means to track
+ indirectly the change in the size of first PGM-hop receiver
+ population and adjust NAK_BO_IVL accordingly. Note that the number
+ of duplicate NAKs for a given SQN is related to the number of first
+ PGM-hop children that scheduled (or forwarded) a NAK and not to the
+ absolute number of first PGM-hop children. This mechanism, however,
+ does not work in the absence of packet loss, hence a large number of
+ duplicate NAKs is possible after a period without NAKs, if many new
+ receivers have joined the session in the meanwhile. To address this
+ issue, PGM Sources and network elements SHOULD periodically poll the
+ number of first PGM-hop children using the "general poll" procedures
+ described in Appendix D. If the result of the polls shows that the
+ population size has increased significantly during a period without
+ NAKs, they SHOULD increase NAK_BO_IVL as a safety measure.
+
+15.3. Containing Implosion in the Presence of Network Failures
+
+15.3.1. Detecting Network Failures
+
+ In some cases PGM (parent) network elements can promptly detect the
+ loss of all or part of the suppressing multicast distribution tree
+ (due to network failures or route changes) by checking their
+ multicast connectivity, when they receive NAKs. In some other cases
+ this is not possible as the connectivity problem might occur at some
+ other non-PGM node downstream or might take time to reflect in the
+ multicast routing table. To address these latter cases, PGM uses a
+ simple heuristic: a failure is assumed for a TSI when the count of
+ duplicated NAKs received for a repair state reaches the value
+ DUP_NAK_MAX in one of the interfaces.
+
+
+
+
+Speakman, et. al. Experimental [Page 94]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+15.3.2. Containing Implosion
+
+ When a PGM source or network element detects or assumes a failure for
+ which it looses multicast connectivity to down-stream PGM agents
+ (either receivers or other network elements), it sends unicast NCFs
+ to them in response to NAKs. Downstream PGM network elements which
+ receive unicast NCFs and have multicast connectivity to the multicast
+ session send special SPMs to prevent further NAKs until a regular SPM
+ sent by the source refreshes the PGM tree.
+
+ Procedures - Sources and Network Elements
+
+ PGM sources or network elements which detect or assume a failure that
+ prevents them from reaching down-stream PGM agents through multicast
+ NCFs revert to confirming NAKs through unicast NCFs for a given TSI
+ on a given interface. If the PGM agent is the source itself, than it
+ MUST generate an SPM for the TSI, in addition to sending the unicast
+ NCF.
+
+ Network elements MUST keep using unicast NCFs until they receive a
+ regular SPM from the source.
+
+ When a unicast NCF is sent for the reasons described above, it MUST
+ contain the OPT_NBR_UNREACH option and the OPT_PATH_NLA option.
+ OPT_NBR_UNREACH indicates that the sender is unable to use multicast
+ to reach downstream PGM agents. OPT_PATH_NLA carries the network
+ layer address of the NCF sender, namely the NLA of the interface
+ leading to the unreachable subtree.
+
+ When a PGM network element receives an NCF containing the
+ OPT_NBR_UNREACH option, it MUST ignore it if OPT_PATH_NLA specifies
+ an upstream neighbour different from the one currently known to be
+ the upstream neighbor for the TSI. Assuming the network element
+ matches the OPT_PATH_NLA of the upstream neighbour address, it MUST
+ stop forwarding NAKs for the TSI until it receives a regular SPM for
+ the TSI. In addition, it MUST also generate a special SPM to prevent
+ downstream receivers from sending more NAKs. This special SPM MUST
+ contain the OPT_NBR_UNREACH option and SHOULD have a SPM_SQN equal to
+ SPM_SQN of the last regular SPM forwarded. The OPT_NBR_UNREACH
+ option invalidates the windowing information in SPMs (SPM_TRAIL and
+ SPM_LEAD). The PGM network element that adds the OPT_NBR_UNREACH
+ option SHOULD invalidate the windowing information by setting
+ SPM_TRAIL to 0 and SPM_LEAD to 0x80000000.
+
+ PGM network elements which receive an SPM containing the
+ OPT_NBR_UNREACH option and whose SPM_PATH matches the currently known
+ PGM parent, MUST forward them in the normal way and MUST stop
+
+
+
+
+Speakman, et. al. Experimental [Page 95]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ forwarding NAKs for the TSI until they receive a regular SPM for the
+ TSI. If the SPM_PATH does not match the currently known PGM parent,
+ the SPM containing the OPT_NBR_UNREACH option MUST be ignored.
+
+ Procedures - Receivers
+
+ PGM receivers which receive either an NCF or an SPM containing the
+ OPT_NBR_UNREACH option MUST stop sending NAKs until a regular SPM is
+ received for the TSI.
+
+ On reception of a unicast NCF containing the OPT_NBR_UNREACH option
+ receivers MUST generate a multicast copy of the packet with TTL set
+ to one on the RPF interface for the data source. This will prevent
+ other receivers in the same subnet from generating NAKs.
+
+ Receivers MUST ignore windowing information in SPMs which contain the
+ OPT_NBR_UNREACH option.
+
+ Receivers MUST ignore NCFs containing the OPT_NBR_UNREACH option if
+ the OPT_PATH_NLA specifies a neighbour different than the one
+ currently know to be the PGM parent neighbour. Similarly receivers
+ MUST ignore SPMs containing the OPT_NBR_UNREACH option if SPM_PATH
+ does not match the current PGM parent.
+
+15.4. Packet Formats
+
+15.4.1. OPT_NAK_BO_IVL - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NAK Back-Off Interval |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NAK Back-Off Interval SQN |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x04
+
+ NAK Back-Off Interval
+
+ The value of NAK-generation Back-Off Interval in microseconds.
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 96]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ NAK Back-Off Interval Sequence Number
+
+ The POLL_SQN to which this value of NAK_BO_IVL corresponds. Zero
+ is reserved and means NAK_BO_IVL is NOT being determined through
+ polling (see Appendix D) and may be used immediately. Otherwise,
+ NAK_BO_IVL MUST NOT be used unless the receiver has also seen
+ POLL_ROUND = 0 for POLL_SQN =< NAK_BO_IVL_SQN within half the
+ sequence number space.
+
+ OPT_NAK_BO_IVL MAY be appended to NCFs, SPMs, or POLLs.
+
+ OPT_NAK_BO_IVL is network-significant.
+
+15.4.2. OPT_NAK_BO_RNG - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Maximum NAK Back-Off Interval |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Minimum NAK Back-Off Interval |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x05
+
+ Maximum NAK Back-Off Interval
+
+ The maximum value of NAK-generation Back-Off Interval in
+ microseconds.
+
+ Minimum NAK Back-Off Interval
+
+ The minimum value of NAK-generation Back-Off Interval in
+ microseconds.
+
+ OPT_NAK_BO_RNG MAY be appended to SPMs.
+
+ OPT_NAK_BO_RNG is network-significant.
+
+15.4.3. OPT_NBR_UNREACH - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+
+
+Speakman, et. al. Experimental [Page 97]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Option Type = 0x0B
+
+ When present in SPMs, it invalidates the windowing information.
+
+ OPT_NBR_UNREACH MAY be appended to SPMs and NCFs.
+
+ OPT_NBR_UNREACH is network-significant.
+
+15.4.4. OPT_PATH_NLA - Packet Extension Format
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |E| Option Type | Option Length |Reserved |F|OPX|U| |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Path NLA |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Option Type = 0x0C
+
+ Path NLA
+
+ The NLA of the interface on the originating PGM network element
+ that it uses to send multicast SPMs to the recipient of the packet
+ containing this option.
+
+ OPT_PATH_NLA MAY be appended to NCFs.
+
+ OPT_PATH_NLA is network-significant.
+
+16. Appendix F - Transmit Window Example
+
+ Nota Bene: The concept of and all references to the increment
+ window (TXW_INC) and the window increment (TXW_ADV_SECS)
+ throughout this document are for illustrative purposes only. They
+ provide the shorthand with which to describe the concept of
+ advancing the transmit window without also implying any particular
+ implementation or policy of advancement.
+
+ The size of the transmit window in seconds is simply TXW_SECS. The
+ size of the transmit window in bytes (TXW_BYTES) is (TXW_MAX_RTE *
+ TXW_SECS). The size of the transmit window in sequence numbers
+ (TXW_SQNS) is (TXW_BYTES / bytes-per-packet).
+
+ The fraction of the transmit window size (in seconds of data) by
+ which the transmit window is advanced (TXW_ADV_SECS) is called the
+ window increment. The trailing (oldest) such fraction of the
+ transmit window itself is called the increment window.
+
+
+
+Speakman, et. al. Experimental [Page 98]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ In terms of sequence numbers, the increment window is the range of
+ sequence numbers that will be the first to be expired from the
+ transmit window. The trailing (or left) edge of the increment window
+ is just TXW_TRAIL, the trailing (or left) edge of the transmit
+ window. The leading (or right) edge of the increment window
+ (TXW_INC) is defined as one less than the sequence number of the
+ first data packet transmitted by the source TXW_ADV_SECS after
+ transmitting TXW_TRAIL.
+
+ A data packet is described as being "in" the transmit or increment
+ window, respectively, if its sequence number is in the range defined
+ by the transmit or increment window, respectively.
+
+ The transmit window is advanced across the increment window by the
+ source when it increments TXW_TRAIL to TXW_INC. When the transmit
+ window is advanced across the increment window, the increment window
+ is emptied (i.e., TXW_TRAIL is momentarily equal to TXW_INC), begins
+ to refill immediately as transmission proceeds, is full again
+ TXW_ADV_SECS later (i.e., TXW_TRAIL is separated from TXW_INC by
+ TXW_ADV_SECS of data), at which point the transmit window is advanced
+ again, and so on.
+
+16.1. Advancing across the Increment Window
+
+ In anticipation of advancing the transmit window, the source starts a
+ timer TXW_ADV_IVL_TMR which runs for time period TXW_ADV_IVL.
+ TXW_ADV_IVL has a value in the range (0, TXW_ADV_SECS). The value
+ MAY be configurable or MAY be determined statically by the strategy
+ used for advancing the transmit window.
+
+ When TXW_ADV_IVL_TMR is running, a source MAY reset TXW_ADV_IVL_TMR
+ if NAKs are received for packets in the increment window. In
+ addition, a source MAY transmit RDATA in the increment window with
+ priority over other data within the transmit window.
+
+ When TXW_ADV_IVL_TMR expires, a source SHOULD advance the trailing
+ edge of the transmit window from TXW_TRAIL to TXW_INC.
+
+ Once the transmit window is advanced across the increment window,
+ SPM_TRAIL, OD_TRAIL and RD_TRAIL are set to the new value of
+ TXW_TRAIL in all subsequent transmitted packets, until the next
+ window advancement.
+
+ PGM does not constrain the strategies that a source may use for
+ advancing the transmit window. The source MAY implement any scheme
+ or number of schemes. Three suggested strategies are outlined here.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 99]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Consider the following example:
+
+ Assuming a constant transmit rate of 128kbps and a constant data
+ packet size of 1500 bytes, if a source maintains the past 30
+ seconds of data for repair and increments its transmit window in 5
+ second increments, then
+
+ TXW_MAX_RTE = 16kBps
+ TXW_ADV_SECS = 5 seconds,
+ TXW_SECS = 35 seconds,
+ TXW_BYTES = 560kB,
+ TXW_SQNS = 383 (rounded up),
+
+ and the size of the increment window in sequence numbers
+ (TXW_MAX_RTE * TXW_ADV_SECS / 1500) = 54 (rounded down).
+
+ Continuing this example, the following is a diagram of the transmit
+ window and the increment window therein in terms of sequence numbers.
+
+
+ TXW_TRAIL TXW_LEAD
+ | |
+ | |
+ |--|--------------- Transmit Window -------------|----|
+ v | | v
+ v v
+ n-1 | n | n+1 | ... | n+53 | n+54 | ... | n+381 | n+382 | n+383
+ ^
+ ^ | ^
+ |--- Increment Window|---|
+ |
+ |
+ TXW_INC
+
+ So the values of the sequence numbers defining these windows are:
+
+ TXW_TRAIL = n
+ TXW_INC = n+53
+ TXW_LEAD = n+382
+
+ Nota Bene: In this example the window sizes in terms of sequence
+ numbers can be determined only because of the assumption of a
+ constant data packet size of 1500 bytes. When the data packet
+ sizes are variable, more or fewer sequence numbers MAY be consumed
+ transmitting the same amount (TXW_BYTES) of data.
+
+ So, for a given transport session identified by a TSI, a source
+ maintains:
+
+
+
+Speakman, et. al. Experimental [Page 100]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ TXW_MAX_RTE a maximum transmit rate in kBytes per second, the
+ cumulative transmit rate of some combination of SPMs,
+ ODATA, and RDATA depending on the transmit window
+ advancement strategy
+
+ TXW_TRAIL the sequence number defining the trailing edge of the
+ transmit window, the sequence number of the oldest
+ data packet available for repair
+
+ TXW_LEAD the sequence number defining the leading edge of the
+ transmit window, the sequence number of the most
+ recently transmitted ODATA packet
+
+ TXW_INC the sequence number defining the leading edge of the
+ increment window, the sequence number of the most
+ recently transmitted data packet amongst those that
+ will expire upon the next increment of the transmit
+ window
+
+ PGM does not constrain the strategies that a source may use for
+ advancing the transmit window. A source MAY implement any scheme or
+ number of schemes. This is possible because a PGM receiver must obey
+ the window provided by the source in its packets. Three strategies
+ are suggested within this document.
+
+ In the first, called "Advance with Time", the transmit window
+ maintains the last TXW_SECS of data in real-time, regardless of
+ whether any data was sent in that real time period or not. The
+ actual number of bytes maintained at any instant in time will vary
+ between 0 and TXW_BYTES, depending on traffic during the last
+ TXW_SECS. In this case, TXW_MAX_RTE is the cumulative transmit rate
+ of SPMs and ODATA.
+
+ In the second, called "Advance with Data", the transmit window
+ maintains the last TXW_BYTES bytes of data for repair. That is, it
+ maintains the theoretical maximum amount of data that could be
+ transmitted in the time period TXW_SECS, regardless of when they were
+ transmitted. In this case, TXW_MAX_RTE is the cumulative transmit
+ rate of SPMs, ODATA, and RDATA.
+
+ The third strategy leaves control of the window in the hands of the
+ application. The API provided by a source implementation for this,
+ could allow the application to control the window in terms of APDUs
+ and to manually step the window. This gives a form of Application
+ Level Framing (ALF). In this case, TXW_MAX_RTE is the cumulative
+ transmit rate of SPMs, ODATA, and RDATA.
+
+
+
+
+
+Speakman, et. al. Experimental [Page 101]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+16.2. Advancing with Data
+
+ In the first strategy, TXW_MAX_RTE is calculated from SPMs and both
+ ODATA and RDATA, and NAKs reset TXW_ADV_IVL_TMR. In this mode of
+ operation the transmit window maintains the last TXW_BYTES bytes of
+ data for repair. That is, it maintains the theoretical maximum
+ amount of data that could be transmitted in the time period TXW_SECS.
+ This means that the following timers are not treated as real-time
+ timers, instead they are "data driven". That is, they expire when
+ the amount of data that could be sent in the time period they define
+ is sent. They are the SPM ambient time interval, TXW_ADV_SECS,
+ TXW_SECS, TXW_ADV_IVL, TXW_ADV_IVL_TMR and the join interval. Note
+ that the SPM heartbeat timers still run in real-time.
+
+ While TXW_ADV_IVL_TMR is running, a source uses the receipt of a NAK
+ for ODATA within the increment window to reset timer TXW_ADV_IVL_TMR
+ to TXW_ADV_IVL so that transmit window advancement is delayed until
+ no NAKs for data in the increment window are seen for TXW_ADV_IVL
+ seconds. If the transmit window should fill in the meantime, further
+ transmissions would be suspended until the transmit window can be
+ advanced.
+
+ A source MUST advance the transmit window across the increment window
+ only upon expiry of TXW_ADV_IVL_TMR.
+
+ This mode of operation is intended for non-real-time, messaging
+ applications based on the receipt of complete data at the expense of
+ delay.
+
+16.3. Advancing with Time
+
+ This strategy advances the transmit window in real-time. In this
+ mode of operation, TXW_MAX_RTE is calculated from SPMs and ODATA only
+ to maintain a constant data throughput rate by consuming extra
+ bandwidth for repairs. TXW_ADV_IVL has the value 0 which advances
+ the transmit window without regard for whether NAKs for data in the
+ increment window are still being received.
+
+ In this mode of operation, all timers are treated as real-time
+ timers.
+
+ This mode of operation is intended for real-time, streaming
+ applications based on the receipt of timely data at the expense of
+ completeness.
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 102]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+16.4. Advancing under explicit application control
+
+ Some applications may wish more explicit control of the transmit
+ window than that provided by the advance with data / time strategies
+ above. An implementation MAY provide this mode of operation and
+ allow an application to explicitly control the window in terms of
+ APDUs.
+
+17. Appendix G - Applicability Statement
+
+ As stated in the introduction, PGM has been designed with a specific
+ class of applications in mind in recognition of the fact that a
+ general solution for reliable multicast has proven elusive. The
+ applicability of PGM is narrowed further, and perhaps more
+ significantly, by the prototypical nature of at least four of the
+ transport elements the protocol incorporates. These are congestion
+ control, router assist, local retransmission, and a programmatic API
+ for reliable multicast protocols of this class. At the same time as
+ standardization efforts address each of these elements individually,
+ this publication is intended to foster experimentation with these
+ elements in general, and to inform that standardization process with
+ results from practise.
+
+ This section briefly describes some of the experimental aspects of
+ PGM and makes non-normative references to some examples of current
+ practise based upon them.
+
+ At least 3 different approaches to congestion control can be explored
+ with PGM: a receiver-feedback based approach, a router-assist based
+ approach, and layer-coding based approach. The first is supported by
+ the negative acknowledgement mechanism in PGM augmented by an
+ application-layer acknowledgement mechanism. The second is supported
+ by the router exception processing mechanism in PGM. The third is
+ supported by the FEC mechanisms in PGM. An example of a receiver-
+ feedback based approach is provided in [16], and a proposal for a
+ router-assist based approach was proposed in [17]. Open issues for
+ the researchers include how do each of these approaches behave in the
+ presence of multiple competing sessions of the same discipline or of
+ different disciplines, TCP most notably; how do each of them behave
+ over a particular range of topologies, and over a particular range of
+ loads; and how do each of them scale as a function of the size of the
+ receiver population.
+
+ Router assist has applications not just to implosion control and
+ retransmit constraint as described in this specification, but also to
+ congestion control as described above, and more generally to any
+ feature which may be enhanced by access to per-network-element state
+ and processing. The full range of these features is as yet
+
+
+
+Speakman, et. al. Experimental [Page 103]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ unexplored, but a general mechanism for providing router assist in a
+ transport-protocol independent way (GRA) is a topic of active
+ research [18]. That effort has been primarily informed by the router
+ assist component of PGM, and implementation and deployment experience
+ with PGM will continue to be fed back into the specification and
+ eventual standardization of GRA. Open questions facing the
+ researchers ([19], [20], [21]) include how router-based state scales
+ relative to the feature benefit obtained, how system-wide factors
+ (such as throughput and retransmit latency) vary relative to the
+ scale and topology of deployed router assistance, and how incremental
+ deployment considerations affect the tractability of router-assist
+ based features. Router assist may have additional implications in
+ the area of congestion control to the extent that it may be applied
+ in multi-group layered coding schemes to increase the granularity and
+ reduce the latency of receiver based congestion control.
+
+ GRA itself explicitly incorporates elements of active networking, and
+ to the extent that the router assist component of PGM is reflected in
+ GRA, experimentation with the narrowly defined network-element
+ functionality of PGM will provide some of the first real world
+ experience with this promising if controversial technology.
+
+ Local retransmission is not a new idea in general in reliable
+ multicast, but the specific approach taken in PGM of locating re-
+ transmitters on the distribution tree for the session, diverting
+ repair requests from network elements to the re-transmitters, and
+ then propagating repairs downward from the repair point on the
+ distribution tree raises interesting questions concerning where to
+ locate re-transmitters in a given topology, and how network elements
+ locate those re-transmitters and evaluate their efficiency relative
+ to other available sources of retransmissions, most notably the
+ source itself. This particular aspect of PGM, while fully specified,
+ has only been implemented on the network element side, and awaits a
+ host-side implementation before questions like these can be
+ addressed.
+
+ PGM presents the opportunity to develop a programming API for
+ reliable multicast applications that reflects both those
+ applications' service requirements as well as the services provided
+ by PGM in support of those applications that may usefully be made
+ visible above the transport interface. At least a couple of host-
+ side implementations of PGM and a concomitant API have been developed
+ for research purposes ([22], [23]), and are available as open source
+ explicitly for the kind of experimentation described in this section.
+
+ Perhaps the broadest experiment that PGM can enable in a community of
+ researchers using a reasonable scale experimental transport protocol
+ is simply in the definition, implementation, and deployment of IP
+
+
+
+Speakman, et. al. Experimental [Page 104]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ multicast applications for which the reliability provided by PGM is a
+ significant enabler. Experience with such applications will not just
+ illuminate the value of reliable multicast, but will also provoke
+ practical examination of and responses to the attendant policy issues
+ (such as peering, billing, access control, firewalls, NATs, etc.),
+ and, if successful, will ultimately encourage more wide spread
+ deployment of IP multicast itself.
+
+18. Abbreviations
+
+ ACK Acknowledgment
+ AFI Address Family Indicator
+ ALF Application Level Framing
+ APDU Application Protocol Data Unit
+ ARQ Automatic Repeat reQuest
+ DLR Designated Local Repairer
+ GSI Globally Unique Source Identifier
+ FEC Forward Error Correction
+ MD5 Message-Digest Algorithm
+ MTU Maximum Transmission Unit
+ NAK Negative Acknowledgment
+ NCF NAK Confirmation
+ NLA Network Layer Address
+ NNAK Null Negative Acknowledgment
+ ODATA Original Data
+ POLL Poll Request
+ POLR Poll Response
+ RDATA Repair Data
+ RSN Receive State Notification
+ SPM Source Path Message
+ SPMR SPM Request
+ TG Transmission Group
+ TGSIZE Transmission Group Size
+ TPDU Transport Protocol Data Unit
+ TSDU Transport Service Data Unit
+ TSI Transport Session Identifier
+ TSN Transmit State Notification
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 105]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+19. Acknowledgements
+
+ The design and specification of PGM has been substantially influenced
+ by reviews and revisions provided by several people who took the time
+ to read and critique this document. These include, in alphabetical
+ order:
+
+ Bob Albrightson
+ Joel Bion
+ Mark Bowles
+ Steve Deering
+ Tugrul Firatli
+ Dan Harkins
+ Dima Khoury
+ Gerard Newman
+ Dave Oran
+ Denny Page
+ Ken Pillay
+ Chetan Rai
+ Yakov Rekhter
+ Dave Rossetti
+ Paul Stirpe
+ Brian Whetten
+ Kyle York
+
+20. References
+
+ [1] B. Whetten, T. Montgomery, S. Kaplan, "A High Performance
+ Totally Ordered Multicast Protocol", in "Theory and Practice in
+ Distributed Systems", Springer Verlag LCNS938, 1994.
+
+ [2] S. Floyd, V. Jacobson, C. Liu, S. McCanne, L. Zhang, "A
+ Reliable Multicast Framework for Light-weight Sessions and
+ Application Level Framing", ACM Transactions on Networking,
+ November 1996.
+
+ [3] J. C. Lin, S. Paul, "RMTP: A Reliable Multicast Transport
+ Protocol", ACM SIGCOMM August 1996.
+
+ [4] Miller, K., Robertson, K., Tweedly, A. and M. White, "Multicast
+ File Transfer Protocol (MFTP) Specification", Work In Progress.
+
+ [5] Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC
+ 1112, August 1989.
+
+ [6] Katz, D., "IP Router Alert Option", RFC 2113, February 1997.
+
+ [7] C. Partridge, "Gigabit Networking", Addison Wesley 1994.
+
+
+
+Speakman, et. al. Experimental [Page 106]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ [8] H. W. Holbrook, S. K. Singhal, D. R. Cheriton, "Log-Based
+ Receiver-Reliable Multicast for Distributed Interactive
+ Simulation", ACM SIGCOMM 1995.
+
+ [9] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, April
+ 1992.
+
+ [10] Reynolds, J. and J. Postel, "Assigned Numbers", STD 2, RFC
+ 1700, October 1994.
+
+ [11] J. Nonnenmacher, E. Biersack, D. Towsley, "Parity-Based Loss
+ Recovery for Reliable Multicast Transmission", ACM SIGCOMM
+ September 1997.
+
+ [12] L. Rizzo, "Effective Erasure Codes for Reliable Computer
+ Communication Protocols", Computer Communication Review, April
+ 1997.
+
+ [13] V. Jacobson, "Congestion Avoidance and Control", ACM SIGCOMM
+ August 1988.
+
+ [14] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP, 14, RFC 2119, March 1997.
+
+ [15] J. Bolot, T. Turletti, I. Wakeman, "Scalable Feedback Control
+ for Multicast Video Distribution in the Internet", Proc.
+ ACM/Sigcomm 94, pp. 58-67.
+
+ [16] L. Rizzo, "pgmcc: A TCP-friendly Single-Rate Multicast
+ Congestion Control Scheme", Proc. of ACM SIGCOMM August 2000.
+
+ [17] M. Luby, L. Vicisano, T. Speakman. "Heterogeneous multicast
+ congestion control based on router packet filtering", RMT
+ working group, June 1999, Pisa, Italy.
+
+ [18] Cain, B., Speakman, T. and D. Towsley, "Generic Router Assist
+ (GRA) Building Block, Motivation and Architecture", Work In
+ Progress.
+
+ [19] C. Papadopoulos, and E. Laliotis,"Incremental Deployment of a
+ Router-assisted Reliable Multicast Scheme,", Proc. of Networked
+ Group Communications (NGC2000), Stanford University, Palo Alto,
+ CA. November 2000.
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 107]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ [20] C. Papadopoulos, "RAIMS: an Architecture for Router-Assisted
+ Internet Multicast Services." Presented at ETH, Zurich,
+ Switzerland, October 23 2000.
+
+ [21] J. Chesterfield, A. Diana, A. Greenhalgh, M. Lad, and M. Lim,
+ "A BSD Router Implementation of PGM",
+ http://www.cs.ucl.ac.uk/external/m.lad/rpgm/
+
+ [22] L. Rizzo, "A PGM Host Implementation for FreeBSD",
+ http://www.iet.unipi.it/~luigi/pgm.html
+
+ [23] M. Psaltaki, R. Araujo, G. Aldabbagh, P. Kouniakis, and A.
+ Giannopoulos, "Pragmatic General Multicast (PGM) host
+ implementation for FreeBSD.",
+ http://www.cs.ucl.ac.uk/research/darpa/pgm/PGM_FINAL.html
+
+21. Authors' Addresses
+
+ Tony Speakman
+ EMail: speakman@cisco.com
+
+ Dino Farinacci
+ Procket Networks
+ 3850 North First Street
+ San Jose, CA 95134
+ USA
+ EMail: dino@procket.com
+
+ Steven Lin
+ Juniper Networks
+ 1194 N. Mathilda Ave.
+ Sunnyvale, CA 94086
+ USA
+ EMail: steven@juniper.net
+
+ Alex Tweedly
+ EMail: agt@cisco.com
+
+ Nidhi Bhaskar
+ EMail: nbhaskar@cisco.com
+
+ Richard Edmonstone
+ EMail: redmonst@cisco.com
+
+ Rajitha Sumanasekera
+ EMail: rajitha@cisco.com
+
+
+
+
+
+Speakman, et. al. Experimental [Page 108]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Lorenzo Vicisano
+ Cisco Systems, Inc.
+ 170 West Tasman Drive,
+ San Jose, CA 95134
+ USA
+ EMail: lorenzo@cisco.com
+
+ Jon Crowcroft
+ Department of Computer Science
+ University College London
+ Gower Street
+ London WC1E 6BT
+ UK
+ EMail: j.crowcroft@cs.ucl.ac.uk
+
+ Jim Gemmell
+ Microsoft Bay Area Research Center
+ 301 Howard Street, #830
+ San Francisco, CA 94105
+ USA
+ EMail: jgemmell@microsoft.com
+
+ Dan Leshchiner
+ Tibco Software
+ 3165 Porter Dr.
+ Palo Alto, CA 94304
+ USA
+ EMail: dleshc@tibco.com
+
+ Michael Luby
+ Digital Fountain, Inc.
+ 39141 Civic Center Drive
+ Fremont CA 94538
+ USA
+ EMail: luby@digitalfountain.com
+
+ Todd L. Montgomery
+ Talarian Corporation
+ 124 Sherman Ave.
+ Morgantown, WV 26501
+ USA
+ EMail: todd@talarian.com
+
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 109]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+ Luigi Rizzo
+ Dip. di Ing. dell'Informazione
+ Universita` di Pisa
+ via Diotisalvi 2
+ 56126 Pisa
+ Italy
+ EMail: luigi@iet.unipi.it
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 110]
+
+RFC 3208 PGM Reliable Transport Protocol December 2001
+
+
+22. Full Copyright Statement
+
+ Copyright (C) The Internet Society (2001). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assigns.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is currently provided by the
+ Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Speakman, et. al. Experimental [Page 111]
+