summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc8834.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc8834.txt')
-rw-r--r--doc/rfc/rfc8834.txt2195
1 files changed, 2195 insertions, 0 deletions
diff --git a/doc/rfc/rfc8834.txt b/doc/rfc/rfc8834.txt
new file mode 100644
index 0000000..4e1ff9f
--- /dev/null
+++ b/doc/rfc/rfc8834.txt
@@ -0,0 +1,2195 @@
+
+
+
+
+Internet Engineering Task Force (IETF) C. Perkins
+Request for Comments: 8834 University of Glasgow
+Category: Standards Track M. Westerlund
+ISSN: 2070-1721 Ericsson
+ J. Ott
+ Technical University Munich
+ January 2021
+
+
+ Media Transport and Use of RTP in WebRTC
+
+Abstract
+
+ The framework for Web Real-Time Communication (WebRTC) provides
+ support for direct interactive rich communication using audio, video,
+ text, collaboration, games, etc. between two peers' web browsers.
+ This memo describes the media transport aspects of the WebRTC
+ framework. It specifies how the Real-time Transport Protocol (RTP)
+ is used in the WebRTC context and gives requirements for which RTP
+ features, profiles, and extensions need to be supported.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ https://www.rfc-editor.org/info/rfc8834.
+
+Copyright Notice
+
+ Copyright (c) 2021 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (https://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction
+ 2. Rationale
+ 3. Terminology
+ 4. WebRTC Use of RTP: Core Protocols
+ 4.1. RTP and RTCP
+ 4.2. Choice of the RTP Profile
+ 4.3. Choice of RTP Payload Formats
+ 4.4. Use of RTP Sessions
+ 4.5. RTP and RTCP Multiplexing
+ 4.6. Reduced Size RTCP
+ 4.7. Symmetric RTP/RTCP
+ 4.8. Choice of RTP Synchronization Source (SSRC)
+ 4.9. Generation of the RTCP Canonical Name (CNAME)
+ 4.10. Handling of Leap Seconds
+ 5. WebRTC Use of RTP: Extensions
+ 5.1. Conferencing Extensions and Topologies
+ 5.1.1. Full Intra Request (FIR)
+ 5.1.2. Picture Loss Indication (PLI)
+ 5.1.3. Slice Loss Indication (SLI)
+ 5.1.4. Reference Picture Selection Indication (RPSI)
+ 5.1.5. Temporal-Spatial Trade-Off Request (TSTR)
+ 5.1.6. Temporary Maximum Media Stream Bit Rate Request (TMMBR)
+ 5.2. Header Extensions
+ 5.2.1. Rapid Synchronization
+ 5.2.2. Client-to-Mixer Audio Level
+ 5.2.3. Mixer-to-Client Audio Level
+ 5.2.4. Media Stream Identification
+ 5.2.5. Coordination of Video Orientation
+ 6. WebRTC Use of RTP: Improving Transport Robustness
+ 6.1. Negative Acknowledgements and RTP Retransmission
+ 6.2. Forward Error Correction (FEC)
+ 7. WebRTC Use of RTP: Rate Control and Media Adaptation
+ 7.1. Boundary Conditions and Circuit Breakers
+ 7.2. Congestion Control Interoperability and Legacy Systems
+ 8. WebRTC Use of RTP: Performance Monitoring
+ 9. WebRTC Use of RTP: Future Extensions
+ 10. Signaling Considerations
+ 11. WebRTC API Considerations
+ 12. RTP Implementation Considerations
+ 12.1. Configuration and Use of RTP Sessions
+ 12.1.1. Use of Multiple Media Sources within an RTP Session
+ 12.1.2. Use of Multiple RTP Sessions
+ 12.1.3. Differentiated Treatment of RTP Streams
+ 12.2. Media Source, RTP Streams, and Participant Identification
+ 12.2.1. Media Source Identification
+ 12.2.2. SSRC Collision Detection
+ 12.2.3. Media Synchronization Context
+ 13. Security Considerations
+ 14. IANA Considerations
+ 15. References
+ 15.1. Normative References
+ 15.2. Informative References
+ Acknowledgements
+ Authors' Addresses
+
+1. Introduction
+
+ The Real-time Transport Protocol (RTP) [RFC3550] provides a framework
+ for delivery of audio and video teleconferencing data and other real-
+ time media applications. Previous work has defined the RTP protocol,
+ along with numerous profiles, payload formats, and other extensions.
+ When combined with appropriate signaling, these form the basis for
+ many teleconferencing systems.
+
+ The Web Real-Time Communication (WebRTC) framework provides the
+ protocol building blocks to support direct, interactive, real-time
+ communication using audio, video, collaboration, games, etc. between
+ two peers' web browsers. This memo describes how the RTP framework
+ is to be used in the WebRTC context. It proposes a baseline set of
+ RTP features that are to be implemented by all WebRTC endpoints,
+ along with suggested extensions for enhanced functionality.
+
+ This memo specifies a protocol intended for use within the WebRTC
+ framework but is not restricted to that context. An overview of the
+ WebRTC framework is given in [RFC8825].
+
+ The structure of this memo is as follows. Section 2 outlines our
+ rationale for preparing this memo and choosing these RTP features.
+ Section 3 defines terminology. Requirements for core RTP protocols
+ are described in Section 4, and suggested RTP extensions are
+ described in Section 5. Section 6 outlines mechanisms that can
+ increase robustness to network problems, while Section 7 describes
+ congestion control and rate adaptation mechanisms. The discussion of
+ mandated RTP mechanisms concludes in Section 8 with a review of
+ performance monitoring and network management tools. Section 9 gives
+ some guidelines for future incorporation of other RTP and RTP Control
+ Protocol (RTCP) extensions into this framework. Section 10 describes
+ requirements placed on the signaling channel. Section 11 discusses
+ the relationship between features of the RTP framework and the WebRTC
+ application programming interface (API), and Section 12 discusses RTP
+ implementation considerations. The memo concludes with security
+ considerations (Section 13) and IANA considerations (Section 14).
+
+2. Rationale
+
+ The RTP framework comprises the RTP data transfer protocol, the RTP
+ control protocol, and numerous RTP payload formats, profiles, and
+ extensions. This range of add-ons has allowed RTP to meet various
+ needs that were not envisaged by the original protocol designers and
+ support many new media encodings, but it raises the question of what
+ extensions are to be supported by new implementations. The
+ development of the WebRTC framework provides an opportunity to review
+ the available RTP features and extensions and define a common
+ baseline RTP feature set for all WebRTC endpoints. This builds on
+ the past 20 years of RTP development to mandate the use of extensions
+ that have shown widespread utility, while still remaining compatible
+ with the wide installed base of RTP implementations where possible.
+
+ RTP and RTCP extensions that are not discussed in this document can
+ be implemented by WebRTC endpoints if they are beneficial for new use
+ cases. However, they are not necessary to address the WebRTC use
+ cases and requirements identified in [RFC7478].
+
+ While the baseline set of RTP features and extensions defined in this
+ memo is targeted at the requirements of the WebRTC framework, it is
+ expected to be broadly useful for other conferencing-related uses of
+ RTP. In particular, it is likely that this set of RTP features and
+ extensions will be appropriate for other desktop or mobile video-
+ conferencing systems, or for room-based high-quality telepresence
+ applications.
+
+3. Terminology
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in BCP
+ 14 [RFC2119] [RFC8174] when, and only when, they appear in all
+ capitals, as shown here. Lower- or mixed-case uses of these key
+ words are not to be interpreted as carrying special significance in
+ this memo.
+
+ We define the following additional terms:
+
+ WebRTC MediaStream: The MediaStream concept defined by the W3C in
+ the WebRTC API [W3C.WD-mediacapture-streams]. A MediaStream
+ consists of zero or more MediaStreamTracks.
+
+ MediaStreamTrack: Part of the MediaStream concept defined by the W3C
+ in the WebRTC API [W3C.WD-mediacapture-streams]. A
+ MediaStreamTrack is an individual stream of media from any type of
+ media source such as a microphone or a camera, but conceptual
+ sources such as an audio mix or a video composition are also
+ possible.
+
+ Transport-layer flow: A unidirectional flow of transport packets
+ that are identified by a particular 5-tuple of source IP address,
+ source port, destination IP address, destination port, and
+ transport protocol.
+
+ Bidirectional transport-layer flow: A bidirectional transport-layer
+ flow is a transport-layer flow that is symmetric. That is, the
+ transport-layer flow in the reverse direction has a 5-tuple where
+ the source and destination address and ports are swapped compared
+ to the forward path transport-layer flow, and the transport
+ protocol is the same.
+
+ This document uses the terminology from [RFC7656] and [RFC8825].
+ Other terms are used according to their definitions from the RTP
+ specification [RFC3550]. In particular, note the following
+ frequently used terms: RTP stream, RTP session, and endpoint.
+
+4. WebRTC Use of RTP: Core Protocols
+
+ The following sections describe the core features of RTP and RTCP
+ that need to be implemented, along with the mandated RTP profiles.
+ Also described are the core extensions providing essential features
+ that all WebRTC endpoints need to implement to function effectively
+ on today's networks.
+
+4.1. RTP and RTCP
+
+ The Real-time Transport Protocol (RTP) [RFC3550] is REQUIRED to be
+ implemented as the media transport protocol for WebRTC. RTP itself
+ comprises two parts: the RTP data transfer protocol and the RTP
+ Control Protocol (RTCP). RTCP is a fundamental and integral part of
+ RTP and MUST be implemented and used in all WebRTC endpoints.
+
+ The following RTP and RTCP features are sometimes omitted in limited-
+ functionality implementations of RTP, but they are REQUIRED in all
+ WebRTC endpoints:
+
+ * Support for use of multiple simultaneous synchronization source
+ (SSRC) values in a single RTP session, including support for RTP
+ endpoints that send many SSRC values simultaneously, following
+ [RFC3550] and [RFC8108]. The RTCP optimizations for multi-SSRC
+ sessions defined in [RFC8861] MAY be supported; if supported, the
+ usage MUST be signaled.
+
+ * Random choice of SSRC on joining a session; collision detection
+ and resolution for SSRC values (see also Section 4.8).
+
+ * Support for reception of RTP data packets containing contributing
+ source (CSRC) lists, as generated by RTP mixers, and RTCP packets
+ relating to CSRCs.
+
+ * Sending correct synchronization information in the RTCP Sender
+ Reports, to allow receivers to implement lip synchronization; see
+ Section 5.2.1 regarding support for the rapid RTP synchronization
+ extensions.
+
+ * Support for multiple synchronization contexts. Participants that
+ send multiple simultaneous RTP packet streams SHOULD do so as part
+ of a single synchronization context, using a single RTCP CNAME for
+ all streams and allowing receivers to play the streams out in a
+ synchronized manner. For compatibility with potential future
+ versions of this specification, or for interoperability with non-
+ WebRTC devices through a gateway, receivers MUST support multiple
+ synchronization contexts, indicated by the use of multiple RTCP
+ CNAMEs in an RTP session. This specification mandates the usage
+ of a single CNAME when sending RTP streams in some circumstances;
+ see Section 4.9.
+
+ * Support for sending and receiving RTCP Sender Report (SR),
+ Receiver Report (RR), Source Description (SDES), and BYE packet
+ types. Note that support for other RTCP packet types is OPTIONAL
+ unless mandated by other parts of this specification. Note that
+ additional RTCP packet types are used by the RTP/SAVPF profile
+ (Section 4.2) and the other RTCP extensions (Section 5). WebRTC
+ endpoints that implement the Session Description Protocol (SDP)
+ bundle negotiation extension will use the SDP Grouping Framework
+ "mid" attribute to identify media streams. Such endpoints MUST
+ implement the RTCP SDES media identification (MID) item described
+ in [RFC8843].
+
+ * Support for multiple endpoints in a single RTP session, and for
+ scaling the RTCP transmission interval according to the number of
+ participants in the session; support for randomized RTCP
+ transmission intervals to avoid synchronization of RTCP reports;
+ support for RTCP timer reconsideration (Section 6.3.6 of
+ [RFC3550]) and reverse reconsideration (Section 6.3.4 of
+ [RFC3550]).
+
+ * Support for configuring the RTCP bandwidth as a fraction of the
+ media bandwidth, and for configuring the fraction of the RTCP
+ bandwidth allocated to senders -- e.g., using the SDP "b=" line
+ [RFC4566] [RFC3556].
+
+ * Support for the reduced minimum RTCP reporting interval described
+ in Section 6.2 of [RFC3550]. When using the reduced minimum RTCP
+ reporting interval, the fixed (nonreduced) minimum interval MUST
+ be used when calculating the participant timeout interval (see
+ Sections 6.2 and 6.3.5 of [RFC3550]). The delay before sending
+ the initial compound RTCP packet can be set to zero (see
+ Section 6.2 of [RFC3550] as updated by [RFC8108]).
+
+ * Support for discontinuous transmission. RTP allows endpoints to
+ pause and resume transmission at any time. When resuming, the RTP
+ sequence number will increase by one, as usual, while the increase
+ in the RTP timestamp value will depend on the duration of the
+ pause. Discontinuous transmission is most commonly used with some
+ audio payload formats, but it is not audio specific and can be
+ used with any RTP payload format.
+
+ * Ignore unknown RTCP packet types and RTP header extensions. This
+ is to ensure robust handling of future extensions, middlebox
+ behaviors, etc., that can result in receiving RTP header
+ extensions or RTCP packet types that were not signaled. If a
+ compound RTCP packet that contains a mixture of known and unknown
+ RTCP packet types is received, the known packet types need to be
+ processed as usual, with only the unknown packet types being
+ discarded.
+
+ It is known that a significant number of legacy RTP implementations,
+ especially those targeted at systems with only Voice over IP (VoIP),
+ do not support all of the above features and in some cases do not
+ support RTCP at all. Implementers are advised to consider the
+ requirements for graceful degradation when interoperating with legacy
+ implementations.
+
+ Other implementation considerations are discussed in Section 12.
+
+4.2. Choice of the RTP Profile
+
+ The complete specification of RTP for a particular application domain
+ requires the choice of an RTP profile. For WebRTC use, the extended
+ secure RTP profile for RTCP-based feedback (RTP/SAVPF) [RFC5124], as
+ extended by [RFC7007], MUST be implemented. The RTP/SAVPF profile is
+ the combination of the basic RTP/AVP profile [RFC3551], the RTP
+ profile for RTCP-based feedback (RTP/AVPF) [RFC4585], and the secure
+ RTP profile (RTP/SAVP) [RFC3711].
+
+ The RTCP-based feedback extensions [RFC4585] are needed for the
+ improved RTCP timer model. This allows more flexible transmission of
+ RTCP packets in response to events, rather than strictly according to
+ bandwidth, and is vital for being able to report congestion signals
+ as well as media events. These extensions also allow saving RTCP
+ bandwidth, and an endpoint will commonly only use the full RTCP
+ bandwidth allocation if there are many events that require feedback.
+ The timer rules are also needed to make use of the RTP conferencing
+ extensions discussed in Section 5.1.
+
+ | Note: The enhanced RTCP timer model defined in the RTP/AVPF
+ | profile is backwards compatible with legacy systems that
+ | implement only the RTP/AVP or RTP/SAVP profile, given some
+ | constraints on parameter configuration such as the RTCP
+ | bandwidth value and "trr-int". The most important factor for
+ | interworking with RTP/(S)AVP endpoints via a gateway is to set
+ | the "trr-int" parameter to a value representing 4 seconds; see
+ | Section 7.1.3 of [RFC8108].
+
+ The secure RTP (SRTP) profile extensions [RFC3711] are needed to
+ provide media encryption, integrity protection, replay protection,
+ and a limited form of source authentication. WebRTC endpoints MUST
+ NOT send packets using the basic RTP/AVP profile or the RTP/AVPF
+ profile; they MUST employ the full RTP/SAVPF profile to protect all
+ RTP and RTCP packets that are generated. In other words,
+ implementations MUST use SRTP and Secure RTCP (SRTCP). The RTP/SAVPF
+ profile MUST be configured using the cipher suites, DTLS-SRTP
+ protection profiles, keying mechanisms, and other parameters
+ described in [RFC8827].
+
+4.3. Choice of RTP Payload Formats
+
+ Mandatory-to-implement audio codecs and RTP payload formats for
+ WebRTC endpoints are defined in [RFC7874]. Mandatory-to-implement
+ video codecs and RTP payload formats for WebRTC endpoints are defined
+ in [RFC7742]. WebRTC endpoints MAY additionally implement any other
+ codec for which an RTP payload format and associated signaling has
+ been defined.
+
+ WebRTC endpoints cannot assume that the other participants in an RTP
+ session understand any RTP payload format, no matter how common. The
+ mapping between RTP payload type numbers and specific configurations
+ of particular RTP payload formats MUST be agreed before those payload
+ types/formats can be used. In an SDP context, this can be done using
+ the "a=rtpmap:" and "a=fmtp:" attributes associated with an "m="
+ line, along with any other SDP attributes needed to configure the RTP
+ payload format.
+
+ Endpoints can signal support for multiple RTP payload formats or
+ multiple configurations of a single RTP payload format, as long as
+ each unique RTP payload format configuration uses a different RTP
+ payload type number. As outlined in Section 4.8, the RTP payload
+ type number is sometimes used to associate an RTP packet stream with
+ a signaling context. This association is possible provided unique
+ RTP payload type numbers are used in each context. For example, an
+ RTP packet stream can be associated with an SDP "m=" line by
+ comparing the RTP payload type numbers used by the RTP packet stream
+ with payload types signaled in the "a=rtpmap:" lines in the media
+ sections of the SDP. This leads to the following considerations:
+
+ If RTP packet streams are being associated with signaling contexts
+ based on the RTP payload type, then the assignment of RTP payload
+ type numbers MUST be unique across signaling contexts.
+
+ If the same RTP payload format configuration is used in multiple
+ contexts, then a different RTP payload type number has to be
+ assigned in each context to ensure uniqueness.
+
+ If the RTP payload type number is not being used to associate RTP
+ packet streams with a signaling context, then the same RTP payload
+ type number can be used to indicate the exact same RTP payload
+ format configuration in multiple contexts.
+
+ A single RTP payload type number MUST NOT be assigned to different
+ RTP payload formats, or different configurations of the same RTP
+ payload format, within a single RTP session (note that the "m=" lines
+ in an SDP BUNDLE group [RFC8843] form a single RTP session).
+
+ An endpoint that has signaled support for multiple RTP payload
+ formats MUST be able to accept data in any of those payload formats
+ at any time, unless it has previously signaled limitations on its
+ decoding capability. This requirement is constrained if several
+ types of media (e.g., audio and video) are sent in the same RTP
+ session. In such a case, a source (SSRC) is restricted to switching
+ only between the RTP payload formats signaled for the type of media
+ that is being sent by that source; see Section 4.4. To support rapid
+ rate adaptation by changing codecs, RTP does not require advance
+ signaling for changes between RTP payload formats used by a single
+ SSRC that were signaled during session setup.
+
+ If performing changes between two RTP payload types that use
+ different RTP clock rates, an RTP sender MUST follow the
+ recommendations in Section 4.1 of [RFC7160]. RTP receivers MUST
+ follow the recommendations in Section 4.3 of [RFC7160] in order to
+ support sources that switch between clock rates in an RTP session.
+ These recommendations for receivers are backwards compatible with the
+ case where senders use only a single clock rate.
+
+4.4. Use of RTP Sessions
+
+ An association amongst a set of endpoints communicating using RTP is
+ known as an RTP session [RFC3550]. An endpoint can be involved in
+ several RTP sessions at the same time. In a multimedia session, each
+ type of media has typically been carried in a separate RTP session
+ (e.g., using one RTP session for the audio and a separate RTP session
+ using a different transport-layer flow for the video). WebRTC
+ endpoints are REQUIRED to implement support for multimedia sessions
+ in this way, separating each RTP session using different transport-
+ layer flows for compatibility with legacy systems (this is sometimes
+ called session multiplexing).
+
+ In modern-day networks, however, with the widespread use of network
+ address/port translators (NAT/NAPT) and firewalls, it is desirable to
+ reduce the number of transport-layer flows used by RTP applications.
+ This can be done by sending all the RTP packet streams in a single
+ RTP session, which will comprise a single transport-layer flow. This
+ will prevent the use of some quality-of-service mechanisms, as
+ discussed in Section 12.1.3. Implementations are therefore also
+ REQUIRED to support transport of all RTP packet streams, independent
+ of media type, in a single RTP session using a single transport-layer
+ flow, according to [RFC8860] (this is sometimes called SSRC
+ multiplexing). If multiple types of media are to be used in a single
+ RTP session, all participants in that RTP session MUST agree to this
+ usage. In an SDP context, the mechanisms described in [RFC8843] can
+ be used to signal such a bundle of RTP packet streams forming a
+ single RTP session.
+
+ Further discussion about the suitability of different RTP session
+ structures and multiplexing methods to different scenarios can be
+ found in [RFC8872].
+
+4.5. RTP and RTCP Multiplexing
+
+ Historically, RTP and RTCP have been run on separate transport-layer
+ flows (e.g., two UDP ports for each RTP session, one for RTP and one
+ for RTCP). With the increased use of Network Address/Port
+ Translation (NAT/NAPT), this has become problematic, since
+ maintaining multiple NAT bindings can be costly. It also complicates
+ firewall administration, since multiple ports need to be opened to
+ allow RTP traffic. To reduce these costs and session setup times,
+ implementations are REQUIRED to support multiplexing RTP data packets
+ and RTCP control packets on a single transport-layer flow [RFC5761].
+ Such RTP and RTCP multiplexing MUST be negotiated in the signaling
+ channel before it is used. If SDP is used for signaling, this
+ negotiation MUST use the mechanism defined in [RFC5761].
+ Implementations can also support sending RTP and RTCP on separate
+ transport-layer flows, but this is OPTIONAL to implement. If an
+ implementation does not support RTP and RTCP sent on separate
+ transport-layer flows, it MUST indicate that using the mechanism
+ defined in [RFC8858].
+
+ Note that the use of RTP and RTCP multiplexed onto a single
+ transport-layer flow ensures that there is occasional traffic sent on
+ that port, even if there is no active media traffic. This can be
+ useful to keep NAT bindings alive [RFC6263].
+
+4.6. Reduced Size RTCP
+
+ RTCP packets are usually sent as compound RTCP packets, and [RFC3550]
+ requires that those compound packets start with an SR or RR packet.
+ When using frequent RTCP feedback messages under the RTP/AVPF profile
+ [RFC4585], these statistics are not needed in every packet, and they
+ unnecessarily increase the mean RTCP packet size. This can limit the
+ frequency at which RTCP packets can be sent within the RTCP bandwidth
+ share.
+
+ To avoid this problem, [RFC5506] specifies how to reduce the mean
+ RTCP message size and allow for more frequent feedback. Frequent
+ feedback, in turn, is essential to make real-time applications
+ quickly aware of changing network conditions and to allow them to
+ adapt their transmission and encoding behavior. Implementations MUST
+ support sending and receiving noncompound RTCP feedback packets
+ [RFC5506]. Use of noncompound RTCP packets MUST be negotiated using
+ the signaling channel. If SDP is used for signaling, this
+ negotiation MUST use the attributes defined in [RFC5506]. For
+ backwards compatibility, implementations are also REQUIRED to support
+ the use of compound RTCP feedback packets if the remote endpoint does
+ not agree to the use of noncompound RTCP in the signaling exchange.
+
+4.7. Symmetric RTP/RTCP
+
+ To ease traversal of NAT and firewall devices, implementations are
+ REQUIRED to implement and use symmetric RTP [RFC4961]. The reason
+ for using symmetric RTP is primarily to avoid issues with NATs and
+ firewalls by ensuring that the send and receive RTP packet streams,
+ as well as RTCP, are actually bidirectional transport-layer flows.
+ This will keep alive the NAT and firewall pinholes and help indicate
+ consent that the receive direction is a transport-layer flow the
+ intended recipient actually wants. In addition, it saves resources,
+ specifically ports at the endpoints, but also in the network, because
+ the NAT mappings or firewall state is not unnecessarily bloated. The
+ amount of per-flow QoS state kept in the network is also reduced.
+
+4.8. Choice of RTP Synchronization Source (SSRC)
+
+ Implementations are REQUIRED to support signaled RTP synchronization
+ source (SSRC) identifiers. If SDP is used, this MUST be done using
+ the "a=ssrc:" SDP attribute defined in Sections 4.1 and 5 of
+ [RFC5576] and the "previous-ssrc" source attribute defined in
+ Section 6.2 of [RFC5576]; other per-SSRC attributes defined in
+ [RFC5576] MAY be supported.
+
+ While support for signaled SSRC identifiers is mandated, their use in
+ an RTP session is OPTIONAL. Implementations MUST be prepared to
+ accept RTP and RTCP packets using SSRCs that have not been explicitly
+ signaled ahead of time. Implementations MUST support random SSRC
+ assignment and MUST support SSRC collision detection and resolution,
+ according to [RFC3550]. When using signaled SSRC values, collision
+ detection MUST be performed as described in Section 5 of [RFC5576].
+
+ It is often desirable to associate an RTP packet stream with a non-
+ RTP context. For users of the WebRTC API, a mapping between SSRCs
+ and MediaStreamTracks is provided per Section 11. For gateways or
+ other usages, it is possible to associate an RTP packet stream with
+ an "m=" line in a session description formatted using SDP. If SSRCs
+ are signaled, this is straightforward (in SDP, the "a=ssrc:" line
+ will be at the media level, allowing a direct association with an
+ "m=" line). If SSRCs are not signaled, the RTP payload type numbers
+ used in an RTP packet stream are often sufficient to associate that
+ packet stream with a signaling context. For example, if RTP payload
+ type numbers are assigned as described in Section 4.3 of this memo,
+ the RTP payload types used by an RTP packet stream can be compared
+ with values in SDP "a=rtpmap:" lines, which are at the media level in
+ SDP and so map to an "m=" line.
+
+4.9. Generation of the RTCP Canonical Name (CNAME)
+
+ The RTCP Canonical Name (CNAME) provides a persistent transport-level
+ identifier for an RTP endpoint. While the SSRC identifier for an RTP
+ endpoint can change if a collision is detected or when the RTP
+ application is restarted, its RTCP CNAME is meant to stay unchanged
+ for the duration of an RTCPeerConnection [W3C.WebRTC], so that RTP
+ endpoints can be uniquely identified and associated with their RTP
+ packet streams within a set of related RTP sessions.
+
+ Each RTP endpoint MUST have at least one RTCP CNAME, and that RTCP
+ CNAME MUST be unique within the RTCPeerConnection. RTCP CNAMEs
+ identify a particular synchronization context -- i.e., all SSRCs
+ associated with a single RTCP CNAME share a common reference clock.
+ If an endpoint has SSRCs that are associated with several
+ unsynchronized reference clocks, and hence different synchronization
+ contexts, it will need to use multiple RTCP CNAMEs, one for each
+ synchronization context.
+
+ Taking the discussion in Section 11 into account, a WebRTC endpoint
+ MUST NOT use more than one RTCP CNAME in the RTP sessions belonging
+ to a single RTCPeerConnection (that is, an RTCPeerConnection forms a
+ synchronization context). RTP middleboxes MAY generate RTP packet
+ streams associated with more than one RTCP CNAME, to allow them to
+ avoid having to resynchronize media from multiple different endpoints
+ that are part of a multiparty RTP session.
+
+ The RTP specification [RFC3550] includes guidelines for choosing a
+ unique RTP CNAME, but these are not sufficient in the presence of NAT
+ devices. In addition, long-term persistent identifiers can be
+ problematic from a privacy viewpoint (Section 13). Accordingly, a
+ WebRTC endpoint MUST generate a new, unique, short-term persistent
+ RTCP CNAME for each RTCPeerConnection, following [RFC7022], with a
+ single exception; if explicitly requested at creation, an
+ RTCPeerConnection MAY use the same CNAME as an existing
+ RTCPeerConnection within their common same-origin context.
+
+ A WebRTC endpoint MUST support reception of any CNAME that matches
+ the syntax limitations specified by the RTP specification [RFC3550]
+ and cannot assume that any CNAME will be chosen according to the form
+ suggested above.
+
+4.10. Handling of Leap Seconds
+
+ The guidelines given in [RFC7164] regarding handling of leap seconds
+ to limit their impact on RTP media play-out and synchronization
+ SHOULD be followed.
+
+5. WebRTC Use of RTP: Extensions
+
+ There are a number of RTP extensions that are either needed to obtain
+ full functionality, or extremely useful to improve on the baseline
+ performance, in the WebRTC context. One set of these extensions is
+ related to conferencing, while others are more generic in nature.
+ The following subsections describe the various RTP extensions
+ mandated or suggested for use within WebRTC.
+
+5.1. Conferencing Extensions and Topologies
+
+ RTP is a protocol that inherently supports group communication.
+ Groups can be implemented by having each endpoint send its RTP packet
+ streams to an RTP middlebox that redistributes the traffic, by using
+ a mesh of unicast RTP packet streams between endpoints, or by using
+ an IP multicast group to distribute the RTP packet streams. These
+ topologies can be implemented in a number of ways as discussed in
+ [RFC7667].
+
+ While the use of IP multicast groups is popular in IPTV systems, the
+ topologies based on RTP middleboxes are dominant in interactive
+ video-conferencing environments. Topologies based on a mesh of
+ unicast transport-layer flows to create a common RTP session have not
+ seen widespread deployment to date. Accordingly, WebRTC endpoints
+ are not expected to support topologies based on IP multicast groups
+ or mesh-based topologies, such as a point-to-multipoint mesh
+ configured as a single RTP session ("Topo-Mesh" in the terminology of
+ [RFC7667]). However, a point-to-multipoint mesh constructed using
+ several RTP sessions, implemented in WebRTC using independent
+ RTCPeerConnections [W3C.WebRTC], can be expected to be used in WebRTC
+ and needs to be supported.
+
+ WebRTC endpoints implemented according to this memo are expected to
+ support all the topologies described in [RFC7667] where the RTP
+ endpoints send and receive unicast RTP packet streams to and from
+ some peer device, provided that peer can participate in performing
+ congestion control on the RTP packet streams. The peer device could
+ be another RTP endpoint, or it could be an RTP middlebox that
+ redistributes the RTP packet streams to other RTP endpoints. This
+ limitation means that some of the RTP middlebox-based topologies are
+ not suitable for use in WebRTC. Specifically:
+
+ * Video-switching Multipoint Control Units (MCUs) (Topo-Video-
+ switch-MCU) SHOULD NOT be used, since they make the use of RTCP
+ for congestion control and quality-of-service reports problematic
+ (see Section 3.8 of [RFC7667]).
+
+ * The Relay-Transport Translator (Topo-PtM-Trn-Translator) topology
+ SHOULD NOT be used, because its safe use requires a congestion
+ control algorithm or RTP circuit breaker that handles point to
+ multipoint, which has not yet been standardized.
+
+ The following topology can be used, however it has some issues worth
+ noting:
+
+ * Content-modifying MCUs with RTCP termination (Topo-RTCP-
+ terminating-MCU) MAY be used. Note that in this RTP topology, RTP
+ loop detection and identification of active senders is the
+ responsibility of the WebRTC application; since the clients are
+ isolated from each other at the RTP layer, RTP cannot assist with
+ these functions (see Section 3.9 of [RFC7667]).
+
+ The RTP extensions described in Sections 5.1.1 to 5.1.6 are designed
+ to be used with centralized conferencing, where an RTP middlebox
+ (e.g., a conference bridge) receives a participant's RTP packet
+ streams and distributes them to the other participants. These
+ extensions are not necessary for interoperability; an RTP endpoint
+ that does not implement these extensions will work correctly but
+ might offer poor performance. Support for the listed extensions will
+ greatly improve the quality of experience; to provide a reasonable
+ baseline quality, some of these extensions are mandatory to be
+ supported by WebRTC endpoints.
+
+ The RTCP conferencing extensions are defined in "Extended RTP Profile
+ for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/
+ AVPF)" [RFC4585] and "Codec Control Messages in the RTP Audio-Visual
+ Profile with Feedback (AVPF)" [RFC5104]; they are fully usable by the
+ secure variant of this profile (RTP/SAVPF) [RFC5124].
+
+5.1.1. Full Intra Request (FIR)
+
+ The Full Intra Request message is defined in Sections 3.5.1 and 4.3.1
+ of Codec Control Messages [RFC5104]. It is used to make the mixer
+ request a new Intra picture from a participant in the session. This
+ is used when switching between sources to ensure that the receivers
+ can decode the video or other predictive media encoding with long
+ prediction chains. WebRTC endpoints that are sending media MUST
+ understand and react to FIR feedback messages they receive, since
+ this greatly improves the user experience when using centralized
+ mixer-based conferencing. Support for sending FIR messages is
+ OPTIONAL.
+
+5.1.2. Picture Loss Indication (PLI)
+
+ The Picture Loss Indication message is defined in Section 6.3.1 of
+ the RTP/AVPF profile [RFC4585]. It is used by a receiver to tell the
+ sending encoder that it lost the decoder context and would like to
+ have it repaired somehow. This is semantically different from the
+ Full Intra Request above, as there could be multiple ways to fulfill
+ the request. WebRTC endpoints that are sending media MUST understand
+ and react to PLI feedback messages as a loss-tolerance mechanism.
+ Receivers MAY send PLI messages.
+
+5.1.3. Slice Loss Indication (SLI)
+
+ The Slice Loss Indication message is defined in Section 6.3.2 of the
+ RTP/AVPF profile [RFC4585]. It is used by a receiver to tell the
+ encoder that it has detected the loss or corruption of one or more
+ consecutive macro blocks and would like to have these repaired
+ somehow. It is RECOMMENDED that receivers generate SLI feedback
+ messages if slices are lost when using a codec that supports the
+ concept of macro blocks. A sender that receives an SLI feedback
+ message SHOULD attempt to repair the lost slice(s).
+
+5.1.4. Reference Picture Selection Indication (RPSI)
+
+ Reference Picture Selection Indication (RPSI) messages are defined in
+ Section 6.3.3 of the RTP/AVPF profile [RFC4585]. Some video-encoding
+ standards allow the use of older reference pictures than the most
+ recent one for predictive coding. If such a codec is in use, and if
+ the encoder has learned that encoder-decoder synchronization has been
+ lost, then a known-as-correct reference picture can be used as a base
+ for future coding. The RPSI message allows this to be signaled.
+ Receivers that detect that encoder-decoder synchronization has been
+ lost SHOULD generate an RPSI feedback message if the codec being used
+ supports reference-picture selection. An RTP packet-stream sender
+ that receives such an RPSI message SHOULD act on that messages to
+ change the reference picture, if it is possible to do so within the
+ available bandwidth constraints and with the codec being used.
+
+5.1.5. Temporal-Spatial Trade-Off Request (TSTR)
+
+ The temporal-spatial trade-off request and notification are defined
+ in Sections 3.5.2 and 4.3.2 of [RFC5104]. This request can be used
+ to ask the video encoder to change the trade-off it makes between
+ temporal and spatial resolution -- for example, to prefer high
+ spatial image quality but low frame rate. Support for TSTR requests
+ and notifications is OPTIONAL.
+
+5.1.6. Temporary Maximum Media Stream Bit Rate Request (TMMBR)
+
+ The Temporary Maximum Media Stream Bit Rate Request (TMMBR) feedback
+ message is defined in Sections 3.5.4 and 4.2.1 of Codec Control
+ Messages [RFC5104]. This request and its corresponding Temporary
+ Maximum Media Stream Bit Rate Notification (TMMBN) message [RFC5104]
+ are used by a media receiver to inform the sending party that there
+ is a current limitation on the amount of bandwidth available to this
+ receiver. There can be various reasons for this: for example, an RTP
+ mixer can use this message to limit the media rate of the sender
+ being forwarded by the mixer (without doing media transcoding) to fit
+ the bottlenecks existing towards the other session participants.
+ WebRTC endpoints that are sending media are REQUIRED to implement
+ support for TMMBR messages and MUST follow bandwidth limitations set
+ by a TMMBR message received for their SSRC. The sending of TMMBR
+ messages is OPTIONAL.
+
+5.2. Header Extensions
+
+ The RTP specification [RFC3550] provides the capability to include
+ RTP header extensions containing in-band data, but the format and
+ semantics of the extensions are poorly specified. The use of header
+ extensions is OPTIONAL in WebRTC, but if they are used, they MUST be
+ formatted and signaled following the general mechanism for RTP header
+ extensions defined in [RFC8285], since this gives well-defined
+ semantics to RTP header extensions.
+
+ As noted in [RFC8285], the requirement from the RTP specification
+ that header extensions are "designed so that the header extension may
+ be ignored" [RFC3550] stands. To be specific, header extensions MUST
+ only be used for data that can safely be ignored by the recipient
+ without affecting interoperability and MUST NOT be used when the
+ presence of the extension has changed the form or nature of the rest
+ of the packet in a way that is not compatible with the way the stream
+ is signaled (e.g., as defined by the payload type). Valid examples
+ of RTP header extensions might include metadata that is additional to
+ the usual RTP information but that can safely be ignored without
+ compromising interoperability.
+
+5.2.1. Rapid Synchronization
+
+ Many RTP sessions require synchronization between audio, video, and
+ other content. This synchronization is performed by receivers, using
+ information contained in RTCP SR packets, as described in the RTP
+ specification [RFC3550]. This basic mechanism can be slow, however,
+ so it is RECOMMENDED that the rapid RTP synchronization extensions
+ described in [RFC6051] be implemented in addition to RTCP SR-based
+ synchronization.
+
+ This header extension uses the generic header extension framework
+ described in [RFC8285] and so needs to be negotiated before it can be
+ used.
+
+5.2.2. Client-to-Mixer Audio Level
+
+ The client-to-mixer audio level extension [RFC6464] is an RTP header
+ extension used by an endpoint to inform a mixer about the level of
+ audio activity in the packet to which the header is attached. This
+ enables an RTP middlebox to make mixing or selection decisions
+ without decoding or detailed inspection of the payload, reducing the
+ complexity in some types of mixers. It can also save decoding
+ resources in receivers, which can choose to decode only the most
+ relevant RTP packet streams based on audio activity levels.
+
+ The client-to-mixer audio level header extension [RFC6464] MUST be
+ implemented. It is REQUIRED that implementations be capable of
+ encrypting the header extension according to [RFC6904], since the
+ information contained in these header extensions can be considered
+ sensitive. The use of this encryption is RECOMMENDED; however, usage
+ of the encryption can be explicitly disabled through API or
+ signaling.
+
+ This header extension uses the generic header extension framework
+ described in [RFC8285] and so needs to be negotiated before it can be
+ used.
+
+5.2.3. Mixer-to-Client Audio Level
+
+ The mixer-to-client audio level header extension [RFC6465] provides
+ an endpoint with the audio level of the different sources mixed into
+ a common source stream by an RTP mixer. This enables a user
+ interface to indicate the relative activity level of each session
+ participant, rather than just being included or not based on the CSRC
+ field. This is a pure optimization of non-critical functions and is
+ hence OPTIONAL to implement. If this header extension is
+ implemented, it is REQUIRED that implementations be capable of
+ encrypting the header extension according to [RFC6904], since the
+ information contained in these header extensions can be considered
+ sensitive. It is further RECOMMENDED that this encryption be used,
+ unless the encryption has been explicitly disabled through API or
+ signaling.
+
+ This header extension uses the generic header extension framework
+ described in [RFC8285] and so needs to be negotiated before it can be
+ used.
+
+5.2.4. Media Stream Identification
+
+ WebRTC endpoints that implement the SDP bundle negotiation extension
+ will use the SDP Grouping Framework "mid" attribute to identify media
+ streams. Such endpoints MUST implement the RTP MID header extension
+ described in [RFC8843].
+
+ This header extension uses the generic header extension framework
+ described in [RFC8285] and so needs to be negotiated before it can be
+ used.
+
+5.2.5. Coordination of Video Orientation
+
+ WebRTC endpoints that send or receive video MUST implement the
+ coordination of video orientation (CVO) RTP header extension as
+ described in Section 4 of [RFC7742].
+
+ This header extension uses the generic header extension framework
+ described in [RFC8285] and so needs to be negotiated before it can be
+ used.
+
+6. WebRTC Use of RTP: Improving Transport Robustness
+
+ There are tools that can make RTP packet streams robust against
+ packet loss and reduce the impact of loss on media quality. However,
+ they generally add some overhead compared to a non-robust stream.
+ The overhead needs to be considered, and the aggregate bitrate MUST
+ be rate controlled to avoid causing network congestion (see
+ Section 7). As a result, improving robustness might require a lower
+ base encoding quality but has the potential to deliver that quality
+ with fewer errors. The mechanisms described in the following
+ subsections can be used to improve tolerance to packet loss.
+
+6.1. Negative Acknowledgements and RTP Retransmission
+
+ As a consequence of supporting the RTP/SAVPF profile, implementations
+ can send negative acknowledgements (NACKs) for RTP data packets
+ [RFC4585]. This feedback can be used to inform a sender of the loss
+ of particular RTP packets, subject to the capacity limitations of the
+ RTCP feedback channel. A sender can use this information to optimize
+ the user experience by adapting the media encoding to compensate for
+ known lost packets.
+
+ RTP packet stream senders are REQUIRED to understand the generic NACK
+ message defined in Section 6.2.1 of [RFC4585], but they MAY choose to
+ ignore some or all of this feedback (following Section 4.2 of
+ [RFC4585]). Receivers MAY send NACKs for missing RTP packets.
+ Guidelines on when to send NACKs are provided in [RFC4585]. It is
+ not expected that a receiver will send a NACK for every lost RTP
+ packet; rather, it needs to consider the cost of sending NACK
+ feedback and the importance of the lost packet to make an informed
+ decision on whether it is worth telling the sender about a packet-
+ loss event.
+
+ The RTP retransmission payload format [RFC4588] offers the ability to
+ retransmit lost packets based on NACK feedback. Retransmission needs
+ to be used with care in interactive real-time applications to ensure
+ that the retransmitted packet arrives in time to be useful, but it
+ can be effective in environments with relatively low network RTT.
+ (An RTP sender can estimate the RTT to the receivers using the
+ information in RTCP SR and RR packets, as described at the end of
+ Section 6.4.1 of [RFC3550]). The use of retransmissions can also
+ increase the forward RTP bandwidth and can potentially cause
+ increased packet loss if the original packet loss was caused by
+ network congestion. Note, however, that retransmission of an
+ important lost packet to repair decoder state can have lower cost
+ than sending a full intra frame. It is not appropriate to blindly
+ retransmit RTP packets in response to a NACK. The importance of lost
+ packets and the likelihood of them arriving in time to be useful need
+ to be considered before RTP retransmission is used.
+
+ Receivers are REQUIRED to implement support for RTP retransmission
+ packets [RFC4588] sent using SSRC multiplexing and MAY also support
+ RTP retransmission packets sent using session multiplexing. Senders
+ MAY send RTP retransmission packets in response to NACKs if support
+ for the RTP retransmission payload format has been negotiated and the
+ sender believes it is useful to send a retransmission of the
+ packet(s) referenced in the NACK. Senders do not need to retransmit
+ every NACKed packet.
+
+6.2. Forward Error Correction (FEC)
+
+ The use of Forward Error Correction (FEC) can provide an effective
+ protection against some degree of packet loss, at the cost of steady
+ bandwidth overhead. There are several FEC schemes that are defined
+ for use with RTP. Some of these schemes are specific to a particular
+ RTP payload format, and others operate across RTP packets and can be
+ used with any payload format. Note that using redundant encoding or
+ FEC will lead to increased play-out delay, which needs to be
+ considered when choosing FEC schemes and their parameters.
+
+ WebRTC endpoints MUST follow the recommendations for FEC use given in
+ [RFC8854]. WebRTC endpoints MAY support other types of FEC, but
+ these MUST be negotiated before they are used.
+
+7. WebRTC Use of RTP: Rate Control and Media Adaptation
+
+ WebRTC will be used in heterogeneous network environments using a
+ variety of link technologies, including both wired and wireless
+ links, to interconnect potentially large groups of users around the
+ world. As a result, the network paths between users can have widely
+ varying one-way delays, available bitrates, load levels, and traffic
+ mixtures. Individual endpoints can send one or more RTP packet
+ streams to each participant, and there can be several participants.
+ Each of these RTP packet streams can contain different types of
+ media, and the type of media, bitrate, and number of RTP packet
+ streams as well as transport-layer flows can be highly asymmetric.
+ Non-RTP traffic can share the network paths with RTP transport-layer
+ flows. Since the network environment is not predictable or stable,
+ WebRTC endpoints MUST ensure that the RTP traffic they generate can
+ adapt to match changes in the available network capacity.
+
+ The quality of experience for users of WebRTC is very dependent on
+ effective adaptation of the media to the limitations of the network.
+ Endpoints have to be designed so they do not transmit significantly
+ more data than the network path can support, except for very short
+ time periods; otherwise, high levels of network packet loss or delay
+ spikes will occur, causing media quality degradation. The limiting
+ factor on the capacity of the network path might be the link
+ bandwidth, or it might be competition with other traffic on the link
+ (this can be non-WebRTC traffic, traffic due to other WebRTC flows,
+ or even competition with other WebRTC flows in the same session).
+
+ An effective media congestion control algorithm is therefore an
+ essential part of the WebRTC framework. However, at the time of this
+ writing, there is no standard congestion control algorithm that can
+ be used for interactive media applications such as WebRTC's flows.
+ Some requirements for congestion control algorithms for
+ RTCPeerConnections are discussed in [RFC8836]. If a standardized
+ congestion control algorithm that satisfies these requirements is
+ developed in the future, this memo will need to be updated to mandate
+ its use.
+
+7.1. Boundary Conditions and Circuit Breakers
+
+ WebRTC endpoints MUST implement the RTP circuit breaker algorithm
+ that is described in [RFC8083]. The RTP circuit breaker is designed
+ to enable applications to recognize and react to situations of
+ extreme network congestion. However, since the RTP circuit breaker
+ might not be triggered until congestion becomes extreme, it cannot be
+ considered a substitute for congestion control, and applications MUST
+ also implement congestion control to allow them to adapt to changes
+ in network capacity. The congestion control algorithm will have to
+ be proprietary until a standardized congestion control algorithm is
+ available. Any future RTP congestion control algorithms are expected
+ to operate within the envelope allowed by the circuit breaker.
+
+ The session-establishment signaling will also necessarily establish
+ boundaries to which the media bitrate will conform. The choice of
+ media codecs provides upper and lower bounds on the supported
+ bitrates that the application can utilize to provide useful quality,
+ and the packetization choices that exist. In addition, the signaling
+ channel can establish maximum media bitrate boundaries using, for
+ example, the SDP "b=AS:" or "b=CT:" lines and the RTP/AVPF TMMBR
+ messages (see Section 5.1.6 of this memo). Signaled bandwidth
+ limitations, such as SDP "b=AS:" or "b=CT:" lines received from the
+ peer, MUST be followed when sending RTP packet streams. A WebRTC
+ endpoint receiving media SHOULD signal its bandwidth limitations.
+ These limitations have to be based on known bandwidth limitations,
+ for example the capacity of the edge links.
+
+7.2. Congestion Control Interoperability and Legacy Systems
+
+ All endpoints that wish to interwork with WebRTC MUST implement RTCP
+ and provide congestion feedback via the defined RTCP reporting
+ mechanisms.
+
+ When interworking with legacy implementations that support RTCP using
+ the RTP/AVP profile [RFC3551], congestion feedback is provided in
+ RTCP RR packets every few seconds. Implementations that have to
+ interwork with such endpoints MUST ensure that they keep within the
+ RTP circuit breaker [RFC8083] constraints to limit the congestion
+ they can cause.
+
+ If a legacy endpoint supports RTP/AVPF, this enables negotiation of
+ important parameters for frequent reporting, such as the "trr-int"
+ parameter, and the possibility that the endpoint supports some useful
+ feedback format for congestion control purposes such as TMMBR
+ [RFC5104]. Implementations that have to interwork with such
+ endpoints MUST ensure that they stay within the RTP circuit breaker
+ [RFC8083] constraints to limit the congestion they can cause, but
+ they might find that they can achieve better congestion response
+ depending on the amount of feedback that is available.
+
+ With proprietary congestion control algorithms, issues can arise when
+ different algorithms and implementations interact in a communication
+ session. If the different implementations have made different
+ choices in regards to the type of adaptation, for example one sender
+ based, and one receiver based, then one could end up in a situation
+ where one direction is dual controlled when the other direction is
+ not controlled. This memo cannot mandate behavior for proprietary
+ congestion control algorithms, but implementations that use such
+ algorithms ought to be aware of this issue and try to ensure that
+ effective congestion control is negotiated for media flowing in both
+ directions. If the IETF were to standardize both sender- and
+ receiver-based congestion control algorithms for WebRTC traffic in
+ the future, the issues of interoperability, control, and ensuring
+ that both directions of media flow are congestion controlled would
+ also need to be considered.
+
+8. WebRTC Use of RTP: Performance Monitoring
+
+ As described in Section 4.1, implementations are REQUIRED to generate
+ RTCP Sender Report (SR) and Receiver Report (RR) packets relating to
+ the RTP packet streams they send and receive. These RTCP reports can
+ be used for performance monitoring purposes, since they include basic
+ packet-loss and jitter statistics.
+
+ A large number of additional performance metrics are supported by the
+ RTCP Extended Reports (XR) framework; see [RFC3611] and [RFC6792].
+ At the time of this writing, it is not clear what extended metrics
+ are suitable for use in WebRTC, so there is no requirement that
+ implementations generate RTCP XR packets. However, implementations
+ that can use detailed performance monitoring data MAY generate RTCP
+ XR packets as appropriate. The use of RTCP XR packets SHOULD be
+ signaled; implementations MUST ignore RTCP XR packets that are
+ unexpected or not understood.
+
+9. WebRTC Use of RTP: Future Extensions
+
+ It is possible that the core set of RTP protocols and RTP extensions
+ specified in this memo will prove insufficient for the future needs
+ of WebRTC. In this case, future updates to this memo have to be made
+ following "Guidelines for Writers of RTP Payload Format
+ Specifications" [RFC2736], "How to Write an RTP Payload Format"
+ [RFC8088], and "Guidelines for Extending the RTP Control Protocol
+ (RTCP)" [RFC5968]. They also SHOULD take into account any future
+ guidelines for extending RTP and related protocols that have been
+ developed.
+
+ Authors of future extensions are urged to consider the wide range of
+ environments in which RTP is used when recommending extensions, since
+ extensions that are applicable in some scenarios can be problematic
+ in others. Where possible, the WebRTC framework will adopt RTP
+ extensions that are of general utility, to enable easy implementation
+ of a gateway to other applications using RTP, rather than adopt
+ mechanisms that are narrowly targeted at specific WebRTC use cases.
+
+10. Signaling Considerations
+
+ RTP is built with the assumption that an external signaling channel
+ exists and can be used to configure RTP sessions and their features.
+ The basic configuration of an RTP session consists of the following
+ parameters:
+
+ RTP profile: The name of the RTP profile to be used in the session.
+ The RTP/AVP [RFC3551] and RTP/AVPF [RFC4585] profiles can
+ interoperate on a basic level, as can their secure variants, RTP/
+ SAVP [RFC3711] and RTP/SAVPF [RFC5124]. The secure variants of
+ the profiles do not directly interoperate with the nonsecure
+ variants, due to the presence of additional header fields for
+ authentication in SRTP packets and cryptographic transformation of
+ the payload. WebRTC requires the use of the RTP/SAVPF profile,
+ and this MUST be signaled. Interworking functions might transform
+ this into the RTP/SAVP profile for a legacy use case by indicating
+ to the WebRTC endpoint that the RTP/SAVPF is used and configuring
+ a "trr-int" value of 4 seconds.
+
+ Transport information: Source and destination IP address(es) and
+ ports for RTP and RTCP MUST be signaled for each RTP session. In
+ WebRTC, these transport addresses will be provided by Interactive
+ Connectivity Establishment (ICE) [RFC8445] that signals candidates
+ and arrives at nominated candidate address pairs. If RTP and RTCP
+ multiplexing [RFC5761] is to be used such that a single port --
+ i.e., transport-layer flow -- is used for RTP and RTCP flows, this
+ MUST be signaled (see Section 4.5).
+
+ RTP payload types, media formats, and format parameters: The mapping
+ between media type names (and hence the RTP payload formats to be
+ used) and the RTP payload type numbers MUST be signaled. Each
+ media type MAY also have a number of media type parameters that
+ MUST also be signaled to configure the codec and RTP payload
+ format (the "a=fmtp:" line from SDP). Section 4.3 of this memo
+ discusses requirements for uniqueness of payload types.
+
+ RTP extensions: The use of any additional RTP header extensions and
+ RTCP packet types, including any necessary parameters, MUST be
+ signaled. This signaling ensures that a WebRTC endpoint's
+ behavior, especially when sending, is predictable and consistent.
+ For robustness and compatibility with non-WebRTC systems that
+ might be connected to a WebRTC session via a gateway,
+ implementations are REQUIRED to ignore unknown RTCP packets and
+ RTP header extensions (see also Section 4.1).
+
+ RTCP bandwidth: Support for exchanging RTCP bandwidth values with
+ the endpoints will be necessary. This SHALL be done as described
+ in "Session Description Protocol (SDP) Bandwidth Modifiers for RTP
+ Control Protocol (RTCP) Bandwidth" [RFC3556] if using SDP, or
+ something semantically equivalent. This also ensures that the
+ endpoints have a common view of the RTCP bandwidth. A common view
+ of the RTCP bandwidth among different endpoints is important to
+ prevent differences in RTCP packet timing and timeout intervals
+ causing interoperability problems.
+
+ These parameters are often expressed in SDP messages conveyed within
+ an offer/answer exchange. RTP does not depend on SDP or the offer/
+ answer model but does require all the necessary parameters to be
+ agreed upon and provided to the RTP implementation. Note that in
+ WebRTC, it will depend on the signaling model and API how these
+ parameters need to be configured, but they will need to either be set
+ in the API or explicitly signaled between the peers.
+
+11. WebRTC API Considerations
+
+ The WebRTC API [W3C.WebRTC] and the Media Capture and Streams API
+ [W3C.WD-mediacapture-streams] define and use the concept of a
+ MediaStream that consists of zero or more MediaStreamTracks. A
+ MediaStreamTrack is an individual stream of media from any type of
+ media source, such as a microphone or a camera, but conceptual
+ sources, like an audio mix or a video composition, are also possible.
+ The MediaStreamTracks within a MediaStream might need to be
+ synchronized during playback.
+
+ A MediaStreamTrack's realization in RTP, in the context of an
+ RTCPeerConnection, consists of a source packet stream, identified by
+ an SSRC, sent within an RTP session that is part of the
+ RTCPeerConnection. The MediaStreamTrack can also result in
+ additional packet streams, and thus SSRCs, in the same RTP session.
+ These can be dependent packet streams from scalable encoding of the
+ source stream associated with the MediaStreamTrack, if such a media
+ encoder is used. They can also be redundancy packet streams; these
+ are created when applying Forward Error Correction (Section 6.2) or
+ RTP retransmission (Section 6.1) to the source packet stream.
+
+ It is important to note that the same media source can be feeding
+ multiple MediaStreamTracks. As different sets of constraints or
+ other parameters can be applied to the MediaStreamTrack, each
+ MediaStreamTrack instance added to an RTCPeerConnection SHALL result
+ in an independent source packet stream with its own set of associated
+ packet streams and thus different SSRC(s). It will depend on applied
+ constraints and parameters if the source stream and the encoding
+ configuration will be identical between different MediaStreamTracks
+ sharing the same media source. If the encoding parameters and
+ constraints are the same, an implementation could choose to use only
+ one encoded stream to create the different RTP packet streams. Note
+ that such optimizations would need to take into account that the
+ constraints for one of the MediaStreamTracks can change at any
+ moment, meaning that the encoding configurations might no longer be
+ identical, and two different encoder instances would then be needed.
+
+ The same MediaStreamTrack can also be included in multiple
+ MediaStreams; thus, multiple sets of MediaStreams can implicitly need
+ to use the same synchronization base. To ensure that this works in
+ all cases and does not force an endpoint to disrupt the media by
+ changing synchronization base and CNAME during delivery of any
+ ongoing packet streams, all MediaStreamTracks and their associated
+ SSRCs originating from the same endpoint need to be sent using the
+ same CNAME within one RTCPeerConnection. This is motivating the use
+ of a single CNAME in Section 4.9.
+
+ | The requirement to use the same CNAME for all SSRCs that
+ | originate from the same endpoint does not require a middlebox
+ | that forwards traffic from multiple endpoints to only use a
+ | single CNAME.
+
+ Different CNAMEs normally need to be used for different
+ RTCPeerConnection instances, as specified in Section 4.9. Having two
+ communication sessions with the same CNAME could enable tracking of a
+ user or device across different services (see Section 4.4.1 of
+ [RFC8826] for details). A web application can request that the
+ CNAMEs used in different RTCPeerConnections (within a same-origin
+ context) be the same; this allows for synchronization of the
+ endpoint's RTP packet streams across the different
+ RTCPeerConnections.
+
+ | Note: This doesn't result in a tracking issue, since the
+ | creation of matching CNAMEs depends on existing tracking within
+ | a single origin.
+
+ The above will currently force a WebRTC endpoint that receives a
+ MediaStreamTrack on one RTCPeerConnection and adds it as outgoing one
+ on any RTCPeerConnection to perform resynchronization of the stream.
+ Since the sending party needs to change the CNAME to the one it uses,
+ this implies it has to use a local system clock as the timebase for
+ the synchronization. Thus, the relative relation between the
+ timebase of the incoming stream and the system sending out needs to
+ be defined. This relation also needs monitoring for clock drift and
+ likely adjustments of the synchronization. The sending entity is
+ also responsible for congestion control for its sent streams. In
+ cases of packet loss, the loss of incoming data also needs to be
+ handled. This leads to the observation that the method that is least
+ likely to cause issues or interruptions in the outgoing source packet
+ stream is a model of full decoding, including repair, followed by
+ encoding of the media again into the outgoing packet stream.
+ Optimizations of this method are clearly possible and implementation
+ specific.
+
+ A WebRTC endpoint MUST support receiving multiple MediaStreamTracks,
+ where each of the different MediaStreamTracks (and its sets of
+ associated packet streams) uses different CNAMEs. However,
+ MediaStreamTracks that are received with different CNAMEs have no
+ defined synchronization.
+
+ | Note: The motivation for supporting reception of multiple
+ | CNAMEs is to allow for forward compatibility with any future
+ | changes that enable more efficient stream handling when
+ | endpoints relay/forward streams. It also ensures that
+ | endpoints can interoperate with certain types of multistream
+ | middleboxes or endpoints that are not WebRTC.
+
+ "JavaScript Session Establishment Protocol (JSEP)" [RFC8829]
+ specifies that the binding between the WebRTC MediaStreams,
+ MediaStreamTracks, and the SSRC is done as specified in "WebRTC
+ MediaStream Identification in the Session Description Protocol"
+ [RFC8830]. Section 4.1 of the MediaStream Identification (MSID)
+ document [RFC8830] also defines how to map source packet streams with
+ unknown SSRCs to MediaStreamTracks and MediaStreams. This later is
+ relevant to handle some cases of legacy interoperability. Commonly,
+ the RTP payload type of any incoming packets will reveal if the
+ packet stream is a source stream or a redundancy or dependent packet
+ stream. The association to the correct source packet stream depends
+ on the payload format in use for the packet stream.
+
+ Finally, this specification puts a requirement on the WebRTC API to
+ realize a method for determining the CSRC list (Section 4.1) as well
+ as the mixer-to-client audio levels (Section 5.2.3) (when supported);
+ the basic requirements for this is further discussed in
+ Section 12.2.1.
+
+12. RTP Implementation Considerations
+
+ The following discussion provides some guidance on the implementation
+ of the RTP features described in this memo. The focus is on a WebRTC
+ endpoint implementation perspective, and while some mention is made
+ of the behavior of middleboxes, that is not the focus of this memo.
+
+12.1. Configuration and Use of RTP Sessions
+
+ A WebRTC endpoint will be a simultaneous participant in one or more
+ RTP sessions. Each RTP session can convey multiple media sources and
+ include media data from multiple endpoints. In the following, some
+ ways in which WebRTC endpoints can configure and use RTP sessions are
+ outlined.
+
+12.1.1. Use of Multiple Media Sources within an RTP Session
+
+ RTP is a group communication protocol, and every RTP session can
+ potentially contain multiple RTP packet streams. There are several
+ reasons why this might be desirable:
+
+ * Multiple media types:
+
+ Outside of WebRTC, it is common to use one RTP session for each
+ type of media source (e.g., one RTP session for audio sources and
+ one for video sources, each sent over different transport-layer
+ flows). However, to reduce the number of UDP ports used, the
+ default in WebRTC is to send all types of media in a single RTP
+ session, as described in Section 4.4, using RTP and RTCP
+ multiplexing (Section 4.5) to further reduce the number of UDP
+ ports needed. This RTP session then uses only one bidirectional
+ transport-layer flow but will contain multiple RTP packet streams,
+ each containing a different type of media. A common example might
+ be an endpoint with a camera and microphone that sends two RTP
+ packet streams, one video and one audio, into a single RTP
+ session.
+
+ * Multiple capture devices:
+
+ A WebRTC endpoint might have multiple cameras, microphones, or
+ other media capture devices, and so it might want to generate
+ several RTP packet streams of the same media type. Alternatively,
+ it might want to send media from a single capture device in
+ several different formats or quality settings at once. Both can
+ result in a single endpoint sending multiple RTP packet streams of
+ the same media type into a single RTP session at the same time.
+
+ * Associated repair data:
+
+ An endpoint might send an RTP packet stream that is somehow
+ associated with another stream. For example, it might send an RTP
+ packet stream that contains FEC or retransmission data relating to
+ another stream. Some RTP payload formats send this sort of
+ associated repair data as part of the source packet stream, while
+ others send it as a separate packet stream.
+
+ * Layered or multiple-description coding:
+
+ Within a single RTP session, an endpoint can use a layered media
+ codec -- for example, H.264 Scalable Video Coding (SVC) -- or a
+ multiple-description codec that generates multiple RTP packet
+ streams, each with a distinct RTP SSRC.
+
+ * RTP mixers, translators, and other middleboxes:
+
+ An RTP session, in the WebRTC context, is a point-to-point
+ association between an endpoint and some other peer device, where
+ those devices share a common SSRC space. The peer device might be
+ another WebRTC endpoint, or it might be an RTP mixer, translator,
+ or some other form of media-processing middlebox. In the latter
+ cases, the middlebox might send mixed or relayed RTP streams from
+ several participants, which the WebRTC endpoint will need to
+ render. Thus, even though a WebRTC endpoint might only be a
+ member of a single RTP session, the peer device might be extending
+ that RTP session to incorporate other endpoints. WebRTC is a
+ group communication environment, and endpoints need to be capable
+ of receiving, decoding, and playing out multiple RTP packet
+ streams at once, even in a single RTP session.
+
+12.1.2. Use of Multiple RTP Sessions
+
+ In addition to sending and receiving multiple RTP packet streams
+ within a single RTP session, a WebRTC endpoint might participate in
+ multiple RTP sessions. There are several reasons why a WebRTC
+ endpoint might choose to do this:
+
+ * To interoperate with legacy devices:
+
+ The common practice in the non-WebRTC world is to send different
+ types of media in separate RTP sessions -- for example, using one
+ RTP session for audio and another RTP session, on a separate
+ transport-layer flow, for video. All WebRTC endpoints need to
+ support the option of sending different types of media on
+ different RTP sessions so they can interwork with such legacy
+ devices. This is discussed further in Section 4.4.
+
+ * To provide enhanced quality of service:
+
+ Some network-based quality-of-service mechanisms operate on the
+ granularity of transport-layer flows. If use of these mechanisms
+ to provide differentiated quality of service for some RTP packet
+ streams is desired, then those RTP packet streams need to be sent
+ in a separate RTP session using a different transport-layer flow,
+ and with appropriate quality-of-service marking. This is
+ discussed further in Section 12.1.3.
+
+ * To separate media with different purposes:
+
+ An endpoint might want to send RTP packet streams that have
+ different purposes on different RTP sessions, to make it easy for
+ the peer device to distinguish them. For example, some
+ centralized multiparty conferencing systems display the active
+ speaker in high resolution but show low-resolution "thumbnails" of
+ other participants. Such systems might configure the endpoints to
+ send simulcast high- and low-resolution versions of their video
+ using separate RTP sessions to simplify the operation of the RTP
+ middlebox. In the WebRTC context, this is currently possible by
+ establishing multiple WebRTC MediaStreamTracks that have the same
+ media source in one (or more) RTCPeerConnection. Each
+ MediaStreamTrack is then configured to deliver a particular media
+ quality and thus media bitrate, and it will produce an
+ independently encoded version with the codec parameters agreed
+ specifically in the context of that RTCPeerConnection. The RTP
+ middlebox can distinguish packets corresponding to the low- and
+ high-resolution streams by inspecting their SSRC, RTP payload
+ type, or some other information contained in RTP payload, RTP
+ header extension, or RTCP packets. However, it can be easier to
+ distinguish the RTP packet streams if they arrive on separate RTP
+ sessions on separate transport-layer flows.
+
+ * To directly connect with multiple peers:
+
+ A multiparty conference does not need to use an RTP middlebox.
+ Rather, a multi-unicast mesh can be created, comprising several
+ distinct RTP sessions, with each participant sending RTP traffic
+ over a separate RTP session (that is, using an independent
+ RTCPeerConnection object) to every other participant, as shown in
+ Figure 1. This topology has the benefit of not requiring an RTP
+ middlebox node that is trusted to access and manipulate the media
+ data. The downside is that it increases the used bandwidth at
+ each sender by requiring one copy of the RTP packet streams for
+ each participant that is part of the same session beyond the
+ sender itself.
+
+ +---+ +---+
+ | A |<--->| B |
+ +---+ +---+
+ ^ ^
+ \ /
+ \ /
+ v v
+ +---+
+ | C |
+ +---+
+
+ Figure 1: Multi-unicast Using Several RTP Sessions
+
+ The multi-unicast topology could also be implemented as a single
+ RTP session, spanning multiple peer-to-peer transport-layer
+ connections, or as several pairwise RTP sessions, one between each
+ pair of peers. To maintain a coherent mapping of the relationship
+ between RTP sessions and RTCPeerConnection objects, it is
+ RECOMMENDED that this be implemented as several individual RTP
+ sessions. The only downside is that endpoint A will not learn of
+ the quality of any transmission happening between B and C, since
+ it will not see RTCP reports for the RTP session between B and C,
+ whereas it would if all three participants were part of a single
+ RTP session. Experience with the Mbone tools (experimental RTP-
+ based multicast conferencing tools from the late 1990s) has shown
+ that RTCP reception quality reports for third parties can be
+ presented to users in a way that helps them understand asymmetric
+ network problems, and the approach of using separate RTP sessions
+ prevents this. However, an advantage of using separate RTP
+ sessions is that it enables using different media bitrates and RTP
+ session configurations between the different peers, thus not
+ forcing B to endure the same quality reductions as C will if there
+ are limitations in the transport from A to C. It is believed that
+ these advantages outweigh the limitations in debugging power.
+
+ * To indirectly connect with multiple peers:
+
+ A common scenario in multiparty conferencing is to create indirect
+ connections to multiple peers, using an RTP mixer, translator, or
+ some other type of RTP middlebox. Figure 2 outlines a simple
+ topology that might be used in a four-person centralized
+ conference. The middlebox acts to optimize the transmission of
+ RTP packet streams from certain perspectives, either by only
+ sending some of the received RTP packet stream to any given
+ receiver, or by providing a combined RTP packet stream out of a
+ set of contributing streams.
+
+ +---+ +-------------+ +---+
+ | A |<---->| |<---->| B |
+ +---+ | RTP mixer, | +---+
+ | translator, |
+ | or other |
+ +---+ | middlebox | +---+
+ | C |<---->| |<---->| D |
+ +---+ +-------------+ +---+
+
+ Figure 2: RTP Mixer with Only Unicast Paths
+
+ There are various methods of implementation for the middlebox. If
+ implemented as a standard RTP mixer or translator, a single RTP
+ session will extend across the middlebox and encompass all the
+ endpoints in one multiparty session. Other types of middleboxes
+ might use separate RTP sessions between each endpoint and the
+ middlebox. A common aspect is that these RTP middleboxes can use
+ a number of tools to control the media encoding provided by a
+ WebRTC endpoint. This includes functions like requesting the
+ breaking of the encoding chain and having the encoder produce a
+ so-called Intra frame. Another common aspect is limiting the
+ bitrate of a stream to better match the mixed output. Other
+ aspects are controlling the most suitable frame rate, picture
+ resolution, and the trade-off between frame rate and spatial
+ quality. The middlebox has the responsibility to correctly
+ perform congestion control, identify sources, and manage
+ synchronization while providing the application with suitable
+ media optimizations. The middlebox also has to be a trusted node
+ when it comes to security, since it manipulates either the RTP
+ header or the media itself (or both) received from one endpoint
+ before sending them on towards the endpoint(s); thus they need to
+ be able to decrypt and then re-encrypt the RTP packet stream
+ before sending it out.
+
+ Mixers are expected to not forward RTCP reports regarding RTP
+ packet streams across themselves. This is due to the difference
+ between the RTP packet streams provided to the different
+ endpoints. The original media source lacks information about a
+ mixer's manipulations prior to being sent to the different
+ receivers. This scenario also results in an endpoint's feedback
+ or requests going to the mixer. When the mixer can't act on this
+ by itself, it is forced to go to the original media source to
+ fulfill the receiver's request. This will not necessarily be
+ explicitly visible to any RTP and RTCP traffic, but the
+ interactions and the time to complete them will indicate such
+ dependencies.
+
+ Providing source authentication in multiparty scenarios is a
+ challenge. In the mixer-based topologies, endpoints source
+ authentication is based on, firstly, verifying that media comes
+ from the mixer by cryptographic verification and, secondly, trust
+ in the mixer to correctly identify any source towards the
+ endpoint. In RTP sessions where multiple endpoints are directly
+ visible to an endpoint, all endpoints will have knowledge about
+ each others' master keys and can thus inject packets claiming to
+ come from another endpoint in the session. Any node performing
+ relay can perform noncryptographic mitigation by preventing
+ forwarding of packets that have SSRC fields that came from other
+ endpoints before. For cryptographic verification of the source,
+ SRTP would require additional security mechanisms -- for example,
+ Timed Efficient Stream Loss-Tolerant Authentication (TESLA) for
+ SRTP [RFC4383] -- that are not part of the base WebRTC standards.
+
+ * To forward media between multiple peers:
+
+ It is sometimes desirable for an endpoint that receives an RTP
+ packet stream to be able to forward that RTP packet stream to a
+ third party. The are some obvious security and privacy
+ implications in supporting this, but also potential uses. This is
+ supported in the W3C API by taking the received and decoded media
+ and using it as a media source that is re-encoded and transmitted
+ as a new stream.
+
+ At the RTP layer, media forwarding acts as a back-to-back RTP
+ receiver and RTP sender. The receiving side terminates the RTP
+ session and decodes the media, while the sender side re-encodes
+ and transmits the media using an entirely separate RTP session.
+ The original sender will only see a single receiver of the media,
+ and will not be able to tell that forwarding is happening based on
+ RTP-layer information, since the RTP session that is used to send
+ the forwarded media is not connected to the RTP session on which
+ the media was received by the node doing the forwarding.
+
+ The endpoint that is performing the forwarding is responsible for
+ producing an RTP packet stream suitable for onwards transmission.
+ The outgoing RTP session that is used to send the forwarded media
+ is entirely separate from the RTP session on which the media was
+ received. This will require media transcoding for congestion
+ control purposes to produce a suitable bitrate for the outgoing
+ RTP session, reducing media quality and forcing the forwarding
+ endpoint to spend the resource on the transcoding. The media
+ transcoding does result in a separation of the two different legs,
+ removing almost all dependencies, and allowing the forwarding
+ endpoint to optimize its media transcoding operation. The cost is
+ greatly increased computational complexity on the forwarding node.
+ Receivers of the forwarded stream will see the forwarding device
+ as the sender of the stream and will not be able to tell from the
+ RTP layer that they are receiving a forwarded stream rather than
+ an entirely new RTP packet stream generated by the forwarding
+ device.
+
+12.1.3. Differentiated Treatment of RTP Streams
+
+ There are use cases for differentiated treatment of RTP packet
+ streams. Such differentiation can happen at several places in the
+ system. First of all is the prioritization within the endpoint
+ sending the media, which controls both which RTP packet streams will
+ be sent and their allocation of bitrate out of the current available
+ aggregate, as determined by the congestion control.
+
+ It is expected that the WebRTC API [W3C.WebRTC] will allow the
+ application to indicate relative priorities for different
+ MediaStreamTracks. These priorities can then be used to influence
+ the local RTP processing, especially when it comes to determining how
+ to divide the available bandwidth between the RTP packet streams for
+ the sake of congestion control. Any changes in relative priority
+ will also need to be considered for RTP packet streams that are
+ associated with the main RTP packet streams, such as redundant
+ streams for RTP retransmission and FEC. The importance of such
+ redundant RTP packet streams is dependent on the media type and codec
+ used, with regard to how robust that codec is against packet loss.
+ However, a default policy might be to use the same priority for a
+ redundant RTP packet stream as for the source RTP packet stream.
+
+ Secondly, the network can prioritize transport-layer flows and
+ subflows, including RTP packet streams. Typically, differential
+ treatment includes two steps, the first being identifying whether an
+ IP packet belongs to a class that has to be treated differently, the
+ second consisting of the actual mechanism for prioritizing packets.
+ Three common methods for classifying IP packets are:
+
+ DiffServ: The endpoint marks a packet with a DiffServ code point to
+ indicate to the network that the packet belongs to a particular
+ class.
+
+ Flow based: Packets that need to be given a particular treatment are
+ identified using a combination of IP and port address.
+
+ Deep packet inspection: A network classifier (DPI) inspects the
+ packet and tries to determine if the packet represents a
+ particular application and type that is to be prioritized.
+
+ Flow-based differentiation will provide the same treatment to all
+ packets within a transport-layer flow, i.e., relative prioritization
+ is not possible. Moreover, if the resources are limited, it might
+ not be possible to provide differential treatment compared to best
+ effort for all the RTP packet streams used in a WebRTC session. The
+ use of flow-based differentiation needs to be coordinated between the
+ WebRTC system and the network(s). The WebRTC endpoint needs to know
+ that flow-based differentiation might be used to provide the
+ separation of the RTP packet streams onto different UDP flows to
+ enable a more granular usage of flow-based differentiation. The used
+ flows, their 5-tuples, and prioritization will need to be
+ communicated to the network so that it can identify the flows
+ correctly to enable prioritization. No specific protocol support for
+ this is specified.
+
+ DiffServ assumes that either the endpoint or a classifier can mark
+ the packets with an appropriate Differentiated Services Code Point
+ (DSCP) so that the packets are treated according to that marking. If
+ the endpoint is to mark the traffic, two requirements arise in the
+ WebRTC context: 1) The WebRTC endpoint has to know which DSCPs to use
+ and know that it can use them on some set of RTP packet streams. 2)
+ The information needs to be propagated to the operating system when
+ transmitting the packet. Details of this process are outside the
+ scope of this memo and are further discussed in "Differentiated
+ Services Code Point (DSCP) Packet Markings for WebRTC QoS" [RFC8837].
+
+ Despite the SRTP media encryption, deep packet inspectors will still
+ be fairly capable of classifying the RTP streams. The reason is that
+ SRTP leaves the first 12 bytes of the RTP header unencrypted. This
+ enables easy RTP stream identification using the SSRC and provides
+ the classifier with useful information that can be correlated to
+ determine, for example, the stream's media type. Using packet sizes,
+ reception times, packet inter-spacing, RTP timestamp increments, and
+ sequence numbers, fairly reliable classifications are achieved.
+
+ For packet-based marking schemes, it might be possible to mark
+ individual RTP packets differently based on the relative priority of
+ the RTP payload. For example, video codecs that have I, P, and B
+ pictures could prioritize any payloads carrying only B frames less,
+ as these are less damaging to lose. However, depending on the QoS
+ mechanism and what markings are applied, this can result in not only
+ different packet-drop probabilities but also packet reordering; see
+ [RFC8837] and [RFC7657] for further discussion. As a default policy,
+ all RTP packets related to an RTP packet stream ought to be provided
+ with the same prioritization; per-packet prioritization is outside
+ the scope of this memo but might be specified elsewhere in future.
+
+ It is also important to consider how RTCP packets associated with a
+ particular RTP packet stream need to be marked. RTCP compound
+ packets with Sender Reports (SRs) ought to be marked with the same
+ priority as the RTP packet stream itself, so the RTCP-based round-
+ trip time (RTT) measurements are done using the same transport-layer
+ flow priority as the RTP packet stream experiences. RTCP compound
+ packets containing an RR packet ought to be sent with the priority
+ used by the majority of the RTP packet streams reported on. RTCP
+ packets containing time-critical feedback packets can use higher
+ priority to improve the timeliness and likelihood of delivery of such
+ feedback.
+
+12.2. Media Source, RTP Streams, and Participant Identification
+
+12.2.1. Media Source Identification
+
+ Each RTP packet stream is identified by a unique synchronization
+ source (SSRC) identifier. The SSRC identifier is carried in each of
+ the RTP packets comprising an RTP packet stream, and is also used to
+ identify that stream in the corresponding RTCP reports. The SSRC is
+ chosen as discussed in Section 4.8. The first stage in
+ demultiplexing RTP and RTCP packets received on a single transport-
+ layer flow at a WebRTC endpoint is to separate the RTP packet streams
+ based on their SSRC value; once that is done, additional
+ demultiplexing steps can determine how and where to render the media.
+
+ RTP allows a mixer, or other RTP-layer middlebox, to combine encoded
+ streams from multiple media sources to form a new encoded stream from
+ a new media source (the mixer). The RTP packets in that new RTP
+ packet stream can include a contributing source (CSRC) list,
+ indicating which original SSRCs contributed to the combined source
+ stream. As described in Section 4.1, implementations need to support
+ reception of RTP data packets containing a CSRC list and RTCP packets
+ that relate to sources present in the CSRC list. The CSRC list can
+ change on a packet-by-packet basis, depending on the mixing operation
+ being performed. Knowledge of what media sources contributed to a
+ particular RTP packet can be important if the user interface
+ indicates which participants are active in the session. Changes in
+ the CSRC list included in packets need to be exposed to the WebRTC
+ application using some API if the application is to be able to track
+ changes in session participation. It is desirable to map CSRC values
+ back into WebRTC MediaStream identities as they cross this API, to
+ avoid exposing the SSRC/CSRC namespace to WebRTC applications.
+
+ If the mixer-to-client audio level extension [RFC6465] is being used
+ in the session (see Section 5.2.3), the information in the CSRC list
+ is augmented by audio-level information for each contributing source.
+ It is desirable to expose this information to the WebRTC application
+ using some API, after mapping the CSRC values to WebRTC MediaStream
+ identities, so it can be exposed in the user interface.
+
+12.2.2. SSRC Collision Detection
+
+ The RTP standard requires RTP implementations to have support for
+ detecting and handling SSRC collisions -- i.e., be able to resolve
+ the conflict when two different endpoints use the same SSRC value
+ (see Section 8.2 of [RFC3550]). This requirement also applies to
+ WebRTC endpoints. There are several scenarios where SSRC collisions
+ can occur:
+
+ * In a point-to-point session where each SSRC is associated with
+ either of the two endpoints and the main media-carrying SSRC
+ identifier will be announced in the signaling channel, a collision
+ is less likely to occur due to the information about used SSRCs.
+ If SDP is used, this information is provided by source-specific
+ SDP attributes [RFC5576]. Still, collisions can occur if both
+ endpoints start using a new SSRC identifier prior to having
+ signaled it to the peer and received acknowledgement on the
+ signaling message. "Source-Specific Media Attributes in the
+ Session Description Protocol (SDP)" [RFC5576] contains a mechanism
+ to signal how the endpoint resolved the SSRC collision.
+
+ * SSRC values that have not been signaled could also appear in an
+ RTP session. This is more likely than it appears, since some RTP
+ functions use extra SSRCs to provide their functionality. For
+ example, retransmission data might be transmitted using a separate
+ RTP packet stream that requires its own SSRC, separate from the
+ SSRC of the source RTP packet stream [RFC4588]. In those cases,
+ an endpoint can create a new SSRC that strictly doesn't need to be
+ announced over the signaling channel to function correctly on both
+ RTP and RTCPeerConnection level.
+
+ * Multiple endpoints in a multiparty conference can create new
+ sources and signal those towards the RTP middlebox. In cases
+ where the SSRC/CSRC are propagated between the different endpoints
+ from the RTP middlebox, collisions can occur.
+
+ * An RTP middlebox could connect an endpoint's RTCPeerConnection to
+ another RTCPeerConnection from the same endpoint, thus forming a
+ loop where the endpoint will receive its own traffic. While it is
+ clearly considered a bug, it is important that the endpoint be
+ able to recognize and handle the case when it occurs. This case
+ becomes even more problematic when media mixers and such are
+ involved, where the stream received is a different stream but
+ still contains this client's input.
+
+ These SSRC/CSRC collisions can only be handled on the RTP level when
+ the same RTP session is extended across multiple RTCPeerConnections
+ by an RTP middlebox. To resolve the more generic case where multiple
+ RTCPeerConnections are interconnected, identification of the media
+ source or sources that are part of a MediaStreamTrack being
+ propagated across multiple interconnected RTCPeerConnection needs to
+ be preserved across these interconnections.
+
+12.2.3. Media Synchronization Context
+
+ When an endpoint sends media from more than one media source, it
+ needs to consider if (and which of) these media sources are to be
+ synchronized. In RTP/RTCP, synchronization is provided by having a
+ set of RTP packet streams be indicated as coming from the same
+ synchronization context and logical endpoint by using the same RTCP
+ CNAME identifier.
+
+ The next provision is that the internal clocks of all media sources
+ -- i.e., what drives the RTP timestamp -- can be correlated to a
+ system clock that is provided in RTCP Sender Reports encoded in an
+ NTP format. By correlating all RTP timestamps to a common system
+ clock for all sources, the timing relation of the different RTP
+ packet streams, also across multiple RTP sessions, can be derived at
+ the receiver and, if desired, the streams can be synchronized. The
+ requirement is for the media sender to provide the correlation
+ information; whether or not the information is used is up to the
+ receiver.
+
+13. Security Considerations
+
+ The overall security architecture for WebRTC is described in
+ [RFC8827], and security considerations for the WebRTC framework are
+ described in [RFC8826]. These considerations also apply to this
+ memo.
+
+ The security considerations of the RTP specification, the RTP/SAVPF
+ profile, and the various RTP/RTCP extensions and RTP payload formats
+ that form the complete protocol suite described in this memo apply.
+ It is believed that there are no new security considerations
+ resulting from the combination of these various protocol extensions.
+
+ "Extended Secure RTP Profile for Real-time Transport Control Protocol
+ (RTCP)-Based Feedback (RTP/SAVPF)" [RFC5124] provides handling of
+ fundamental issues by offering confidentiality, integrity, and
+ partial source authentication. A media-security solution that is
+ mandatory to implement and use is created by combining this secured
+ RTP profile and DTLS-SRTP keying [RFC5764], as defined by Section 5.5
+ of [RFC8827].
+
+ RTCP packets convey a Canonical Name (CNAME) identifier that is used
+ to associate RTP packet streams that need to be synchronized across
+ related RTP sessions. Inappropriate choice of CNAME values can be a
+ privacy concern, since long-term persistent CNAME identifiers can be
+ used to track users across multiple WebRTC calls. Section 4.9 of
+ this memo mandates generation of short-term persistent RTCP CNAMES,
+ as specified in RFC 7022, resulting in untraceable CNAME values that
+ alleviate this risk.
+
+ Some potential denial-of-service attacks exist if the RTCP reporting
+ interval is configured to an inappropriate value. This could be done
+ by configuring the RTCP bandwidth fraction to an excessively large or
+ small value using the SDP "b=RR:" or "b=RS:" lines [RFC3556] or some
+ similar mechanism, or by choosing an excessively large or small value
+ for the RTP/AVPF minimal receiver report interval (if using SDP, this
+ is the "a=rtcp-fb:... trr-int" parameter) [RFC4585]. The risks are
+ as follows:
+
+ 1. the RTCP bandwidth could be configured to make the regular
+ reporting interval so large that effective congestion control
+ cannot be maintained, potentially leading to denial of service
+ due to congestion caused by the media traffic;
+
+ 2. the RTCP interval could be configured to a very small value,
+ causing endpoints to generate high-rate RTCP traffic, potentially
+ leading to denial of service due to the RTCP traffic not being
+ congestion controlled; and
+
+ 3. RTCP parameters could be configured differently for each
+ endpoint, with some of the endpoints using a large reporting
+ interval and some using a smaller interval, leading to denial of
+ service due to premature participant timeouts due to mismatched
+ timeout periods that are based on the reporting interval. This
+ is a particular concern if endpoints use a small but nonzero
+ value for the RTP/AVPF minimal receiver report interval (trr-int)
+ [RFC4585], as discussed in Section 6.1 of [RFC8108].
+
+ Premature participant timeout can be avoided by using the fixed
+ (nonreduced) minimum interval when calculating the participant
+ timeout (see Section 4.1 of this memo and Section 7.1.2 of
+ [RFC8108]). To address the other concerns, endpoints SHOULD ignore
+ parameters that configure the RTCP reporting interval to be
+ significantly longer than the default five-second interval specified
+ in [RFC3550] (unless the media data rate is so low that the longer
+ reporting interval roughly corresponds to 5% of the media data rate),
+ or that configure the RTCP reporting interval small enough that the
+ RTCP bandwidth would exceed the media bandwidth.
+
+ The guidelines in [RFC6562] apply when using variable bitrate (VBR)
+ audio codecs such as Opus (see Section 4.3 for discussion of mandated
+ audio codecs). The guidelines in [RFC6562] also apply, but are of
+ lesser importance, when using the client-to-mixer audio level header
+ extensions (Section 5.2.2) or the mixer-to-client audio level header
+ extensions (Section 5.2.3). The use of the encryption of the header
+ extensions are RECOMMENDED, unless there are known reasons, like RTP
+ middleboxes performing voice-activity-based source selection or
+ third-party monitoring that will greatly benefit from the
+ information, and this has been expressed using API or signaling. If
+ further evidence is produced to show that information leakage is
+ significant from audio-level indications, then use of encryption
+ needs to be mandated at that time.
+
+ In multiparty communication scenarios using RTP middleboxes, a lot of
+ trust is placed on these middleboxes to preserve the session's
+ security. The middlebox needs to maintain confidentiality and
+ integrity and perform source authentication. As discussed in
+ Section 12.1.1, the middlebox can perform checks that prevent any
+ endpoint participating in a conference from impersonating another.
+ Some additional security considerations regarding multiparty
+ topologies can be found in [RFC7667].
+
+14. IANA Considerations
+
+ This document has no IANA actions.
+
+15. References
+
+15.1. Normative References
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <https://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP
+ Payload Format Specifications", BCP 36, RFC 2736,
+ DOI 10.17487/RFC2736, December 1999,
+ <https://www.rfc-editor.org/info/rfc2736>.
+
+ [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
+ Jacobson, "RTP: A Transport Protocol for Real-Time
+ Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
+ July 2003, <https://www.rfc-editor.org/info/rfc3550>.
+
+ [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
+ Video Conferences with Minimal Control", STD 65, RFC 3551,
+ DOI 10.17487/RFC3551, July 2003,
+ <https://www.rfc-editor.org/info/rfc3551>.
+
+ [RFC3556] Casner, S., "Session Description Protocol (SDP) Bandwidth
+ Modifiers for RTP Control Protocol (RTCP) Bandwidth",
+ RFC 3556, DOI 10.17487/RFC3556, July 2003,
+ <https://www.rfc-editor.org/info/rfc3556>.
+
+ [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+ Norrman, "The Secure Real-time Transport Protocol (SRTP)",
+ RFC 3711, DOI 10.17487/RFC3711, March 2004,
+ <https://www.rfc-editor.org/info/rfc3711>.
+
+ [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
+ Description Protocol", RFC 4566, DOI 10.17487/RFC4566,
+ July 2006, <https://www.rfc-editor.org/info/rfc4566>.
+
+ [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
+ "Extended RTP Profile for Real-time Transport Control
+ Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
+ DOI 10.17487/RFC4585, July 2006,
+ <https://www.rfc-editor.org/info/rfc4585>.
+
+ [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
+ Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
+ DOI 10.17487/RFC4588, July 2006,
+ <https://www.rfc-editor.org/info/rfc4588>.
+
+ [RFC4961] Wing, D., "Symmetric RTP / RTP Control Protocol (RTCP)",
+ BCP 131, RFC 4961, DOI 10.17487/RFC4961, July 2007,
+ <https://www.rfc-editor.org/info/rfc4961>.
+
+ [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
+ "Codec Control Messages in the RTP Audio-Visual Profile
+ with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
+ February 2008, <https://www.rfc-editor.org/info/rfc5104>.
+
+ [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for
+ Real-time Transport Control Protocol (RTCP)-Based Feedback
+ (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
+ 2008, <https://www.rfc-editor.org/info/rfc5124>.
+
+ [RFC5506] Johansson, I. and M. Westerlund, "Support for Reduced-Size
+ Real-Time Transport Control Protocol (RTCP): Opportunities
+ and Consequences", RFC 5506, DOI 10.17487/RFC5506, April
+ 2009, <https://www.rfc-editor.org/info/rfc5506>.
+
+ [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and
+ Control Packets on a Single Port", RFC 5761,
+ DOI 10.17487/RFC5761, April 2010,
+ <https://www.rfc-editor.org/info/rfc5761>.
+
+ [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer
+ Security (DTLS) Extension to Establish Keys for the Secure
+ Real-time Transport Protocol (SRTP)", RFC 5764,
+ DOI 10.17487/RFC5764, May 2010,
+ <https://www.rfc-editor.org/info/rfc5764>.
+
+ [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP
+ Flows", RFC 6051, DOI 10.17487/RFC6051, November 2010,
+ <https://www.rfc-editor.org/info/rfc6051>.
+
+ [RFC6464] Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time
+ Transport Protocol (RTP) Header Extension for Client-to-
+ Mixer Audio Level Indication", RFC 6464,
+ DOI 10.17487/RFC6464, December 2011,
+ <https://www.rfc-editor.org/info/rfc6464>.
+
+ [RFC6465] Ivov, E., Ed., Marocco, E., Ed., and J. Lennox, "A Real-
+ time Transport Protocol (RTP) Header Extension for Mixer-
+ to-Client Audio Level Indication", RFC 6465,
+ DOI 10.17487/RFC6465, December 2011,
+ <https://www.rfc-editor.org/info/rfc6465>.
+
+ [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of
+ Variable Bit Rate Audio with Secure RTP", RFC 6562,
+ DOI 10.17487/RFC6562, March 2012,
+ <https://www.rfc-editor.org/info/rfc6562>.
+
+ [RFC6904] Lennox, J., "Encryption of Header Extensions in the Secure
+ Real-time Transport Protocol (SRTP)", RFC 6904,
+ DOI 10.17487/RFC6904, April 2013,
+ <https://www.rfc-editor.org/info/rfc6904>.
+
+ [RFC7007] Terriberry, T., "Update to Remove DVI4 from the
+ Recommended Codecs for the RTP Profile for Audio and Video
+ Conferences with Minimal Control (RTP/AVP)", RFC 7007,
+ DOI 10.17487/RFC7007, August 2013,
+ <https://www.rfc-editor.org/info/rfc7007>.
+
+ [RFC7022] Begen, A., Perkins, C., Wing, D., and E. Rescorla,
+ "Guidelines for Choosing RTP Control Protocol (RTCP)
+ Canonical Names (CNAMEs)", RFC 7022, DOI 10.17487/RFC7022,
+ September 2013, <https://www.rfc-editor.org/info/rfc7022>.
+
+ [RFC7160] Petit-Huguenin, M. and G. Zorn, Ed., "Support for Multiple
+ Clock Rates in an RTP Session", RFC 7160,
+ DOI 10.17487/RFC7160, April 2014,
+ <https://www.rfc-editor.org/info/rfc7160>.
+
+ [RFC7164] Gross, K. and R. Brandenburg, "RTP and Leap Seconds",
+ RFC 7164, DOI 10.17487/RFC7164, March 2014,
+ <https://www.rfc-editor.org/info/rfc7164>.
+
+ [RFC7742] Roach, A.B., "WebRTC Video Processing and Codec
+ Requirements", RFC 7742, DOI 10.17487/RFC7742, March 2016,
+ <https://www.rfc-editor.org/info/rfc7742>.
+
+ [RFC7874] Valin, JM. and C. Bran, "WebRTC Audio Codec and Processing
+ Requirements", RFC 7874, DOI 10.17487/RFC7874, May 2016,
+ <https://www.rfc-editor.org/info/rfc7874>.
+
+ [RFC8083] Perkins, C. and V. Singh, "Multimedia Congestion Control:
+ Circuit Breakers for Unicast RTP Sessions", RFC 8083,
+ DOI 10.17487/RFC8083, March 2017,
+ <https://www.rfc-editor.org/info/rfc8083>.
+
+ [RFC8108] Lennox, J., Westerlund, M., Wu, Q., and C. Perkins,
+ "Sending Multiple RTP Streams in a Single RTP Session",
+ RFC 8108, DOI 10.17487/RFC8108, March 2017,
+ <https://www.rfc-editor.org/info/rfc8108>.
+
+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+ 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+ May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
+ [RFC8285] Singer, D., Desineni, H., and R. Even, Ed., "A General
+ Mechanism for RTP Header Extensions", RFC 8285,
+ DOI 10.17487/RFC8285, October 2017,
+ <https://www.rfc-editor.org/info/rfc8285>.
+
+ [RFC8825] Alvestrand, H., "Overview: Real-Time Protocols for
+ Browser-Based Applications", RFC 8825,
+ DOI 10.17487/RFC8825, January 2021,
+ <https://www.rfc-editor.org/info/rfc8825>.
+
+ [RFC8826] Rescorla, E., "Security Considerations for WebRTC",
+ RFC 8826, DOI 10.17487/RFC8826, January 2021,
+ <https://www.rfc-editor.org/info/rfc8826>.
+
+ [RFC8827] Rescorla, E., "WebRTC Security Architecture", RFC 8827,
+ DOI 10.17487/RFC8827, January 2021,
+ <https://www.rfc-editor.org/info/rfc8827>.
+
+ [RFC8843] Holmberg, C., Alvestrand, H., and C. Jennings,
+ "Negotiating Media Multiplexing Using the Session
+ Description Protocol (SDP)", RFC 8843,
+ DOI 10.17487/RFC8843, January 2021,
+ <https://www.rfc-editor.org/info/rfc8843>.
+
+ [RFC8854] Uberti, J., "WebRTC Forward Error Correction
+ Requirements", RFC 8854, DOI 10.17487/RFC8854, January
+ 2021, <https://www.rfc-editor.org/info/rfc8854>.
+
+ [RFC8858] Holmberg, C., "Indicating Exclusive Support of RTP and RTP
+ Control Protocol (RTCP) Multiplexing Using the Session
+ Description Protocol (SDP)", RFC 8858,
+ DOI 10.17487/RFC8858, January 2021,
+ <https://www.rfc-editor.org/info/rfc8858>.
+
+ [RFC8860] Westerlund, M., Perkins, C., and J. Lennox, "Sending
+ Multiple Types of Media in a Single RTP Session",
+ RFC 8860, DOI 10.17487/RFC8860, January 2021,
+ <https://www.rfc-editor.org/info/rfc8860>.
+
+ [RFC8861] Lennox, J., Westerlund, M., Wu, Q., and C. Perkins,
+ "Sending Multiple RTP Streams in a Single RTP Session:
+ Grouping RTP Control Protocol (RTCP) Reception Statistics
+ and Other Feedback", RFC 8861, DOI 10.17487/RFC8861,
+ January 2021, <https://www.rfc-editor.org/info/rfc8861>.
+
+ [W3C.WD-mediacapture-streams]
+ Jennings, C., Aboba, B., Bruaroey, J-I., and H. Boström,
+ "Media Capture and Streams", W3C Candidate Recommendation,
+ <https://www.w3.org/TR/mediacapture-streams/>.
+
+ [W3C.WebRTC]
+ Jennings, C., Boström, H., and J-I. Bruaroey, "WebRTC 1.0:
+ Real-time Communication Between Browsers", W3C Proposed
+ Recommendation, <https://www.w3.org/TR/webrtc/>.
+
+15.2. Informative References
+
+ [RFC3611] Friedman, T., Ed., Caceres, R., Ed., and A. Clark, Ed.,
+ "RTP Control Protocol Extended Reports (RTCP XR)",
+ RFC 3611, DOI 10.17487/RFC3611, November 2003,
+ <https://www.rfc-editor.org/info/rfc3611>.
+
+ [RFC4383] Baugher, M. and E. Carrara, "The Use of Timed Efficient
+ Stream Loss-Tolerant Authentication (TESLA) in the Secure
+ Real-time Transport Protocol (SRTP)", RFC 4383,
+ DOI 10.17487/RFC4383, February 2006,
+ <https://www.rfc-editor.org/info/rfc4383>.
+
+ [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific
+ Media Attributes in the Session Description Protocol
+ (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009,
+ <https://www.rfc-editor.org/info/rfc5576>.
+
+ [RFC5968] Ott, J. and C. Perkins, "Guidelines for Extending the RTP
+ Control Protocol (RTCP)", RFC 5968, DOI 10.17487/RFC5968,
+ September 2010, <https://www.rfc-editor.org/info/rfc5968>.
+
+ [RFC6263] Marjou, X. and A. Sollaud, "Application Mechanism for
+ Keeping Alive the NAT Mappings Associated with RTP / RTP
+ Control Protocol (RTCP) Flows", RFC 6263,
+ DOI 10.17487/RFC6263, June 2011,
+ <https://www.rfc-editor.org/info/rfc6263>.
+
+ [RFC6792] Wu, Q., Ed., Hunt, G., and P. Arden, "Guidelines for Use
+ of the RTP Monitoring Framework", RFC 6792,
+ DOI 10.17487/RFC6792, November 2012,
+ <https://www.rfc-editor.org/info/rfc6792>.
+
+ [RFC7478] Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real-
+ Time Communication Use Cases and Requirements", RFC 7478,
+ DOI 10.17487/RFC7478, March 2015,
+ <https://www.rfc-editor.org/info/rfc7478>.
+
+ [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
+ B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
+ for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
+ DOI 10.17487/RFC7656, November 2015,
+ <https://www.rfc-editor.org/info/rfc7656>.
+
+ [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services
+ (Diffserv) and Real-Time Communication", RFC 7657,
+ DOI 10.17487/RFC7657, November 2015,
+ <https://www.rfc-editor.org/info/rfc7657>.
+
+ [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
+ DOI 10.17487/RFC7667, November 2015,
+ <https://www.rfc-editor.org/info/rfc7667>.
+
+ [RFC8088] Westerlund, M., "How to Write an RTP Payload Format",
+ RFC 8088, DOI 10.17487/RFC8088, May 2017,
+ <https://www.rfc-editor.org/info/rfc8088>.
+
+ [RFC8445] Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive
+ Connectivity Establishment (ICE): A Protocol for Network
+ Address Translator (NAT) Traversal", RFC 8445,
+ DOI 10.17487/RFC8445, July 2018,
+ <https://www.rfc-editor.org/info/rfc8445>.
+
+ [RFC8829] Uberti, J., Jennings, C., and E. Rescorla, Ed.,
+ "JavaScript Session Establishment Protocol (JSEP)",
+ RFC 8829, DOI 10.17487/RFC8829, January 2021,
+ <https://www.rfc-editor.org/info/rfc8829>.
+
+ [RFC8830] Alvestrand, H., "WebRTC MediaStream Identification in the
+ Session Description Protocol", RFC 8830,
+ DOI 10.17487/RFC8830, January 2021,
+ <https://www.rfc-editor.org/info/rfc8830>.
+
+ [RFC8836] Jesup, R. and Z. Sarker, Ed., "Congestion Control
+ Requirements for Interactive Real-Time Media", RFC 8836,
+ DOI 10.17487/RFC8836, January 2021,
+ <https://www.rfc-editor.org/info/rfc8836>.
+
+ [RFC8837] Jones, P., Dhesikan, S., Jennings, C., and D. Druta,
+ "Differentiated Services Code Point (DSCP) Packet Markings
+ for WebRTC QoS", RFC 8837, DOI 10.17487/RFC8837, January
+ 2021, <https://www.rfc-editor.org/info/rfc8837>.
+
+ [RFC8872] Westerlund, M., Burman, B., Perkins, C., Alvestrand, H.,
+ and R. Even, "Guidelines for Using the Multiplexing
+ Features of RTP to Support Multiple Media Streams",
+ RFC 8872, DOI 10.17487/RFC8872, January 2021,
+ <https://www.rfc-editor.org/info/rfc8872>.
+
+Acknowledgements
+
+ The authors would like to thank Bernard Aboba, Harald Alvestrand,
+ Cary Bran, Ben Campbell, Alissa Cooper, Spencer Dawkins, Charles
+ Eckel, Alex Eleftheriadis, Christian Groves, Chris Inacio, Cullen
+ Jennings, Olle Johansson, Suhas Nandakumar, Dan Romascanu, Jim
+ Spring, Martin Thomson, and the other members of the IETF RTCWEB
+ working group for their valuable feedback.
+
+Authors' Addresses
+
+ Colin Perkins
+ University of Glasgow
+ School of Computing Science
+ Glasgow
+ G12 8QQ
+ United Kingdom
+
+ Email: csp@csperkins.org
+ URI: https://csperkins.org/
+
+
+ Magnus Westerlund
+ Ericsson
+ Torshamnsgatan 23
+ SE-164 80 Kista
+ Sweden
+
+ Email: magnus.westerlund@ericsson.com
+
+
+ Jörg Ott
+ Technical University Munich
+ Department of Informatics
+ Chair of Connected Mobility
+ Boltzmannstrasse 3
+ 85748 Garching
+ Germany
+
+ Email: ott@in.tum.de