diff options
Diffstat (limited to 'doc/rfc/rfc8872.txt')
-rw-r--r-- | doc/rfc/rfc8872.txt | 2046 |
1 files changed, 2046 insertions, 0 deletions
diff --git a/doc/rfc/rfc8872.txt b/doc/rfc/rfc8872.txt new file mode 100644 index 0000000..326b481 --- /dev/null +++ b/doc/rfc/rfc8872.txt @@ -0,0 +1,2046 @@ + + + + +Internet Engineering Task Force (IETF) M. Westerlund +Request for Comments: 8872 B. Burman +Category: Informational Ericsson +ISSN: 2070-1721 C. Perkins + University of Glasgow + H. Alvestrand + Google + R. Even + January 2021 + + + Guidelines for Using the Multiplexing Features of RTP to Support + Multiple Media Streams + +Abstract + + The Real-time Transport Protocol (RTP) is a flexible protocol that + can be used in a wide range of applications, networks, and system + topologies. That flexibility makes for wide applicability but can + complicate the application design process. One particular design + question that has received much attention is how to support multiple + media streams in RTP. This memo discusses the available options and + design trade-offs, and provides guidelines on how to use the + multiplexing features of RTP to support multiple media streams. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Not all documents + approved by the IESG are candidates for any level of Internet + Standard; see Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc8872. + +Copyright Notice + + Copyright (c) 2021 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction + 2. Definitions + 2.1. Terminology + 2.2. Focus of This Document + 3. RTP Multiplexing Overview + 3.1. Reasons for Multiplexing and Grouping RTP Streams + 3.2. RTP Multiplexing Points + 3.2.1. RTP Session + 3.2.2. Synchronization Source (SSRC) + 3.2.3. Contributing Source (CSRC) + 3.2.4. RTP Payload Type + 3.3. Issues Related to RTP Topologies + 3.4. Issues Related to RTP and RTCP + 3.4.1. The RTP Specification + 3.4.2. Multiple SSRCs in a Session + 3.4.3. Binding Related Sources + 3.4.4. Forward Error Correction + 4. Considerations for RTP Multiplexing + 4.1. Interworking Considerations + 4.1.1. Application Interworking + 4.1.2. RTP Translator Interworking + 4.1.3. Gateway Interworking + 4.1.4. Legacy Considerations for Multiple SSRCs + 4.2. Network Considerations + 4.2.1. Quality of Service + 4.2.2. NAT and Firewall Traversal + 4.2.3. Multicast + 4.3. Security and Key-Management Considerations + 4.3.1. Security Context Scope + 4.3.2. Key Management for Multi-party Sessions + 4.3.3. Complexity Implications + 5. RTP Multiplexing Design Choices + 5.1. Multiple Media Types in One Session + 5.2. Multiple SSRCs of the Same Media Type + 5.3. Multiple Sessions for One Media Type + 5.4. Single SSRC per Endpoint + 5.5. Summary + 6. Guidelines + 7. IANA Considerations + 8. Security Considerations + 9. References + 9.1. Normative References + 9.2. Informative References + Appendix A. Dismissing Payload Type Multiplexing + Appendix B. Signaling Considerations + B.1. Session-Oriented Properties + B.2. SDP Prevents Multiple Media Types + B.3. Signaling RTP Stream Usage + Acknowledgments + Contributors + Authors' Addresses + +1. Introduction + + The Real-time Transport Protocol (RTP) [RFC3550] is a commonly used + protocol for real-time media transport. It is a protocol that + provides great flexibility and can support a large set of different + applications. From the beginning, RTP was designed for multiple + participants in a communication session. It supports many topology + paradigms and usages, as defined in [RFC7667]. RTP has several + multiplexing points designed for different purposes; these points + enable support of multiple RTP streams and switching between + different encoding or packetization techniques for the media. By + using multiple RTP sessions, sets of RTP streams can be structured + for efficient processing or identification. Thus, to meet an + application's needs, an RTP application designer needs to understand + how best to use the RTP session, the RTP stream identifier + (synchronization source (SSRC)), and the RTP payload type. + + There has been increased interest in more-advanced usage of RTP. For + example, multiple RTP streams can be used when a single endpoint has + multiple media sources (like multiple cameras or microphones) from + which streams of media need to be sent simultaneously. Consequently, + questions are raised regarding the most appropriate RTP usage. The + limitations in some implementations, RTP/RTCP extensions, and + signaling have also been exposed. This document aims to clarify the + usefulness of some functionalities in RTP that, hopefully, will + result in future implementations that are more complete. + + The purpose of this document is to provide clear information about + the possibilities of RTP when it comes to multiplexing. The RTP + application designer needs to understand the implications arising + from a particular usage of the RTP multiplexing points. This + document provides some guidelines and recommends against some usages + as being unsuitable, in general or for particular purposes. + + This document starts with some definitions and then goes into + existing RTP functionalities around multiplexing. Both the desired + behavior and the implications of a particular behavior depend on + which topologies are used; therefore, this topic requires some + consideration. We then discuss some choices regarding multiplexing + behavior and the impacts of those choices. Some designs of RTP usage + are also discussed. Finally, some guidelines and examples are + provided. + +2. Definitions + +2.1. Terminology + + The definitions in Section 3 of [RFC3550] are referenced normatively. + + The taxonomy defined in [RFC7656] is referenced normatively. + + The following terms and abbreviations are used in this document: + + Multi-party: + Communication that includes multiple endpoints. In this document, + "multi-party" will be used to refer to scenarios where more than + two endpoints communicate. + + Multiplexing: + An operation that takes multiple entities as input, aggregating + them onto some common resource while keeping the individual + entities addressable such that they can later be fully and + unambiguously separated (demultiplexed) again. + + RTP Receiver: + An endpoint or middlebox receiving RTP streams and RTCP messages. + It uses at least one SSRC to send RTCP messages. An RTP receiver + may also be an RTP sender. + + RTP Sender: + An endpoint sending one or more RTP streams but also sending RTCP + messages. + + RTP Session Group: + One or more RTP sessions that are used together to perform some + function. Examples include multiple RTP sessions used to carry + different layers of a layered encoding. In an RTP Session Group, + CNAMEs are assumed to be valid across all RTP sessions and + designate synchronization contexts that can cross RTP sessions; + i.e., SSRCs that map to a common CNAME can be assumed to have RTCP + Sender Report (SR) timing information derived from a common clock + such that they can be synchronized for playout. + + Signaling: + The process of configuring endpoints to participate in one or more + RTP sessions. + + | Note: The above definitions of "RTP receiver" and "RTP sender" + | are consistent with the usage in [RFC3550]. + +2.2. Focus of This Document + + This document is focused on issues that affect RTP. Thus, issues + that involve signaling protocols -- such as whether SIP [RFC3261], + Jingle [JINGLE], or some other protocol is in use for session + configuration; the particular syntaxes used to define RTP session + properties; or the constraints imposed by particular choices in the + signaling protocols -- are mentioned only as examples in order to + describe the RTP issues more precisely. + + This document assumes that the applications will use RTCP. While + there are applications that don't send RTCP, they do not conform to + the RTP specification and thus can be regarded as reusing the RTP + packet format but not implementing RTP. + +3. RTP Multiplexing Overview + +3.1. Reasons for Multiplexing and Grouping RTP Streams + + There are several reasons why an endpoint might choose to send + multiple media streams. In the discussion below, please keep in mind + that the reasons for having multiple RTP streams vary and include, + but are not limited to, the following: + + * There might be multiple media sources. + + * Multiple RTP streams might be needed to represent one media + source, for example: + + - To carry different layers of a scalable encoding of a media + source + + - Alternative encodings during simulcast, using different codecs + for the same audio stream + + - Alternative formats during simulcast, multiple resolutions of + the same video stream + + * A retransmission stream might repeat some parts of the content of + another RTP stream. + + * A Forward Error Correction (FEC) stream might provide material + that can be used to repair another RTP stream. + + For each of these reasons, it is necessary to decide whether each + additional RTP stream is sent within the same RTP session as the + other RTP streams or it is necessary to use additional RTP sessions + to group the RTP streams. For a combination of reasons, the suitable + choice for one situation might not be the suitable choice for another + situation. The choice is easiest when multiplexing multiple media + sources of the same media type. However, all reasons warrant + discussion and clarification regarding how to deal with them. As the + discussion below will show, a single solution does not suit all + purposes. To utilize RTP well and as efficiently as possible, both + are needed. The real issue is knowing when to create multiple RTP + sessions versus when to send multiple RTP streams in a single RTP + session. + +3.2. RTP Multiplexing Points + + This section describes the multiplexing points present in RTP that + can be used to distinguish RTP streams and groups of RTP streams. + Figure 1 outlines the process of demultiplexing incoming RTP streams, + starting with one or more sockets representing the reception of one + or more transport flows, e.g., based on the UDP destination port. It + also demultiplexes RTP/RTCP from any other protocols, such as Session + Traversal Utilities for NAT (STUN) [RFC5389] and DTLS-SRTP [RFC5764] + on the same transport as described in [RFC7983]. The Processing and + Buffering (PB) step in Figure 1 terminates RTP/RTCP and prepares the + RTP payload for input to the decoder. + + | | | + | | | packets + +-- v v v + | +------------+ + | | Socket(s) | Transport Protocol Demultiplexing + | +------------+ + | || || + RTP | RTP/ || |+-----> DTLS (SRTP keying, SCTP, etc.) + Session | RTCP || +------> STUN (multiplexed using same port) + +-- || + +-- || + | ++(split by SSRC)-++---> Identify SSRC collision + | || || || || + | (associate with signaling by MID/RID) + | vv vv vv vv + RTP | +--+ +--+ +--+ +--+ Jitter buffer, + Streams | |PB| |PB| |PB| |PB| process RTCP, etc. + | +--+ +--+ +--+ +--+ + +-- | | | | + (select decoder based on payload type (PT)) + +-- | / | / + | +-----+ | / + | / | |/ + Payload | v v v + Formats | +---+ +---+ +---+ + | |Dec| |Dec| |Dec| Decoders + | +---+ +---+ +---+ + +-- + + Figure 1: RTP Demultiplexing Process + +3.2.1. RTP Session + + An RTP session is the highest semantic layer in RTP and represents an + association between a group of communicating endpoints. RTP does not + contain a session identifier, yet different RTP sessions must be + possible to identify both across a set of different endpoints and + from the perspective of a single endpoint. + + For RTP session separation across endpoints, the set of participants + that form an RTP session is defined as those that share a single SSRC + space [RFC3550]. That is, if a group of participants are each aware + of the SSRC identifiers belonging to the other participants, then + those participants are in a single RTP session. A participant can + become aware of an SSRC identifier by receiving an RTP packet + containing the identifier in the SSRC field or contributing source + (CSRC) list, by receiving an RTCP packet listing it in an SSRC field, + or through signaling (e.g., the Session Description Protocol (SDP) + [RFC4566] "a=ssrc:" attribute [RFC5576]). Thus, the scope of an RTP + session is determined by the participants' network interconnection + topology, in combination with RTP and RTCP forwarding strategies + deployed by the endpoints and any middleboxes, and by the signaling. + + For RTP session separation within a single endpoint, RTP relies on + the underlying transport layer and the signaling to identify RTP + sessions in a manner that is meaningful to the application. A single + endpoint can have one or more transport flows for the same RTP + session, and a single RTP session can span multiple transport-layer + flows even if all endpoints use a single transport-layer flow per + endpoint for that RTP session. The signaling layer might give RTP + sessions an explicit identifier, or the identification might be + implicit based on the addresses and ports used. Accordingly, a + single RTP session can have multiple associated identifiers, explicit + and implicit, belonging to different contexts. For example, when + running RTP on top of UDP/IP, an endpoint can identify and delimit an + RTP session from other RTP sessions by their UDP source and + destination IP addresses and their UDP port numbers. A single RTP + session can be using multiple IP/UDP flows for receiving and/or + sending RTP packets to other endpoints or middleboxes, even if the + endpoint does not have multiple IP addresses. Using multiple IP + addresses only makes it more likely that multiple IP/UDP flows will + be required. Another example is SDP media descriptions (the "m=" + line and the subsequent associated lines) that signal the transport + flow and RTP session configuration for the endpoint's part of the RTP + session. The SDP grouping framework [RFC5888] allows labeling of the + media descriptions to be used so that RTP Session Groups can be + created. Through the use of "Negotiating Media Multiplexing Using + the Session Description Protocol (SDP)" [RFC8843], multiple media + descriptions become part of a common RTP session where each media + description represents the RTP streams sent or received for a media + source. + + RTP makes no normative statements about the relationship between + different RTP sessions; however, applications that use more than one + RTP session need to understand how the different RTP sessions that + they create relate to one another. + +3.2.2. Synchronization Source (SSRC) + + An SSRC identifies a source of an RTP stream, or an RTP receiver when + sending RTCP. Every endpoint has at least one SSRC identifier, even + if it does not send RTP packets. RTP endpoints that are only RTP + receivers still send RTCP and use their SSRC identifiers in the RTCP + packets they send. An endpoint can have multiple SSRC identifiers if + it sends multiple RTP streams. Endpoints that function as both RTP + sender and RTP receiver use the same SSRC(s) in both roles. + + The SSRC is a 32-bit identifier. It is present in every RTP and RTCP + packet header and in the payload of some RTCP packet types. It can + also be present in SDP signaling. Unless presignaled, e.g., using + the SDP "a=ssrc:" attribute [RFC5576], the SSRC is chosen at random. + It is not dependent on the network address of the endpoint and is + intended to be unique within an RTP session. SSRC collisions can + occur and are handled as specified in [RFC3550] and [RFC5576], + resulting in the SSRC of the colliding RTP streams or receivers + changing. An endpoint that changes its network transport address + during a session has to choose a new SSRC identifier to avoid being + interpreted as a looped source, unless a mechanism providing a + virtual transport (such as Interactive Connectivity Establishment + (ICE) [RFC8445]) abstracts the changes. + + SSRC identifiers that belong to the same synchronization context + (i.e., that represent RTP streams that can be synchronized using + information in RTCP SR packets) use identical CNAME chunks in + corresponding RTCP source description (SDES) packets. SDP signaling + can also be used to provide explicit SSRC grouping [RFC5576]. + + In some cases, the same SSRC identifier value is used to relate + streams in two different RTP sessions, such as in RTP retransmission + [RFC4588]. This is to be avoided, since there is no guarantee that + SSRC values are unique across RTP sessions. In the case of RTP + retransmission [RFC4588], it is recommended to use explicit binding + of the source RTP stream and the redundancy stream, e.g., using the + RepairedRtpStreamId RTCP SDES item [RFC8852]. The + RepairedRtpStreamId is a rather recent mechanism, so one cannot + expect older applications to follow this recommendation. + + Note that the RTP sequence number and RTP timestamp are scoped by the + SSRC and are thus specific per RTP stream. + + Different types of entities use an SSRC to identify themselves, as + follows: + + * A real media source uses the SSRC to identify a "physical" media + source. + + * A conceptual media source uses the SSRC to identify the result of + applying some filtering function in a network node -- for example, + a filtering function in an RTP mixer that provides the most active + speaker based on some criteria, or a mix representing a set of + other sources. + + * An RTP receiver uses the SSRC to identify itself as the source of + its RTCP reports. + + An endpoint that generates more than one media type, e.g., a + conference participant sending both audio and video, need not (and, + indeed, should not) use the same SSRC value across RTP sessions. + Using RTCP compound packets containing the CNAME SDES item is the + designated method for binding an SSRC to a CNAME, effectively cross- + correlating SSRCs within and between RTP sessions as coming from the + same endpoint. The main property attributed to SSRCs associated with + the same CNAME is that they are from a particular synchronization + context and can be synchronized at playback. + + An RTP receiver receiving a previously unseen SSRC value will + interpret it as a new source. It might in fact be a previously + existing source that had to change its SSRC number due to an SSRC + conflict. Using the media identification (MID) extension [RFC8843] + helps to identify which media source the new SSRC represents, and + using the restriction identifier (RID) extension [RFC8851] helps to + identify what encoding or redundancy stream it represents, even + though the SSRC changed. However, the originator of the previous + SSRC ought to have ended the conflicting source by sending an RTCP + BYE for it prior to starting to send with the new SSRC, making the + new SSRC a new source. + +3.2.3. Contributing Source (CSRC) + + The CSRC is not a separate identifier. Rather, an SSRC identifier is + listed as a CSRC in the RTP header of a packet generated by an RTP + mixer or video Multipoint Control Unit (MCU) / switch, if the + corresponding SSRC was in the header of one of the packets that + contributed to the output. + + It is not possible, in general, to extract media represented by an + individual CSRC, since it is typically the result of a media merge + (e.g., mix) operation on the individual media streams corresponding + to the CSRC identifiers. The exception is the case where only a + single CSRC is indicated, as this represents the forwarding of an RTP + stream that might have been modified. The RTP header extension ("A + Real-time Transport Protocol (RTP) Header Extension for + Mixer-to-Client Audio Level Indication" [RFC6465]) expands on the + receiver's information about a packet with a CSRC list. Due to these + restrictions, a CSRC will not be considered a fully qualified + multiplexing point and will be disregarded in the rest of this + document. + +3.2.4. RTP Payload Type + + Each RTP stream utilizes one or more RTP payload formats. An RTP + payload format describes how the output of a particular media codec + is framed and encoded into RTP packets. The payload format is + identified by the payload type (PT) field in the RTP packet header. + The combination of SSRC and PT therefore identifies a specific RTP + stream in a specific encoding format. The format definition can be + taken from [RFC3551] for statically allocated payload types but ought + to be explicitly defined in signaling, such as SDP, for both static + and dynamic payload types. The term "format" here includes those + aspects described by out-of-band signaling means; in SDP, the term + "format" includes media type, RTP timestamp sampling rate, codec, + codec configuration, payload format configurations, and various + robustness mechanisms such as redundant encodings [RFC2198]. + + The RTP payload type is scoped by the sending endpoint within an RTP + session. PT has the same meaning across all RTP streams in an RTP + session. All SSRCs sent from a single endpoint share the same + payload type definitions. The RTP payload type is designed such that + only a single payload type is valid at any instant in time in the RTP + stream's timestamp timeline, effectively time-multiplexing different + payload types if any change occurs. The payload type can change on a + per-packet basis for an SSRC -- for example, a speech codec making + use of generic comfort noise [RFC3389]. If there is a true need to + send multiple payload types for the same SSRC that are valid for the + same instant, then redundant encodings [RFC2198] can be used. + Several additional constraints, other than those mentioned above, + need to be met to enable this usage, one of which is that the + combined payload sizes of the different payload types ought not + exceed the transport MTU. + + Other aspects of using the RTP payload format are described in "How + to Write an RTP Payload Format" [RFC8088]. + + The payload type is not a multiplexing point at the RTP layer (see + Appendix A for a detailed discussion of why using the payload type as + an RTP multiplexing point does not work). The RTP payload type is, + however, used to determine how to consume and decode an RTP stream. + The RTP payload type number is sometimes used to associate an RTP + stream with the signaling, which in general requires that unique RTP + payload type numbers be used in each context. Using MID, e.g., when + bundling "m=" sections [RFC8843], can replace the payload type as a + signaling association, and unique RTP payload types are then no + longer required for that purpose. + +3.3. Issues Related to RTP Topologies + + The impact of how RTP multiplexing is performed will in general vary + with how the RTP session participants are interconnected, as + described in "RTP Topologies" [RFC7667]. + + Even the most basic use case -- "Topo-Point-to-Point" as described in + [RFC7667] -- raises a number of considerations, which are discussed + in detail in the following sections. They range over such aspects as + the following: + + * Does my communication peer support RTP as defined with multiple + SSRCs per RTP session? + + * Do I need network differentiation in the form of QoS + (Section 4.2.1)? + + * Can the application more easily process and handle the media + streams if they are in different RTP sessions? + + * Do I need to use additional RTP streams for RTP retransmission or + FEC? + + For some point-to-multipoint topologies (e.g., Topo-ASM and Topo-SSM + [RFC7667]), multicast is used to interconnect the session + participants. Special considerations (documented in Section 4.2.3) + are then needed, as multicast is a one-to-many distribution system. + + Sometimes, an RTP communication session can end up in a situation + where the communicating peers are not compatible, for various + reasons: + + * No common media codec for a media type, thus requiring + transcoding. + + * Different support for multiple RTP streams and RTP sessions. + + * Usage of different media transport protocols (i.e., one peer uses + RTP, but the other peer uses a different transport protocol). + + * Usage of different transport protocols, e.g., UDP, the Datagram + Congestion Control Protocol (DCCP), or TCP. + + * Different security solutions (e.g., IPsec, TLS, DTLS, or the + Secure Real-time Transport Protocol (SRTP)) with different keying + mechanisms. + + These compatibility issues can often be resolved by the inclusion of + a translator between the two peers -- the Topo-PtP-Translator, as + described in [RFC7667]. The translator's main purpose is to make the + peers look compatible to each other. There can also be reasons other + than compatibility for inserting a translator in the form of a + middlebox or gateway -- for example, a need to monitor the RTP + streams. Beware that changing the stream transport characteristics + in the translator can require a thorough understanding of aspects + ranging from congestion control and media-level adaptations to + application-layer semantics. + + Within the uses enabled by the RTP standard, the point-to-point + topology can contain one or more RTP sessions with one or more media + sources per session, each having one or more RTP streams per media + source. + +3.4. Issues Related to RTP and RTCP + + Using multiple RTP streams is a well-supported feature of RTP. + However, for most implementers or people writing RTP/RTCP + applications or extensions attempting to apply multiple streams, it + can be unclear when it is most appropriate to add an additional RTP + stream in an existing RTP session and when it is better to use + multiple RTP sessions. This section discusses the various + considerations that need to be taken into account. + +3.4.1. The RTP Specification + + RFC 3550 contains some recommendations and a numbered list + (Section 5.2 of [RFC3550]) of five arguments regarding different + aspects of RTP multiplexing. Please review Section 5.2 of [RFC3550]. + Five important aspects are quoted below. + + 1. | If, say, two audio streams shared the same RTP session and the + | same SSRC value, and one were to change encodings and thus + | acquire a different RTP payload type, there would be no + | general way of identifying which stream had changed encodings. + + This argument advocates the use of different SSRCs for each + individual RTP stream, as this is fundamental to RTP operation. + + 2. | An SSRC is defined to identify a single timing and sequence + | number space. Interleaving multiple payload types would + | require different timing spaces if the media clock rates + | differ and would require different sequence number spaces to + | tell which payload type suffered packet loss. + + This argument advocates against demultiplexing RTP streams within + a session based only on their RTP payload type numbers; it still + stands, as can be seen by the extensive list of issues discussed + in Appendix A. + + 3. | The RTCP sender and receiver reports (see Section 6.4) can + | only describe one timing and sequence number space per SSRC + | and do not carry a payload type field. + + This argument is yet another argument against payload type + multiplexing. + + 4. | An RTP mixer would not be able to combine interleaved streams + | of incompatible media into one stream. + + This argument advocates against multiplexing RTP packets that + require different handling into the same session. In most cases, + the RTP mixer must embed application logic to handle streams; the + separation of streams according to stream type is just another + piece of application logic, which might or might not be + appropriate for a particular application. One type of + application that can mix different media sources blindly is the + audio-only telephone bridge, although the ability to do that + comes from the well-defined scenario that is aided by the use of + a single media type, even though individual streams may use + incompatible codec types; most other types of applications need + application-specific logic to perform the mix correctly. + + 5. | Carrying multiple media in one RTP session precludes: the use + | of different network paths or network resource allocations if + | appropriate; reception of a subset of the media if desired, + | for example just audio if video would exceed the available + | bandwidth; and receiver implementations that use separate + | processes for the different media, whereas using separate RTP + | sessions permits either single- or multiple-process + | implementations. + + This argument discusses network aspects that are described in + Section 4.2. It also goes into aspects of implementation, like + split component terminals (see Section 3.10 of [RFC7667]) -- + endpoints where different processes or interconnected devices + handle different aspects of the whole multimedia session. + + To summarize, RFC 3550's view on multiplexing is to use unique SSRCs + for anything that is its own media/packet stream and use different + RTP sessions for media streams that don't share a media type. This + document supports the first point; it is very valid. The latter + needs further discussion, as imposing a single solution on all usages + of RTP is inappropriate. "Sending Multiple Types of Media in a + Single RTP Session" [RFC8860] updates RFC 3550 to allow multiple + media types in an RTP session and provides a detailed analysis of the + potential benefits and issues related to having multiple media types + in the same RTP session. Thus, [RFC8860] provides a wider scope for + an RTP session and considers multiple media types in one RTP session + as a possible choice for the RTP application designer. + +3.4.2. Multiple SSRCs in a Session + + Using multiple SSRCs at one endpoint in an RTP session requires that + some unclear aspects of the RTP specification be resolved. These + items could potentially lead to some interoperability issues as well + as some potential significant inefficiencies, as further discussed in + "Sending Multiple RTP Streams in a Single RTP Session" [RFC8108]. An + RTP application designer should consider these issues and the + application's possible impact caused by a lack of appropriate RTP + handling or optimization in the peer endpoints. + + Using multiple RTP sessions can potentially mitigate application + issues caused by multiple SSRCs in an RTP session. + +3.4.3. Binding Related Sources + + A common problem in a number of various RTP extensions has been how + to bind related RTP streams together. This issue is common to both + using additional SSRCs and multiple RTP sessions. + + The solutions can be divided into a few groups: + + * RTP/RTCP based + + * Signaling based, e.g., SDP + + * Grouping related RTP sessions + + * Grouping SSRCs within an RTP session + + Most solutions are explicit, but some implicit methods have also been + applied to the problem. + + The SDP-based signaling solutions are: + + SDP media description grouping: + The SDP grouping framework [RFC5888] uses various semantics to + group any number of media descriptions. SDP media description + grouping has primarily been used to group RTP sessions, but in + combination with [RFC8843], it can also group multiple media + descriptions within a single RTP session. + + SDP media multiplexing: + "Negotiating Media Multiplexing Using the Session Description + Protocol (SDP)" [RFC8843] uses information taken from both SDP and + RTCP to associate RTP streams to SDP media descriptions. This + allows both SDP and RTCP to group RTP streams belonging to an SDP + media description and group multiple SDP media descriptions into a + single RTP session. + + SDP SSRC grouping: + "Source-Specific Media Attributes in the Session Description + Protocol (SDP)" [RFC5576] includes a solution for grouping SSRCs + in the same way that the grouping framework groups media + descriptions. + + The above grouping constructs support many use cases. Those + solutions have shortcomings in cases where the session's dynamic + properties are such that it is difficult or a drain on resources to + keep the list of related SSRCs up to date. + + One RTP/RTCP-based grouping solution is to use the RTCP SDES CNAME to + bind related RTP streams to an endpoint or a synchronization context. + For applications with a single RTP stream per type (media, source, or + redundancy stream), the CNAME is sufficient for that purpose, + independent of whether one or more RTP sessions are used. However, + some applications choose not to use a CNAME because of perceived + complexity or a desire not to implement RTCP and instead use the same + SSRC value to bind related RTP streams across multiple RTP sessions. + RTP retransmission [RFC4588], when configured to use multiple RTP + sessions, and generic FEC [RFC5109] both use the CNAME method to + relate the RTP streams, which may work but might have some downsides + in RTP sessions with many participating SSRCs. It is not recommended + to use identical SSRC values across RTP sessions to relate RTP + streams; when an SSRC collision occurs, this will force a change of + that SSRC in all RTP sessions and will thus resynchronize all of the + streams instead of only the single media stream experiencing the + collision. + + Another method for implicitly binding SSRCs is used by RTP + retransmission [RFC4588] when using the same RTP session as the + source RTP stream for retransmissions. A receiver that is missing a + packet issues an RTP retransmission request and then awaits a new + SSRC carrying the RTP retransmission payload, where that SSRC is from + the same CNAME. This limits a requester to having only one + outstanding retransmission request on any new SSRCs per endpoint. + + "RTP Payload Format Restrictions" [RFC8851] provides an RTP/RTCP- + based mechanism to unambiguously identify the RTP streams within an + RTP session and restrict the streams' payload format parameters in a + codec-agnostic way beyond what is provided with the regular payload + types. The mapping is done by specifying an "a=rid" value in the SDP + offer/answer signaling and having the corresponding RtpStreamId value + as an SDES item and an RTP header extension [RFC8852]. The RID + solution also includes a solution for binding redundancy RTP streams + to their original source RTP streams, given that those streams use + RID identifiers. The redundancy stream uses the RepairedRtpStreamId + SDES item and RTP header extension to declare the RtpStreamId value + of the source stream to create the binding. + + Experience has shown that an explicit binding between the RTP + streams, agnostic of SSRC values, behaves well. That way, solutions + using multiple RTP streams in a single RTP session and in multiple + RTP sessions will use the same type of binding. + +3.4.4. Forward Error Correction + + There exist a number of FEC-based schemes designed to mitigate packet + loss in the original streams. Most of the FEC schemes protect a + single source flow. This protection is achieved by transmitting a + certain amount of redundant information that is encoded such that it + can repair one or more instances of packet loss over the set of + packets the redundant information protects. This sequence of + redundant information needs to be transmitted as its own media stream + or, in some cases, instead of the original media stream. Thus, many + of these schemes create a need for binding related flows, as + discussed above. Looking at the history of these schemes, there are + schemes using multiple SSRCs and schemes using multiple RTP sessions, + and some schemes that support both modes of operation. + + Using multiple RTP sessions supports the case where some set of + receivers might not be able to utilize the FEC information. By + placing it in a separate RTP session and if separating RTP sessions + at the transport level, FEC can easily be ignored at the transport + level, without considering any RTP-layer information. + + In usages involving multicast, sending FEC information in a separate + multicast group allows for similar flexibility. This is especially + useful when receivers see heterogeneous packet loss rates. A + receiver can decide, based on measurement of experienced packet loss + rates, whether to join a multicast group with suitable FEC data + repair capabilities. + +4. Considerations for RTP Multiplexing + +4.1. Interworking Considerations + + There are several different kinds of interworking, and this section + discusses two: interworking directly between different applications + and the interworking of applications through an RTP translator. The + discussion includes the implications of potentially different RTP + multiplexing point choices and limitations that have to be considered + when working with some legacy applications. + +4.1.1. Application Interworking + + It is not uncommon that applications or services of similar but not + identical usage, especially those intended for interactive + communication, encounter a situation where one wants to interconnect + two or more of these applications. + + In these cases, one ends up in a situation where one might use a + gateway to interconnect applications. This gateway must then either + change the multiplexing structure or adhere to the respective + limitations in each application. + + There are two fundamental approaches to building a gateway: using RTP + translator interworking (RTP bridging), where the gateway acts as an + RTP translator with the two interconnected applications being members + of the same RTP session; or using gateway interworking + (Section 4.1.3) with RTP termination, where there are independent RTP + sessions between each interconnected application and the gateway. + + For interworking to be feasible, any security solution in use needs + to be compatible and capable of exchanging keys with either the peer + or the gateway under the trust model being used. Secondly, the + applications need to use media streams in a way that makes sense in + both applications. + +4.1.2. RTP Translator Interworking + + From an RTP perspective, the RTP translator approach could work if + all the applications are using the same codecs with the same payload + types, have made the same multiplexing choices, and have the same + capabilities regarding the number of simultaneous RTP streams + combined with the same set of RTP/RTCP extensions being supported. + Unfortunately, this might not always be true. + + When a gateway is implemented via an RTP translator, an important + consideration is if the two applications being interconnected need to + use the same approach to multiplexing. If one side is using RTP + session multiplexing and the other is using SSRC multiplexing with + BUNDLE [RFC8843], it may be possible for the RTP translator to map + the RTP streams between both sides using some method, e.g., based on + the number and order of SDP "m=" lines from each side. There are + also challenges related to SSRC collision handling, since, unless + SSRC translation is applied on the RTP translator, there may be a + collision on the SSRC multiplexing side that the RTP session + multiplexing side will not be aware of. Furthermore, if one of the + applications is capable of working in several modes (such as being + able to use additional RTP streams in one RTP session or multiple RTP + sessions at will) and the other one is not, successful + interconnection depends on locking the more flexible application into + the operating mode where interconnection can be successful, even if + none of the participants are using the less flexible application when + the RTP sessions are being created. + +4.1.3. Gateway Interworking + + When one terminates RTP sessions at the gateway, there are certain + tasks that the gateway has to carry out: + + * Generating appropriate RTCP reports for all RTP streams (possibly + based on incoming RTCP reports) originating from SSRCs controlled + by the gateway. + + * Handling SSRC collision resolution in each application's RTP + sessions. + + * Signaling, choosing, and policing appropriate bitrates for each + session. + + For applications that use any security mechanism, e.g., in the form + of SRTP, the gateway needs to be able to decrypt and verify source + integrity of the incoming packets and then re-encrypt, integrity + protect, and sign the packets as the peer in the other application's + security context. This is necessary even if all that's needed is a + simple remapping of SSRC numbers. If this is done, the gateway also + needs to be a member of the security contexts of both sides and thus + a trusted entity. + + The gateway might also need to apply transcoding (for incompatible + codec types), media-level adaptations that cannot be solved through + media negotiation (such as rescaling for incompatible video size + requirements), suppression of content that is known not to be handled + in the destination application, or the addition or removal of + redundancy coding or scalability layers to fit the needs of the + destination domain. + + From the above, we can see that the gateway needs to have an intimate + knowledge of the application requirements; a gateway is by its nature + application specific and not a commodity product. + + These gateways might therefore potentially block application + evolution by blocking RTP and RTCP extensions that the applications + have been extended with but that are unknown to the gateway. + + If one uses a security mechanism like SRTP, the gateway and the + necessary trust in it by the peers pose an additional risk to + communication security. The gateway also incurs additional + complexities in the form of the decrypt-encrypt cycles needed for + each forwarded packet. SRTP, due to its keying structure, also + requires that each RTP session need different master keys, as the use + of the same key in two RTP sessions can, for some ciphers, result in + a reuse of a one-time pad that completely breaks the confidentiality + of the packets. + +4.1.4. Legacy Considerations for Multiple SSRCs + + Historically, the most common RTP use cases have been point-to-point + Voice over IP (VoIP) or streaming applications, commonly with no more + than one media source per endpoint and media type (typically audio or + video). Even in conferencing applications, especially voice-only, + the conference focus or bridge provides to each participant a single + stream containing a mix of the other participants. It is also common + to have individual RTP sessions between each endpoint and the RTP + mixer, meaning that the mixer functions as an RTP-terminating + gateway. + + Applications and systems that aren't updated to handle multiple + streams following these recommendations can have issues with + participating in RTP sessions containing multiple SSRCs within a + single session, such as: + + 1. The need to handle more than one stream simultaneously rather + than replacing an already-existing stream with a new one. + + 2. Being capable of decoding multiple streams simultaneously. + + 3. Being capable of rendering multiple streams simultaneously. + + This indicates that gateways attempting to interconnect to this class + of devices have to make sure that only one RTP stream of each media + type gets delivered to the endpoint if it's expecting only one and + that the multiplexing format is what the device expects. It is + highly unlikely that RTP translator-based interworking can be made to + function successfully in such a context. + +4.2. Network Considerations + + The RTP implementer needs to consider that the RTP multiplexing + choice also impacts network-level mechanisms. + +4.2.1. Quality of Service + + QoS mechanisms are either flow based or packet marking based. RSVP + [RFC2205] is an example of a flow-based mechanism, while Diffserv + [RFC2474] is an example of a packet-marking-based mechanism. + + For a flow-based scheme, additional SSRCs will receive the same QoS + as all other RTP streams being part of the same 5-tuple (protocol, + source address, destination address, source port, destination port), + which is the most common selector for flow-based QoS. + + For a packet-marking-based scheme, the method of multiplexing will + not affect the possibility of using QoS. Different Differentiated + Services Code Points (DSCPs) can be assigned to different packets + within a transport flow (5-tuple) as well as within an RTP stream, + assuming the usage of UDP or other transport protocols that do not + have issues with packet reordering within the transport flow + (5-tuple). To avoid packet-reordering issues, packets belonging to + the same RTP flow should limit their use of DSCPs to packets whose + corresponding Per-Hop Behavior (PHB) do not enable reordering. If + the transport protocol being used assumes in-order delivery of + packets (e.g., TCP and the Stream Control Transmission Protocol + (SCTP)), then a single DSCP should be used. For more discussion on + this topic, see [RFC7657]. + + The method for assigning marking to packets can impact what number of + RTP sessions to choose. If this marking is done using a network + ingress function, it can have issues discriminating the different RTP + streams. The network API on the endpoint also needs to be capable of + setting the marking on a per-packet basis to reach full + functionality. + +4.2.2. NAT and Firewall Traversal + + In today's networks, there exist a large number of middleboxes. + Those that normally have the most impact on RTP are Network Address + Translators (NATs) and Firewalls (FWs). + + Below, we analyze and comment on the impact of requiring more + underlying transport flows in the presence of NATs and FWs: + + Endpoint Port Consumption: + A given IP address only has 65536 available local ports per + transport protocol for all consumers of ports that exist on the + machine. This is normally never an issue for an end-user machine. + It can become an issue for servers that handle a large number of + simultaneous streams. However, if the application uses ICE to + authenticate STUN requests, a server can serve multiple endpoints + from the same local port and use the whole 5-tuple (source and + destination address, source and destination port, protocol) as the + identifier of flows after having securely bound them to the remote + endpoint address using the STUN request. In theory, the minimum + number of media server ports needed is the maximum number of + simultaneous RTP sessions a single endpoint can use. In practice, + implementations will probably benefit from using more server ports + to simplify implementation or avoid performance bottlenecks. + + NAT State: + If an endpoint sits behind a NAT, each flow it generates to an + external address will result in a state that has to be kept in the + NAT. That state is a limited resource. In home or Small + Office/Home Office (SOHO) NATs, the most limited resource is + memory or processing. For large-scale NATs serving many internal + endpoints, available external ports are likely the scarce + resource. Port limitations are primarily a problem for larger + centralized NATs where endpoint-independent mapping requires each + flow to use one port for the external IP address. This affects + the maximum number of internal users per external IP address. + However, as a comparison, a real-time video conference session + with audio and video likely uses less than 10 UDP flows, compared + to certain web applications that can use 100+ TCP flows to various + servers from a single browser instance. + + Extra Delay Added by NAT Traversal: + Performing the NAT/FW traversal takes a certain amount of time for + each flow. The best-case scenario for additional NAT/FW traversal + time after finding the first valid candidate pair following the + specified ICE procedures is 1.5*RTT + Ta*(Additional_Flows-1), + where Ta is the pacing timer. That assumes a message in one + direction, immediately followed by a return message in the + opposite direction to confirm reachability. It isn't more, + because ICE first finds one candidate pair that works, prior to + attempting to establish multiple flows. Thus, there is no extra + time until one has found a working candidate pair. Based on that + working pair, the extra time is needed to establish the additional + flows (two or three, in most cases) in parallel. However, packet + loss causes extra delays of at least 500 ms (the minimal + retransmission timer for ICE). + + NAT Traversal Failure Rate: + Due to the need to establish more than a single flow through the + NAT, there is some risk that establishing the first flow will + succeed but one or more of the additional flows will fail. The + risk of this happening is hard to quantify but should be fairly + low, as one flow from the same interfaces has just been + successfully established. Thus, only such rare events as NAT + resource overload, selecting particular port numbers that are + filtered, etc., ought to be reasons for failure. + + Deep Packet Inspection and Multiple Streams: + FWs differ in how deeply they inspect packets. Previous + experience using FWs and Session Border Gateways (SBGs) with RTP + shows that there is a significant risk that the FWs and SBGs will + reject RTP sessions that use multiple SSRCs. + + Using additional RTP streams in the same RTP session and transport + flow does not introduce any additional NAT traversal complexities per + RTP stream. This can be compared with (normally) one or two + additional transport flows per RTP session when using multiple RTP + sessions. Additional lower-layer transport flows will be needed, + unless an explicit demultiplexing layer is added between RTP and the + transport protocol. At the time of this writing, no such mechanism + was defined. + +4.2.3. Multicast + + Multicast groups provide a powerful tool for a number of real-time + applications, especially those that desire broadcast-like behaviors + with one endpoint transmitting to a large number of receivers, like + in IPTV. An RTP/RTCP extension to better support Source-Specific + Multicast (SSM) [RFC5760] is also available. Many-to-many + communication, which RTP [RFC3550] was originally built to support, + has several limitations in common with multicast. + + One limitation is that, for any group, sender-side adaptations with + the intent to suit all receivers would have to adapt to the most + limited receiver experiencing the worst conditions among the group + participants, which imposes degradation for all participants. For + broadcast-type applications with a large number of receivers, this is + not acceptable. Instead, various receiver-based solutions are + employed to ensure that the receivers achieve the best possible + performance. By using scalable encoding and placing each scalability + layer in a different multicast group, the receiver can control the + amount of traffic it receives. To have each scalability layer in a + different multicast group, one RTP session per multicast group is + used. + + In addition, the transport flow considerations in multicast are a bit + different from unicast; NATs with port translation are not useful in + the multicast environment, meaning that the entire port range of each + multicast address is available for distinguishing between RTP + sessions. + + Thus, when using broadcast applications it appears easiest and most + straightforward to use multiple RTP sessions for sending different + media flows used for adapting to network conditions. It is also + common that streams improving transport robustness are sent in their + own multicast group to allow for interworking with legacy + applications or to support different levels of protection. + + Many-to-many applications have different needs, and the most + appropriate multiplexing choice will depend on how the actual + application is realized. Multicast applications that are capable of + using sender-side congestion control can avoid the use of multiple + multicast sessions and RTP sessions that result from the use of + receiver-side congestion control. + + The properties of a broadcast application using RTP multicast are as + follows: + + 1. The application uses a group of RTP sessions -- not just one. + Each endpoint will need to be a member of a number of RTP + sessions in order to perform well. + + 2. Within each RTP session, the number of RTP receivers is likely to + be much larger than the number of RTP senders. + + 3. The application needs signaling functions to identify the + relationships between RTP sessions. + + 4. The application needs signaling or RTP/RTCP functions to identify + the relationships between SSRCs in different RTP sessions when + more complex relations than those that can be expressed by the + CNAME exist. + + Both broadcast and many-to-many multicast applications share a + signaling requirement; all of the participants need the same RTP and + payload type configuration. Otherwise, A could, for example, be + using payload type 97 as the video codec H.264 while B thinks it is + MPEG-2. SDP offer/answer [RFC3264] is not appropriate for ensuring + this property in a broadcast/multicast context. The signaling + aspects of broadcast/multicast are not explored further in this memo. + + Security solutions for this type of group communication are also + challenging. First, the key-management mechanism and the security + protocol need to support group communication. Second, source + authentication requires special solutions. For more discussion on + this topic, please review "Options for Securing RTP Sessions" + [RFC7201]. + +4.3. Security and Key-Management Considerations + + When dealing with point-to-point two-member RTP sessions only, there + are few security issues that are relevant to the choice of having one + RTP session or multiple RTP sessions. However, there are a few + aspects of multi-party sessions that might warrant consideration. + For general information regarding possible methods of securing RTP, + please review [RFC7201]. + +4.3.1. Security Context Scope + + When using SRTP [RFC3711], the security context scope is important + and can be a necessary differentiation in some applications. As + SRTP's crypto suites are (so far) built around symmetric keys, the + receiver will need to have the same key as the sender. As a result, + no one in a multi-party session can be certain that a received packet + was really sent by the claimed sender and not by another party having + access to the key. The single SRTP algorithm not having this + property is Timed Efficient Stream Loss-Tolerant Authentication + (TESLA) source authentication [RFC4383]. However, TESLA adds delay + to achieve source authentication. In most cases, symmetric ciphers + provide sufficient security properties, but in a few cases they can + create issues. + + The first case is when someone leaves a multi-party session and one + wants to ensure that the party that left can no longer access the RTP + streams. This requires that everyone rekey without disclosing the + new keys to the excluded party. + + A second case is when security is used as an enforcing mechanism for + stream access differentiation between different receivers. Take, for + example, a scalable layer or a high-quality simulcast version that + only users paying a premium are allowed to access. The mechanism + preventing a receiver from getting the high-quality stream can be + based on the stream being encrypted with a key that users can't + access without paying a premium, using the key-management mechanism + to limit access to the key. + + As specified in [RFC3711], SRTP uses unique keys per SSRC; however, + the original assumption was a single-session master key from which + SSRC-specific RTP and RTCP keys were derived. However, that + assumption was proven incorrect, as the application usage and the + developed key-management mechanisms have chosen many different + methods for ensuring unique keys per SSRC. The key-management + functions have different abilities to establish different sets of + keys, normally on a per-endpoint basis. For example, DTLS-SRTP + [RFC5764] and Security Descriptions [RFC4568] establish different + keys for outgoing and incoming traffic from an endpoint. This key + usage has to be written into the cryptographic context, possibly + associated with different SSRCs. Thus, limitations do exist, + depending on the chosen key-management method and due to the + integration of particular implementations of the key-management + method and SRTP. + +4.3.2. Key Management for Multi-party Sessions + + The capabilities of the key-management method combined with the RTP + multiplexing choices affect the resulting security properties, + control over the secured media, and who has access to it. + + Multi-party sessions contain at least one RTP stream from each active + participant. Depending on the multi-party topology [RFC7667], each + participant can both send and receive multiple RTP streams. + Transport translator-based sessions (Topo-Trn-Translator) and + multicast sessions (Topo-ASM) can use neither Security Descriptions + [RFC4568] nor DTLS-SRTP [RFC5764] without an extension, because each + endpoint provides its own set of keys. In centralized conferences, + the signaling counterpart is a conference server, and the transport + translator is the media-plane unicast counterpart (to which DTLS + messages would be sent). Thus, an extension like Encrypted Key + Transport [RFC8870] or a solution based on Multimedia Internet KEYing + (MIKEY) [RFC3830] that allows for keying all session participants + with the same master key is needed. + + Privacy-Enhanced RTP Conferencing (PERC) also enables a different + trust model with semi-trusted media-switching RTP middleboxes + [RFC8871]. + +4.3.3. Complexity Implications + + There can be complex interactions between the choice of multiplexing + and topology and the security functions. This becomes especially + evident in RTP topologies having any type of middlebox that processes + or modifies RTP/RTCP packets. While the overhead of an RTP + translator or mixer rewriting an SSRC value in the RTP packet of an + unencrypted session is low, the cost is higher when using + cryptographic security functions. For example, if using SRTP + [RFC3711], the actual security context and exact crypto key are + determined by the SSRC field value. If one changes the SSRC value, + the encryption and authentication must use another key. Thus, + changing the SSRC value implies a decryption using the old SSRC and + its security context, followed by an encryption using the new one. + +5. RTP Multiplexing Design Choices + + This section discusses how some RTP multiplexing design choices can + be used in applications to achieve certain goals and summarizes the + implications of such choices. The benefits and downsides of each + design are also discussed. + +5.1. Multiple Media Types in One Session + + This design uses a single RTP session for multiple different media + types, like audio and video, and possibly also transport robustness + mechanisms like FEC or retransmission. An endpoint can send zero, + one, or multiple media sources per media type, resulting in a number + of RTP streams of various media types for both source and redundancy + streams. + + Advantages: + + 1. Only a single RTP session is used, which implies: + + * Minimal need to keep NAT/FW state. + + * Minimal NAT/FW traversal cost. + + * Fate-sharing for all media flows. + + * Minimal overhead for security association establishment. + + 2. Dynamic allocation of RTP streams can be handled almost entirely + at the RTP level. The extent to which this allocation can be + kept at the RTP level depends on the application's needs for an + explicit indication of stream usage and in how timely a fashion + that information can be signaled. + + Disadvantages: + + 1. It is less suitable for interworking with other applications that + use individual RTP sessions per media type or multiple sessions + for a single media type, due to the risk of SSRC collisions and + thus a potential need for SSRC translation. + + 2. Negotiation of individual bandwidths for the different media + types is currently only possible in SDP when using RID [RFC8851]. + + 3. It is not suitable for split component terminals (see + Section 3.10 of [RFC7667]). + + 4. Flow-based QoS cannot be used to provide separate treatment of + RTP streams compared to others in the single RTP session. + + 5. If there is significant asymmetry between the RTP streams' RTCP + reporting needs, there are some challenges related to + configuration and usage to avoid wasting RTCP reporting on the + RTP stream that does not need such frequent reporting. + + 6. It is not suitable for applications where some receivers like to + receive only a subset of the RTP streams, especially if multicast + or a transport translator is being used. + + 7. There are some additional concerns regarding legacy + implementations that do not support the RTP specification fully + when it comes to handling multiple SSRCs per endpoint, as + multiple simultaneous media types are sent as separate SSRCs in + the same RTP session. + + 8. If the applications need finer control over which session + participants are included in different sets of security + associations, most key-management mechanisms will have + difficulties establishing such a session. + +5.2. Multiple SSRCs of the Same Media Type + + In this design, each RTP session serves only a single media type. + The RTP session can contain multiple RTP streams, from either a + single endpoint or multiple endpoints. This commonly creates a low + number of RTP sessions, typically only one for audio and one for + video, with a corresponding need for two listening ports when using + RTP/RTCP multiplexing [RFC5761]. + + Advantages: + + 1. It works well with split component terminals (see Section 3.10 of + [RFC7667]) where the split is per media type. + + 2. It enables flow-based QoS with different prioritization levels + between media types. + + 3. For applications with dynamic usage of RTP streams (i.e., streams + are frequently added and removed), having much of the state + associated with the RTP session rather than per individual SSRC + can avoid the need for in-session signaling of meta-information + about each SSRC. In simple cases, this allows for unsignaled RTP + streams where session-level information and an RTCP SDES item + (e.g., CNAME) are sufficient. In the more complex cases where + more source-specific metadata needs to be signaled, the SSRC can + be associated with an intermediate identifier, e.g., the MID + conveyed as an SDES item as defined in Section 15 of [RFC8843]. + + 4. The overhead of security association establishment is low. + + Disadvantages: + + 1. A slightly higher number of RTP sessions are needed, compared to + multiple media types in one session (Section 5.1). This implies + the following: + + * More NAT/FW state is needed. + + * The cost of NAT/FW traversal is increased in terms of both + processing and delay. + + 2. There is some potential for concern regarding legacy + implementations that don't support the RTP specification fully + when it comes to handling multiple SSRCs per endpoint. + + 3. It is not possible to control security associations for sets of + RTP streams within the same media type with today's key- + management mechanisms, unless these are split into different RTP + sessions (Section 5.3). + + For RTP applications where all RTP streams of the same media type + share the same usage, this structure provides efficiency gains in the + amount of network state used and provides more fate-sharing with + other media flows of the same type. At the same time, it still + maintains almost all functionalities for the negotiation signaling of + properties per individual media type and also enables flow-based QoS + prioritization between media types. It handles multi-party sessions + well, independently of multicast or centralized transport + distribution, as additional sources can dynamically enter and leave + the session. + +5.3. Multiple Sessions for One Media Type + + This design goes one step further than the design discussed in + Section 5.2 by also using multiple RTP sessions for a single media + type. The main reason for going in this direction is that the RTP + application needs separation of the RTP streams according to their + usage, such as, for example, scalability over multicast, simulcast, + the need for extended QoS prioritization, or the need for fine- + grained signaling using RTP session-focused signaling tools. + + Advantages: + + 1. This design is more suitable for multicast usage where receivers + can individually select which RTP sessions they want to + participate in, assuming that each RTP session has its own + multicast group. + + 2. When multiple different usages exist, the application can + indicate its usage of the RTP streams at the RTP session level. + + 3. There is less need for SSRC-specific explicit signaling for each + media stream and thus a reduced need for explicit and timely + signaling when RTP streams are added or removed. + + 4. It enables detailed QoS prioritization for flow-based mechanisms. + + 5. It works well with split component terminals (see Section 3.10 of + [RFC7667]). + + 6. The scope for who is included in a security association can be + structured around the different RTP sessions, thus enabling such + functionality with existing key-management mechanisms. + + Disadvantages: + + 1. There is an increased amount of session configuration state + compared to multiple SSRCs of the same media type (Section 5.2), + due to the increased amount of RTP sessions. + + 2. For RTP streams that are part of scalability, simulcast, or + transport robustness, a method for binding sources across + multiple RTP sessions is needed. + + 3. There is some potential for concern regarding legacy + implementations that don't support the RTP specification fully + when it comes to handling multiple SSRCs per endpoint. + + 4. The overhead of security association establishment is higher, due + to the increased number of RTP sessions. + + 5. If the applications need finer control over which participants in + a given RTP session are included in different sets of security + associations, most of today's key-management mechanisms will have + difficulties establishing such a session. + + For more-complex RTP applications that have several different usages + for RTP streams of the same media type or that use scalability or + simulcast, this solution can enable those functions, at the cost of + increased overhead associated with the additional sessions. This + type of structure is suitable for more-advanced applications as well + as multicast-based applications requiring differentiation to + different participants. + +5.4. Single SSRC per Endpoint + + In this design, each endpoint in a point-to-point session has only a + single SSRC; thus, the RTP session contains only two SSRCs -- one + local and one remote. This session can be used either + unidirectionally (i.e., one SSRC sends an RTP stream that is received + by the other SSRC) or bidirectionally (i.e., the two SSRCs both send + an RTP stream and receive the RTP stream sent by the other endpoint). + If the application needs additional media flows between the + endpoints, it will have to establish additional RTP sessions. + + Advantages: + + 1. This design has great potential for interoperability with legacy + applications, as it will not tax any RTP stack implementations. + + 2. The signaling system makes it possible to negotiate and describe + the exact formats and bitrates for each RTP stream, especially + using today's tools in SDP. + + 3. It is possible to control security associations per RTP stream + with current key-management functions, since each RTP stream is + directly related to an RTP session and the most commonly used + keying mechanisms operate on a per-session basis. + + Disadvantages: + + 1. The amount of NAT/FW state grows linearly with the number of RTP + streams. + + 2. NAT/FW traversal increases delay and resource consumption. + + 3. There are likely more signaling message and signaling processing + requirements due to the increased amount of session-related + information. + + 4. There is higher potential for a single RTP stream to fail during + transport between the endpoints, due to the need for a separate + NAT/FW traversal for every RTP stream, since there is only one + stream per session. + + 5. The amount of explicit state for relating RTP streams grows, + depending on how the application relates RTP streams. + + 6. Port consumption might become a problem for centralized services, + where the central node's port or 5-tuple filter consumption grows + rapidly with the number of sessions. + + 7. For applications where RTP stream usage is highly dynamic, i.e., + entities frequently enter and leave sessions, the amount of + signaling can become high. Issues can also arise from the need + for timely establishment of additional RTP sessions. + + 8. If, against the recommendation in [RFC3550], the same SSRC value + is reused in multiple RTP sessions rather than being randomly + chosen, interworking with applications that use a different + multiplexing structure will require SSRC translation. + + RTP applications with a strong need to interwork with legacy RTP + applications can potentially benefit from this structure. However, a + large number of media descriptions in SDP can also run into issues + with existing implementations. For any application needing a larger + number of media flows, the overhead can become very significant. + This structure is also not suitable for non-mixed multi-party + sessions, as any given RTP stream from each participant, although + having the same usage in the application, needs its own RTP session. + In addition, the dynamic behavior that can arise in multi-party + applications can tax the signaling system and make timely media + establishment more difficult. + +5.5. Summary + + Both the "single SSRC per endpoint" (Section 5.4) and "multiple media + types in one session" (Section 5.1) cases require full explicit + signaling of the media stream relationships. However, they operate + on two different levels, where the first primarily enables session- + level binding and the second needs SSRC-level binding. From another + perspective, the two solutions are the two extremes when it comes to + the number of RTP sessions needed. + + The two other designs -- multiple SSRCs of the same media type + (Section 5.2) and multiple sessions for one media type (Section 5.3) + -- are two examples that primarily allow for some implicit mapping of + the role or usage of the RTP streams based on which RTP session they + appear in. Thus, they potentially allow for less signaling and, in + particular, reduce the need for real-time signaling in sessions with + a dynamically changing number of RTP streams. They also represent + points between the first two designs when it comes to the amount of + RTP sessions established, i.e., they represent an attempt to balance + the amount of RTP sessions with the functionality the communication + session provides at both the network level and the signaling level. + +6. Guidelines + + This section contains a number of multi-stream guidelines for + implementers, system designers, and specification writers. + + Do not require the use of the same SSRC value across RTP sessions: + As discussed in Section 3.4.3, there are downsides to using the + same SSRC in multiple RTP sessions as a mechanism to bind related + RTP streams together. It is instead recommended to use a + mechanism to explicitly signal the relationship, in either + RTP/RTCP or the signaling mechanism used to establish the RTP + session(s). + + Use additional RTP streams for additional media sources: + In the cases where an RTP endpoint needs to transmit additional + RTP streams of the same media type in the application, with the + same processing requirements at the network and RTP layers, it is + suggested to send them in the same RTP session. For example, in + the case of a telepresence room where there are three cameras and + each camera captures two persons sitting at the table, we suggest + that each camera send its own RTP stream within a single RTP + session. + + Use additional RTP sessions for streams with different + requirements: + When RTP streams have different processing requirements from the + network or the RTP layer at the endpoints, it is suggested that + the different types of streams be put in different RTP sessions. + This includes the case where different participants want different + subsets of the set of RTP streams. + + Use grouping when using multiple RTP sessions: + When using multiple RTP session solutions, it is suggested to + explicitly group the involved RTP sessions when needed using a + signaling mechanism -- for example, see "The Session Description + Protocol (SDP) Grouping Framework" [RFC5888] -- using some + appropriate grouping semantics. + + Ensure that RTP/RTCP extensions support multiple RTP streams as + well as multiple RTP sessions: + When defining an RTP or RTCP extension, the creator needs to + consider if this extension is applicable for use with additional + SSRCs and multiple RTP sessions. Any extension intended to be + generic must support both. Extensions that are not as generally + applicable will have to consider whether interoperability is + better served by defining a single solution or providing both + options. + + Provide adequate extensions for transport support: + When defining new RTP/RTCP extensions intended for transport + support, like the retransmission or FEC mechanisms, they must + include support for both multiple RTP streams in the same RTP + session and multiple RTP sessions, such that application + developers can choose freely from the set of mechanisms without + concerning themselves with which of the multiplexing choices a + particular solution supports. + +7. IANA Considerations + + This document has no IANA actions. + +8. Security Considerations + + The security considerations discussed in the RTP specification + [RFC3550]; any applicable RTP profile [RFC3551] [RFC4585] [RFC3711]; + and the extensions for sending multiple media types in a single RTP + session [RFC8860], RID [RFC8851], BUNDLE [RFC8843], [RFC5760], and + [RFC5761] apply if selected and thus need to be considered in the + evaluation. + + Section 4.3 discusses the security implications of choosing multiple + SSRCs vs. multiple RTP sessions. + +9. References + +9.1. Normative References + + [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. + Jacobson, "RTP: A Transport Protocol for Real-Time + Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, + July 2003, <https://www.rfc-editor.org/info/rfc3550>. + + [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and + Video Conferences with Minimal Control", STD 65, RFC 3551, + DOI 10.17487/RFC3551, July 2003, + <https://www.rfc-editor.org/info/rfc3551>. + + [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. + Norrman, "The Secure Real-time Transport Protocol (SRTP)", + RFC 3711, DOI 10.17487/RFC3711, March 2004, + <https://www.rfc-editor.org/info/rfc3711>. + + [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, + "Extended RTP Profile for Real-time Transport Control + Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, + DOI 10.17487/RFC4585, July 2006, + <https://www.rfc-editor.org/info/rfc4585>. + + [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific + Media Attributes in the Session Description Protocol + (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, + <https://www.rfc-editor.org/info/rfc5576>. + + [RFC5760] Ott, J., Chesterfield, J., and E. Schooler, "RTP Control + Protocol (RTCP) Extensions for Single-Source Multicast + Sessions with Unicast Feedback", RFC 5760, + DOI 10.17487/RFC5760, February 2010, + <https://www.rfc-editor.org/info/rfc5760>. + + [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and + Control Packets on a Single Port", RFC 5761, + DOI 10.17487/RFC5761, April 2010, + <https://www.rfc-editor.org/info/rfc5761>. + + [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and + B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms + for Real-Time Transport Protocol (RTP) Sources", RFC 7656, + DOI 10.17487/RFC7656, November 2015, + <https://www.rfc-editor.org/info/rfc7656>. + + [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, + DOI 10.17487/RFC7667, November 2015, + <https://www.rfc-editor.org/info/rfc7667>. + + [RFC8843] Holmberg, C., Alvestrand, H., and C. Jennings, + "Negotiating Media Multiplexing Using the Session + Description Protocol (SDP)", RFC 8843, + DOI 10.17487/RFC8843, January 2021, + <https://www.rfc-editor.org/info/rfc8843>. + + [RFC8851] Roach, A.B., Ed., "RTP Payload Format Restrictions", + RFC 8851, DOI 10.17487/RFC8851, January 2021, + <https://www.rfc-editor.org/info/rfc8851>. + + [RFC8852] Roach, A.B., Nandakumar, S., and P. Thatcher, "RTP Stream + Identifier Source Description (SDES)", RFC 8852, + DOI 10.17487/RFC8852, January 2021, + <https://www.rfc-editor.org/info/rfc8852>. + + [RFC8860] Westerlund, M., Perkins, C., and J. Lennox, "Sending + Multiple Types of Media in a Single RTP Session", + RFC 8860, DOI 10.17487/RFC8860, January 2021, + <https://www.rfc-editor.org/info/rfc8860>. + + [RFC8870] Jennings, C., Mattsson, J., McGrew, D., Wing, D., and F. + Andreasen, "Encrypted Key Transport for DTLS and Secure + RTP", RFC 8870, DOI 10.17487/RFC8870, January 2021, + <https://www.rfc-editor.org/info/rfc8870>. + +9.2. Informative References + + [JINGLE] Ludwig, S., Beda, J., Saint-Andre, P., McQueen, R., Egan, + S., and J. Hildebrand, "XEP-0166: Jingle", September 2018, + <https://xmpp.org/extensions/xep-0166.html>. + + [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., + Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse- + Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, + DOI 10.17487/RFC2198, September 1997, + <https://www.rfc-editor.org/info/rfc2198>. + + [RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S. + Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 + Functional Specification", RFC 2205, DOI 10.17487/RFC2205, + September 1997, <https://www.rfc-editor.org/info/rfc2205>. + + [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, + "Definition of the Differentiated Services Field (DS + Field) in the IPv4 and IPv6 Headers", RFC 2474, + DOI 10.17487/RFC2474, December 1998, + <https://www.rfc-editor.org/info/rfc2474>. + + [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session + Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974, + October 2000, <https://www.rfc-editor.org/info/rfc2974>. + + [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, + A., Peterson, J., Sparks, R., Handley, M., and E. + Schooler, "SIP: Session Initiation Protocol", RFC 3261, + DOI 10.17487/RFC3261, June 2002, + <https://www.rfc-editor.org/info/rfc3261>. + + [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model + with Session Description Protocol (SDP)", RFC 3264, + DOI 10.17487/RFC3264, June 2002, + <https://www.rfc-editor.org/info/rfc3264>. + + [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for + Comfort Noise (CN)", RFC 3389, DOI 10.17487/RFC3389, + September 2002, <https://www.rfc-editor.org/info/rfc3389>. + + [RFC3830] Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K. + Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830, + DOI 10.17487/RFC3830, August 2004, + <https://www.rfc-editor.org/info/rfc3830>. + + [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text + Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, + <https://www.rfc-editor.org/info/rfc4103>. + + [RFC4383] Baugher, M. and E. Carrara, "The Use of Timed Efficient + Stream Loss-Tolerant Authentication (TESLA) in the Secure + Real-time Transport Protocol (SRTP)", RFC 4383, + DOI 10.17487/RFC4383, February 2006, + <https://www.rfc-editor.org/info/rfc4383>. + + [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session + Description Protocol", RFC 4566, DOI 10.17487/RFC4566, + July 2006, <https://www.rfc-editor.org/info/rfc4566>. + + [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session + Description Protocol (SDP) Security Descriptions for Media + Streams", RFC 4568, DOI 10.17487/RFC4568, July 2006, + <https://www.rfc-editor.org/info/rfc4568>. + + [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. + Hakenberg, "RTP Retransmission Payload Format", RFC 4588, + DOI 10.17487/RFC4588, July 2006, + <https://www.rfc-editor.org/info/rfc4588>. + + [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, + "Codec Control Messages in the RTP Audio-Visual Profile + with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, + February 2008, <https://www.rfc-editor.org/info/rfc5104>. + + [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error + Correction", RFC 5109, DOI 10.17487/RFC5109, December + 2007, <https://www.rfc-editor.org/info/rfc5109>. + + [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, + "Session Traversal Utilities for NAT (STUN)", RFC 5389, + DOI 10.17487/RFC5389, October 2008, + <https://www.rfc-editor.org/info/rfc5389>. + + [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer + Security (DTLS) Extension to Establish Keys for the Secure + Real-time Transport Protocol (SRTP)", RFC 5764, + DOI 10.17487/RFC5764, May 2010, + <https://www.rfc-editor.org/info/rfc5764>. + + [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description + Protocol (SDP) Grouping Framework", RFC 5888, + DOI 10.17487/RFC5888, June 2010, + <https://www.rfc-editor.org/info/rfc5888>. + + [RFC6465] Ivov, E., Ed., Marocco, E., Ed., and J. Lennox, "A Real- + time Transport Protocol (RTP) Header Extension for Mixer- + to-Client Audio Level Indication", RFC 6465, + DOI 10.17487/RFC6465, December 2011, + <https://www.rfc-editor.org/info/rfc6465>. + + [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP + Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, + <https://www.rfc-editor.org/info/rfc7201>. + + [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services + (Diffserv) and Real-Time Communication", RFC 7657, + DOI 10.17487/RFC7657, November 2015, + <https://www.rfc-editor.org/info/rfc7657>. + + [RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., + and M. Stiemerling, Ed., "Real-Time Streaming Protocol + Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December + 2016, <https://www.rfc-editor.org/info/rfc7826>. + + [RFC7983] Petit-Huguenin, M. and G. Salgueiro, "Multiplexing Scheme + Updates for Secure Real-time Transport Protocol (SRTP) + Extension for Datagram Transport Layer Security (DTLS)", + RFC 7983, DOI 10.17487/RFC7983, September 2016, + <https://www.rfc-editor.org/info/rfc7983>. + + [RFC8088] Westerlund, M., "How to Write an RTP Payload Format", + RFC 8088, DOI 10.17487/RFC8088, May 2017, + <https://www.rfc-editor.org/info/rfc8088>. + + [RFC8108] Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, + "Sending Multiple RTP Streams in a Single RTP Session", + RFC 8108, DOI 10.17487/RFC8108, March 2017, + <https://www.rfc-editor.org/info/rfc8108>. + + [RFC8445] Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive + Connectivity Establishment (ICE): A Protocol for Network + Address Translator (NAT) Traversal", RFC 8445, + DOI 10.17487/RFC8445, July 2018, + <https://www.rfc-editor.org/info/rfc8445>. + + [RFC8871] Jones, P., Benham, D., and C. Groves, "A Solution + Framework for Private Media in Privacy-Enhanced RTP + Conferencing (PERC)", RFC 8871, DOI 10.17487/RFC8871, + January 2021, <https://www.rfc-editor.org/info/rfc8871>. + +Appendix A. Dismissing Payload Type Multiplexing + + This section documents a number of reasons why using the payload type + as a multiplexing point is unsuitable for most issues related to + multiple RTP streams. Attempting to use payload type multiplexing + beyond its defined usage has well-known negative effects on RTP, as + discussed below. To use the payload type as the single discriminator + for multiple streams implies that all the different RTP streams are + being sent with the same SSRC, thus using the same timestamp and + sequence number space. The many effects of using payload type + multiplexing are as follows: + + 1. Constraints are placed on the RTP timestamp rate for the + multiplexed media. For example, RTP streams that use different + RTP timestamp rates cannot be combined, as the timestamp values + need to be consistent across all multiplexed media frames. + Thus, streams are forced to use the same RTP timestamp rate. + When this is not possible, payload type multiplexing cannot be + used. + + 2. Many RTP payload formats can fragment a media object over + multiple RTP packets, like parts of a video frame. These + payload formats need to determine the order of the fragments to + correctly decode them. Thus, it is important to ensure that all + fragments related to a frame or a similar media object are + transmitted in sequence and without interruptions within the + object. This can be done relatively easily on the sender side + by ensuring that the fragments of each RTP stream are sent in + sequence. + + 3. Some media formats require uninterrupted sequence number space + between media parts. These are media formats where any missing + RTP sequence number will result in decoding failure or invoking + a repair mechanism within a single media context. The text/t140 + payload format [RFC4103] is an example of such a format. These + formats will need a sequence numbering abstraction function + between RTP and the individual RTP stream before being used with + payload type multiplexing. + + 4. Sending multiple media streams in the same sequence number space + makes it impossible to determine which media stream lost a + packet. Such a scenario causes difficulties, since the receiver + cannot determine to which stream it should apply packet-loss + concealment or other stream-specific loss-mitigation mechanisms. + + 5. If RTP retransmission [RFC4588] is used and packet loss occurs, + it is possible to ask for the missing packet(s) by SSRC and + sequence number -- not by payload type. If only some of the + payload type multiplexed streams are of interest, there is no + way to tell which missing packet or packets belong to the stream + or streams of interest, and all lost packets need to be + requested, wasting bandwidth. + + 6. The current RTCP feedback mechanisms are built around providing + feedback on RTP streams based on stream ID (SSRC), packet + (sequence numbers), and time interval (RTP timestamps). There + is almost never a field to indicate which payload type is + reported, so sending feedback for a specific RTP payload type is + difficult without extending existing RTCP reporting. + + 7. The current RTCP media control messages specification [RFC5104] + is oriented around controlling particular media flows, i.e., + requests are done by addressing a particular SSRC. Such + mechanisms would need to be redefined to support payload type + multiplexing. + + 8. The number of payload types is inherently limited. Accordingly, + using payload type multiplexing limits the number of streams + that can be multiplexed and does not scale. This limitation is + exacerbated if one uses solutions like RTP and RTCP multiplexing + [RFC5761] where a number of payload types are blocked due to the + overlap between RTP and RTCP. + + 9. At times, there is a need to group multiplexed streams. This is + currently possible for RTP sessions and SSRCs, but there is no + defined way to group payload types. + + 10. It is currently not possible to signal bandwidth requirements + per RTP stream when using payload type multiplexing. + + 11. Most existing SDP media-level attributes cannot be applied on a + per-payload-type basis and would require redefinition in that + context. + + 12. A legacy endpoint that does not understand the indication that + different RTP payload types are different RTP streams might be + slightly confused by the large amount of possibly overlapping or + identically defined RTP payload types. + +Appendix B. Signaling Considerations + + Signaling is not an architectural consideration for RTP itself, so + this discussion has been moved to an appendix. However, it is + extremely important for anyone building complete applications, so it + is deserving of discussion. + + We document some issues here that need to be addressed when using + some form of signaling to establish RTP sessions. These issues + cannot be addressed by simply tweaking, extending, or profiling RTP; + rather, they require a dedicated and in-depth look at the signaling + primitives that set up the RTP sessions. + + There exist various signaling solutions for establishing RTP + sessions. Many are based on SDP [RFC4566]; however, SDP + functionality is also dependent on the signaling protocols carrying + the SDP. The Real-Time Streaming Protocol (RTSP) [RFC7826] and the + Session Announcement Protocol (SAP) [RFC2974] both use SDP in a + declarative fashion, while SIP [RFC3261] uses SDP with the additional + definition of offer/answer [RFC3264]. The impact on signaling, and + especially on SDP, needs to be considered, as it can greatly affect + how to deploy a certain multiplexing point choice. + +B.1. Session-Oriented Properties + + One aspect of existing signaling protocols is that they are focused + on RTP sessions or, in the case of SDP, the concept of media + descriptions. A number of things are signaled at the media + description level, but those are not necessarily strictly bound to an + RTP session and could be of interest for signaling, especially for a + particular RTP stream (SSRC) within the session. The following + properties have been identified as being potentially useful for + signaling, and not only at the RTP session level: + + * Bitrate and/or bandwidth can be specified today only as an + aggregate limit, or as a common "any RTP stream" limit, unless + either codec-specific bandwidth limiting or RTCP signaling using + Temporary Maximum Media Stream Bit Rate Request (TMMBR) messages + [RFC5104] is used. + + * Which SSRC will use which RTP payload type (this information will + be visible in the first media packet but is sometimes useful to + have before the packet arrives). + + Some of these issues are clearly SDP's problem rather than RTP + limitations. However, if the aim is to deploy a solution that uses + several SSRCs and contains several sets of RTP streams with different + properties (encoding/packetization parameters, bitrate, etc.), + putting each set in a different RTP session would directly enable + negotiation of the parameters for each set. If insisting on + additional SSRCs only, a number of signaling extensions are needed to + clarify that there are multiple sets of RTP streams with different + properties and that they in fact need to be kept different, since a + single set will not satisfy the application's requirements. + + For some parameters, such as RTP payload type, resolution, and frame + rate, an SSRC-linked mechanism has been proposed in [RFC8851]. + +B.2. SDP Prevents Multiple Media Types + + SDP uses the "m=" line to both delineate an RTP session and specify + the top-level media type: audio, video, text, image, application. + This media type is used as the top-level media type for identifying + the actual payload format and is bound to a particular payload type + using the "a=rtpmap:" attribute. This binding has to be loosened in + order to use SDP to describe RTP sessions containing multiple top- + level media types. + + [RFC8843] describes how to let multiple SDP media descriptions use a + single underlying transport in SDP, which allows the definition of + one RTP session with different top-level media types. + +B.3. Signaling RTP Stream Usage + + RTP streams being transported in RTP have a particular usage in an + RTP application. In many applications to date, this usage of the RTP + stream is implicitly signaled. For example, an application might + choose to take all incoming audio RTP streams, mix them, and play + them out. However, in more-advanced applications that use multiple + RTP streams, there will be more than a single usage or purpose among + the set of RTP streams being sent or received. RTP applications will + need to somehow signal this usage. The signaling that is used will + have to identify the RTP streams affected by their RTP-level + identifiers, which means that they have to be identified by either + their session or their SSRC + session. + + In some applications, the receiver cannot utilize the RTP stream at + all before it has received the signaling message describing the RTP + stream and its usage. In other applications, there exists a default + handling method that is appropriate. + + If all RTP streams in an RTP session are to be treated in the same + way, identifying the session is enough. If SSRCs in a session are to + be treated differently, signaling needs to identify both the session + and the SSRC. + + If this signaling affects how any RTP central node, like an RTP mixer + or translator that selects, mixes, or processes streams, treats the + streams, the node will also need to receive the same signaling to + know how to treat RTP streams with different usages in the right + fashion. + +Acknowledgments + + The authors would like to acknowledge and thank Cullen Jennings, Dale + R. Worley, Huang Yihong (Rachel), Benjamin Kaduk, Mirja Kühlewind, + and Vijay Gurbani for review and comments. + +Contributors + + Hui Zheng (Marvin) contributed to WG draft versions -04 and -05 of + the document. + +Authors' Addresses + + Magnus Westerlund + Ericsson + Torshamnsgatan 23 + SE-164 80 Kista + Sweden + + Email: magnus.westerlund@ericsson.com + + + Bo Burman + Ericsson + Gronlandsgatan 31 + SE-164 60 Kista + Sweden + + Email: bo.burman@ericsson.com + + + Colin Perkins + University of Glasgow + School of Computing Science + Glasgow + G12 8QQ + United Kingdom + + Email: csp@csperkins.org + + + Harald Tveit Alvestrand + Google + Kungsbron 2 + SE-11122 Stockholm + Sweden + + Email: harald@alvestrand.no + + + Roni Even + + Email: ron.even.tlv@gmail.com |