1 files changed, 2046 insertions, 0 deletions
diff --git a/doc/rfc/rfc8872.txt b/doc/rfc/rfc8872.txt
new file mode 100644
index 0000000..326b481
--- /dev/null
+++ b/doc/rfc/rfc8872.txt
@@ -0,0 +1,2046 @@
+
+
+
+
+Internet Engineering Task Force (IETF)                     M. Westerlund
+Request for Comments: 8872                                     B. Burman
+Category: Informational                                         Ericsson
+ISSN: 2070-1721                                               C. Perkins
+                                                   University of Glasgow
+                                                           H. Alvestrand
+                                                                  Google
+                                                                 R. Even
+                                                            January 2021
+
+
+    Guidelines for Using the Multiplexing Features of RTP to Support
+                         Multiple Media Streams
+
+Abstract
+
+   The Real-time Transport Protocol (RTP) is a flexible protocol that
+   can be used in a wide range of applications, networks, and system
+   topologies.  That flexibility makes for wide applicability but can
+   complicate the application design process.  One particular design
+   question that has received much attention is how to support multiple
+   media streams in RTP.  This memo discusses the available options and
+   design trade-offs, and provides guidelines on how to use the
+   multiplexing features of RTP to support multiple media streams.
+
+Status of This Memo
+
+   This document is not an Internet Standards Track specification; it is
+   published for informational purposes.
+
+   This document is a product of the Internet Engineering Task Force
+   (IETF).  It represents the consensus of the IETF community.  It has
+   received public review and has been approved for publication by the
+   Internet Engineering Steering Group (IESG).  Not all documents
+   approved by the IESG are candidates for any level of Internet
+   Standard; see Section 2 of RFC 7841.
+
+   Information about the current status of this document, any errata,
+   and how to provide feedback on it may be obtained at
+   https://www.rfc-editor.org/info/rfc8872.
+
+Copyright Notice
+
+   Copyright (c) 2021 IETF Trust and the persons identified as the
+   document authors.  All rights reserved.
+
+   This document is subject to BCP 78 and the IETF Trust's Legal
+   Provisions Relating to IETF Documents
+   (https://trustee.ietf.org/license-info) in effect on the date of
+   publication of this document.  Please review these documents
+   carefully, as they describe your rights and restrictions with respect
+   to this document.  Code Components extracted from this document must
+   include Simplified BSD License text as described in Section 4.e of
+   the Trust Legal Provisions and are provided without warranty as
+   described in the Simplified BSD License.
+
+Table of Contents
+
+   1.  Introduction
+   2.  Definitions
+     2.1.  Terminology
+     2.2.  Focus of This Document
+   3.  RTP Multiplexing Overview
+     3.1.  Reasons for Multiplexing and Grouping RTP Streams
+     3.2.  RTP Multiplexing Points
+       3.2.1.  RTP Session
+       3.2.2.  Synchronization Source (SSRC)
+       3.2.3.  Contributing Source (CSRC)
+       3.2.4.  RTP Payload Type
+     3.3.  Issues Related to RTP Topologies
+     3.4.  Issues Related to RTP and RTCP
+       3.4.1.  The RTP Specification
+       3.4.2.  Multiple SSRCs in a Session
+       3.4.3.  Binding Related Sources
+       3.4.4.  Forward Error Correction
+   4.  Considerations for RTP Multiplexing
+     4.1.  Interworking Considerations
+       4.1.1.  Application Interworking
+       4.1.2.  RTP Translator Interworking
+       4.1.3.  Gateway Interworking
+       4.1.4.  Legacy Considerations for Multiple SSRCs
+     4.2.  Network Considerations
+       4.2.1.  Quality of Service
+       4.2.2.  NAT and Firewall Traversal
+       4.2.3.  Multicast
+     4.3.  Security and Key-Management Considerations
+       4.3.1.  Security Context Scope
+       4.3.2.  Key Management for Multi-party Sessions
+       4.3.3.  Complexity Implications
+   5.  RTP Multiplexing Design Choices
+     5.1.  Multiple Media Types in One Session
+     5.2.  Multiple SSRCs of the Same Media Type
+     5.3.  Multiple Sessions for One Media Type
+     5.4.  Single SSRC per Endpoint
+     5.5.  Summary
+   6.  Guidelines
+   7.  IANA Considerations
+   8.  Security Considerations
+   9.  References
+     9.1.  Normative References
+     9.2.  Informative References
+   Appendix A.  Dismissing Payload Type Multiplexing
+   Appendix B.  Signaling Considerations
+     B.1.  Session-Oriented Properties
+     B.2.  SDP Prevents Multiple Media Types
+     B.3.  Signaling RTP Stream Usage
+   Acknowledgments
+   Contributors
+   Authors' Addresses
+
+1.  Introduction
+
+   The Real-time Transport Protocol (RTP) [RFC3550] is a commonly used
+   protocol for real-time media transport.  It is a protocol that
+   provides great flexibility and can support a large set of different
+   applications.  From the beginning, RTP was designed for multiple
+   participants in a communication session.  It supports many topology
+   paradigms and usages, as defined in [RFC7667].  RTP has several
+   multiplexing points designed for different purposes; these points
+   enable support of multiple RTP streams and switching between
+   different encoding or packetization techniques for the media.  By
+   using multiple RTP sessions, sets of RTP streams can be structured
+   for efficient processing or identification.  Thus, to meet an
+   application's needs, an RTP application designer needs to understand
+   how best to use the RTP session, the RTP stream identifier
+   (synchronization source (SSRC)), and the RTP payload type.
+
+   There has been increased interest in more-advanced usage of RTP.  For
+   example, multiple RTP streams can be used when a single endpoint has
+   multiple media sources (like multiple cameras or microphones) from
+   which streams of media need to be sent simultaneously.  Consequently,
+   questions are raised regarding the most appropriate RTP usage.  The
+   limitations in some implementations, RTP/RTCP extensions, and
+   signaling have also been exposed.  This document aims to clarify the
+   usefulness of some functionalities in RTP that, hopefully, will
+   result in future implementations that are more complete.
+
+   The purpose of this document is to provide clear information about
+   the possibilities of RTP when it comes to multiplexing.  The RTP
+   application designer needs to understand the implications arising
+   from a particular usage of the RTP multiplexing points.  This
+   document provides some guidelines and recommends against some usages
+   as being unsuitable, in general or for particular purposes.
+
+   This document starts with some definitions and then goes into
+   existing RTP functionalities around multiplexing.  Both the desired
+   behavior and the implications of a particular behavior depend on
+   which topologies are used; therefore, this topic requires some
+   consideration.  We then discuss some choices regarding multiplexing
+   behavior and the impacts of those choices.  Some designs of RTP usage
+   are also discussed.  Finally, some guidelines and examples are
+   provided.
+
+2.  Definitions
+
+2.1.  Terminology
+
+   The definitions in Section 3 of [RFC3550] are referenced normatively.
+
+   The taxonomy defined in [RFC7656] is referenced normatively.
+
+   The following terms and abbreviations are used in this document:
+
+   Multi-party:
+      Communication that includes multiple endpoints.  In this document,
+      "multi-party" will be used to refer to scenarios where more than
+      two endpoints communicate.
+
+   Multiplexing:
+      An operation that takes multiple entities as input, aggregating
+      them onto some common resource while keeping the individual
+      entities addressable such that they can later be fully and
+      unambiguously separated (demultiplexed) again.
+
+   RTP Receiver:
+      An endpoint or middlebox receiving RTP streams and RTCP messages.
+      It uses at least one SSRC to send RTCP messages.  An RTP receiver
+      may also be an RTP sender.
+
+   RTP Sender:
+      An endpoint sending one or more RTP streams but also sending RTCP
+      messages.
+
+   RTP Session Group:
+      One or more RTP sessions that are used together to perform some
+      function.  Examples include multiple RTP sessions used to carry
+      different layers of a layered encoding.  In an RTP Session Group,
+      CNAMEs are assumed to be valid across all RTP sessions and
+      designate synchronization contexts that can cross RTP sessions;
+      i.e., SSRCs that map to a common CNAME can be assumed to have RTCP
+      Sender Report (SR) timing information derived from a common clock
+      such that they can be synchronized for playout.
+
+   Signaling:
+      The process of configuring endpoints to participate in one or more
+      RTP sessions.
+
+      |  Note: The above definitions of "RTP receiver" and "RTP sender"
+      |  are consistent with the usage in [RFC3550].
+
+2.2.  Focus of This Document
+
+   This document is focused on issues that affect RTP.  Thus, issues
+   that involve signaling protocols -- such as whether SIP [RFC3261],
+   Jingle [JINGLE], or some other protocol is in use for session
+   configuration; the particular syntaxes used to define RTP session
+   properties; or the constraints imposed by particular choices in the
+   signaling protocols -- are mentioned only as examples in order to
+   describe the RTP issues more precisely.
+
+   This document assumes that the applications will use RTCP.  While
+   there are applications that don't send RTCP, they do not conform to
+   the RTP specification and thus can be regarded as reusing the RTP
+   packet format but not implementing RTP.
+
+3.  RTP Multiplexing Overview
+
+3.1.  Reasons for Multiplexing and Grouping RTP Streams
+
+   There are several reasons why an endpoint might choose to send
+   multiple media streams.  In the discussion below, please keep in mind
+   that the reasons for having multiple RTP streams vary and include,
+   but are not limited to, the following:
+
+   *  There might be multiple media sources.
+
+   *  Multiple RTP streams might be needed to represent one media
+      source, for example:
+
+      -  To carry different layers of a scalable encoding of a media
+         source
+
+      -  Alternative encodings during simulcast, using different codecs
+         for the same audio stream
+
+      -  Alternative formats during simulcast, multiple resolutions of
+         the same video stream
+
+   *  A retransmission stream might repeat some parts of the content of
+      another RTP stream.
+
+   *  A Forward Error Correction (FEC) stream might provide material
+      that can be used to repair another RTP stream.
+
+   For each of these reasons, it is necessary to decide whether each
+   additional RTP stream is sent within the same RTP session as the
+   other RTP streams or it is necessary to use additional RTP sessions
+   to group the RTP streams.  For a combination of reasons, the suitable
+   choice for one situation might not be the suitable choice for another
+   situation.  The choice is easiest when multiplexing multiple media
+   sources of the same media type.  However, all reasons warrant
+   discussion and clarification regarding how to deal with them.  As the
+   discussion below will show, a single solution does not suit all
+   purposes.  To utilize RTP well and as efficiently as possible, both
+   are needed.  The real issue is knowing when to create multiple RTP
+   sessions versus when to send multiple RTP streams in a single RTP
+   session.
+
+3.2.  RTP Multiplexing Points
+
+   This section describes the multiplexing points present in RTP that
+   can be used to distinguish RTP streams and groups of RTP streams.
+   Figure 1 outlines the process of demultiplexing incoming RTP streams,
+   starting with one or more sockets representing the reception of one
+   or more transport flows, e.g., based on the UDP destination port.  It
+   also demultiplexes RTP/RTCP from any other protocols, such as Session
+   Traversal Utilities for NAT (STUN) [RFC5389] and DTLS-SRTP [RFC5764]
+   on the same transport as described in [RFC7983].  The Processing and
+   Buffering (PB) step in Figure 1 terminates RTP/RTCP and prepares the
+   RTP payload for input to the decoder.
+
+                      |   |   |
+                      |   |   | packets
+           +--        v   v   v
+           |        +------------+
+           |        |  Socket(s) |   Transport Protocol Demultiplexing
+           |        +------------+
+           |            ||  ||
+      RTP  |       RTP/ ||  |+-----> DTLS (SRTP keying, SCTP, etc.)
+   Session |       RTCP ||  +------> STUN (multiplexed using same port)
+           +--          ||
+           +--          ||
+           |      ++(split by SSRC)-++---> Identify SSRC collision
+           |      ||    ||    ||    ||
+           | (associate with signaling by MID/RID)
+           |      vv    vv    vv    vv
+     RTP   |     +--+  +--+  +--+  +--+ Jitter buffer,
+   Streams |     |PB|  |PB|  |PB|  |PB| process RTCP, etc.
+           |     +--+  +--+  +--+  +--+
+           +--     |    |      |    |
+             (select decoder based on payload type (PT))
+           +--     |   /       |  /
+           |       +-----+     | /
+           |         /   |     |/
+   Payload |        v    v     v
+   Formats |     +---+ +---+ +---+
+           |     |Dec| |Dec| |Dec| Decoders
+           |     +---+ +---+ +---+
+           +--
+
+                    Figure 1: RTP Demultiplexing Process
+
+3.2.1.  RTP Session
+
+   An RTP session is the highest semantic layer in RTP and represents an
+   association between a group of communicating endpoints.  RTP does not
+   contain a session identifier, yet different RTP sessions must be
+   possible to identify both across a set of different endpoints and
+   from the perspective of a single endpoint.
+
+   For RTP session separation across endpoints, the set of participants
+   that form an RTP session is defined as those that share a single SSRC
+   space [RFC3550].  That is, if a group of participants are each aware
+   of the SSRC identifiers belonging to the other participants, then
+   those participants are in a single RTP session.  A participant can
+   become aware of an SSRC identifier by receiving an RTP packet
+   containing the identifier in the SSRC field or contributing source
+   (CSRC) list, by receiving an RTCP packet listing it in an SSRC field,
+   or through signaling (e.g., the Session Description Protocol (SDP)
+   [RFC4566] "a=ssrc:" attribute [RFC5576]).  Thus, the scope of an RTP
+   session is determined by the participants' network interconnection
+   topology, in combination with RTP and RTCP forwarding strategies
+   deployed by the endpoints and any middleboxes, and by the signaling.
+
+   For RTP session separation within a single endpoint, RTP relies on
+   the underlying transport layer and the signaling to identify RTP
+   sessions in a manner that is meaningful to the application.  A single
+   endpoint can have one or more transport flows for the same RTP
+   session, and a single RTP session can span multiple transport-layer
+   flows even if all endpoints use a single transport-layer flow per
+   endpoint for that RTP session.  The signaling layer might give RTP
+   sessions an explicit identifier, or the identification might be
+   implicit based on the addresses and ports used.  Accordingly, a
+   single RTP session can have multiple associated identifiers, explicit
+   and implicit, belonging to different contexts.  For example, when
+   running RTP on top of UDP/IP, an endpoint can identify and delimit an
+   RTP session from other RTP sessions by their UDP source and
+   destination IP addresses and their UDP port numbers.  A single RTP
+   session can be using multiple IP/UDP flows for receiving and/or
+   sending RTP packets to other endpoints or middleboxes, even if the
+   endpoint does not have multiple IP addresses.  Using multiple IP
+   addresses only makes it more likely that multiple IP/UDP flows will
+   be required.  Another example is SDP media descriptions (the "m="
+   line and the subsequent associated lines) that signal the transport
+   flow and RTP session configuration for the endpoint's part of the RTP
+   session.  The SDP grouping framework [RFC5888] allows labeling of the
+   media descriptions to be used so that RTP Session Groups can be
+   created.  Through the use of "Negotiating Media Multiplexing Using
+   the Session Description Protocol (SDP)" [RFC8843], multiple media
+   descriptions become part of a common RTP session where each media
+   description represents the RTP streams sent or received for a media
+   source.
+
+   RTP makes no normative statements about the relationship between
+   different RTP sessions; however, applications that use more than one
+   RTP session need to understand how the different RTP sessions that
+   they create relate to one another.
+
+3.2.2.  Synchronization Source (SSRC)
+
+   An SSRC identifies a source of an RTP stream, or an RTP receiver when
+   sending RTCP.  Every endpoint has at least one SSRC identifier, even
+   if it does not send RTP packets.  RTP endpoints that are only RTP
+   receivers still send RTCP and use their SSRC identifiers in the RTCP
+   packets they send.  An endpoint can have multiple SSRC identifiers if
+   it sends multiple RTP streams.  Endpoints that function as both RTP
+   sender and RTP receiver use the same SSRC(s) in both roles.
+
+   The SSRC is a 32-bit identifier.  It is present in every RTP and RTCP
+   packet header and in the payload of some RTCP packet types.  It can
+   also be present in SDP signaling.  Unless presignaled, e.g., using
+   the SDP "a=ssrc:" attribute [RFC5576], the SSRC is chosen at random.
+   It is not dependent on the network address of the endpoint and is
+   intended to be unique within an RTP session.  SSRC collisions can
+   occur and are handled as specified in [RFC3550] and [RFC5576],
+   resulting in the SSRC of the colliding RTP streams or receivers
+   changing.  An endpoint that changes its network transport address
+   during a session has to choose a new SSRC identifier to avoid being
+   interpreted as a looped source, unless a mechanism providing a
+   virtual transport (such as Interactive Connectivity Establishment
+   (ICE) [RFC8445]) abstracts the changes.
+
+   SSRC identifiers that belong to the same synchronization context
+   (i.e., that represent RTP streams that can be synchronized using
+   information in RTCP SR packets) use identical CNAME chunks in
+   corresponding RTCP source description (SDES) packets.  SDP signaling
+   can also be used to provide explicit SSRC grouping [RFC5576].
+
+   In some cases, the same SSRC identifier value is used to relate
+   streams in two different RTP sessions, such as in RTP retransmission
+   [RFC4588].  This is to be avoided, since there is no guarantee that
+   SSRC values are unique across RTP sessions.  In the case of RTP
+   retransmission [RFC4588], it is recommended to use explicit binding
+   of the source RTP stream and the redundancy stream, e.g., using the
+   RepairedRtpStreamId RTCP SDES item [RFC8852].  The
+   RepairedRtpStreamId is a rather recent mechanism, so one cannot
+   expect older applications to follow this recommendation.
+
+   Note that the RTP sequence number and RTP timestamp are scoped by the
+   SSRC and are thus specific per RTP stream.
+
+   Different types of entities use an SSRC to identify themselves, as
+   follows:
+
+   *  A real media source uses the SSRC to identify a "physical" media
+      source.
+
+   *  A conceptual media source uses the SSRC to identify the result of
+      applying some filtering function in a network node -- for example,
+      a filtering function in an RTP mixer that provides the most active
+      speaker based on some criteria, or a mix representing a set of
+      other sources.
+
+   *  An RTP receiver uses the SSRC to identify itself as the source of
+      its RTCP reports.
+
+   An endpoint that generates more than one media type, e.g., a
+   conference participant sending both audio and video, need not (and,
+   indeed, should not) use the same SSRC value across RTP sessions.
+   Using RTCP compound packets containing the CNAME SDES item is the
+   designated method for binding an SSRC to a CNAME, effectively cross-
+   correlating SSRCs within and between RTP sessions as coming from the
+   same endpoint.  The main property attributed to SSRCs associated with
+   the same CNAME is that they are from a particular synchronization
+   context and can be synchronized at playback.
+
+   An RTP receiver receiving a previously unseen SSRC value will
+   interpret it as a new source.  It might in fact be a previously
+   existing source that had to change its SSRC number due to an SSRC
+   conflict.  Using the media identification (MID) extension [RFC8843]
+   helps to identify which media source the new SSRC represents, and
+   using the restriction identifier (RID) extension [RFC8851] helps to
+   identify what encoding or redundancy stream it represents, even
+   though the SSRC changed.  However, the originator of the previous
+   SSRC ought to have ended the conflicting source by sending an RTCP
+   BYE for it prior to starting to send with the new SSRC, making the
+   new SSRC a new source.
+
+3.2.3.  Contributing Source (CSRC)
+
+   The CSRC is not a separate identifier.  Rather, an SSRC identifier is
+   listed as a CSRC in the RTP header of a packet generated by an RTP
+   mixer or video Multipoint Control Unit (MCU) / switch, if the
+   corresponding SSRC was in the header of one of the packets that
+   contributed to the output.
+
+   It is not possible, in general, to extract media represented by an
+   individual CSRC, since it is typically the result of a media merge
+   (e.g., mix) operation on the individual media streams corresponding
+   to the CSRC identifiers.  The exception is the case where only a
+   single CSRC is indicated, as this represents the forwarding of an RTP
+   stream that might have been modified.  The RTP header extension ("A
+   Real-time Transport Protocol (RTP) Header Extension for
+   Mixer-to-Client Audio Level Indication" [RFC6465]) expands on the
+   receiver's information about a packet with a CSRC list.  Due to these
+   restrictions, a CSRC will not be considered a fully qualified
+   multiplexing point and will be disregarded in the rest of this
+   document.
+
+3.2.4.  RTP Payload Type
+
+   Each RTP stream utilizes one or more RTP payload formats.  An RTP
+   payload format describes how the output of a particular media codec
+   is framed and encoded into RTP packets.  The payload format is
+   identified by the payload type (PT) field in the RTP packet header.
+   The combination of SSRC and PT therefore identifies a specific RTP
+   stream in a specific encoding format.  The format definition can be
+   taken from [RFC3551] for statically allocated payload types but ought
+   to be explicitly defined in signaling, such as SDP, for both static
+   and dynamic payload types.  The term "format" here includes those
+   aspects described by out-of-band signaling means; in SDP, the term
+   "format" includes media type, RTP timestamp sampling rate, codec,
+   codec configuration, payload format configurations, and various
+   robustness mechanisms such as redundant encodings [RFC2198].
+
+   The RTP payload type is scoped by the sending endpoint within an RTP
+   session.  PT has the same meaning across all RTP streams in an RTP
+   session.  All SSRCs sent from a single endpoint share the same
+   payload type definitions.  The RTP payload type is designed such that
+   only a single payload type is valid at any instant in time in the RTP
+   stream's timestamp timeline, effectively time-multiplexing different
+   payload types if any change occurs.  The payload type can change on a
+   per-packet basis for an SSRC -- for example, a speech codec making
+   use of generic comfort noise [RFC3389].  If there is a true need to
+   send multiple payload types for the same SSRC that are valid for the
+   same instant, then redundant encodings [RFC2198] can be used.
+   Several additional constraints, other than those mentioned above,
+   need to be met to enable this usage, one of which is that the
+   combined payload sizes of the different payload types ought not
+   exceed the transport MTU.
+
+   Other aspects of using the RTP payload format are described in "How
+   to Write an RTP Payload Format" [RFC8088].
+
+   The payload type is not a multiplexing point at the RTP layer (see
+   Appendix A for a detailed discussion of why using the payload type as
+   an RTP multiplexing point does not work).  The RTP payload type is,
+   however, used to determine how to consume and decode an RTP stream.
+   The RTP payload type number is sometimes used to associate an RTP
+   stream with the signaling, which in general requires that unique RTP
+   payload type numbers be used in each context.  Using MID, e.g., when
+   bundling "m=" sections [RFC8843], can replace the payload type as a
+   signaling association, and unique RTP payload types are then no
+   longer required for that purpose.
+
+3.3.  Issues Related to RTP Topologies
+
+   The impact of how RTP multiplexing is performed will in general vary
+   with how the RTP session participants are interconnected, as
+   described in "RTP Topologies" [RFC7667].
+
+   Even the most basic use case -- "Topo-Point-to-Point" as described in
+   [RFC7667] -- raises a number of considerations, which are discussed
+   in detail in the following sections.  They range over such aspects as
+   the following:
+
+   *  Does my communication peer support RTP as defined with multiple
+      SSRCs per RTP session?
+
+   *  Do I need network differentiation in the form of QoS
+      (Section 4.2.1)?
+
+   *  Can the application more easily process and handle the media
+      streams if they are in different RTP sessions?
+
+   *  Do I need to use additional RTP streams for RTP retransmission or
+      FEC?
+
+   For some point-to-multipoint topologies (e.g., Topo-ASM and Topo-SSM
+   [RFC7667]), multicast is used to interconnect the session
+   participants.  Special considerations (documented in Section 4.2.3)
+   are then needed, as multicast is a one-to-many distribution system.
+
+   Sometimes, an RTP communication session can end up in a situation
+   where the communicating peers are not compatible, for various
+   reasons:
+
+   *  No common media codec for a media type, thus requiring
+      transcoding.
+
+   *  Different support for multiple RTP streams and RTP sessions.
+
+   *  Usage of different media transport protocols (i.e., one peer uses
+      RTP, but the other peer uses a different transport protocol).
+
+   *  Usage of different transport protocols, e.g., UDP, the Datagram
+      Congestion Control Protocol (DCCP), or TCP.
+
+   *  Different security solutions (e.g., IPsec, TLS, DTLS, or the
+      Secure Real-time Transport Protocol (SRTP)) with different keying
+      mechanisms.
+
+   These compatibility issues can often be resolved by the inclusion of
+   a translator between the two peers -- the Topo-PtP-Translator, as
+   described in [RFC7667].  The translator's main purpose is to make the
+   peers look compatible to each other.  There can also be reasons other
+   than compatibility for inserting a translator in the form of a
+   middlebox or gateway -- for example, a need to monitor the RTP
+   streams.  Beware that changing the stream transport characteristics
+   in the translator can require a thorough understanding of aspects
+   ranging from congestion control and media-level adaptations to
+   application-layer semantics.
+
+   Within the uses enabled by the RTP standard, the point-to-point
+   topology can contain one or more RTP sessions with one or more media
+   sources per session, each having one or more RTP streams per media
+   source.
+
+3.4.  Issues Related to RTP and RTCP
+
+   Using multiple RTP streams is a well-supported feature of RTP.
+   However, for most implementers or people writing RTP/RTCP
+   applications or extensions attempting to apply multiple streams, it
+   can be unclear when it is most appropriate to add an additional RTP
+   stream in an existing RTP session and when it is better to use
+   multiple RTP sessions.  This section discusses the various
+   considerations that need to be taken into account.
+
+3.4.1.  The RTP Specification
+
+   RFC 3550 contains some recommendations and a numbered list
+   (Section 5.2 of [RFC3550]) of five arguments regarding different
+   aspects of RTP multiplexing.  Please review Section 5.2 of [RFC3550].
+   Five important aspects are quoted below.
+
+   1.  |  If, say, two audio streams shared the same RTP session and the
+       |  same SSRC value, and one were to change encodings and thus
+       |  acquire a different RTP payload type, there would be no
+       |  general way of identifying which stream had changed encodings.
+
+       This argument advocates the use of different SSRCs for each
+       individual RTP stream, as this is fundamental to RTP operation.
+
+   2.  |  An SSRC is defined to identify a single timing and sequence
+       |  number space.  Interleaving multiple payload types would
+       |  require different timing spaces if the media clock rates
+       |  differ and would require different sequence number spaces to
+       |  tell which payload type suffered packet loss.
+
+       This argument advocates against demultiplexing RTP streams within
+       a session based only on their RTP payload type numbers; it still
+       stands, as can be seen by the extensive list of issues discussed
+       in Appendix A.
+
+   3.  |  The RTCP sender and receiver reports (see Section 6.4) can
+       |  only describe one timing and sequence number space per SSRC
+       |  and do not carry a payload type field.
+
+       This argument is yet another argument against payload type
+       multiplexing.
+
+   4.  |  An RTP mixer would not be able to combine interleaved streams
+       |  of incompatible media into one stream.
+
+       This argument advocates against multiplexing RTP packets that
+       require different handling into the same session.  In most cases,
+       the RTP mixer must embed application logic to handle streams; the
+       separation of streams according to stream type is just another
+       piece of application logic, which might or might not be
+       appropriate for a particular application.  One type of
+       application that can mix different media sources blindly is the
+       audio-only telephone bridge, although the ability to do that
+       comes from the well-defined scenario that is aided by the use of
+       a single media type, even though individual streams may use
+       incompatible codec types; most other types of applications need
+       application-specific logic to perform the mix correctly.
+
+   5.  |  Carrying multiple media in one RTP session precludes: the use
+       |  of different network paths or network resource allocations if
+       |  appropriate; reception of a subset of the media if desired,
+       |  for example just audio if video would exceed the available
+       |  bandwidth; and receiver implementations that use separate
+       |  processes for the different media, whereas using separate RTP
+       |  sessions permits either single- or multiple-process
+       |  implementations.
+
+       This argument discusses network aspects that are described in
+       Section 4.2.  It also goes into aspects of implementation, like
+       split component terminals (see Section 3.10 of [RFC7667]) --
+       endpoints where different processes or interconnected devices
+       handle different aspects of the whole multimedia session.
+
+   To summarize, RFC 3550's view on multiplexing is to use unique SSRCs
+   for anything that is its own media/packet stream and use different
+   RTP sessions for media streams that don't share a media type.  This
+   document supports the first point; it is very valid.  The latter
+   needs further discussion, as imposing a single solution on all usages
+   of RTP is inappropriate.  "Sending Multiple Types of Media in a
+   Single RTP Session" [RFC8860] updates RFC 3550 to allow multiple
+   media types in an RTP session and provides a detailed analysis of the
+   potential benefits and issues related to having multiple media types
+   in the same RTP session.  Thus, [RFC8860] provides a wider scope for
+   an RTP session and considers multiple media types in one RTP session
+   as a possible choice for the RTP application designer.
+
+3.4.2.  Multiple SSRCs in a Session
+
+   Using multiple SSRCs at one endpoint in an RTP session requires that
+   some unclear aspects of the RTP specification be resolved.  These
+   items could potentially lead to some interoperability issues as well
+   as some potential significant inefficiencies, as further discussed in
+   "Sending Multiple RTP Streams in a Single RTP Session" [RFC8108].  An
+   RTP application designer should consider these issues and the
+   application's possible impact caused by a lack of appropriate RTP
+   handling or optimization in the peer endpoints.
+
+   Using multiple RTP sessions can potentially mitigate application
+   issues caused by multiple SSRCs in an RTP session.
+
+3.4.3.  Binding Related Sources
+
+   A common problem in a number of various RTP extensions has been how
+   to bind related RTP streams together.  This issue is common to both
+   using additional SSRCs and multiple RTP sessions.
+
+   The solutions can be divided into a few groups:
+
+   *  RTP/RTCP based
+
+   *  Signaling based, e.g., SDP
+
+   *  Grouping related RTP sessions
+
+   *  Grouping SSRCs within an RTP session
+
+   Most solutions are explicit, but some implicit methods have also been
+   applied to the problem.
+
+   The SDP-based signaling solutions are:
+
+   SDP media description grouping:
+      The SDP grouping framework [RFC5888] uses various semantics to
+      group any number of media descriptions.  SDP media description
+      grouping has primarily been used to group RTP sessions, but in
+      combination with [RFC8843], it can also group multiple media
+      descriptions within a single RTP session.
+
+   SDP media multiplexing:
+      "Negotiating Media Multiplexing Using the Session Description
+      Protocol (SDP)" [RFC8843] uses information taken from both SDP and
+      RTCP to associate RTP streams to SDP media descriptions.  This
+      allows both SDP and RTCP to group RTP streams belonging to an SDP
+      media description and group multiple SDP media descriptions into a
+      single RTP session.
+
+   SDP SSRC grouping:
+      "Source-Specific Media Attributes in the Session Description
+      Protocol (SDP)" [RFC5576] includes a solution for grouping SSRCs
+      in the same way that the grouping framework groups media
+      descriptions.
+
+   The above grouping constructs support many use cases.  Those
+   solutions have shortcomings in cases where the session's dynamic
+   properties are such that it is difficult or a drain on resources to
+   keep the list of related SSRCs up to date.
+
+   One RTP/RTCP-based grouping solution is to use the RTCP SDES CNAME to
+   bind related RTP streams to an endpoint or a synchronization context.
+   For applications with a single RTP stream per type (media, source, or
+   redundancy stream), the CNAME is sufficient for that purpose,
+   independent of whether one or more RTP sessions are used.  However,
+   some applications choose not to use a CNAME because of perceived
+   complexity or a desire not to implement RTCP and instead use the same
+   SSRC value to bind related RTP streams across multiple RTP sessions.
+   RTP retransmission [RFC4588], when configured to use multiple RTP
+   sessions, and generic FEC [RFC5109] both use the CNAME method to
+   relate the RTP streams, which may work but might have some downsides
+   in RTP sessions with many participating SSRCs.  It is not recommended
+   to use identical SSRC values across RTP sessions to relate RTP
+   streams; when an SSRC collision occurs, this will force a change of
+   that SSRC in all RTP sessions and will thus resynchronize all of the
+   streams instead of only the single media stream experiencing the
+   collision.
+
+   Another method for implicitly binding SSRCs is used by RTP
+   retransmission [RFC4588] when using the same RTP session as the
+   source RTP stream for retransmissions.  A receiver that is missing a
+   packet issues an RTP retransmission request and then awaits a new
+   SSRC carrying the RTP retransmission payload, where that SSRC is from
+   the same CNAME.  This limits a requester to having only one
+   outstanding retransmission request on any new SSRCs per endpoint.
+
+   "RTP Payload Format Restrictions" [RFC8851] provides an RTP/RTCP-
+   based mechanism to unambiguously identify the RTP streams within an
+   RTP session and restrict the streams' payload format parameters in a
+   codec-agnostic way beyond what is provided with the regular payload
+   types.  The mapping is done by specifying an "a=rid" value in the SDP
+   offer/answer signaling and having the corresponding RtpStreamId value
+   as an SDES item and an RTP header extension [RFC8852].  The RID
+   solution also includes a solution for binding redundancy RTP streams
+   to their original source RTP streams, given that those streams use
+   RID identifiers.  The redundancy stream uses the RepairedRtpStreamId
+   SDES item and RTP header extension to declare the RtpStreamId value
+   of the source stream to create the binding.
+
+   Experience has shown that an explicit binding between the RTP
+   streams, agnostic of SSRC values, behaves well.  That way, solutions
+   using multiple RTP streams in a single RTP session and in multiple
+   RTP sessions will use the same type of binding.
+
+3.4.4.  Forward Error Correction
+
+   There exist a number of FEC-based schemes designed to mitigate packet
+   loss in the original streams.  Most of the FEC schemes protect a
+   single source flow.  This protection is achieved by transmitting a
+   certain amount of redundant information that is encoded such that it
+   can repair one or more instances of packet loss over the set of
+   packets the redundant information protects.  This sequence of
+   redundant information needs to be transmitted as its own media stream
+   or, in some cases, instead of the original media stream.  Thus, many
+   of these schemes create a need for binding related flows, as
+   discussed above.  Looking at the history of these schemes, there are
+   schemes using multiple SSRCs and schemes using multiple RTP sessions,
+   and some schemes that support both modes of operation.
+
+   Using multiple RTP sessions supports the case where some set of
+   receivers might not be able to utilize the FEC information.  By
+   placing it in a separate RTP session and if separating RTP sessions
+   at the transport level, FEC can easily be ignored at the transport
+   level, without considering any RTP-layer information.
+
+   In usages involving multicast, sending FEC information in a separate
+   multicast group allows for similar flexibility.  This is especially
+   useful when receivers see heterogeneous packet loss rates.  A
+   receiver can decide, based on measurement of experienced packet loss
+   rates, whether to join a multicast group with suitable FEC data
+   repair capabilities.
+
+4.  Considerations for RTP Multiplexing
+
+4.1.  Interworking Considerations
+
+   There are several different kinds of interworking, and this section
+   discusses two: interworking directly between different applications
+   and the interworking of applications through an RTP translator.  The
+   discussion includes the implications of potentially different RTP
+   multiplexing point choices and limitations that have to be considered
+   when working with some legacy applications.
+
+4.1.1.  Application Interworking
+
+   It is not uncommon that applications or services of similar but not
+   identical usage, especially those intended for interactive
+   communication, encounter a situation where one wants to interconnect
+   two or more of these applications.
+
+   In these cases, one ends up in a situation where one might use a
+   gateway to interconnect applications.  This gateway must then either
+   change the multiplexing structure or adhere to the respective
+   limitations in each application.
+
+   There are two fundamental approaches to building a gateway: using RTP
+   translator interworking (RTP bridging), where the gateway acts as an
+   RTP translator with the two interconnected applications being members
+   of the same RTP session; or using gateway interworking
+   (Section 4.1.3) with RTP termination, where there are independent RTP
+   sessions between each interconnected application and the gateway.
+
+   For interworking to be feasible, any security solution in use needs
+   to be compatible and capable of exchanging keys with either the peer
+   or the gateway under the trust model being used.  Secondly, the
+   applications need to use media streams in a way that makes sense in
+   both applications.
+
+4.1.2.  RTP Translator Interworking
+
+   From an RTP perspective, the RTP translator approach could work if
+   all the applications are using the same codecs with the same payload
+   types, have made the same multiplexing choices, and have the same
+   capabilities regarding the number of simultaneous RTP streams
+   combined with the same set of RTP/RTCP extensions being supported.
+   Unfortunately, this might not always be true.
+
+   When a gateway is implemented via an RTP translator, an important
+   consideration is if the two applications being interconnected need to
+   use the same approach to multiplexing.  If one side is using RTP
+   session multiplexing and the other is using SSRC multiplexing with
+   BUNDLE [RFC8843], it may be possible for the RTP translator to map
+   the RTP streams between both sides using some method, e.g., based on
+   the number and order of SDP "m=" lines from each side.  There are
+   also challenges related to SSRC collision handling, since, unless
+   SSRC translation is applied on the RTP translator, there may be a
+   collision on the SSRC multiplexing side that the RTP session
+   multiplexing side will not be aware of.  Furthermore, if one of the
+   applications is capable of working in several modes (such as being
+   able to use additional RTP streams in one RTP session or multiple RTP
+   sessions at will) and the other one is not, successful
+   interconnection depends on locking the more flexible application into
+   the operating mode where interconnection can be successful, even if
+   none of the participants are using the less flexible application when
+   the RTP sessions are being created.
+
+4.1.3.  Gateway Interworking
+
+   When one terminates RTP sessions at the gateway, there are certain
+   tasks that the gateway has to carry out:
+
+   *  Generating appropriate RTCP reports for all RTP streams (possibly
+      based on incoming RTCP reports) originating from SSRCs controlled
+      by the gateway.
+
+   *  Handling SSRC collision resolution in each application's RTP
+      sessions.
+
+   *  Signaling, choosing, and policing appropriate bitrates for each
+      session.
+
+   For applications that use any security mechanism, e.g., in the form
+   of SRTP, the gateway needs to be able to decrypt and verify source
+   integrity of the incoming packets and then re-encrypt, integrity
+   protect, and sign the packets as the peer in the other application's
+   security context.  This is necessary even if all that's needed is a
+   simple remapping of SSRC numbers.  If this is done, the gateway also
+   needs to be a member of the security contexts of both sides and thus
+   a trusted entity.
+
+   The gateway might also need to apply transcoding (for incompatible
+   codec types), media-level adaptations that cannot be solved through
+   media negotiation (such as rescaling for incompatible video size
+   requirements), suppression of content that is known not to be handled
+   in the destination application, or the addition or removal of
+   redundancy coding or scalability layers to fit the needs of the
+   destination domain.
+
+   From the above, we can see that the gateway needs to have an intimate
+   knowledge of the application requirements; a gateway is by its nature
+   application specific and not a commodity product.
+
+   These gateways might therefore potentially block application
+   evolution by blocking RTP and RTCP extensions that the applications
+   have been extended with but that are unknown to the gateway.
+
+   If one uses a security mechanism like SRTP, the gateway and the
+   necessary trust in it by the peers pose an additional risk to
+   communication security.  The gateway also incurs additional
+   complexities in the form of the decrypt-encrypt cycles needed for
+   each forwarded packet.  SRTP, due to its keying structure, also
+   requires that each RTP session need different master keys, as the use
+   of the same key in two RTP sessions can, for some ciphers, result in
+   a reuse of a one-time pad that completely breaks the confidentiality
+   of the packets.
+
+4.1.4.  Legacy Considerations for Multiple SSRCs
+
+   Historically, the most common RTP use cases have been point-to-point
+   Voice over IP (VoIP) or streaming applications, commonly with no more
+   than one media source per endpoint and media type (typically audio or
+   video).  Even in conferencing applications, especially voice-only,
+   the conference focus or bridge provides to each participant a single
+   stream containing a mix of the other participants.  It is also common
+   to have individual RTP sessions between each endpoint and the RTP
+   mixer, meaning that the mixer functions as an RTP-terminating
+   gateway.
+
+   Applications and systems that aren't updated to handle multiple
+   streams following these recommendations can have issues with
+   participating in RTP sessions containing multiple SSRCs within a
+   single session, such as:
+
+   1.  The need to handle more than one stream simultaneously rather
+       than replacing an already-existing stream with a new one.
+
+   2.  Being capable of decoding multiple streams simultaneously.
+
+   3.  Being capable of rendering multiple streams simultaneously.
+
+   This indicates that gateways attempting to interconnect to this class
+   of devices have to make sure that only one RTP stream of each media
+   type gets delivered to the endpoint if it's expecting only one and
+   that the multiplexing format is what the device expects.  It is
+   highly unlikely that RTP translator-based interworking can be made to
+   function successfully in such a context.
+
+4.2.  Network Considerations
+
+   The RTP implementer needs to consider that the RTP multiplexing
+   choice also impacts network-level mechanisms.
+
+4.2.1.  Quality of Service
+
+   QoS mechanisms are either flow based or packet marking based.  RSVP
+   [RFC2205] is an example of a flow-based mechanism, while Diffserv
+   [RFC2474] is an example of a packet-marking-based mechanism.
+
+   For a flow-based scheme, additional SSRCs will receive the same QoS
+   as all other RTP streams being part of the same 5-tuple (protocol,
+   source address, destination address, source port, destination port),
+   which is the most common selector for flow-based QoS.
+
+   For a packet-marking-based scheme, the method of multiplexing will
+   not affect the possibility of using QoS.  Different Differentiated
+   Services Code Points (DSCPs) can be assigned to different packets
+   within a transport flow (5-tuple) as well as within an RTP stream,
+   assuming the usage of UDP or other transport protocols that do not
+   have issues with packet reordering within the transport flow
+   (5-tuple).  To avoid packet-reordering issues, packets belonging to
+   the same RTP flow should limit their use of DSCPs to packets whose
+   corresponding Per-Hop Behavior (PHB) do not enable reordering.  If
+   the transport protocol being used assumes in-order delivery of
+   packets (e.g., TCP and the Stream Control Transmission Protocol
+   (SCTP)), then a single DSCP should be used.  For more discussion on
+   this topic, see [RFC7657].
+
+   The method for assigning marking to packets can impact what number of
+   RTP sessions to choose.  If this marking is done using a network
+   ingress function, it can have issues discriminating the different RTP
+   streams.  The network API on the endpoint also needs to be capable of
+   setting the marking on a per-packet basis to reach full
+   functionality.
+
+4.2.2.  NAT and Firewall Traversal
+
+   In today's networks, there exist a large number of middleboxes.
+   Those that normally have the most impact on RTP are Network Address
+   Translators (NATs) and Firewalls (FWs).
+
+   Below, we analyze and comment on the impact of requiring more
+   underlying transport flows in the presence of NATs and FWs:
+
+   Endpoint Port Consumption:
+      A given IP address only has 65536 available local ports per
+      transport protocol for all consumers of ports that exist on the
+      machine.  This is normally never an issue for an end-user machine.
+      It can become an issue for servers that handle a large number of
+      simultaneous streams.  However, if the application uses ICE to
+      authenticate STUN requests, a server can serve multiple endpoints
+      from the same local port and use the whole 5-tuple (source and
+      destination address, source and destination port, protocol) as the
+      identifier of flows after having securely bound them to the remote
+      endpoint address using the STUN request.  In theory, the minimum
+      number of media server ports needed is the maximum number of
+      simultaneous RTP sessions a single endpoint can use.  In practice,
+      implementations will probably benefit from using more server ports
+      to simplify implementation or avoid performance bottlenecks.
+
+   NAT State:
+      If an endpoint sits behind a NAT, each flow it generates to an
+      external address will result in a state that has to be kept in the
+      NAT.  That state is a limited resource.  In home or Small
+      Office/Home Office (SOHO) NATs, the most limited resource is
+      memory or processing.  For large-scale NATs serving many internal
+      endpoints, available external ports are likely the scarce
+      resource.  Port limitations are primarily a problem for larger
+      centralized NATs where endpoint-independent mapping requires each
+      flow to use one port for the external IP address.  This affects
+      the maximum number of internal users per external IP address.
+      However, as a comparison, a real-time video conference session
+      with audio and video likely uses less than 10 UDP flows, compared
+      to certain web applications that can use 100+ TCP flows to various
+      servers from a single browser instance.
+
+   Extra Delay Added by NAT Traversal:
+      Performing the NAT/FW traversal takes a certain amount of time for
+      each flow.  The best-case scenario for additional NAT/FW traversal
+      time after finding the first valid candidate pair following the
+      specified ICE procedures is 1.5*RTT + Ta*(Additional_Flows-1),
+      where Ta is the pacing timer.  That assumes a message in one
+      direction, immediately followed by a return message in the
+      opposite direction to confirm reachability.  It isn't more,
+      because ICE first finds one candidate pair that works, prior to
+      attempting to establish multiple flows.  Thus, there is no extra
+      time until one has found a working candidate pair.  Based on that
+      working pair, the extra time is needed to establish the additional
+      flows (two or three, in most cases) in parallel.  However, packet
+      loss causes extra delays of at least 500 ms (the minimal
+      retransmission timer for ICE).
+
+   NAT Traversal Failure Rate:
+      Due to the need to establish more than a single flow through the
+      NAT, there is some risk that establishing the first flow will
+      succeed but one or more of the additional flows will fail.  The
+      risk of this happening is hard to quantify but should be fairly
+      low, as one flow from the same interfaces has just been
+      successfully established.  Thus, only such rare events as NAT
+      resource overload, selecting particular port numbers that are
+      filtered, etc., ought to be reasons for failure.
+
+   Deep Packet Inspection and Multiple Streams:
+      FWs differ in how deeply they inspect packets.  Previous
+      experience using FWs and Session Border Gateways (SBGs) with RTP
+      shows that there is a significant risk that the FWs and SBGs will
+      reject RTP sessions that use multiple SSRCs.
+
+   Using additional RTP streams in the same RTP session and transport
+   flow does not introduce any additional NAT traversal complexities per
+   RTP stream.  This can be compared with (normally) one or two
+   additional transport flows per RTP session when using multiple RTP
+   sessions.  Additional lower-layer transport flows will be needed,
+   unless an explicit demultiplexing layer is added between RTP and the
+   transport protocol.  At the time of this writing, no such mechanism
+   was defined.
+
+4.2.3.  Multicast
+
+   Multicast groups provide a powerful tool for a number of real-time
+   applications, especially those that desire broadcast-like behaviors
+   with one endpoint transmitting to a large number of receivers, like
+   in IPTV.  An RTP/RTCP extension to better support Source-Specific
+   Multicast (SSM) [RFC5760] is also available.  Many-to-many
+   communication, which RTP [RFC3550] was originally built to support,
+   has several limitations in common with multicast.
+
+   One limitation is that, for any group, sender-side adaptations with
+   the intent to suit all receivers would have to adapt to the most
+   limited receiver experiencing the worst conditions among the group
+   participants, which imposes degradation for all participants.  For
+   broadcast-type applications with a large number of receivers, this is
+   not acceptable.  Instead, various receiver-based solutions are
+   employed to ensure that the receivers achieve the best possible
+   performance.  By using scalable encoding and placing each scalability
+   layer in a different multicast group, the receiver can control the
+   amount of traffic it receives.  To have each scalability layer in a
+   different multicast group, one RTP session per multicast group is
+   used.
+
+   In addition, the transport flow considerations in multicast are a bit
+   different from unicast; NATs with port translation are not useful in
+   the multicast environment, meaning that the entire port range of each
+   multicast address is available for distinguishing between RTP
+   sessions.
+
+   Thus, when using broadcast applications it appears easiest and most
+   straightforward to use multiple RTP sessions for sending different
+   media flows used for adapting to network conditions.  It is also
+   common that streams improving transport robustness are sent in their
+   own multicast group to allow for interworking with legacy
+   applications or to support different levels of protection.
+
+   Many-to-many applications have different needs, and the most
+   appropriate multiplexing choice will depend on how the actual
+   application is realized.  Multicast applications that are capable of
+   using sender-side congestion control can avoid the use of multiple
+   multicast sessions and RTP sessions that result from the use of
+   receiver-side congestion control.
+
+   The properties of a broadcast application using RTP multicast are as
+   follows:
+
+   1.  The application uses a group of RTP sessions -- not just one.
+       Each endpoint will need to be a member of a number of RTP
+       sessions in order to perform well.
+
+   2.  Within each RTP session, the number of RTP receivers is likely to
+       be much larger than the number of RTP senders.
+
+   3.  The application needs signaling functions to identify the
+       relationships between RTP sessions.
+
+   4.  The application needs signaling or RTP/RTCP functions to identify
+       the relationships between SSRCs in different RTP sessions when
+       more complex relations than those that can be expressed by the
+       CNAME exist.
+
+   Both broadcast and many-to-many multicast applications share a
+   signaling requirement; all of the participants need the same RTP and
+   payload type configuration.  Otherwise, A could, for example, be
+   using payload type 97 as the video codec H.264 while B thinks it is
+   MPEG-2.  SDP offer/answer [RFC3264] is not appropriate for ensuring
+   this property in a broadcast/multicast context.  The signaling
+   aspects of broadcast/multicast are not explored further in this memo.
+
+   Security solutions for this type of group communication are also
+   challenging.  First, the key-management mechanism and the security
+   protocol need to support group communication.  Second, source
+   authentication requires special solutions.  For more discussion on
+   this topic, please review "Options for Securing RTP Sessions"
+   [RFC7201].
+
+4.3.  Security and Key-Management Considerations
+
+   When dealing with point-to-point two-member RTP sessions only, there
+   are few security issues that are relevant to the choice of having one
+   RTP session or multiple RTP sessions.  However, there are a few
+   aspects of multi-party sessions that might warrant consideration.
+   For general information regarding possible methods of securing RTP,
+   please review [RFC7201].
+
+4.3.1.  Security Context Scope
+
+   When using SRTP [RFC3711], the security context scope is important
+   and can be a necessary differentiation in some applications.  As
+   SRTP's crypto suites are (so far) built around symmetric keys, the
+   receiver will need to have the same key as the sender.  As a result,
+   no one in a multi-party session can be certain that a received packet
+   was really sent by the claimed sender and not by another party having
+   access to the key.  The single SRTP algorithm not having this
+   property is Timed Efficient Stream Loss-Tolerant Authentication
+   (TESLA) source authentication [RFC4383].  However, TESLA adds delay
+   to achieve source authentication.  In most cases, symmetric ciphers
+   provide sufficient security properties, but in a few cases they can
+   create issues.
+
+   The first case is when someone leaves a multi-party session and one
+   wants to ensure that the party that left can no longer access the RTP
+   streams.  This requires that everyone rekey without disclosing the
+   new keys to the excluded party.
+
+   A second case is when security is used as an enforcing mechanism for
+   stream access differentiation between different receivers.  Take, for
+   example, a scalable layer or a high-quality simulcast version that
+   only users paying a premium are allowed to access.  The mechanism
+   preventing a receiver from getting the high-quality stream can be
+   based on the stream being encrypted with a key that users can't
+   access without paying a premium, using the key-management mechanism
+   to limit access to the key.
+
+   As specified in [RFC3711], SRTP uses unique keys per SSRC; however,
+   the original assumption was a single-session master key from which
+   SSRC-specific RTP and RTCP keys were derived.  However, that
+   assumption was proven incorrect, as the application usage and the
+   developed key-management mechanisms have chosen many different
+   methods for ensuring unique keys per SSRC.  The key-management
+   functions have different abilities to establish different sets of
+   keys, normally on a per-endpoint basis.  For example, DTLS-SRTP
+   [RFC5764] and Security Descriptions [RFC4568] establish different
+   keys for outgoing and incoming traffic from an endpoint.  This key
+   usage has to be written into the cryptographic context, possibly
+   associated with different SSRCs.  Thus, limitations do exist,
+   depending on the chosen key-management method and due to the
+   integration of particular implementations of the key-management
+   method and SRTP.
+
+4.3.2.  Key Management for Multi-party Sessions
+
+   The capabilities of the key-management method combined with the RTP
+   multiplexing choices affect the resulting security properties,
+   control over the secured media, and who has access to it.
+
+   Multi-party sessions contain at least one RTP stream from each active
+   participant.  Depending on the multi-party topology [RFC7667], each
+   participant can both send and receive multiple RTP streams.
+   Transport translator-based sessions (Topo-Trn-Translator) and
+   multicast sessions (Topo-ASM) can use neither Security Descriptions
+   [RFC4568] nor DTLS-SRTP [RFC5764] without an extension, because each
+   endpoint provides its own set of keys.  In centralized conferences,
+   the signaling counterpart is a conference server, and the transport
+   translator is the media-plane unicast counterpart (to which DTLS
+   messages would be sent).  Thus, an extension like Encrypted Key
+   Transport [RFC8870] or a solution based on Multimedia Internet KEYing
+   (MIKEY) [RFC3830] that allows for keying all session participants
+   with the same master key is needed.
+
+   Privacy-Enhanced RTP Conferencing (PERC) also enables a different
+   trust model with semi-trusted media-switching RTP middleboxes
+   [RFC8871].
+
+4.3.3.  Complexity Implications
+
+   There can be complex interactions between the choice of multiplexing
+   and topology and the security functions.  This becomes especially
+   evident in RTP topologies having any type of middlebox that processes
+   or modifies RTP/RTCP packets.  While the overhead of an RTP
+   translator or mixer rewriting an SSRC value in the RTP packet of an
+   unencrypted session is low, the cost is higher when using
+   cryptographic security functions.  For example, if using SRTP
+   [RFC3711], the actual security context and exact crypto key are
+   determined by the SSRC field value.  If one changes the SSRC value,
+   the encryption and authentication must use another key.  Thus,
+   changing the SSRC value implies a decryption using the old SSRC and
+   its security context, followed by an encryption using the new one.
+
+5.  RTP Multiplexing Design Choices
+
+   This section discusses how some RTP multiplexing design choices can
+   be used in applications to achieve certain goals and summarizes the
+   implications of such choices.  The benefits and downsides of each
+   design are also discussed.
+
+5.1.  Multiple Media Types in One Session
+
+   This design uses a single RTP session for multiple different media
+   types, like audio and video, and possibly also transport robustness
+   mechanisms like FEC or retransmission.  An endpoint can send zero,
+   one, or multiple media sources per media type, resulting in a number
+   of RTP streams of various media types for both source and redundancy
+   streams.
+
+   Advantages:
+
+   1.  Only a single RTP session is used, which implies:
+
+       *  Minimal need to keep NAT/FW state.
+
+       *  Minimal NAT/FW traversal cost.
+
+       *  Fate-sharing for all media flows.
+
+       *  Minimal overhead for security association establishment.
+
+   2.  Dynamic allocation of RTP streams can be handled almost entirely
+       at the RTP level.  The extent to which this allocation can be
+       kept at the RTP level depends on the application's needs for an
+       explicit indication of stream usage and in how timely a fashion
+       that information can be signaled.
+
+   Disadvantages:
+
+   1.  It is less suitable for interworking with other applications that
+       use individual RTP sessions per media type or multiple sessions
+       for a single media type, due to the risk of SSRC collisions and
+       thus a potential need for SSRC translation.
+
+   2.  Negotiation of individual bandwidths for the different media
+       types is currently only possible in SDP when using RID [RFC8851].
+
+   3.  It is not suitable for split component terminals (see
+       Section 3.10 of [RFC7667]).
+
+   4.  Flow-based QoS cannot be used to provide separate treatment of
+       RTP streams compared to others in the single RTP session.
+
+   5.  If there is significant asymmetry between the RTP streams' RTCP
+       reporting needs, there are some challenges related to
+       configuration and usage to avoid wasting RTCP reporting on the
+       RTP stream that does not need such frequent reporting.
+
+   6.  It is not suitable for applications where some receivers like to
+       receive only a subset of the RTP streams, especially if multicast
+       or a transport translator is being used.
+
+   7.  There are some additional concerns regarding legacy
+       implementations that do not support the RTP specification fully
+       when it comes to handling multiple SSRCs per endpoint, as
+       multiple simultaneous media types are sent as separate SSRCs in
+       the same RTP session.
+
+   8.  If the applications need finer control over which session
+       participants are included in different sets of security
+       associations, most key-management mechanisms will have
+       difficulties establishing such a session.
+
+5.2.  Multiple SSRCs of the Same Media Type
+
+   In this design, each RTP session serves only a single media type.
+   The RTP session can contain multiple RTP streams, from either a
+   single endpoint or multiple endpoints.  This commonly creates a low
+   number of RTP sessions, typically only one for audio and one for
+   video, with a corresponding need for two listening ports when using
+   RTP/RTCP multiplexing [RFC5761].
+
+   Advantages:
+
+   1.  It works well with split component terminals (see Section 3.10 of
+       [RFC7667]) where the split is per media type.
+
+   2.  It enables flow-based QoS with different prioritization levels
+       between media types.
+
+   3.  For applications with dynamic usage of RTP streams (i.e., streams
+       are frequently added and removed), having much of the state
+       associated with the RTP session rather than per individual SSRC
+       can avoid the need for in-session signaling of meta-information
+       about each SSRC.  In simple cases, this allows for unsignaled RTP
+       streams where session-level information and an RTCP SDES item
+       (e.g., CNAME) are sufficient.  In the more complex cases where
+       more source-specific metadata needs to be signaled, the SSRC can
+       be associated with an intermediate identifier, e.g., the MID
+       conveyed as an SDES item as defined in Section 15 of [RFC8843].
+
+   4.  The overhead of security association establishment is low.
+
+   Disadvantages:
+
+   1.  A slightly higher number of RTP sessions are needed, compared to
+       multiple media types in one session (Section 5.1).  This implies
+       the following:
+
+       *  More NAT/FW state is needed.
+
+       *  The cost of NAT/FW traversal is increased in terms of both
+          processing and delay.
+
+   2.  There is some potential for concern regarding legacy
+       implementations that don't support the RTP specification fully
+       when it comes to handling multiple SSRCs per endpoint.
+
+   3.  It is not possible to control security associations for sets of
+       RTP streams within the same media type with today's key-
+       management mechanisms, unless these are split into different RTP
+       sessions (Section 5.3).
+
+   For RTP applications where all RTP streams of the same media type
+   share the same usage, this structure provides efficiency gains in the
+   amount of network state used and provides more fate-sharing with
+   other media flows of the same type.  At the same time, it still
+   maintains almost all functionalities for the negotiation signaling of
+   properties per individual media type and also enables flow-based QoS
+   prioritization between media types.  It handles multi-party sessions
+   well, independently of multicast or centralized transport
+   distribution, as additional sources can dynamically enter and leave
+   the session.
+
+5.3.  Multiple Sessions for One Media Type
+
+   This design goes one step further than the design discussed in
+   Section 5.2 by also using multiple RTP sessions for a single media
+   type.  The main reason for going in this direction is that the RTP
+   application needs separation of the RTP streams according to their
+   usage, such as, for example, scalability over multicast, simulcast,
+   the need for extended QoS prioritization, or the need for fine-
+   grained signaling using RTP session-focused signaling tools.
+
+   Advantages:
+
+   1.  This design is more suitable for multicast usage where receivers
+       can individually select which RTP sessions they want to
+       participate in, assuming that each RTP session has its own
+       multicast group.
+
+   2.  When multiple different usages exist, the application can
+       indicate its usage of the RTP streams at the RTP session level.
+
+   3.  There is less need for SSRC-specific explicit signaling for each
+       media stream and thus a reduced need for explicit and timely
+       signaling when RTP streams are added or removed.
+
+   4.  It enables detailed QoS prioritization for flow-based mechanisms.
+
+   5.  It works well with split component terminals (see Section 3.10 of
+       [RFC7667]).
+
+   6.  The scope for who is included in a security association can be
+       structured around the different RTP sessions, thus enabling such
+       functionality with existing key-management mechanisms.
+
+   Disadvantages:
+
+   1.  There is an increased amount of session configuration state
+       compared to multiple SSRCs of the same media type (Section 5.2),
+       due to the increased amount of RTP sessions.
+
+   2.  For RTP streams that are part of scalability, simulcast, or
+       transport robustness, a method for binding sources across
+       multiple RTP sessions is needed.
+
+   3.  There is some potential for concern regarding legacy
+       implementations that don't support the RTP specification fully
+       when it comes to handling multiple SSRCs per endpoint.
+
+   4.  The overhead of security association establishment is higher, due
+       to the increased number of RTP sessions.
+
+   5.  If the applications need finer control over which participants in
+       a given RTP session are included in different sets of security
+       associations, most of today's key-management mechanisms will have
+       difficulties establishing such a session.
+
+   For more-complex RTP applications that have several different usages
+   for RTP streams of the same media type or that use scalability or
+   simulcast, this solution can enable those functions, at the cost of
+   increased overhead associated with the additional sessions.  This
+   type of structure is suitable for more-advanced applications as well
+   as multicast-based applications requiring differentiation to
+   different participants.
+
+5.4.  Single SSRC per Endpoint
+
+   In this design, each endpoint in a point-to-point session has only a
+   single SSRC; thus, the RTP session contains only two SSRCs -- one
+   local and one remote.  This session can be used either
+   unidirectionally (i.e., one SSRC sends an RTP stream that is received
+   by the other SSRC) or bidirectionally (i.e., the two SSRCs both send
+   an RTP stream and receive the RTP stream sent by the other endpoint).
+   If the application needs additional media flows between the
+   endpoints, it will have to establish additional RTP sessions.
+
+   Advantages:
+
+   1.  This design has great potential for interoperability with legacy
+       applications, as it will not tax any RTP stack implementations.
+
+   2.  The signaling system makes it possible to negotiate and describe
+       the exact formats and bitrates for each RTP stream, especially
+       using today's tools in SDP.
+
+   3.  It is possible to control security associations per RTP stream
+       with current key-management functions, since each RTP stream is
+       directly related to an RTP session and the most commonly used
+       keying mechanisms operate on a per-session basis.
+
+   Disadvantages:
+
+   1.  The amount of NAT/FW state grows linearly with the number of RTP
+       streams.
+
+   2.  NAT/FW traversal increases delay and resource consumption.
+
+   3.  There are likely more signaling message and signaling processing
+       requirements due to the increased amount of session-related
+       information.
+
+   4.  There is higher potential for a single RTP stream to fail during
+       transport between the endpoints, due to the need for a separate
+       NAT/FW traversal for every RTP stream, since there is only one
+       stream per session.
+
+   5.  The amount of explicit state for relating RTP streams grows,
+       depending on how the application relates RTP streams.
+
+   6.  Port consumption might become a problem for centralized services,
+       where the central node's port or 5-tuple filter consumption grows
+       rapidly with the number of sessions.
+
+   7.  For applications where RTP stream usage is highly dynamic, i.e.,
+       entities frequently enter and leave sessions, the amount of
+       signaling can become high.  Issues can also arise from the need
+       for timely establishment of additional RTP sessions.
+
+   8.  If, against the recommendation in [RFC3550], the same SSRC value
+       is reused in multiple RTP sessions rather than being randomly
+       chosen, interworking with applications that use a different
+       multiplexing structure will require SSRC translation.
+
+   RTP applications with a strong need to interwork with legacy RTP
+   applications can potentially benefit from this structure.  However, a
+   large number of media descriptions in SDP can also run into issues
+   with existing implementations.  For any application needing a larger
+   number of media flows, the overhead can become very significant.
+   This structure is also not suitable for non-mixed multi-party
+   sessions, as any given RTP stream from each participant, although
+   having the same usage in the application, needs its own RTP session.
+   In addition, the dynamic behavior that can arise in multi-party
+   applications can tax the signaling system and make timely media
+   establishment more difficult.
+
+5.5.  Summary
+
+   Both the "single SSRC per endpoint" (Section 5.4) and "multiple media
+   types in one session" (Section 5.1) cases require full explicit
+   signaling of the media stream relationships.  However, they operate
+   on two different levels, where the first primarily enables session-
+   level binding and the second needs SSRC-level binding.  From another
+   perspective, the two solutions are the two extremes when it comes to
+   the number of RTP sessions needed.
+
+   The two other designs -- multiple SSRCs of the same media type
+   (Section 5.2) and multiple sessions for one media type (Section 5.3)
+   -- are two examples that primarily allow for some implicit mapping of
+   the role or usage of the RTP streams based on which RTP session they
+   appear in.  Thus, they potentially allow for less signaling and, in
+   particular, reduce the need for real-time signaling in sessions with
+   a dynamically changing number of RTP streams.  They also represent
+   points between the first two designs when it comes to the amount of
+   RTP sessions established, i.e., they represent an attempt to balance
+   the amount of RTP sessions with the functionality the communication
+   session provides at both the network level and the signaling level.
+
+6.  Guidelines
+
+   This section contains a number of multi-stream guidelines for
+   implementers, system designers, and specification writers.
+
+   Do not require the use of the same SSRC value across RTP sessions:
+      As discussed in Section 3.4.3, there are downsides to using the
+      same SSRC in multiple RTP sessions as a mechanism to bind related
+      RTP streams together.  It is instead recommended to use a
+      mechanism to explicitly signal the relationship, in either
+      RTP/RTCP or the signaling mechanism used to establish the RTP
+      session(s).
+
+   Use additional RTP streams for additional media sources:
+      In the cases where an RTP endpoint needs to transmit additional
+      RTP streams of the same media type in the application, with the
+      same processing requirements at the network and RTP layers, it is
+      suggested to send them in the same RTP session.  For example, in
+      the case of a telepresence room where there are three cameras and
+      each camera captures two persons sitting at the table, we suggest
+      that each camera send its own RTP stream within a single RTP
+      session.
+
+   Use additional RTP sessions for streams with different
+   requirements:
+      When RTP streams have different processing requirements from the
+      network or the RTP layer at the endpoints, it is suggested that
+      the different types of streams be put in different RTP sessions.
+      This includes the case where different participants want different
+      subsets of the set of RTP streams.
+
+   Use grouping when using multiple RTP sessions:
+      When using multiple RTP session solutions, it is suggested to
+      explicitly group the involved RTP sessions when needed using a
+      signaling mechanism -- for example, see "The Session Description
+      Protocol (SDP) Grouping Framework" [RFC5888] -- using some
+      appropriate grouping semantics.
+
+   Ensure that RTP/RTCP extensions support multiple RTP streams as
+   well as multiple RTP sessions:
+      When defining an RTP or RTCP extension, the creator needs to
+      consider if this extension is applicable for use with additional
+      SSRCs and multiple RTP sessions.  Any extension intended to be
+      generic must support both.  Extensions that are not as generally
+      applicable will have to consider whether interoperability is
+      better served by defining a single solution or providing both
+      options.
+
+   Provide adequate extensions for transport support:
+      When defining new RTP/RTCP extensions intended for transport
+      support, like the retransmission or FEC mechanisms, they must
+      include support for both multiple RTP streams in the same RTP
+      session and multiple RTP sessions, such that application
+      developers can choose freely from the set of mechanisms without
+      concerning themselves with which of the multiplexing choices a
+      particular solution supports.
+
+7.  IANA Considerations
+
+   This document has no IANA actions.
+
+8.  Security Considerations
+
+   The security considerations discussed in the RTP specification
+   [RFC3550]; any applicable RTP profile [RFC3551] [RFC4585] [RFC3711];
+   and the extensions for sending multiple media types in a single RTP
+   session [RFC8860], RID [RFC8851], BUNDLE [RFC8843], [RFC5760], and
+   [RFC5761] apply if selected and thus need to be considered in the
+   evaluation.
+
+   Section 4.3 discusses the security implications of choosing multiple
+   SSRCs vs. multiple RTP sessions.
+
+9.  References
+
+9.1.  Normative References
+
+   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
+              Jacobson, "RTP: A Transport Protocol for Real-Time
+              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
+              July 2003, <https://www.rfc-editor.org/info/rfc3550>.
+
+   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
+              Video Conferences with Minimal Control", STD 65, RFC 3551,
+              DOI 10.17487/RFC3551, July 2003,
+              <https://www.rfc-editor.org/info/rfc3551>.
+
+   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
+              RFC 3711, DOI 10.17487/RFC3711, March 2004,
+              <https://www.rfc-editor.org/info/rfc3711>.
+
+   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
+              "Extended RTP Profile for Real-time Transport Control
+              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
+              DOI 10.17487/RFC4585, July 2006,
+              <https://www.rfc-editor.org/info/rfc4585>.
+
+   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
+              Media Attributes in the Session Description Protocol
+              (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009,
+              <https://www.rfc-editor.org/info/rfc5576>.
+
+   [RFC5760]  Ott, J., Chesterfield, J., and E. Schooler, "RTP Control
+              Protocol (RTCP) Extensions for Single-Source Multicast
+              Sessions with Unicast Feedback", RFC 5760,
+              DOI 10.17487/RFC5760, February 2010,
+              <https://www.rfc-editor.org/info/rfc5760>.
+
+   [RFC5761]  Perkins, C. and M. Westerlund, "Multiplexing RTP Data and
+              Control Packets on a Single Port", RFC 5761,
+              DOI 10.17487/RFC5761, April 2010,
+              <https://www.rfc-editor.org/info/rfc5761>.
+
+   [RFC7656]  Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
+              B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
+              for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
+              DOI 10.17487/RFC7656, November 2015,
+              <https://www.rfc-editor.org/info/rfc7656>.
+
+   [RFC7667]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
+              DOI 10.17487/RFC7667, November 2015,
+              <https://www.rfc-editor.org/info/rfc7667>.
+
+   [RFC8843]  Holmberg, C., Alvestrand, H., and C. Jennings,
+              "Negotiating Media Multiplexing Using the Session
+              Description Protocol (SDP)", RFC 8843,
+              DOI 10.17487/RFC8843, January 2021,
+              <https://www.rfc-editor.org/info/rfc8843>.
+
+   [RFC8851]  Roach, A.B., Ed., "RTP Payload Format Restrictions",
+              RFC 8851, DOI 10.17487/RFC8851, January 2021,
+              <https://www.rfc-editor.org/info/rfc8851>.
+
+   [RFC8852]  Roach, A.B., Nandakumar, S., and P. Thatcher, "RTP Stream
+              Identifier Source Description (SDES)", RFC 8852,
+              DOI 10.17487/RFC8852, January 2021,
+              <https://www.rfc-editor.org/info/rfc8852>.
+
+   [RFC8860]  Westerlund, M., Perkins, C., and J. Lennox, "Sending
+              Multiple Types of Media in a Single RTP Session",
+              RFC 8860, DOI 10.17487/RFC8860, January 2021,
+              <https://www.rfc-editor.org/info/rfc8860>.
+
+   [RFC8870]  Jennings, C., Mattsson, J., McGrew, D., Wing, D., and F.
+              Andreasen, "Encrypted Key Transport for DTLS and Secure
+              RTP", RFC 8870, DOI 10.17487/RFC8870, January 2021,
+              <https://www.rfc-editor.org/info/rfc8870>.
+
+9.2.  Informative References
+
+   [JINGLE]   Ludwig, S., Beda, J., Saint-Andre, P., McQueen, R., Egan,
+              S., and J. Hildebrand, "XEP-0166: Jingle", September 2018,
+              <https://xmpp.org/extensions/xep-0166.html>.
+
+   [RFC2198]  Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
+              Handley, M., Bolot, J.C., Vega-Garcia, A., and S. Fosse-
+              Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
+              DOI 10.17487/RFC2198, September 1997,
+              <https://www.rfc-editor.org/info/rfc2198>.
+
+   [RFC2205]  Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S.
+              Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
+              Functional Specification", RFC 2205, DOI 10.17487/RFC2205,
+              September 1997, <https://www.rfc-editor.org/info/rfc2205>.
+
+   [RFC2474]  Nichols, K., Blake, S., Baker, F., and D. Black,
+              "Definition of the Differentiated Services Field (DS
+              Field) in the IPv4 and IPv6 Headers", RFC 2474,
+              DOI 10.17487/RFC2474, December 1998,
+              <https://www.rfc-editor.org/info/rfc2474>.
+
+   [RFC2974]  Handley, M., Perkins, C., and E. Whelan, "Session
+              Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974,
+              October 2000, <https://www.rfc-editor.org/info/rfc2974>.
+
+   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
+              A., Peterson, J., Sparks, R., Handley, M., and E.
+              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
+              DOI 10.17487/RFC3261, June 2002,
+              <https://www.rfc-editor.org/info/rfc3261>.
+
+   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
+              with Session Description Protocol (SDP)", RFC 3264,
+              DOI 10.17487/RFC3264, June 2002,
+              <https://www.rfc-editor.org/info/rfc3264>.
+
+   [RFC3389]  Zopf, R., "Real-time Transport Protocol (RTP) Payload for
+              Comfort Noise (CN)", RFC 3389, DOI 10.17487/RFC3389,
+              September 2002, <https://www.rfc-editor.org/info/rfc3389>.
+
+   [RFC3830]  Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K.
+              Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830,
+              DOI 10.17487/RFC3830, August 2004,
+              <https://www.rfc-editor.org/info/rfc3830>.
+
+   [RFC4103]  Hellstrom, G. and P. Jones, "RTP Payload for Text
+              Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005,
+              <https://www.rfc-editor.org/info/rfc4103>.
+
+   [RFC4383]  Baugher, M. and E. Carrara, "The Use of Timed Efficient
+              Stream Loss-Tolerant Authentication (TESLA) in the Secure
+              Real-time Transport Protocol (SRTP)", RFC 4383,
+              DOI 10.17487/RFC4383, February 2006,
+              <https://www.rfc-editor.org/info/rfc4383>.
+
+   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
+              Description Protocol", RFC 4566, DOI 10.17487/RFC4566,
+              July 2006, <https://www.rfc-editor.org/info/rfc4566>.
+
+   [RFC4568]  Andreasen, F., Baugher, M., and D. Wing, "Session
+              Description Protocol (SDP) Security Descriptions for Media
+              Streams", RFC 4568, DOI 10.17487/RFC4568, July 2006,
+              <https://www.rfc-editor.org/info/rfc4568>.
+
+   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
+              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
+              DOI 10.17487/RFC4588, July 2006,
+              <https://www.rfc-editor.org/info/rfc4588>.
+
+   [RFC5104]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
+              "Codec Control Messages in the RTP Audio-Visual Profile
+              with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
+              February 2008, <https://www.rfc-editor.org/info/rfc5104>.
+
+   [RFC5109]  Li, A., Ed., "RTP Payload Format for Generic Forward Error
+              Correction", RFC 5109, DOI 10.17487/RFC5109, December
+              2007, <https://www.rfc-editor.org/info/rfc5109>.
+
+   [RFC5389]  Rosenberg, J., Mahy, R., Matthews, P., and D. Wing,
+              "Session Traversal Utilities for NAT (STUN)", RFC 5389,
+              DOI 10.17487/RFC5389, October 2008,
+              <https://www.rfc-editor.org/info/rfc5389>.
+
+   [RFC5764]  McGrew, D. and E. Rescorla, "Datagram Transport Layer
+              Security (DTLS) Extension to Establish Keys for the Secure
+              Real-time Transport Protocol (SRTP)", RFC 5764,
+              DOI 10.17487/RFC5764, May 2010,
+              <https://www.rfc-editor.org/info/rfc5764>.
+
+   [RFC5888]  Camarillo, G. and H. Schulzrinne, "The Session Description
+              Protocol (SDP) Grouping Framework", RFC 5888,
+              DOI 10.17487/RFC5888, June 2010,
+              <https://www.rfc-editor.org/info/rfc5888>.
+
+   [RFC6465]  Ivov, E., Ed., Marocco, E., Ed., and J. Lennox, "A Real-
+              time Transport Protocol (RTP) Header Extension for Mixer-
+              to-Client Audio Level Indication", RFC 6465,
+              DOI 10.17487/RFC6465, December 2011,
+              <https://www.rfc-editor.org/info/rfc6465>.
+
+   [RFC7201]  Westerlund, M. and C. Perkins, "Options for Securing RTP
+              Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
+              <https://www.rfc-editor.org/info/rfc7201>.
+
+   [RFC7657]  Black, D., Ed. and P. Jones, "Differentiated Services
+              (Diffserv) and Real-Time Communication", RFC 7657,
+              DOI 10.17487/RFC7657, November 2015,
+              <https://www.rfc-editor.org/info/rfc7657>.
+
+   [RFC7826]  Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M.,
+              and M. Stiemerling, Ed., "Real-Time Streaming Protocol
+              Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December
+              2016, <https://www.rfc-editor.org/info/rfc7826>.
+
+   [RFC7983]  Petit-Huguenin, M. and G. Salgueiro, "Multiplexing Scheme
+              Updates for Secure Real-time Transport Protocol (SRTP)
+              Extension for Datagram Transport Layer Security (DTLS)",
+              RFC 7983, DOI 10.17487/RFC7983, September 2016,
+              <https://www.rfc-editor.org/info/rfc7983>.
+
+   [RFC8088]  Westerlund, M., "How to Write an RTP Payload Format",
+              RFC 8088, DOI 10.17487/RFC8088, May 2017,
+              <https://www.rfc-editor.org/info/rfc8088>.
+
+   [RFC8108]  Lennox, J., Westerlund, M., Wu, Q., and C. Perkins,
+              "Sending Multiple RTP Streams in a Single RTP Session",
+              RFC 8108, DOI 10.17487/RFC8108, March 2017,
+              <https://www.rfc-editor.org/info/rfc8108>.
+
+   [RFC8445]  Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive
+              Connectivity Establishment (ICE): A Protocol for Network
+              Address Translator (NAT) Traversal", RFC 8445,
+              DOI 10.17487/RFC8445, July 2018,
+              <https://www.rfc-editor.org/info/rfc8445>.
+
+   [RFC8871]  Jones, P., Benham, D., and C. Groves, "A Solution
+              Framework for Private Media in Privacy-Enhanced RTP
+              Conferencing (PERC)", RFC 8871, DOI 10.17487/RFC8871,
+              January 2021, <https://www.rfc-editor.org/info/rfc8871>.
+
+Appendix A.  Dismissing Payload Type Multiplexing
+
+   This section documents a number of reasons why using the payload type
+   as a multiplexing point is unsuitable for most issues related to
+   multiple RTP streams.  Attempting to use payload type multiplexing
+   beyond its defined usage has well-known negative effects on RTP, as
+   discussed below.  To use the payload type as the single discriminator
+   for multiple streams implies that all the different RTP streams are
+   being sent with the same SSRC, thus using the same timestamp and
+   sequence number space.  The many effects of using payload type
+   multiplexing are as follows:
+
+   1.   Constraints are placed on the RTP timestamp rate for the
+        multiplexed media.  For example, RTP streams that use different
+        RTP timestamp rates cannot be combined, as the timestamp values
+        need to be consistent across all multiplexed media frames.
+        Thus, streams are forced to use the same RTP timestamp rate.
+        When this is not possible, payload type multiplexing cannot be
+        used.
+
+   2.   Many RTP payload formats can fragment a media object over
+        multiple RTP packets, like parts of a video frame.  These
+        payload formats need to determine the order of the fragments to
+        correctly decode them.  Thus, it is important to ensure that all
+        fragments related to a frame or a similar media object are
+        transmitted in sequence and without interruptions within the
+        object.  This can be done relatively easily on the sender side
+        by ensuring that the fragments of each RTP stream are sent in
+        sequence.
+
+   3.   Some media formats require uninterrupted sequence number space
+        between media parts.  These are media formats where any missing
+        RTP sequence number will result in decoding failure or invoking
+        a repair mechanism within a single media context.  The text/t140
+        payload format [RFC4103] is an example of such a format.  These
+        formats will need a sequence numbering abstraction function
+        between RTP and the individual RTP stream before being used with
+        payload type multiplexing.
+
+   4.   Sending multiple media streams in the same sequence number space
+        makes it impossible to determine which media stream lost a
+        packet.  Such a scenario causes difficulties, since the receiver
+        cannot determine to which stream it should apply packet-loss
+        concealment or other stream-specific loss-mitigation mechanisms.
+
+   5.   If RTP retransmission [RFC4588] is used and packet loss occurs,
+        it is possible to ask for the missing packet(s) by SSRC and
+        sequence number -- not by payload type.  If only some of the
+        payload type multiplexed streams are of interest, there is no
+        way to tell which missing packet or packets belong to the stream
+        or streams of interest, and all lost packets need to be
+        requested, wasting bandwidth.
+
+   6.   The current RTCP feedback mechanisms are built around providing
+        feedback on RTP streams based on stream ID (SSRC), packet
+        (sequence numbers), and time interval (RTP timestamps).  There
+        is almost never a field to indicate which payload type is
+        reported, so sending feedback for a specific RTP payload type is
+        difficult without extending existing RTCP reporting.
+
+   7.   The current RTCP media control messages specification [RFC5104]
+        is oriented around controlling particular media flows, i.e.,
+        requests are done by addressing a particular SSRC.  Such
+        mechanisms would need to be redefined to support payload type
+        multiplexing.
+
+   8.   The number of payload types is inherently limited.  Accordingly,
+        using payload type multiplexing limits the number of streams
+        that can be multiplexed and does not scale.  This limitation is
+        exacerbated if one uses solutions like RTP and RTCP multiplexing
+        [RFC5761] where a number of payload types are blocked due to the
+        overlap between RTP and RTCP.
+
+   9.   At times, there is a need to group multiplexed streams.  This is
+        currently possible for RTP sessions and SSRCs, but there is no
+        defined way to group payload types.
+
+   10.  It is currently not possible to signal bandwidth requirements
+        per RTP stream when using payload type multiplexing.
+
+   11.  Most existing SDP media-level attributes cannot be applied on a
+        per-payload-type basis and would require redefinition in that
+        context.
+
+   12.  A legacy endpoint that does not understand the indication that
+        different RTP payload types are different RTP streams might be
+        slightly confused by the large amount of possibly overlapping or
+        identically defined RTP payload types.
+
+Appendix B.  Signaling Considerations
+
+   Signaling is not an architectural consideration for RTP itself, so
+   this discussion has been moved to an appendix.  However, it is
+   extremely important for anyone building complete applications, so it
+   is deserving of discussion.
+
+   We document some issues here that need to be addressed when using
+   some form of signaling to establish RTP sessions.  These issues
+   cannot be addressed by simply tweaking, extending, or profiling RTP;
+   rather, they require a dedicated and in-depth look at the signaling
+   primitives that set up the RTP sessions.
+
+   There exist various signaling solutions for establishing RTP
+   sessions.  Many are based on SDP [RFC4566]; however, SDP
+   functionality is also dependent on the signaling protocols carrying
+   the SDP.  The Real-Time Streaming Protocol (RTSP) [RFC7826] and the
+   Session Announcement Protocol (SAP) [RFC2974] both use SDP in a
+   declarative fashion, while SIP [RFC3261] uses SDP with the additional
+   definition of offer/answer [RFC3264].  The impact on signaling, and
+   especially on SDP, needs to be considered, as it can greatly affect
+   how to deploy a certain multiplexing point choice.
+
+B.1.  Session-Oriented Properties
+
+   One aspect of existing signaling protocols is that they are focused
+   on RTP sessions or, in the case of SDP, the concept of media
+   descriptions.  A number of things are signaled at the media
+   description level, but those are not necessarily strictly bound to an
+   RTP session and could be of interest for signaling, especially for a
+   particular RTP stream (SSRC) within the session.  The following
+   properties have been identified as being potentially useful for
+   signaling, and not only at the RTP session level:
+
+   *  Bitrate and/or bandwidth can be specified today only as an
+      aggregate limit, or as a common "any RTP stream" limit, unless
+      either codec-specific bandwidth limiting or RTCP signaling using
+      Temporary Maximum Media Stream Bit Rate Request (TMMBR) messages
+      [RFC5104] is used.
+
+   *  Which SSRC will use which RTP payload type (this information will
+      be visible in the first media packet but is sometimes useful to
+      have before the packet arrives).
+
+   Some of these issues are clearly SDP's problem rather than RTP
+   limitations.  However, if the aim is to deploy a solution that uses
+   several SSRCs and contains several sets of RTP streams with different
+   properties (encoding/packetization parameters, bitrate, etc.),
+   putting each set in a different RTP session would directly enable
+   negotiation of the parameters for each set.  If insisting on
+   additional SSRCs only, a number of signaling extensions are needed to
+   clarify that there are multiple sets of RTP streams with different
+   properties and that they in fact need to be kept different, since a
+   single set will not satisfy the application's requirements.
+
+   For some parameters, such as RTP payload type, resolution, and frame
+   rate, an SSRC-linked mechanism has been proposed in [RFC8851].
+
+B.2.  SDP Prevents Multiple Media Types
+
+   SDP uses the "m=" line to both delineate an RTP session and specify
+   the top-level media type: audio, video, text, image, application.
+   This media type is used as the top-level media type for identifying
+   the actual payload format and is bound to a particular payload type
+   using the "a=rtpmap:" attribute.  This binding has to be loosened in
+   order to use SDP to describe RTP sessions containing multiple top-
+   level media types.
+
+   [RFC8843] describes how to let multiple SDP media descriptions use a
+   single underlying transport in SDP, which allows the definition of
+   one RTP session with different top-level media types.
+
+B.3.  Signaling RTP Stream Usage
+
+   RTP streams being transported in RTP have a particular usage in an
+   RTP application.  In many applications to date, this usage of the RTP
+   stream is implicitly signaled.  For example, an application might
+   choose to take all incoming audio RTP streams, mix them, and play
+   them out.  However, in more-advanced applications that use multiple
+   RTP streams, there will be more than a single usage or purpose among
+   the set of RTP streams being sent or received.  RTP applications will
+   need to somehow signal this usage.  The signaling that is used will
+   have to identify the RTP streams affected by their RTP-level
+   identifiers, which means that they have to be identified by either
+   their session or their SSRC + session.
+
+   In some applications, the receiver cannot utilize the RTP stream at
+   all before it has received the signaling message describing the RTP
+   stream and its usage.  In other applications, there exists a default
+   handling method that is appropriate.
+
+   If all RTP streams in an RTP session are to be treated in the same
+   way, identifying the session is enough.  If SSRCs in a session are to
+   be treated differently, signaling needs to identify both the session
+   and the SSRC.
+
+   If this signaling affects how any RTP central node, like an RTP mixer
+   or translator that selects, mixes, or processes streams, treats the
+   streams, the node will also need to receive the same signaling to
+   know how to treat RTP streams with different usages in the right
+   fashion.
+
+Acknowledgments
+
+   The authors would like to acknowledge and thank Cullen Jennings, Dale
+   R. Worley, Huang Yihong (Rachel), Benjamin Kaduk, Mirja Kühlewind,
+   and Vijay Gurbani for review and comments.
+
+Contributors
+
+   Hui Zheng (Marvin) contributed to WG draft versions -04 and -05 of
+   the document.
+
+Authors' Addresses
+
+   Magnus Westerlund
+   Ericsson
+   Torshamnsgatan 23
+   SE-164 80 Kista
+   Sweden
+
+   Email: magnus.westerlund@ericsson.com
+
+
+   Bo Burman
+   Ericsson
+   Gronlandsgatan 31
+   SE-164 60 Kista
+   Sweden
+
+   Email: bo.burman@ericsson.com
+
+
+   Colin Perkins
+   University of Glasgow
+   School of Computing Science
+   Glasgow
+   G12 8QQ
+   United Kingdom
+
+   Email: csp@csperkins.org
+
+
+   Harald Tveit Alvestrand
+   Google
+   Kungsbron 2
+   SE-11122 Stockholm
+   Sweden
+
+   Email: harald@alvestrand.no
+
+
+   Roni Even
+
+   Email: ron.even.tlv@gmail.com