diff options
Diffstat (limited to 'doc/rfc/rfc7667.txt')
-rw-r--r-- | doc/rfc/rfc7667.txt | 2691 |
1 files changed, 2691 insertions, 0 deletions
diff --git a/doc/rfc/rfc7667.txt b/doc/rfc/rfc7667.txt new file mode 100644 index 0000000..6686d0b --- /dev/null +++ b/doc/rfc/rfc7667.txt @@ -0,0 +1,2691 @@ + + + + + + +Internet Engineering Task Force (IETF) M. Westerlund +Request for Comments: 7667 Ericsson +Obsoletes: 5117 S. Wenger +Category: Informational Vidyo +ISSN: 2070-1721 November 2015 + + + RTP Topologies + +Abstract + + This document discusses point-to-point and multi-endpoint topologies + used in environments based on the Real-time Transport Protocol (RTP). + In particular, centralized topologies commonly employed in the video + conferencing industry are mapped to the RTP terminology. + + This document is updated with additional topologies and replaces RFC + 5117. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Not all documents + approved by the IESG are a candidate for any level of Internet + Standard; see Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc7667. + + + + + + + + + + + + + + + + + +Westerlund & Wenger Informational [Page 1] + +RFC 7667 RTP Topologies November 2015 + + +Copyright Notice + + Copyright (c) 2015 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Westerlund & Wenger Informational [Page 2] + +RFC 7667 RTP Topologies November 2015 + + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 + 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 + 2.1. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 5 + 2.2. Definitions Related to RTP Grouping Taxonomy . . . . . . 5 + 3. Topologies . . . . . . . . . . . . . . . . . . . . . . . . . 6 + 3.1. Point to Point . . . . . . . . . . . . . . . . . . . . . 6 + 3.2. Point to Point via Middlebox . . . . . . . . . . . . . . 7 + 3.2.1. Translators . . . . . . . . . . . . . . . . . . . . . 7 + 3.2.2. Back-to-Back RTP sessions . . . . . . . . . . . . . . 11 + 3.3. Point to Multipoint Using Multicast . . . . . . . . . . . 12 + 3.3.1. Any-Source Multicast (ASM) . . . . . . . . . . . . . 12 + 3.3.2. Source-Specific Multicast (SSM) . . . . . . . . . . . 14 + 3.3.3. SSM with Local Unicast Resources . . . . . . . . . . 15 + 3.4. Point to Multipoint Using Mesh . . . . . . . . . . . . . 17 + 3.5. Point to Multipoint Using the RFC 3550 Translator . . . . 20 + 3.5.1. Relay - Transport Translator . . . . . . . . . . . . 20 + 3.5.2. Media Translator . . . . . . . . . . . . . . . . . . 21 + 3.6. Point to Multipoint Using the RFC 3550 Mixer Model . . . 22 + 3.6.1. Media-Mixing Mixer . . . . . . . . . . . . . . . . . 24 + 3.6.2. Media-Switching Mixer . . . . . . . . . . . . . . . . 27 + 3.7. Selective Forwarding Middlebox . . . . . . . . . . . . . 29 + 3.8. Point to Multipoint Using Video-Switching MCUs . . . . . 33 + 3.9. Point to Multipoint Using RTCP-Terminating MCU . . . . . 34 + 3.10. Split Component Terminal . . . . . . . . . . . . . . . . 35 + 3.11. Non-symmetric Mixer/Translators . . . . . . . . . . . . . 38 + 3.12. Combining Topologies . . . . . . . . . . . . . . . . . . 38 + 4. Topology Properties . . . . . . . . . . . . . . . . . . . . . 39 + 4.1. All-to-All Media Transmission . . . . . . . . . . . . . . 39 + 4.2. Transport or Media Interoperability . . . . . . . . . . . 40 + 4.3. Per-Domain Bitrate Adaptation . . . . . . . . . . . . . . 40 + 4.4. Aggregation of Media . . . . . . . . . . . . . . . . . . 41 + 4.5. View of All Session Participants . . . . . . . . . . . . 41 + 4.6. Loop Detection . . . . . . . . . . . . . . . . . . . . . 42 + 4.7. Consistency between Header Extensions and RTCP . . . . . 42 + 5. Comparison of Topologies . . . . . . . . . . . . . . . . . . 42 + 6. Security Considerations . . . . . . . . . . . . . . . . . . . 43 + 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 + 7.1. Normative References . . . . . . . . . . . . . . . . . . 45 + 7.2. Informative References . . . . . . . . . . . . . . . . . 45 + Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 48 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 48 + + + + + + + + +Westerlund & Wenger Informational [Page 3] + +RFC 7667 RTP Topologies November 2015 + + +1. Introduction + + Real-time Transport Protocol (RTP) [RFC3550] topologies describe + methods for interconnecting RTP entities and their processing + behavior for RTP and the RTP Control Protocol (RTCP). This document + tries to address past and existing confusion, especially with respect + to terms not defined in RTP but in common use in the communication + industry, such as the Multipoint Control Unit or MCU. + + When the Audio-Visual Profile with Feedback (AVPF) [RFC4585] was + developed, the main emphasis lay in the efficient support of + point-to-point and small multipoint scenarios without centralized + multipoint control. In practice, however, most multipoint + conferences operate utilizing centralized units referred to as MCUs. + MCUs may implement mixer or translator functionality (in RTP + [RFC3550] terminology) and signaling support. They may also contain + additional application-layer functionality. This document focuses on + the media transport aspects of the MCU that can be realized using + RTP, as discussed below. Further considered are the properties of + mixers and translators, and how some types of deployed MCUs deviate + from these properties. + + This document also codifies new multipoint architectures that have + recently been introduced and that were not anticipated in RFC 5117; + thus, this document replaces [RFC5117]. These architectures use + scalable video coding and simulcasting, and their associated + centralized units are referred to as Selective Forwarding Middleboxes + (SFMs). This codification provides a common information basis for + future discussion and specification work. + + The new topologies are Point to Point via Middlebox (Section 3.2), + Source-Specific Multicast (Section 3.3.2), SSM with Local Unicast + Resources (Section 3.3.3), Point to Multipoint Using Mesh + (Section 3.4), Selective Forwarding Middlebox (Section 3.7), and + Split Component Terminal (Section 3.10). The Point to Multipoint + Using the RFC 3550 Mixer Model (Section 3.6) has been significantly + expanded to cover two different versions, namely Media-Mixing Mixer + (Section 3.6.1) and Media-Switching Mixer (Section 3.6.2). + + The document's attempt to clarify and explain sections of the RTP + spec [RFC3550] is informal. It is not intended to update or change + what is normatively specified within RFC 3550. + + + + + + + + + +Westerlund & Wenger Informational [Page 4] + +RFC 7667 RTP Topologies November 2015 + + +2. Definitions + +2.1. Glossary + + ASM: Any-Source Multicast + + AVPF: The extended RTP profile for RTCP-based feedback + + CSRC: Contributing Source + + Link: The data transport to the next IP hop + + Middlebox: A device that is on the Path that media travel between + two endpoints + + MCU: Multipoint Control Unit + + Path: The concatenation of multiple links, resulting in an + end-to-end data transfer. + + PtM: Point to Multipoint + + PtP: Point to Point + + SFM: Selective Forwarding Middlebox + + SSM: Source-Specific Multicast + + SSRC: Synchronization Source + +2.2. Definitions Related to RTP Grouping Taxonomy + + The following definitions have been taken from [RFC7656]. + + Communication Session: A Communication Session is an association + among two or more Participants communicating with each other via + one or more Multimedia Sessions. + + Endpoint: A single addressable entity sending or receiving RTP + packets. It may be decomposed into several functional blocks, but + as long as it behaves as a single RTP stack mentity, it is + classified as a single "endpoint". + + Media Source: A Media Source is the logical source of a time + progressing digital media stream synchronized to a reference + clock. This stream is called a Source Stream. + + + + + +Westerlund & Wenger Informational [Page 5] + +RFC 7667 RTP Topologies November 2015 + + + Multimedia Session: A Multimedia Session is an association among a + group of participants engaged in communication via one or more RTP + sessions. + +3. Topologies + + This subsection defines several topologies that are relevant for + codec control but also RTP usage in other contexts. The section + starts with point-to-point cases, with or without middleboxes. Then + it follows a number of different methods for establishing point-to- + multipoint communication. These are structured around the most + fundamental enabler, i.e., multicast, a mesh of connections, + translators, mixers, and finally MCUs and SFMs. The section ends by + discussing decomposited terminals, asymmetric middlebox behaviors, + and combining topologies. + + The topologies may be referenced in other documents by a shortcut + name, indicated by the prefix "Topo-". + + For each of the RTP-defined topologies, we discuss how RTP, RTCP, and + the carried media are handled. With respect to RTCP, we also discuss + the handling of RTCP feedback messages as defined in [RFC4585] and + [RFC5104]. + +3.1. Point to Point + + Shortcut name: Topo-Point-to-Point + + The Point-to-Point (PtP) topology (Figure 1) consists of two + endpoints, communicating using unicast. Both RTP and RTCP traffic + are conveyed endpoint to endpoint, using unicast traffic only (even + if, in exotic cases, this unicast traffic happens to be conveyed over + an IP multicast address). + + +---+ +---+ + | A |<------->| B | + +---+ +---+ + + Figure 1: Point to Point + + The main property of this topology is that A sends to B, and only B, + while B sends to A, and only A. This avoids all complexities of + handling multiple endpoints and combining the requirements stemming + from them. Note that an endpoint can still use multiple RTP + Synchronization Sources (SSRCs) in an RTP session. The number of RTP + sessions in use between A and B can also be of any number, subject + only to system-level limitations like the number range of ports. + + + + +Westerlund & Wenger Informational [Page 6] + +RFC 7667 RTP Topologies November 2015 + + + RTCP feedback messages for the indicated SSRCs are communicated + directly between the endpoints. Therefore, this topology poses + minimal (if any) issues for any feedback messages. For RTP sessions + that use multiple SSRCs per endpoint, it can be relevant to implement + support for cross-reporting suppression as defined in "Sending + Multiple Media Streams in a Single RTP Session" [MULTI-STREAM-OPT]. + +3.2. Point to Point via Middlebox + + This section discusses cases where two endpoints communicate but have + one or more middleboxes involved in the RTP session. + +3.2.1. Translators + + Shortcut name: Topo-PtP-Translator + + Two main categories of translators can be distinguished: Transport + Translators and Media Translators. Both translator types share + common attributes that separate them from mixers. For each RTP + stream that the translator receives, it generates an individual RTP + stream in the other domain. A translator keeps the SSRC for an RTP + stream across the translation, whereas a mixer can select a single + RTP stream from multiple received RTP streams (in cases like audio/ + video switching) or send out an RTP stream composed of multiple mixed + media received in multiple RTP streams (in cases like audio mixing or + video tiling), but always under its own SSRC, possibly using the CSRC + field to indicate the source(s) of the content. Mixers are more + common in point-to-multipoint cases than in PtP. The reason is that + in PtP use cases, the primary focus of a middlebox is enabling + interoperability, between otherwise non-interoperable endpoints, such + as transcoding to a codec the receiver supports, which can be done by + a Media Translator. + + As specified in Section 7.1 of [RFC3550], the SSRC space is common + for all participants in the RTP session, independent of on which side + of the translator the session resides. Therefore, it is the + responsibility of the endpoints (as the RTP session participants) to + run SSRC collision detection, and the SSRC is thus a field the + translator cannot change. Any Source Description (SDES) information + associated with an SSRC or CSRC also needs to be forwarded between + the domains for any SSRC/CSRC used in the different domains. + + A translator commonly does not use an SSRC of its own and is not + visible as an active participant in the RTP session. One reason to + have its own SSRC is when a translator acts as a quality monitor that + sends RTCP reports and therefore is required to have an SSRC. + Another example is the case when a translator is prepared to use RTCP + feedback messages. This may, for example, occur in a translator + + + +Westerlund & Wenger Informational [Page 7] + +RFC 7667 RTP Topologies November 2015 + + + configured to detect packet loss of important video packets, and it + wants to trigger repair by the media sending endpoint, by sending + feedback messages. While such feedback could use the SSRC of the + target for the translator (the receiving endpoint), this in turn + would require translation of the target RTCP reports to make them + consistent. It may be simpler to expose an additional SSRC in the + session. The only concern is that endpoints failing to support the + full RTP specification may have issues with multiple SSRCs reporting + on the RTP streams sent by that endpoint, as this use case may be + viewed as exotic by implementers. + + In general, a translator implementation should consider which RTCP + feedback messages or codec-control messages it needs to understand in + relation to the functionality of the translator itself. This is + completely in line with the requirement to also translate RTCP + messages between the domains. + +3.2.1.1. Transport Relay/Anchoring + + Shortcut name: Topo-PtP-Relay + + There exist a number of different types of middleboxes that might be + inserted between two endpoints on the transport level, e.g., to + perform changes on the IP/UDP headers, and are, therefore, basic + Transport Translators. These middleboxes come in many variations + including NAT [RFC3022] traversal by pinning the media path to a + public address domain relay and network topologies where the RTP + stream is required to pass a particular point for audit by employing + relaying, or preserving privacy by hiding each peer's transport + addresses to the other party. Other protocols or functionalities + that provide this behavior are Traversal Using Relays around NAT + (TURN) [RFC5766] servers, Session Border Gateways, and Media + Processing Nodes with media anchoring functionalities. + + +---+ +---+ +---+ + | A |<------>| T |<------->| B | + +---+ +---+ +---+ + + Figure 2: Point to Point with Translator + + A common element in these functions is that they are normally + transparent at the RTP level, i.e., they perform no changes on any + RTP or RTCP packet fields and only affect the lower layers. They may + affect, however, the path since the RTP and RTCP packets are routed + between the endpoints in the RTP session, and thereby they indirectly + affect the RTP session. For this reason, one could believe that + Transport Translator-type middleboxes do not need to be included in + this document. This topology, however, can raise additional + + + +Westerlund & Wenger Informational [Page 8] + +RFC 7667 RTP Topologies November 2015 + + + requirements in the RTP implementation and its interactions with the + signaling solution. Both in signaling and in certain RTCP fields, + network addresses other than those of the relay can occur since B has + a different network address than the relay (T). Implementations that + cannot support this will also not work correctly when endpoints are + subject to NAT. + + The Transport Relay implementations also have to take into account + security considerations. In particular, source address filtering of + incoming packets is usually important in relays, to prevent attackers + from injecting traffic into a session, which one peer may, in the + absence of adequate security in the relay, think it comes from the + other peer. + +3.2.1.2. Transport Translator + + Shortcut name: Topo-Trn-Translator + + Transport Translators (Topo-Trn-Translator) do not modify the RTP + stream itself but are concerned with transport parameters. Transport + parameters, in the sense of this section, comprise the transport + addresses (to bridge different domains such as unicast to multicast) + and the media packetization to allow other transport protocols to be + interconnected to a session (in gateways). + + Translators that bridge between different protocol worlds need to be + concerned about the mapping of the SSRC/CSRC (Contributing Source) + concept to the non-RTP protocol. When designing a translator to a + non-RTP-based media transport, an important consideration is how to + handle different sources and their identities. This problem space is + not discussed henceforth. + + Of the Transport Translators, this memo is primarily interested in + those that use RTP on both sides, and this is assumed henceforth. + + The most basic Transport Translators that operate below the RTP level + were already discussed in Section 3.2.1.1. + +3.2.1.3. Media Translator + + Shortcut name: Topo-Media-Translator + + Media Translators (Topo-Media-Translator) modify the media inside the + RTP stream. This process is commonly known as transcoding. The + modification of the media can be as small as removing parts of the + stream, and it can go all the way to a full decoding and re-encoding + (down to the sample level or equivalent) utilizing a different media + + + + +Westerlund & Wenger Informational [Page 9] + +RFC 7667 RTP Topologies November 2015 + + + codec. Media Translators are commonly used to connect endpoints + without a common interoperability point in the media encoding. + + Stand-alone Media Translators are rare. Most commonly, a combination + of Transport and Media Translator is used to translate both the media + and the transport aspects of the RTP stream carrying the media + between two transport domains. + + When media translation occurs, the translator's task regarding + handling of RTCP traffic becomes substantially more complex. In this + case, the translator needs to rewrite endpoint B's RTCP receiver + report before forwarding them to endpoint A. The rewriting is needed + as the RTP stream received by B is not the same RTP stream as the + other participants receive. For example, the number of packets + transmitted to B may be lower than what A sends, due to the different + media format and data rate. Therefore, if the receiver reports were + forwarded without changes, the extended highest sequence number would + indicate that B was substantially behind in reception, while it most + likely would not be. Therefore, the translator must translate that + number to a corresponding sequence number for the stream the + translator received. Similar requirements exist for most other + fields in the RTCP receiver reports. + + A Media Translator may in some cases act on behalf of the "real" + source (the endpoint originally sending the media to the translator) + and respond to RTCP feedback messages. This may occur, for example, + when a receiving endpoint requests a bandwidth reduction, and the + Media Translator has not detected any congestion or other reasons for + bandwidth reduction between the sending endpoint and itself. In that + case, it is sensible that the Media Translator reacts to codec + control messages itself, for example, by transcoding to a lower media + rate. + + A variant of translator behavior worth pointing out is the one + depicted in Figure 3 of an endpoint A sending an RTP stream + containing media (only) to B. On the path, there is a device T that + manipulates the RTP streams on A's behalf. One common example is + that T adds a second RTP stream containing Forward Error Correction + (FEC) information in order to protect A's (non FEC-protected) RTP + stream. In this case, T needs to semantically bind the new FEC RTP + stream to A's media-carrying RTP stream, for example, by using the + same CNAME as A. + + + + + + + + + +Westerlund & Wenger Informational [Page 10] + +RFC 7667 RTP Topologies November 2015 + + + +------+ +------+ +------+ + | | | | | | + | A |------->| T |-------->| B | + | | | |---FEC-->| | + +------+ +------+ +------+ + + Figure 3: Media Translator Adding FEC + + There may also be cases where information is added into the original + RTP stream, while leaving most or all of the original RTP packets + intact (with the exception of certain RTP header fields, such as the + sequence number). One example is the injection of metadata into the + RTP stream, carried in their own RTP packets. + + Similarly, a Media Translator can sometimes remove information from + the RTP stream, while otherwise leaving the remaining RTP packets + unchanged (again with the exception of certain RTP header fields). + + Either type of functionality where T manipulates the RTP stream, or + adds an accompanying RTP stream, on behalf of A is also covered under + the Media Translator definition. + +3.2.2. Back-to-Back RTP sessions + + Shortcut name: Topo-Back-To-Back + + There exist middleboxes that interconnect two endpoints (A and B) + through themselves (MB), but not by being part of a common RTP + session. Instead, they establish two different RTP sessions: one + between A and the middlebox and another between the middlebox and B. + This topology is called Topo-Back-To-Back. + + |<--Session A-->| |<--Session B-->| + +------+ +------+ +------+ + | A |------->| MB |-------->| B | + +------+ +------+ +------+ + + Figure 4: Back-to-Back RTP Sessions through Middlebox + + The middlebox acts as an application-level gateway and bridges the + two RTP sessions. This bridging can be as basic as forwarding the + RTP payloads between the sessions or more complex including media + transcoding. The difference of this topology relative to the single + RTP session context is the handling of the SSRCs and the other + session-related identifiers, such as CNAMEs. With two different RTP + sessions, these can be freely changed and it becomes the middlebox's + responsibility to maintain the correct relations. + + + + +Westerlund & Wenger Informational [Page 11] + +RFC 7667 RTP Topologies November 2015 + + + The signaling or other above RTP-level functionalities referencing + RTP streams may be what is most impacted by using two RTP sessions + and changing identifiers. The structure with two RTP sessions also + puts a congestion control requirement on the middlebox, because it + becomes fully responsible for the media stream it sources into each + of the sessions. + + Adherence to congestion control can be solved locally on each of the + two segments or by bridging statistics from the receiving endpoint + through the middlebox to the sending endpoint. From an + implementation point, however, the latter requires dealing with a + number of inconsistencies. First, packet loss must be detected for + an RTP stream sent from A to the middlebox, and that loss must be + reported through a skipped sequence number in the RTP stream from the + middlebox to B. This coupling and the resulting inconsistencies are + conceptually easier to handle when considering the two RTP streams as + belonging to a single RTP session. + +3.3. Point to Multipoint Using Multicast + + Multicast is an IP-layer functionality that is available in some + networks. Two main flavors can be distinguished: Any-Source + Multicast (ASM) [RFC1112] where any multicast group participant can + send to the group address and expect the packet to reach all group + participants and Source-Specific Multicast (SSM) [RFC3569], where + only a particular IP host sends to the multicast group. Each of + these models are discussed below in their respective sections. + +3.3.1. Any-Source Multicast (ASM) + + Shortcut name: Topo-ASM (was Topo-Multicast) + + +-----+ + +---+ / \ +---+ + | A |----/ \---| B | + +---+ / Multi- \ +---+ + + cast + + +---+ \ Network / +---+ + | C |----\ /---| D | + +---+ \ / +---+ + +-----+ + + Figure 5: Point to Multipoint Using Multicast + + + + + + + + +Westerlund & Wenger Informational [Page 12] + +RFC 7667 RTP Topologies November 2015 + + + Point to Multipoint (PtM) is defined here as using a multicast + topology as a transmission model, in which traffic from any multicast + group participant reaches all the other multicast group participants, + except for cases such as: + + o packet loss, or + + o when a multicast group participant does not wish to receive the + traffic for a specific multicast group and, therefore, has not + subscribed to the IP multicast group in question. This scenario + can occur, for example, where a Multimedia Session is distributed + using two or more multicast groups, and a multicast group + participant is subscribed only to a subset of these sessions. + + In the above context, "traffic" encompasses both RTP and RTCP + traffic. The number of multicast group participants can vary between + one and many, as RTP and RTCP scale to very large multicast groups + (the theoretical limit of the number of participants in a single RTP + session is in the range of billions). The above can be realized + using ASM. + + For feedback usage, it is useful to define a "small multicast group" + as a group where the number of multicast group participants is so low + (and other factors such as the connectivity is so good) that it + allows the participants to use early or immediate feedback, as + defined in AVPF [RFC4585]. Even when the environment would allow for + the use of a small multicast group, some applications may still want + to use the more limited options for RTCP feedback available to large + multicast groups, for example, when there is a likelihood that the + threshold of the small multicast group (in terms of multicast group + participants) may be exceeded during the lifetime of a session. + + RTCP feedback messages in multicast reach, like media data, every + subscriber (subject to packet losses and multicast group + subscription). Therefore, the feedback suppression mechanism + discussed in [RFC4585] is typically required. Each individual + endpoint that is a multicast group participant needs to process every + feedback message it receives, not only to determine if it is affected + or if the feedback message applies only to some other endpoint but + also to derive timing restrictions for the sending of its own + feedback messages, if any. + + + + + + + + + + +Westerlund & Wenger Informational [Page 13] + +RFC 7667 RTP Topologies November 2015 + + +3.3.2. Source-Specific Multicast (SSM) + + Shortcut name: Topo-SSM + + In Any-Source Multicast, any of the multicast group participants can + send to all the other multicast group participants, by sending a + packet to the multicast group. In contrast, Source-Specific + Multicast [RFC3569][RFC4607] refers to scenarios where only a single + source (Distribution Source) can send to the multicast group, + creating a topology that looks like the one below: + + +--------+ +-----+ + |Media | | | Source-Specific + |Sender 1|<----->| D S | Multicast + +--------+ | I O | +--+----------------> R(1) + | S U | | | | + +--------+ | T R | | +-----------> R(2) | + |Media |<----->| R C |->+ | : | | + |Sender 2| | I E | | +------> R(n-1) | | + +--------+ | B | | | | | | + : | U | +--+--> R(n) | | | + : | T +-| | | | | + : | I | |<---------+ | | | + +--------+ | O |F|<---------------+ | | + |Media | | N |T|<--------------------+ | + |Sender M|<----->| | |<-------------------------+ + +--------+ +-----+ RTCP Unicast + + FT = Feedback Target + Transport from the Feedback Target to the Distribution + Source is via unicast or multicast RTCP if they are not + co-located. + + Figure 6: Point to Multipoint Using Source-Specific Multicast + + In the SSM topology (Figure 6), a number of RTP sending endpoints + (RTP sources henceforth) (1 to M) are allowed to send media to the + SSM group. These sources send media to a dedicated Distribution + Source, which forwards the RTP streams to the multicast group on + behalf of the original RTP sources. The RTP streams reach the + receiving endpoints (receivers henceforth) (R(1) to R(n)). The + receivers' RTCP messages cannot be sent to the multicast group, as + the SSM multicast group by definition has only a single IP sender. + To support RTCP, an RTP extension for SSM [RFC5760] was defined. It + uses unicast transmission to send RTCP from each of the receivers to + one or more Feedback Targets (FT). The Feedback Targets relay the + RTCP unmodified, or provide a summary of the participants' RTCP + reports towards the whole group by forwarding the RTCP traffic to the + + + +Westerlund & Wenger Informational [Page 14] + +RFC 7667 RTP Topologies November 2015 + + + Distribution Source. Figure 6 only shows a single Feedback Target + integrated in the Distribution Source, but for scalability the FT can + be distributed and each instance can have responsibility for + subgroups of the receivers. For summary reports, however, there + typically must be a single Feedback Target aggregating all the + summaries to a common message to the whole receiver group. + + The RTP extension for SSM specifies how feedback (both reception + information and specific feedback events) are handled. The more + general problems associated with the use of multicast, where everyone + receives what the Distribution Source sends, need to be accounted + for. + + The aforementioned situation results in common behavior for RTP + multicast: + + 1. Multicast applications often use a group of RTP sessions, not + one. Each endpoint needs to be a member of most or all of these + RTP sessions in order to perform well. + + 2. Within each RTP session, the number of media sinks is likely to + be much larger than the number of RTP sources. + + 3. Multicast applications need signaling functions to identify the + relationships between RTP sessions. + + 4. Multicast applications need signaling functions to identify the + relationships between SSRCs in different RTP sessions. + + All multicast configurations share a signaling requirement: all of + the endpoints need to have the same RTP and payload type + configuration. Otherwise, endpoint A could, for example, be using + payload type 97 to identify the video codec H.264, while endpoint B + would identify it as MPEG-2, with unpredictable but almost certainly + not visually pleasing results. + + Security solutions for this type of group communication are also + challenging. First, the key management and the security protocol + must support group communication. Source authentication becomes more + difficult and requires specialized solutions. For more discussion on + this, please review "Options for Securing RTP Sessions" [RFC7201]. + +3.3.3. SSM with Local Unicast Resources + + Shortcut name: Topo-SSM-RAMS + + "Unicast-Based Rapid Acquisition of Multicast RTP Sessions" [RFC6285] + results in additional extensions to SSM topology. + + + +Westerlund & Wenger Informational [Page 15] + +RFC 7667 RTP Topologies November 2015 + + + ----------- -------------- + | |------------------------------------>| | + | |.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.->| | + | | | | + | Multicast | ---------------- | | + | Source | | Retransmission | | | + | |-------->| Server (RS) | | | + | |.-.-.-.->| | | | + | | | ------------ | | | + ----------- | | Feedback | |<.=.=.=.=.| | + | | Target (FT)| |<~~~~~~~~~| RTP Receiver | + PRIMARY MULTICAST | ------------ | | (RTP_Rx) | + RTP SESSION with | | | | + UNICAST FEEDBACK | | | | + | | | | + - - - - - - - - - - - |- - - - - - - - |- - - - - |- - - - - - - |- - + | | | | + UNICAST BURST | ------------ | | | + (or RETRANSMISSION) | | Burst/ | |<~~~~~~~~>| | + RTP SESSION | | Retrans. | |.........>| | + | |Source (BRS)| |<.=.=.=.=>| | + | ------------ | | | + | | | | + ---------------- -------------- + + -------> Multicast RTP Stream + .-.-.-.> Multicast RTCP Stream + .=.=.=.> Unicast RTCP Reports + ~~~~~~~> Unicast RTCP Feedback Messages + .......> Unicast RTP Stream + + Figure 7: SSM with Local Unicast Resources (RAMS) + + The rapid acquisition extension allows an endpoint joining an SSM + multicast session to request media starting with the last sync point + (from where media can be decoded without requiring context + established by the decoding of prior packets) to be sent at high + speed until such time where, after the decoding of these burst- + delivered media packets, the correct media timing is established, + i.e., media packets are received within adequate buffer intervals for + this application. This is accomplished by first establishing a + unicast PtP RTP session between the Burst/Retransmission Source (BRS) + (Figure 7) and the RTP Receiver. The unicast session is used to + transmit cached packets from the multicast group at higher then + normal speed in order to synchronize the receiver to the ongoing + multicast RTP stream. Once the RTP receiver and its decoder have + caught up with the multicast session's current delivery, the receiver + switches over to receiving directly from the multicast group. In + + + +Westerlund & Wenger Informational [Page 16] + +RFC 7667 RTP Topologies November 2015 + + + many deployed applications, the (still existing) PtP RTP session is + used as a repair channel, i.e., for RTP Retransmission traffic of + those packets that were not received from the multicast group. + +3.4. Point to Multipoint Using Mesh + + Shortcut name: Topo-Mesh + + +---+ +---+ + | A |<---->| B | + +---+ +---+ + ^ ^ + \ / + \ / + v v + +---+ + | C | + +---+ + + Figure 8: Point to Multipoint Using Mesh + + Based on the RTP session definition, it is clearly possible to have a + joint RTP session involving three or more endpoints over multiple + unicast transport flows, like the joint three-endpoint session + depicted above. In this case, A needs to send its RTP streams and + RTCP packets to both B and C over their respective transport flows. + As long as all endpoints do the same, everyone will have a joint view + of the RTP session. + + This topology does not create any additional requirements beyond the + need to have multiple transport flows associated with a single RTP + session. Note that an endpoint may use a single local port to + receive all these transport flows (in which case the sending port, IP + address, or SSRC can be used to demultiplex), or it might have + separate local reception ports for each of the endpoints. + + + + + + + + + + + + + + + + +Westerlund & Wenger Informational [Page 17] + +RFC 7667 RTP Topologies November 2015 + + + +-A--------------------+ + |+---+ | + ||CAM| | +-B-----------+ + |+---+ +-UDP1------| |-UDP1------+ | + | | | +-RTP1----| |-RTP1----+ | | + | V | | +-Video-| |-Video-+ | | | + |+----+ | | | |<----------------|BV1 | | | | + ||ENC |----+-+-+--->AV1|---------------->| | | | | + |+----+ | | +-------| |-------+ | | | + | | | +---------| |---------+ | | + | | +-----------| |-----------+ | + | | | +-------------+ + | | | + | | | +-C-----------+ + | | +-UDP2------| |-UDP2------+ | + | | | +-RTP1----| |-RTP1----+ | | + | | | | +-Video-| |-Video-+ | | | + | +-------+-+-+--->AV1|---------------->| | | | | + | | | | |<----------------|CV1 | | | | + | | | +-------| |-------+ | | | + | | +---------| |---------+ | | + | +-----------| |-----------+ | + +----------------------+ +-------------+ + + Figure 9: A Multi-Unicast Mesh with a Joint RTP Session + + Figure 9 depicts endpoint A's view of using a common RTP session when + establishing the mesh as shown in Figure 8. There is only one RTP + session (RTP1) but two transport flows (UDP1 and UDP2). The Media + Source (CAM) is encoded and transmitted over the SSRC (AV1) across + both transport layers. However, as this is a joint RTP session, the + two streams must be the same. Thus, a congestion control adaptation + needed for the paths A to B and A to C needs to use the most + restricting path's properties. + + An alternative structure for establishing the above topology is to + use independent RTP sessions between each pair of peers, i.e., three + different RTP sessions. In some scenarios, the same RTP stream may + be sent from the transmitting endpoint; however, it also supports + local adaptation taking place in one or more of the RTP streams, + rendering them non-identical. + + + + + + + + + + +Westerlund & Wenger Informational [Page 18] + +RFC 7667 RTP Topologies November 2015 + + + +-A----------------------+ +-B-----------+ + |+---+ | | | + ||MIC| +-UDP1------| |-UDP1------+ | + |+---+ | +-RTP1----| |-RTP1----+ | | + | | +----+ | | +-Audio-| |-Audio-+ | | | + | +->|ENC1|--+-+-+--->AA1|------------->| | | | | + | | +----+ | | | |<-------------|BA1 | | | | + | | | | +-------| |-------+ | | | + | | | +---------| |---------+ | | + | | +-----------| |-----------+ | + | | ------------| |-------------| + | | | |-------------+ + | | | + | | | +-C-----------+ + | | | | | + | | +-UDP2------| |-UDP2------+ | + | | | +-RTP2----| |-RTP2----+ | | + | | +----+ | | +-Audio-| |-Audio-+ | | | + | +->|ENC2|--+-+-+--->AA2|------------->| | | | | + | +----+ | | | |<-------------|CA1 | | | | + | | | +-------| |-------+ | | | + | | +---------| |---------+ | | + | +-----------| |-----------+ | + +------------------------+ +-------------+ + + Figure 10: A Multi-Unicast Mesh with an Independent RTP Session + + Let's review the topology when independent RTP sessions are used from + A's perspective in Figure 10 by considering both how the media is + handled and how the RTP sessions are set up in Figure 10. A's + microphone is captured and the audio is fed into two different + encoder instances, each with a different independent RTP session, + i.e., RTP1 and RTP2, respectively. The SSRCs (AA1 and AA2) in each + RTP session are completely independent, and the media bitrate + produced by the encoders can also be tuned differently to address any + congestion control requirements differing for the paths A to B + compared to A to C. + + From a topologies viewpoint, an important difference exists in the + behavior around RTCP. First, when a single RTP session spans all + three endpoints A, B, and C, and their connecting RTP streams, a + common RTCP bandwidth is calculated and used for this single joint + session. In contrast, when there are multiple independent RTP + sessions, each RTP session has its local RTCP bandwidth allocation. + + Further, when multiple sessions are used, endpoints not directly + involved in a session do not have any awareness of the conditions in + those sessions. For example, in the case of the three-endpoint + + + +Westerlund & Wenger Informational [Page 19] + +RFC 7667 RTP Topologies November 2015 + + + configuration in Figure 8, endpoint A has no awareness of the + conditions occurring in the session between endpoints B and C + (whereas if a single RTP session were used, it would have such + awareness). + + Loop detection is also affected. With independent RTP sessions, the + SSRC/CSRC cannot be used to determine when an endpoint receives its + own media stream, or a mixed media stream including its own media + stream (a condition known as a loop). The identification of loops + and, in most cases, their avoidance, has to be achieved by other + means, for example, through signaling or the use of an RTP external + namespace binding SSRC/CSRC among any communicating RTP sessions in + the mesh. + +3.5. Point to Multipoint Using the RFC 3550 Translator + + This section discusses some additional usages related to point to + multipoint of translators compared to the point-to-point cases in + Section 3.2.1. + +3.5.1. Relay - Transport Translator + + Shortcut name: Topo-PtM-Trn-Translator + + This section discusses Transport Translator-only usages to enable + multipoint sessions. + + +-----+ + +---+ / \ +------------+ +---+ + | A |<---/ \ | |<---->| B | + +---+ / \ | | +---+ + + Multicast +->| Translator | + +---+ \ Network / | | +---+ + | C |<---\ / | |<---->| D | + +---+ \ / +------------+ +---+ + +-----+ + + Figure 11: Point to Multipoint Using Multicast + + Figure 11 depicts an example of a Transport Translator performing at + least IP address translation. It allows the (non-multicast-capable) + endpoints B and D to take part in an Any-Source Multicast session + involving endpoints A and C, by having the translator forward their + unicast traffic to the multicast addresses in use, and vice versa. + It must also forward B's traffic to D, and vice versa, to provide + both B and D with a complete view of the session. + + + + + +Westerlund & Wenger Informational [Page 20] + +RFC 7667 RTP Topologies November 2015 + + + +---+ +------------+ +---+ + | A |<---->| |<---->| B | + +---+ | | +---+ + | Translator | + +---+ | | +---+ + | C |<---->| |<---->| D | + +---+ +------------+ +---+ + + Figure 12: RTP Translator (Relay) with Only Unicast Paths + + Another translator scenario is depicted in Figure 12. The translator + in this case connects multiple endpoints through unicast. This can + be implemented using a very simple Transport Translator which, in + this document, is called a relay. The relay forwards all traffic it + receives, both RTP and RTCP, to all other endpoints. In doing so, a + multicast network is emulated without relying on a multicast-capable + network infrastructure. + + For RTCP feedback, this results in a similar set of considerations to + those described in the ASM RTP topology. It also puts some + additional signaling requirements onto the session establishment; for + example, a common configuration of RTP payload types is required. + + Transport Translators and relays should always consider implementing + source address filtering, to prevent attackers from using the + listening ports on the translator to inject traffic. The translator + can, however, go one step further, especially if explicit SSRC + signaling is used, to prevent endpoints from sending SSRCs other than + its own (that are, for example, used by other participants in the + session). This can improve the security properties of the session, + despite the use of group keys that on a cryptographic level allows + anyone to impersonate another in the same RTP session. + + A translator that doesn't change the RTP/RTCP packet content can be + operated without requiring it to have access to the security contexts + used to protect the RTP/RTCP traffic between the participants. + +3.5.2. Media Translator + + In the context of multipoint communications, a Media Translator is + not providing new mechanisms to establish a multipoint session. It + is more of an enabler, or facilitator, that ensures a given endpoint + or a defined subset of endpoints can participate in the session. + + If endpoint B in Figure 11 were behind a limited network path, the + translator may perform media transcoding to allow the traffic + received from the other endpoints to reach B without overloading the + path. This transcoding can help the other endpoints in the multicast + + + +Westerlund & Wenger Informational [Page 21] + +RFC 7667 RTP Topologies November 2015 + + + part of the session, by not requiring the quality transmitted by A to + be lowered to the bitrates that B is actually capable of receiving + (and vice versa). + +3.6. Point to Multipoint Using the RFC 3550 Mixer Model + + Shortcut name: Topo-Mixer + + A mixer is a middlebox that aggregates multiple RTP streams that are + part of a session by generating one or more new RTP streams and, in + most cases, by manipulating the media data. One common application + for a mixer is to allow a participant to receive a session with a + reduced amount of resources. + + +-----+ + +---+ / \ +-----------+ +---+ + | A |<---/ \ | |<---->| B | + +---+ / Multi- \ | | +---+ + + cast +->| Mixer | + +---+ \ Network / | | +---+ + | C |<---\ / | |<---->| D | + +---+ \ / +-----------+ +---+ + +-----+ + + Figure 13: Point to Multipoint Using the RFC 3550 Mixer Model + + A mixer can be viewed as a device terminating the RTP streams + received from other endpoints in the same RTP session. Using the + media data carried in the received RTP streams, a mixer generates + derived RTP streams that are sent to the receiving endpoints. + + The content that the mixer provides is the mixed aggregate of what + the mixer receives over the PtP or PtM paths, which are part of the + same Communication Session. + + The mixer creates the Media Source and the source RTP stream just + like an endpoint, as it mixes the content (often in the uncompressed + domain) and then encodes and packetizes it for transmission to a + receiving endpoint. The CSRC Count (CC) and CSRC fields in the RTP + header can be used to indicate the contributors to the newly + generated RTP stream. The SSRCs of the to-be-mixed streams on the + mixer input appear as the CSRCs at the mixer output. That output + stream uses a unique SSRC that identifies the mixer's stream. The + CSRC should be forwarded between the different endpoints to allow for + loop detection and identification of sources that are part of the + Communication Session. Note that Section 7.1 of RFC 3550 requires + + + + + +Westerlund & Wenger Informational [Page 22] + +RFC 7667 RTP Topologies November 2015 + + + the SSRC space to be shared between domains for these reasons. This + also implies that any SDES information normally needs to be forwarded + across the mixer. + + The mixer is responsible for generating RTCP packets in accordance + with its role. It is an RTP receiver and should therefore send RTCP + receiver reports for the RTP streams it receives and terminates. In + its role as an RTP sender, it should also generate RTCP sender + reports for those RTP streams it sends. As specified in Section 7.3 + of RFC 3550, a mixer must not forward RTCP unaltered between the two + domains. + + The mixer depicted in Figure 13 is involved in three domains that + need to be separated: the Any-Source Multicast network (including + endpoints A and C), endpoint B, and endpoint D. Assuming all four + endpoints in the conference are interested in receiving content from + all other endpoints, the mixer produces different mixed RTP streams + for B and D, as the one to B may contain content received from D, and + vice versa. However, the mixer may only need one SSRC per media type + in each domain where it is the receiving entity and transmitter of + mixed content. + + In the multicast domain, a mixer still needs to provide a mixed view + of the other domains. This makes the mixer simpler to implement and + avoids any issues with advanced RTCP handling or loop detection, + which would be problematic if the mixer were providing non-symmetric + behavior. Please see Section 3.11 for more discussion on this topic. + The mixing operation, however, in each domain could potentially be + different. + + A mixer is responsible for receiving RTCP feedback messages and + handling them appropriately. The definition of "appropriate" depends + on the message itself and the context. In some cases, the reception + of a codec-control message by the mixer may result in the generation + and transmission of RTCP feedback messages by the mixer to the + endpoints in the other domain(s). In other cases, a message is + handled by the mixer locally and therefore not forwarded to any other + domain. + + When replacing the multicast network in Figure 13 (to the left of the + mixer) with individual unicast paths as depicted in Figure 14, the + mixer model is very similar to the one discussed in Section 3.9 + below. Please see the discussion in Section 3.9 about the + differences between these two models. + + + + + + + +Westerlund & Wenger Informational [Page 23] + +RFC 7667 RTP Topologies November 2015 + + + +---+ +------------+ +---+ + | A |<---->| |<---->| B | + +---+ | | +---+ + | Mixer | + +---+ | | +---+ + | C |<---->| |<---->| D | + +---+ +------------+ +---+ + + Figure 14: RTP Mixer with Only Unicast Paths + + We now discuss in more detail the different mixing operations that a + mixer can perform and how they can affect RTP and RTCP behavior. + +3.6.1. Media-Mixing Mixer + + The Media-Mixing Mixer is likely the one that most think of when they + hear the term "mixer". Its basic mode of operation is that it + receives RTP streams from several endpoints and selects the stream(s) + to be included in a media-domain mix. The selection can be through + static configuration or by dynamic, content-dependent means such as + voice activation. The mixer then creates a single outgoing RTP + stream from this mix. + + The most commonly deployed Media-Mixing Mixer is probably the audio + mixer, used in voice conferencing, where the output consists of a + mixture of all the input audio signals; this needs minimal signaling + to be successfully set up. From a signal processing viewpoint, audio + mixing is relatively straightforward and commonly possible for a + reasonable number of endpoints. Assume, for example, that one wants + to mix N streams from N different endpoints. The mixer needs to + decode those N streams, typically into the sample domain, and then + produce N or N+1 mixes. Different mixes are needed so that each + endpoint gets a mix of all other sources except its own, as this + would result in an echo. When N is lower than the number of all + endpoints, one may produce a mix of all N streams for the group that + are currently not included in the mix; thus, N+1 mixes. These audio + streams are then encoded again, RTP packetized, and sent out. In + many cases, audio level normalization, noise suppression, and similar + signal processing steps are also required or desirable before the + actual mixing process commences. + + In video, the term "mixing" has a different interpretation than + audio. It is commonly used to refer to the process of spatially + combining contributed video streams, which is also known as "tiling". + The reconstructed, appropriately scaled down videos can be spatially + arranged in a set of tiles, with each tile containing the video from + an endpoint (typically showing a human participant). Tiles can be of + different sizes so that, for example, a particularly important + + + +Westerlund & Wenger Informational [Page 24] + +RFC 7667 RTP Topologies November 2015 + + + participant, or the loudest speaker, is being shown in a larger tile + than other participants. A self-view picture can be included in the + tiling, which can be either locally produced or feedback from a + mixer-received and reconstructed video image. Such remote loopback + allows for confidence monitoring, i.e., it enables the participant to + see himself/herself in the same quality as other participants see + him/her. The tiling normally operates on reconstructed video in the + sample domain. The tiled image is encoded, packetized, and sent by + the mixer to the receiving endpoints. It is possible that a + middlebox with media mixing duties contains only a single mixer of + the aforementioned type, in which case all participants necessarily + see the same tiled video, even if it is being sent over different RTP + streams. More common, however, are mixing arrangements where an + individual mixer is available for each outgoing port of the + middlebox, allowing individual compositions for each receiving + endpoint (a feature commonly referred to as personalized layout). + + One problem with media mixing is that it consumes both large amounts + of media processing resources (for the decoding and mixing process in + the uncompressed domain) and encoding resources (for the encoding of + the mixed signal). Another problem is the quality degradation + created by decoding and re-encoding the media, which is the result of + the lossy nature of the most commonly used media codecs. A third + problem is the latency introduced by the media mixing, which can be + substantial and annoyingly noticeable in case of video, or in case of + audio if that mixed audio is lip-synchronized with high-latency + video. The advantage of media mixing is that it is straightforward + for the endpoints to handle the single media stream (which includes + the mixed aggregate of many sources), as they don't need to handle + multiple decodings, local mixing, and composition. In fact, mixers + were introduced in pre-RTP times so that legacy, single stream + receiving endpoints (that, in some protocol environments, actually + didn't need to be aware of the multipoint nature of the conference) + could successfully participate in what a user would recognize as a + multiparty video conference. + + + + + + + + + + + + + + + + +Westerlund & Wenger Informational [Page 25] + +RFC 7667 RTP Topologies November 2015 + + + +-A---------+ +-MIXER----------------------+ + | +-RTP1----| |-RTP1------+ +-----+ | + | | +-Audio-| |-Audio---+ | +---+ | | | + | | | AA1|--------->|---------+-+-|DEC|->| | | + | | | |<---------|MA1 <----+ | +---+ | | | + | | | | |(BA1+CA1)|\| +---+ | | | + | | +-------| |---------+ +-|ENC|<-| B+C | | + | +---------| |-----------+ +---+ | | | + +-----------+ | | | | + | | M | | + +-B---------+ | | E | | + | +-RTP2----| |-RTP2------+ | D | | + | | +-Audio-| |-Audio---+ | +---+ | I | | + | | | BA1|--------->|---------+-+-|DEC|->| A | | + | | | |<---------|MA2 <----+ | +---+ | | | + | | +-------| |(AA1+CA1)|\| +---+ | | | + | +---------| |---------+ +-|ENC|<-| A+C | | + +-----------+ |-----------+ +---+ | | | + | | M | | + +-C---------+ | | I | | + | +-RTP3----| |-RTP3------+ | X | | + | | +-Audio-| |-Audio---+ | +---+ | E | | + | | | CA1|--------->|---------+-+-|DEC|->| R | | + | | | |<---------|MA3 <----+ | +---+ | | | + | | +-------| |(AA1+BA1)|\| +---+ | | | + | +---------| |---------+ +-|ENC|<-| A+B | | + +-----------+ |-----------+ +---+ +-----+ | + +----------------------------+ + + Figure 15: Session and SSRC Details for Media Mixer + + From an RTP perspective, media mixing can be a very simple process, + as can be seen in Figure 15. The mixer presents one SSRC towards the + receiving endpoint, e.g., MA1 to Peer A, where the associated stream + is the media mix of the other endpoints. As each peer, in this + example, receives a different version of a mix from the mixer, there + is no actual relation between the different RTP sessions in terms of + actual media or transport-level information. There are, however, + common relationships between RTP1-RTP3, namely SSRC space and + identity information. When A receives the MA1 stream, which is a + combination of BA1 and CA1 streams, the mixer may include CSRC + information in the MA1 stream to identify the Contributing Sources + BA1 and CA1, allowing the receiver to identify the Contributing + Sources even if this were not possible through the media itself or + through other signaling means. + + The CSRC has, in turn, utility in RTP extensions, like the RTP header + extension for Mixer-to-Client Audio Level Indication [RFC6465]. If + + + +Westerlund & Wenger Informational [Page 26] + +RFC 7667 RTP Topologies November 2015 + + + the SSRCs from the endpoint to mixer paths are used as CSRCs in + another RTP session, then RTP1, RTP2, and RTP3 become one joint + session as they have a common SSRC space. At this stage, the mixer + also needs to consider which RTCP information it needs to expose in + the different paths. In the above scenario, a mixer would normally + expose nothing more than the SDES information and RTCP BYE for a CSRC + leaving the session. The main goal would be to enable the correct + binding against the application logic and other information sources. + This also enables loop detection in the RTP session. + +3.6.2. Media-Switching Mixer + + Media-Switching Mixers are used in limited functionality scenarios + where no, or only very limited, concurrent presentation of multiple + sources is required by the application and also in more complex + multi-stream usages with receiver mixing or tiling, including + combined with simulcast and/or scalability between source and mixer. + An RTP mixer based on media switching avoids the media decoding and + encoding operations in the mixer, as it conceptually forwards the + encoded media stream as it was being sent to the mixer. It does not + avoid, however, the decryption and re-encryption cycle as it rewrites + RTP headers. Forwarding media (in contrast to reconstructing-mixing- + encoding media) reduces the amount of computational resources needed + in the mixer and increases the media quality (both in terms of + fidelity and reduced latency). + + A Media-Switching Mixer maintains a pool of SSRCs representing + conceptual or functional RTP streams that the mixer can produce. + These RTP streams are created by selecting media from one of the RTP + streams received by the mixer and forwarded to the peer using the + mixer's own SSRCs. The mixer can switch between available sources if + that is required by the concept for the source, like the currently + active speaker. Note that the mixer, in most cases, still needs to + perform a certain amount of media processing, as many media formats + do not allow to "tune into" the stream at arbitrary points in their + bitstream. + + To achieve a coherent RTP stream from the mixer's SSRC, the mixer + needs to rewrite the incoming RTP packet's header. First, the SSRC + field must be set to the value of the mixer's SSRC. Second, the + sequence number must be the next in the sequence of outgoing packets + it sent. Third, the RTP timestamp value needs to be adjusted using + an offset that changes each time one switches the Media Source. + Finally, depending on the negotiation of the RTP payload type, the + value representing this particular RTP payload configuration may have + to be changed if the different endpoint-to-mixer paths have not + arrived on the same numbering for a given configuration. This also + + + + +Westerlund & Wenger Informational [Page 27] + +RFC 7667 RTP Topologies November 2015 + + + requires that the different endpoints support a common set of codecs, + otherwise media transcoding for codec compatibility would still be + required. + + We now consider the operation of a Media-Switching Mixer that + supports a video conference with six participating endpoints (A-F) + where the two most recent speakers in the conference are shown to + each receiving endpoint. Thus, the mixer has two SSRCs sending video + to each peer, and each peer is capable of locally handling two video + streams simultaneously. + + +-A---------+ +-MIXER----------------------+ + | +-RTP1----| |-RTP1------+ +-----+ | + | | +-Video-| |-Video---+ | | | | + | | | AV1|------------>|---------+-+------->| S | | + | | | |<------------|MV1 <----+-+-BV1----| W | | + | | | |<------------|MV2 <----+-+-EV1----| I | | + | | +-------| |---------+ | | T | | + | +---------| |-----------+ | C | | + +-----------+ | | H | | + | | | | + +-B---------+ | | M | | + | +-RTP2----| |-RTP2------+ | A | | + | | +-Video-| |-Video---+ | | T | | + | | | BV1|------------>|---------+-+------->| R | | + | | | |<------------|MV3 <----+-+-AV1----| I | | + | | | |<------------|MV4 <----+-+-EV1----| X | | + | | +-------| |---------+ | | | | + | +---------| |-----------+ | | | + +-----------+ | | | | + : : : : + : : : : + +-F---------+ | | | | + | +-RTP6----| |-RTP6------+ | | | + | | +-Video-| |-Video---+ | | | | + | | | FV1|------------>|---------+-+------->| | | + | | | |<------------|MV11 <---+-+-AV1----| | | + | | | |<------------|MV12 <---+-+-EV1----| | | + | | +-------| |---------+ | | | | + | +---------| |-----------+ +-----+ | + +-----------+ +----------------------------+ + + + Figure 16: Media-Switching RTP Mixer + + + + + + + +Westerlund & Wenger Informational [Page 28] + +RFC 7667 RTP Topologies November 2015 + + + The Media-Switching Mixer can, similarly to the Media-Mixing Mixer, + reduce the bitrate required for media transmission towards the + different peers by selecting and forwarding only a subset of RTP + streams it receives from the sending endpoints. In case the mixer + receives simulcast transmissions or a scalable encoding of the Media + Source, the mixer has more degrees of freedom to select streams or + subsets of streams to forward to a receiving endpoint, both based on + transport or endpoint restrictions as well as application logic. + + To ensure that a media receiver in an endpoint can correctly decode + the media in the RTP stream after a switch, a codec that uses + temporal prediction needs to start its decoding from independent + refresh points, or points in the bitstream offering similar + functionality (like "dirty refresh points"). For some codecs, for + example, frame-based speech and audio codecs, this is easily achieved + by starting the decoding at RTP packet boundaries, as each packet + boundary provides a refresh point (assuming proper packetization on + the encoder side). For other codecs, particularly in video, refresh + points are less common in the bitstream or may not be present at all + without an explicit request to the respective encoder. The Full + Intra Request [RFC5104] RTCP codec control message has been defined + for this purpose. + + In this type of mixer, one could consider fully terminating the RTP + sessions between the different endpoint and mixer paths. The same + arguments and considerations as discussed in Section 3.9 need to be + taken into consideration and apply here. + +3.7. Selective Forwarding Middlebox + + Another method for handling media in the RTP mixer is to "project", + or make available, all potential RTP sources (SSRCs) into a per- + endpoint, independent RTP session. The middlebox can select which of + the potential sources that are currently actively transmitting media + will be sent to each of the endpoints. This is similar to the Media- + Switching Mixer but has some important differences in RTP details. + + + + + + + + + + + + + + + +Westerlund & Wenger Informational [Page 29] + +RFC 7667 RTP Topologies November 2015 + + + +-A---------+ +-Middlebox-----------------+ + | +-RTP1----| |-RTP1------+ +-----+ | + | | +-Video-| |-Video---+ | | | | + | | | AV1|------------>|---------+-+------>| | | + | | | |<------------|BV1 <----+-+-------| S | | + | | | |<------------|CV1 <----+-+-------| W | | + | | | |<------------|DV1 <----+-+-------| I | | + | | | |<------------|EV1 <----+-+-------| T | | + | | | |<------------|FV1 <----+-+-------| C | | + | | +-------| |---------+ | | H | | + | +---------| |-----------+ | | | + +-----------+ | | M | | + | | A | | + +-B---------+ | | T | | + | +-RTP2----| |-RTP2------+ | R | | + | | +-Video-| |-Video---+ | | I | | + | | | BV1|------------>|---------+-+------>| X | | + | | | |<------------|AV1 <----+-+-------| | | + | | | |<------------|CV1 <----+-+-------| | | + | | | | : : : |: : : : : : : : :| | | + | | | |<------------|FV1 <----+-+-------| | | + | | +-------| |---------+ | | | | + | +---------| |-----------+ | | | + +-----------+ | | | | + : : : : + : : : : + +-F---------+ | | | | + | +-RTP6----| |-RTP6------+ | | | + | | +-Video-| |-Video---+ | | | | + | | | FV1|------------>|---------+-+------>| | | + | | | |<------------|AV1 <----+-+-------| | | + | | | | : : : |: : : : : : : : :| | | + | | | |<------------|EV1 <----+-+-------| | | + | | +-------| |---------+ | | | | + | +---------| |-----------+ +-----+ | + +-----------+ +---------------------------+ + + Figure 17: Selective Forwarding Middlebox + + In the six endpoint conference depicted above (in Figure 17), one can + see that endpoint A is aware of five incoming SSRCs, BV1-FV1. If + this middlebox intends to have a similar behavior as in Section 3.6.2 + where the mixer provides the endpoints with the two latest speaking + endpoints, then only two out of these five SSRCs need concurrently + transmit media to A. As the middlebox selects the source in the + different RTP sessions that transmit media to the endpoints, each RTP + stream requires the rewriting of certain RTP header fields when being + projected from one session into another. In particular, the sequence + + + +Westerlund & Wenger Informational [Page 30] + +RFC 7667 RTP Topologies November 2015 + + + number needs to be consecutively incremented based on the packet + actually being transmitted in each RTP session. Therefore, the RTP + sequence number offset will change each time a source is turned on in + an RTP session. The timestamp (possibly offset) stays the same. + + The RTP sessions can be considered independent, resulting in that the + SSRC numbers used can also be handled independently. This simplifies + the SSRC collision detection and avoidance but requires tools such as + remapping tables between the RTP sessions. Using independent RTP + sessions is not required, as it is possible for the switching + behavior to also perform with a common SSRC space. However, in this + case, collision detection and handling becomes a different problem. + It is up to the implementation to use a single common SSRC space or + separate ones. + + Using separate SSRC spaces has some implications. For example, the + RTP stream that is being sent by endpoint B to the middlebox (BV1) + may use an SSRC value of 12345678. When that RTP stream is sent to + endpoint F by the middlebox, it can use any SSRC value, e.g., + 87654321. As a result, each endpoint may have a different view of + the application usage of a particular SSRC. Any RTP-level identity + information, such as SDES items, also needs to update the SSRC + referenced, if the included SDES items are intended to be global. + Thus, the application must not use SSRC as references to RTP streams + when communicating with other peers directly. This also affects loop + detection, which will fail to work as there is no common namespace + and identities across the different legs in the Communication Session + on the RTP level. Instead, this responsibility falls onto higher + layers. + + The middlebox is also responsible for receiving any RTCP codec + control requests coming from an endpoint and deciding if it can act + on the request locally or needs to translate the request into the RTP + session/transport leg that contains the Media Source. Both endpoints + and the middlebox need to implement conference-related codec control + functionalities to provide a good experience. Commonly used are Full + Intra Request to request from the Media Source that switching points + be provided between the sources and Temporary Maximum Media Bitrate + Request (TMMBR) to enable the middlebox to aggregate congestion + control responses towards the Media Source so to enable it to adjust + its bitrate (obviously, only in case the limitation is not in the + source to middlebox link). + + The Selective Forwarding Middlebox has been introduced in recently + developed videoconferencing systems in conjunction with, and to + capitalize on, scalable video coding as well as simulcasting. An + example of scalable video coding is Annex G of H.264, but other + codecs, including H.264 AVC and VP8, also exhibit scalability, albeit + + + +Westerlund & Wenger Informational [Page 31] + +RFC 7667 RTP Topologies November 2015 + + + only in the temporal dimension. In both scalable coding and + simulcast cases, the video signal is represented by a set of two or + more bitstreams, providing a corresponding number of distinct + fidelity points. The middlebox selects which parts of a scalable + bitstream (or which bitstream, in the case of simulcasting) to + forward to each of the receiving endpoints. The decision may be + driven by a number of factors, such as available bitrate, desired + layout, etc. Contrary to transcoding MCUs, SFMs have extremely low + delay and provide features that are typically associated with high- + end systems (personalized layout, error localization) without any + signal processing at the middlebox. They are also capable of scaling + to a large number of concurrent users, and--due to their very low + delay--can also be cascaded. + + This version of the middlebox also puts different requirements on the + endpoint when it comes to decoder instances and handling of the RTP + streams providing media. As each projected SSRC can, at any time, + provide media, the endpoint either needs to be able to handle as many + decoder instances as the middlebox received, or have efficient + switching of decoder contexts in a more limited set of actual decoder + instances to cope with the switches. The application also gets more + responsibility to update how the media provided is to be presented to + the user. + + Note that this topology could potentially be seen as a Media + Translator that includes an on/off logic as part of its media + translation. The topology has the property that all SSRCs present in + the session are visible to an endpoint. It also has mixer aspects, + as the streams it provides are not basically translated versions, but + instead they have conceptual property assigned to them and can be + both turned on/off as well as fully or partially delivered. Thus, + this topology appears to be some hybrid between the translator and + mixer model. + + The differences between a Selective Forwarding Middlebox and a + Switching-Media Mixer (Section 3.6.2) are minor, and they share most + properties. The above requirement on having a large number of + decoding instances or requiring efficient switching of decoder + contexts, are one point of difference. The other is how the + identification is performed, where the mixer uses CSRC to provide + information on what is included in a particular RTP stream that + represents a particular concept. Selective forwarding gets the + source information through the SSRC and instead uses other mechanisms + to indicate the streams intended usage, if needed. + + + + + + + +Westerlund & Wenger Informational [Page 32] + +RFC 7667 RTP Topologies November 2015 + + +3.8. Point to Multipoint Using Video-Switching MCUs + + Shortcut name: Topo-Video-switch-MCU + + +---+ +------------+ +---+ + | A |------| Multipoint |------| B | + +---+ | Control | +---+ + | Unit | + +---+ | (MCU) | +---+ + | C |------| |------| D | + +---+ +------------+ +---+ + + Figure 18: Point to Multipoint Using a Video-Switching MCU + + This PtM topology was popular in early implementations of multipoint + videoconferencing systems due to its simplicity, and the + corresponding middlebox design has been known as a "video-switching + MCU". The more complex RTCP-terminating MCUs, discussed in the next + section, became the norm, however, when technology allowed + implementations at acceptable costs. + + A video-switching MCU forwards to a participant a single media + stream, selected from the available streams. The criteria for + selection are often based on voice activity in the audio-visual + conference, but other conference management mechanisms (like + presentation mode or explicit floor control) are known to exist as + well. + + The video-switching MCU may also perform media translation to modify + the content in bitrate, encoding, or resolution. However, it still + may indicate the original sender of the content through the SSRC. In + this case, the values of the CC and CSRC fields are retained. + + If not terminating RTP, the RTCP sender reports are forwarded for the + currently selected sender. All RTCP receiver reports are freely + forwarded between the endpoints. In addition, the MCU may also + originate RTCP control traffic in order to control the session and/or + report on status from its viewpoint. + + The video-switching MCU has most of the attributes of a translator. + However, its stream selection is a mixing behavior. This behavior + has some RTP and RTCP issues associated with it. The suppression of + all but one RTP stream results in most participants seeing only a + subset of the sent RTP streams at any given time, often a single RTP + stream per conference. Therefore, RTCP receiver reports only report + on these RTP streams. Consequently, the endpoints emitting RTP + streams that are not currently forwarded receive a view of the + session that indicates their RTP streams disappear somewhere en + + + +Westerlund & Wenger Informational [Page 33] + +RFC 7667 RTP Topologies November 2015 + + + route. This makes the use of RTCP for congestion control, or any + type of quality reporting, very problematic. + + To avoid the aforementioned issues, the MCU needs to implement two + features. First, it needs to act as a mixer (see Section 3.6) and + forward the selected RTP stream under its own SSRC and with the + appropriate CSRC values. Second, the MCU needs to modify the RTCP + RRs it forwards between the domains. As a result, it is recommended + that one implement a centralized video-switching conference using a + mixer according to RFC 3550, instead of the shortcut implementation + described here. + +3.9. Point to Multipoint Using RTCP-Terminating MCU + + Shortcut name: Topo-RTCP-terminating-MCU + + +---+ +------------+ +---+ + | A |<---->| Multipoint |<---->| B | + +---+ | Control | +---+ + | Unit | + +---+ | (MCU) | +---+ + | C |<---->| |<---->| D | + +---+ +------------+ +---+ + + Figure 19: Point to Multipoint Using Content Modifying MCUs + + In this PtM scenario, each endpoint runs an RTP point-to-point + session between itself and the MCU. This is a very commonly deployed + topology in multipoint video conferencing. The content that the MCU + provides to each participant is either: + + a. a selection of the content received from the other endpoints or + + b. the mixed aggregate of what the MCU receives from the other PtP + paths, which are part of the same Communication Session. + + In case (a), the MCU may modify the content in terms of bitrate, + encoding format, or resolution. No explicit RTP mechanism is used to + establish the relationship between the original RTP stream of the + media being sent and the RTP stream the MCU sends. In other words, + the outgoing RTP streams typically use a different SSRC, and may well + use a different payload type (PT), even if this different PT happens + to be mapped to the same media type. This is a result of the + individually negotiated RTP session for each endpoint. + + In case (b), the MCU is the Media Source and generates the Source RTP + Stream as it mixes the received content and then encodes and + packetizes it for transmission to an endpoint. According to RTP + + + +Westerlund & Wenger Informational [Page 34] + +RFC 7667 RTP Topologies November 2015 + + + [RFC3550], the SSRC of the contributors are to be signaled using the + CSRC/CC mechanism. In practice, today, most deployed MCUs do not + implement this feature. Instead, the identification of the endpoints + whose content is included in the mixer's output is not indicated + through any explicit RTP mechanism. That is, most deployed MCUs set + the CC field in the RTP header to zero, thereby indicating no + available CSRC information, even if they could identify the original + sending endpoints as suggested in RTP. + + The main feature that sets this topology apart from what RFC 3550 + describes is the breaking of the common RTP session across the + centralized device, such as the MCU. This results in the loss of + explicit RTP-level indication of all participants. If one were using + the mechanisms available in RTP and RTCP to signal this explicitly, + the topology would follow the approach of an RTP mixer. The lack of + explicit indication has at least the following potential problems: + + 1. Loop detection cannot be performed on the RTP level. When + carelessly connecting two misconfigured MCUs, a loop could be + generated. + + 2. There is no information about active media senders available in + the RTP packet. As this information is missing, receivers cannot + use it. It also deprives the client of information related to + currently active senders in a machine-usable way, thus preventing + clients from indicating currently active speakers in user + interfaces, etc. + + Note that many/most deployed MCUs (and video conferencing endpoints) + rely on signaling-layer mechanisms for the identification of the + Contributing Sources, for example, a SIP conferencing package + [RFC4575]. This alleviates, to some extent, the aforementioned + issues resulting from ignoring RTP's CSRC mechanism. + +3.10. Split Component Terminal + + Shortcut name: Topo-Split-Terminal + + In some applications, for example, in some telepresence systems, + terminals may not be integrated into a single functional unit but + composed of more than one subunits. For example, a telepresence room + terminal employing multiple cameras and monitors may consist of + multiple video conferencing subunits, each capable of handling a + single camera and monitor. Another example would be a video + conferencing terminal in which audio is handled by one subunit, and + video by another. Each of these subunits uses its own physical + network interface (for example: Ethernet jack) and network address. + + + + +Westerlund & Wenger Informational [Page 35] + +RFC 7667 RTP Topologies November 2015 + + + The various (media processing) subunits need (logically and + physically) to be interconnected by control functionality, but their + media plane functionality may be split. These types of terminals are + referred to as split component terminals. Historically, the earliest + split component terminals were perhaps the independent audio and + video conference software tools used over the MBONE in the late + 1990s. + + An example for such a split component terminal is depicted in + Figure 20. Within split component terminal A, at least audio and + video subunits are addressed by their own network addresses. In some + of these systems, the control stack subunit may also have its own + network address. + + From an RTP viewpoint, each of the subunits terminates RTP and acts + as an endpoint in the sense that each subunit includes its own, + independent RTP stack. However, as the subunits are semantically + part of the same terminal, it is appropriate that this semantic + relationship is expressed in RTCP protocol elements, namely in the + CNAME. + + +---------------------+ + | Endpoint A | + | Local Area Network | + | +------------+ | + | +->| Audio |<+-RTP---\ + | | +------------+ | \ +------+ + | | +------------+ | +-->| | + | +->| Video |<+-RTP-------->| B | + | | +------------+ | +-->| | + | | +------------+ | / +------+ + | +->| Control |<+-SIP---/ + | +------------+ | + +---------------------+ + + Figure 20: Split Component Terminal + + It is further sensible that the subunits share a common clock from + which RTP and RTCP clocks are derived, to facilitate synchronization + and avoid clock drift. + + To indicate that audio and video Source Streams generated by + different subunits share a common clock, and can be synchronized, the + RTP streams generated from those Source Streams need to include the + same CNAME in their RTCP SDES packets. The use of a common CNAME for + RTP flows carried in different transport-layer flows is entirely + normal for RTP and RTCP senders, and fully compliant RTP endpoints, + middleboxes, and other tools should have no problem with this. + + + +Westerlund & Wenger Informational [Page 36] + +RFC 7667 RTP Topologies November 2015 + + + However, outside of the split component terminal scenario (and + perhaps a multihomed endpoint scenario, which is not further + discussed herein), the use of a common CNAME in RTP streams sent from + separate endpoints (as opposed to a common CNAME for RTP streams sent + on different transport-layer flows between two endpoints) is rare. + It has been reported that at least some third-party tools like some + network monitors do not handle gracefully endpoints that use a common + CNAME across multiple transport-layer flows: they report an error + condition in which two separate endpoints are using the same CNAME. + Depending on the sophistication of the support staff, such erroneous + reports can lead to support issues. + + The aforementioned support issue can sometimes be avoided if each of + the subunits of a split component terminal is configured to use a + different CNAME, with the synchronization between the RTP streams + being indicated by some non-RTP signaling channel rather than using a + common CNAME sent in RTCP. This complicates the signaling, + especially in cases where there are multiple SSRCs in use with + complex synchronization requirements, as is the same in many current + telepresence systems. Unless one uses RTCP terminating topologies + such as Topo-RTCP-terminating-MCU, sessions involving more than one + video subunit with a common CNAME are close to unavoidable. + + The different RTP streams comprising a split terminal system can form + a single RTP session or they can form multiple RTP sessions, + depending on the visibility of their SSRC values in RTCP reports. If + the receiver of the RTP streams sent by the split terminal sends + reports relating to all of the RTP flows (i.e., to each SSRC) in each + RTCP report, then a single RTP session is formed. Alternatively, if + the receiver of the RTP streams sent by the split terminal does not + send cross-reports in RTCP, then the audio and video form separate + RTP sessions. + + For example, in Figure 20, B will send RTCP reports to each of the + subunits of A. If the RTCP packets that B sends to the audio subunit + of A include reports on the reception quality of the video as well as + the audio, and similarly if the RTCP packets that B sends to the + video subunit of A include reports on the reception quality of the + audio as well as video, then a single RTP session is formed. + However, if the RTCP packets B sends to the audio subunit of A only + report on the received audio, and the RTCP packets B sends to the + video subunit of A only report on the received video, then there are + two separate RTP sessions. + + Forming a single RTP session across the RTP streams sent by the + different subunits of a split terminal gives each subunit visibility + into reception quality of RTP streams sent by the other subunits. + + + + +Westerlund & Wenger Informational [Page 37] + +RFC 7667 RTP Topologies November 2015 + + + This information can help diagnose reception quality problems, but at + the cost of increased RTCP bandwidth use. + + RTP streams sent by the subunits of a split terminal need to use the + same CNAME in their RTCP packets if they are to be synchronized, + irrespective of whether a single RTP session is formed or not. + +3.11. Non-symmetric Mixer/Translators + + Shortcut name: Topo-Asymmetric + + It is theoretically possible to construct an MCU that is a mixer in + one direction and a translator in another. The main reason to + consider this would be to allow topologies similar to Figure 13, + where the mixer does not need to mix in the direction from B or D + towards the multicast domains with A and C. Instead, the RTP streams + from B and D are forwarded without changes. Avoiding this mixing + would save media processing resources that perform the mixing in + cases where it isn't needed. However, there would still be a need to + mix B's media towards D. Only in the direction B -> multicast domain + or D -> multicast domain would it be possible to work as a + translator. In all other directions, it would function as a mixer. + + The mixer/translator would still need to process and change the RTCP + before forwarding it in the directions of B or D to the multicast + domain. One issue is that A and C do not know about the mixed-media + stream the mixer sends to either B or D. Therefore, any reports + related to these streams must be removed. Also, receiver reports + related to A's and C's RTP streams would be missing. To avoid A and + C thinking that B and D aren't receiving A and C at all, the mixer + needs to insert locally generated reports reflecting the situation + for the streams from A and C into B's and D's sender reports. In the + opposite direction, the receiver reports from A and C about B's and + D's streams also need to be aggregated into the mixer's receiver + reports sent to B and D. Since B and D only have the mixer as source + for the stream, all RTCP from A and C must be suppressed by the + mixer. + + This topology is so problematic, and it is so easy to get the RTCP + processing wrong, that it is not recommended for implementation. + +3.12. Combining Topologies + + Topologies can be combined and linked to each other using mixers or + translators. However, care must be taken in handling the SSRC/CSRC + space. A mixer does not forward RTCP from sources in other domains, + but instead generates its own RTCP packets for each domain it mixes + into, including the necessary SDES information for both the CSRCs and + + + +Westerlund & Wenger Informational [Page 38] + +RFC 7667 RTP Topologies November 2015 + + + the SSRCs. Thus, in a mixed domain, the only SSRCs seen will be the + ones present in the domain, while there can be CSRCs from all the + domains connected together with a combination of mixers and + translators. The combined SSRC and CSRC space is common over any + translator or mixer. It is important to facilitate loop detection, + something that is likely to be even more important in combined + topologies due to the mixed behavior between the domains. Any + hybrid, like the Topo-Video-switch-MCU or Topo-Asymmetric, requires + considerable thought on how RTCP is dealt with. + +4. Topology Properties + + The topologies discussed in Section 3 have different properties. + This section describes these properties. Note that, even if a + certain property is supported within a particular topology concept, + the necessary functionality may be optional to implement. + +4.1. All-to-All Media Transmission + + To recapitulate, multicast, and in particular ASM, provides the + functionality that everyone may send to, or receive from, everyone + else within the session. SSM can provide a similar functionality by + having anyone intending to participate as a sender to send its media + to the SSM Distribution Source. The SSM Distribution Source forwards + the media to all receivers subscribed to the multicast group. Mesh, + MCUs, mixers, Selective Forwarding Middleboxes (SFMs), and + translators may all provide that functionality at least on some basic + level. However, there are some differences in which type of + reachability they provide. + + The topologies that come closest to emulating Any-Source IP + Multicast, with all-to-all transmission capabilities, are the + Transport Translator function called "relay" in Section 3.5, as well + as the Mesh with joint RTP sessions (Section 3.4). Media + Translators, Mesh with independent RTP Sessions, mixers, SFUs, and + the MCU variants do not provide a fully meshed forwarding on the + transport level; instead, they only allow limited forwarding of + content from the other session participants. + + The "all-to-all media transmission" requires that any media + transmitting endpoint considers the path to the least-capable + receiving endpoint. Otherwise, the media transmissions may overload + that path. Therefore, a sending endpoint needs to monitor the path + from itself to any of the receiving endpoints, to detect the + currently least-capable receiver and adapt its sending rate + accordingly. As multiple endpoints may send simultaneously, the + available resources may vary. RTCP's receiver reports help perform + this monitoring, at least on a medium time scale. + + + +Westerlund & Wenger Informational [Page 39] + +RFC 7667 RTP Topologies November 2015 + + + The resource consumption for performing all-to-all transmission + varies depending on the topology. Both ASM and SSM have the benefit + that only one copy of each packet traverses a particular link. Using + a relay causes the transmission of one copy of a packet per + endpoint-to-relay path and packet transmitted. However, in most + cases, the links carrying the multiple copies will be the ones close + to the relay (which can be assumed to be part of the network + infrastructure with good connectivity to the backbone) rather than + the endpoints (which may be behind slower access links). The Mesh + topologies causes N-1 streams of transmitted packets to traverse the + first-hop link from the endpoint, in a mesh with N endpoints. How + long the different paths are common is highly situation dependent. + + The transmission of RTCP by design adapts to any changes in the + number of participants due to the transmission algorithm, defined in + the RTP specification [RFC3550], and the extensions in AVPF [RFC4585] + (when applicable). That way, the resources utilized for RTCP stay + within the bounds configured for the session. + +4.2. Transport or Media Interoperability + + All translators, mixers, RTCP-terminating MCUs, and Mesh with + individual RTP sessions allow changing the media encoding or the + transport to other properties of the other domain, thereby providing + extended interoperability in cases where the endpoints lack a common + set of media codecs and/or transport protocols. Selective Forwarding + Middleboxes can adopt the transport and (at least) selectively + forward the encoded streams that match a receiving endpoint's + capability. It requires an additional translator to change the media + encoding if the encoded streams do not match the receiving endpoint's + capabilities. + +4.3. Per-Domain Bitrate Adaptation + + Endpoints are often connected to each other with a heterogeneous set + of paths. This makes congestion control in a Point-to-Multipoint set + problematic. In the ASM, SSM, Mesh with common RTP session, and + Transport Relay scenarios, each individual sending endpoint has to + adapt to the receiving endpoint behind the least-capable path, + yielding suboptimal quality for the endpoints behind the more capable + paths. This is no longer an issue when Media Translators, mixers, + SFMs, or MCUs are involved, as each endpoint only needs to adapt to + the slowest path within its own domain. The translator, mixer, SFM, + or MCU topologies all require their respective outgoing RTP streams + to adjust the bitrate, packet rate, etc., to adapt to the least- + capable path in each of the other domains. That way one can avoid + lowering the quality to the least-capable endpoint in all the domains + at the cost (complexity, delay, equipment) of the mixer, SFM, or + + + +Westerlund & Wenger Informational [Page 40] + +RFC 7667 RTP Topologies November 2015 + + + translator, and potentially the media sender (multicast/layered + encoding and sending the different representations). + +4.4. Aggregation of Media + + In the all-to-all media property mentioned above and provided by ASM, + SSM, Mesh with common RTP session, and relay, all simultaneous media + transmissions share the available bitrate. For endpoints with + limited reception capabilities, this may result in a situation where + even a minimal, acceptable media quality cannot be accomplished, + because multiple RTP streams need to share the same resources. One + solution to this problem is to use a mixer, or MCU, to aggregate the + multiple RTP streams into a single one, where the single RTP stream + takes up less resources in terms of bitrate. This aggregation can be + performed according to different methods. Mixing or selection are + two common methods. Selection is almost always possible and easy to + implement. Mixing requires resources in the mixer and may be + relatively easy and not impair the quality too badly (audio) or quite + difficult (video tiling, which is not only computationally complex + but also reduces the pixel count per stream, with corresponding loss + in perceptual quality). + +4.5. View of All Session Participants + + The RTP protocol includes functionality to identify the session + participants through the use of the SSRC and CSRC fields. In + addition, it is capable of carrying some further identity information + about these participants using the RTCP SDES. In topologies that + provide a full all-to-all functionality, i.e., ASM, Mesh with common + RTP session, and relay, a compliant RTP implementation offers the + functionality directly as specified in RTP. In topologies that do + not offer all-to-all communication, it is necessary that RTCP is + handled correctly in domain bridging functions. RTP includes + explicit specification text for translators and mixers, and for SFMs + the required functionality can be derived from that text. However, + the MCU described in Section 3.8 cannot offer the full functionality + for session participant identification through RTP means. The + topologies that create independent RTP sessions per endpoint or pair + of endpoints, like a Back-to-Back RTP session, MESH with independent + RTP sessions, and the RTCP terminating MCU (Section 3.9), with an + exception of SFM, do not support RTP-based identification of session + participants. In all those cases, other non-RTP-based mechanisms + need to be implemented if such knowledge is required or desirable. + When it comes to SFM, the SSRC namespace is not necessarily joint. + Instead, identification will require knowledge of SSRC/CSRC mappings + that the SFM performed; see Section 3.7. + + + + + +Westerlund & Wenger Informational [Page 41] + +RFC 7667 RTP Topologies November 2015 + + +4.6. Loop Detection + + In complex topologies with multiple interconnected domains, it is + possible to unintentionally form media loops. RTP and RTCP support + detecting such loops, as long as the SSRC and CSRC identities are + maintained and correctly set in forwarded packets. Loop detection + will work in ASM, SSM, Mesh with joint RTP session, and relay. It is + likely that loop detection works for the video-switching MCU, + Section 3.8, at least as long as it forwards the RTCP between the + endpoints. However, the Back-to-Back RTP sessions, Mesh with + independent RTP sessions, and SFMs will definitely break the loop + detection mechanism. + +4.7. Consistency between Header Extensions and RTCP + + Some RTP header extensions have relevance not only end to end but + also hop to hop, meaning at least some of the middleboxes in the path + are aware of their potential presence through signaling, intercept + and interpret such header extensions, and potentially also rewrite or + generate them. Modern header extensions generally follow "A General + Mechanism for RTP Header Extensions" [RFC5285], which allows for all + of the above. Examples for such header extensions include the Media + ID (MID) in [SDP-BUNDLE]. At the time of writing, there was also a + proposal for how to include some SDES into an RTP header extension + [RTCP-SDES]. + + When such header extensions are in use, any middlebox that + understands it must ensure consistency between the extensions it sees + and/or generates and the RTCP it receives and generates. For + example, the MID of the bundle is sent in an RTP header extension and + also in an RTCP SDES message. This apparent redundancy was + introduced as unaware middleboxes may choose to discard RTP header + extensions. Obviously, inconsistency between the MID sent in the RTP + header extension and in the RTCP SDES message could lead to + undesirable results, and, therefore, consistency is needed. + Middleboxes unaware of the nature of a header extension, as specified + in [RFC5285], are free to forward or discard header extensions. + +5. Comparison of Topologies + + The table below attempts to summarize the properties of the different + topologies. The legend to the topology abbreviations are: + Topo-Point-to-Point (PtP), Topo-ASM (ASM), Topo-SSM (SSM), Topo-Trn- + Translator (TT), Topo-Media-Translator (including Transport + Translator) (MT), Topo-Mesh with joint session (MJS), Topo-Mesh with + individual sessions (MIS), Topo-Mixer (Mix), Topo-Asymmetric (ASY), + Topo-Video-switch-MCU (VSM), Topo-RTCP-terminating-MCU (RTM), and + Selective Forwarding Middlebox (SFM). In the table below, Y + + + +Westerlund & Wenger Informational [Page 42] + +RFC 7667 RTP Topologies November 2015 + + + indicates Yes or full support, N indicates No support, (Y) indicates + partial support, and N/A indicates not applicable. + + Property PtP ASM SSM TT MT MJS MIS Mix ASY VSM RTM SFM + --------------------------------------------------------------------- + All-to-All Media N Y (Y) Y Y Y (Y) (Y) (Y) (Y) (Y) (Y) + Interoperability N/A N N Y Y Y Y Y Y N Y Y + Per-Domain Adaptation N/A N N N Y N Y Y Y N Y Y + Aggregation of Media N N N N N N N Y (Y) Y Y N + Full Session View Y Y Y Y Y Y N Y Y (Y) N Y + Loop Detection Y Y Y Y Y Y N Y Y (Y) N N + + Please note that the Media Translator also includes the Transport + Translator functionality. + +6. Security Considerations + + The use of mixers, SFMs, and translators has impact on security and + the security functions used. The primary issue is that mixers, SFMs, + and translators modify packets, thus preventing the use of integrity + and source authentication, unless they are trusted devices that take + part in the security context, e.g., the device can send Secure Real- + time Transport Protocol (SRTP) and Secure Real-time Transport Control + Protocol (SRTCP) [RFC3711] packets to endpoints in the Communication + Session. If encryption is employed, the Media Translator, SFM, and + mixer need to be able to decrypt the media to perform its function. + A Transport Translator may be used without access to the encrypted + payload in cases where it translates parts that are not included in + the encryption and integrity protection, for example, IP address and + UDP port numbers in a media stream using SRTP [RFC3711]. However, in + general, the translator, SFM, or mixer needs to be part of the + signaling context and get the necessary security associations (e.g., + SRTP crypto contexts) established with its RTP session participants. + + Including the mixer, SFM, and translator in the security context + allows the entity, if subverted or misbehaving, to perform a number + of very serious attacks as it has full access. It can perform all + the attacks possible (see RFC 3550 and any applicable profiles) as if + the media session were not protected at all, while giving the + impression to the human session participants that they are protected. + + Transport Translators have no interactions with cryptography that + work above the transport layer, such as SRTP, since that sort of + translator leaves the RTP header and payload unaltered. Media + Translators, on the other hand, have strong interactions with + cryptography, since they alter the RTP payload. A Media Translator + in a session that uses cryptographic protection needs to perform + cryptographic processing to both inbound and outbound packets. + + + +Westerlund & Wenger Informational [Page 43] + +RFC 7667 RTP Topologies November 2015 + + + A Media Translator may need to use different cryptographic keys for + the inbound and outbound processing. For SRTP, different keys are + required, because an RFC 3550 Media Translator leaves the SSRC + unchanged during its packet processing, and SRTP key sharing is only + allowed when distinct SSRCs can be used to protect distinct packet + streams. + + When the Media Translator uses different keys to process inbound and + outbound packets, each session participant needs to be provided with + the appropriate key, depending on whether they are listening to the + translator or the original source. (Note that there is an + architectural difference between RTP media translation, in which + participants can rely on the RTP payload type field of a packet to + determine appropriate processing, and cryptographically protected + media translation, in which participants must use information that is + not carried in the packet.) + + When using security mechanisms with translators, SFMs, and mixers, it + is possible that the translator, SFM, or mixer could create different + security associations for the different domains they are working in. + Doing so has some implications: + + First, it might weaken security if the mixer/translator accepts a + weaker algorithm or key in one domain rather than in another. + Therefore, care should be taken that appropriately strong security + parameters are negotiated in all domains. In many cases, + "appropriate" translates to "similar" strength. If a key-management + system does allow the negotiation of security parameters resulting in + a different strength of the security, then this system should notify + the participants in the other domains about this. + + Second, the number of crypto contexts (keys and security-related + state) needed (for example, in SRTP [RFC3711]) may vary between + mixers, SFMs, and translators. A mixer normally needs to represent + only a single SSRC per domain and therefore needs to create only one + security association (SRTP crypto context) per domain. In contrast, + a translator needs one security association per participant it + translates towards, in the opposite domain. Considering Figure 11, + the translator needs two security associations towards the multicast + domain: one for B and one for D. It may be forced to maintain a set + of totally independent security associations between itself and B and + D, respectively, so as to avoid two-time pad occurrences. These + contexts must also be capable of handling all the sources present in + the other domains. Hence, using completely independent security + associations (for certain keying mechanisms) may force a translator + to handle N*DM keys and related state, where N is the total number of + SSRCs used over all domains and DM is the total number of domains. + + + + +Westerlund & Wenger Informational [Page 44] + +RFC 7667 RTP Topologies November 2015 + + + The ASM, SSM, Relay, and Mesh (with common RTP session) topologies + each have multiple endpoints that require shared knowledge about the + different crypto contexts for the endpoints. These multiparty + topologies have special requirements on the key management as well as + the security functions. Specifically, source authentication in these + environments has special requirements. + + There exist a number of different mechanisms to provide keys to the + different participants. One example is the choice between group keys + and unique keys per SSRC. The appropriate keying model is impacted + by the topologies one intends to use. The final security properties + are dependent on both the topologies in use and the keying + mechanisms' properties and need to be considered by the application. + Exactly which mechanisms are used is outside of the scope of this + document. Please review RTP Security Options [RFC7201] to get a + better understanding of most of the available options. + +7. References + +7.1. Normative References + + [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. + Jacobson, "RTP: A Transport Protocol for Real-Time + Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, + July 2003, <http://www.rfc-editor.org/info/rfc3550>. + + [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, + "Extended RTP Profile for Real-time Transport Control + Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, + DOI 10.17487/RFC4585, July 2006, + <http://www.rfc-editor.org/info/rfc4585>. + + [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and + B. Burman, Ed., "A Taxonomy of Grouping Semantics and + Mechanisms for Real-Time Transport Protocol (RTP) + Sources", RFC 7656, November 2015, + <http://www.rfc-editor.org/info/rfc7656>. + +7.2. Informative References + + [MULTI-STREAM-OPT] + Lennox, J., Westerlund, M., Wu, W., and C. Perkins, + "Sending Multiple Media Streams in a Single RTP Session: + Grouping RTCP Reception Statistics and Other Feedback", + Work in Progress, draft-ietf-avtcore-rtp-multi-stream- + optimisation-08, October 2015. + + + + + +Westerlund & Wenger Informational [Page 45] + +RFC 7667 RTP Topologies November 2015 + + + [RFC1112] Deering, S., "Host extensions for IP multicasting", STD 5, + RFC 1112, DOI 10.17487/RFC1112, August 1989, + <http://www.rfc-editor.org/info/rfc1112>. + + [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network + Address Translator (Traditional NAT)", RFC 3022, + DOI 10.17487/RFC3022, January 2001, + <http://www.rfc-editor.org/info/rfc3022>. + + [RFC3569] Bhattacharyya, S., Ed., "An Overview of Source-Specific + Multicast (SSM)", RFC 3569, DOI 10.17487/RFC3569, July + 2003, <http://www.rfc-editor.org/info/rfc3569>. + + [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. + Norrman, "The Secure Real-time Transport Protocol (SRTP)", + RFC 3711, DOI 10.17487/RFC3711, March 2004, + <http://www.rfc-editor.org/info/rfc3711>. + + [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A + Session Initiation Protocol (SIP) Event Package for + Conference State", RFC 4575, DOI 10.17487/RFC4575, August + 2006, <http://www.rfc-editor.org/info/rfc4575>. + + [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for + IP", RFC 4607, DOI 10.17487/RFC4607, August 2006, + <http://www.rfc-editor.org/info/rfc4607>. + + [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, + "Codec Control Messages in the RTP Audio-Visual Profile + with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, + February 2008, <http://www.rfc-editor.org/info/rfc5104>. + + [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, + DOI 10.17487/RFC5117, January 2008, + <http://www.rfc-editor.org/info/rfc5117>. + + [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP + Header Extensions", RFC 5285, DOI 10.17487/RFC5285, July + 2008, <http://www.rfc-editor.org/info/rfc5285>. + + [RFC5760] Ott, J., Chesterfield, J., and E. Schooler, "RTP Control + Protocol (RTCP) Extensions for Single-Source Multicast + Sessions with Unicast Feedback", RFC 5760, + DOI 10.17487/RFC5760, February 2010, + <http://www.rfc-editor.org/info/rfc5760>. + + + + + + +Westerlund & Wenger Informational [Page 46] + +RFC 7667 RTP Topologies November 2015 + + + [RFC5766] Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using + Relays around NAT (TURN): Relay Extensions to Session + Traversal Utilities for NAT (STUN)", RFC 5766, + DOI 10.17487/RFC5766, April 2010, + <http://www.rfc-editor.org/info/rfc5766>. + + [RFC6285] Ver Steeg, B., Begen, A., Van Caenegem, T., and Z. Vax, + "Unicast-Based Rapid Acquisition of Multicast RTP + Sessions", RFC 6285, DOI 10.17487/RFC6285, June 2011, + <http://www.rfc-editor.org/info/rfc6285>. + + [RFC6465] Ivov, E., Ed., Marocco, E., Ed., and J. Lennox, "A Real- + time Transport Protocol (RTP) Header Extension for Mixer- + to-Client Audio Level Indication", RFC 6465, + DOI 10.17487/RFC6465, December 2011, + <http://www.rfc-editor.org/info/rfc6465>. + + [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP + Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, + <http://www.rfc-editor.org/info/rfc7201>. + + [RTCP-SDES] + Westerlund, M., Burman, B., Even, R., and M. Zanaty, "RTP + Header Extension for RTCP Source Description Items", Work + in Progress, draft-ietf-avtext-sdes-hdr-ext-02, July 2015. + + [SDP-BUNDLE] + Holmberg, C., Alvestrand, H., and C. Jennings, + "Negotiating Media Multiplexing Using the Session + Description Protocol (SDP)", Work in Progress, + draft-ietf-mmusic-sdp-bundle-negotiation-23, July 2015. + + + + + + + + + + + + + + + + + + + + +Westerlund & Wenger Informational [Page 47] + +RFC 7667 RTP Topologies November 2015 + + +Acknowledgements + + The authors would like to thank Mark Baugher, Bo Burman, Ben + Campbell, Umesh Chandra, Alex Eleftheriadis, Roni Even, Ladan Gharai, + Geoff Hunt, Suresh Krishnan, Keith Lantz, Jonathan Lennox, Scarlet + Liuyan, Suhas Nandakumar, Colin Perkins, and Dan Wing for their help + in reviewing and improving this document. + +Authors' Addresses + + Magnus Westerlund + Ericsson + Farogatan 2 + SE-164 80 Kista + Sweden + + Phone: +46 10 714 82 87 + Email: magnus.westerlund@ericsson.com + + + Stephan Wenger + Vidyo + 433 Hackensack Ave + Hackensack, NJ 07601 + United States + + Email: stewe@stewe.org + + + + + + + + + + + + + + + + + + + + + + + + +Westerlund & Wenger Informational [Page 48] + |