From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc5117.txt | 1179 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1179 insertions(+) create mode 100644 doc/rfc/rfc5117.txt (limited to 'doc/rfc/rfc5117.txt') diff --git a/doc/rfc/rfc5117.txt b/doc/rfc/rfc5117.txt new file mode 100644 index 0000000..c745b4f --- /dev/null +++ b/doc/rfc/rfc5117.txt @@ -0,0 +1,1179 @@ + + + + + + +Network Working Group M. Westerlund +Request for Comments: 5117 Ericsson +Category: Informational S. Wenger + Nokia + January 2008 + + + RTP Topologies + +Status of This Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Abstract + + This document discusses multi-endpoint topologies used in Real-time + Transport Protocol (RTP)-based environments. In particular, + centralized topologies commonly employed in the video conferencing + industry are mapped to the RTP terminology. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Westerlund & Wenger Informational [Page 1] + +RFC 5117 RTP Topologies January 2008 + + +Table of Contents + + 1. Introduction ....................................................2 + 2. Definitions .....................................................3 + 2.1. Glossary ...................................................3 + 2.2. Indicating Requirement Levels ..............................3 + 3. Topologies ......................................................3 + 3.1. Point to Point .............................................4 + 3.2. Point to Multipoint Using Multicast ........................5 + 3.3. Point to Multipoint Using the RFC 3550 Translator ..........6 + 3.4. Point to Multipoint Using the RFC 3550 Mixer Model .........9 + 3.5. Point to Multipoint Using Video Switching MCUs ............11 + 3.6. Point to Multipoint Using RTCP-Terminating MCU ............12 + 3.7. Non-Symmetric Mixer/Translators ...........................13 + 3.8. Combining Topologies ......................................14 + 4. Comparing Topologies ...........................................15 + 4.1. Topology Properties .......................................15 + 4.1.1. All to All Media Transmission ......................15 + 4.1.2. Transport or Media Interoperability ................16 + 4.1.3. Per Domain Bit-Rate Adaptation .....................16 + 4.1.4. Aggregation of Media ...............................16 + 4.1.5. View of All Session Participants ...................16 + 4.1.6. Loop Detection .....................................17 + 4.2. Comparison of Topologies ..................................17 + 5. Security Considerations ........................................17 + 6. Acknowledgements ...............................................19 + 7. References .....................................................19 + 7.1. Normative References ......................................19 + 7.2. Informative References ....................................20 + +1. Introduction + + When working on the Codec Control Messages [CCM], considerable + confusion was noticed in the community with respect to terms such as + Multipoint Control Unit (MCU), Mixer, and Translator, and their usage + in various topologies. This document tries to address this confusion + by providing a common information basis for future discussion and + specification work. It attempts to clarify and explain sections of + the Real-time Transport Protocol (RTP) spec [RFC3550] in an informal + way. It is not intended to update or change what is normatively + specified within RFC 3550. + + When the Audio-Visual Profile with Feedback (AVPF) [RFC4585] was + developed the main emphasis lay in the efficient support of point to + point and small multipoint scenarios without centralized multipoint + control. However, in practice, many small multipoint conferences + operate utilizing devices known as Multipoint Control Units (MCUs). + MCUs may implement Mixer or Translator (in RTP [RFC3550] terminology) + + + +Westerlund & Wenger Informational [Page 2] + +RFC 5117 RTP Topologies January 2008 + + + functionality and signalling support. They may also contain + additional application functionality. This document focuses on the + media transport aspects of the MCU that can be realized using RTP, as + discussed below. Further considered are the properties of Mixers and + Translators, and how some types of deployed MCUs deviate from these + properties. + +2. Definitions + +2.1. Glossary + + ASM - Any Source Multicast + AVPF - The Extended RTP Profile for RTCP-based Feedback + CSRC - Contributing Source + Link - The data transport to the next IP hop + MCU - Multipoint Control Unit + Path - The concatenation of multiple links, resulting in an + end-to-end data transfer. + PtM - Point to Multipoint + PtP - Point to Point + SSM - Source-Specific Multicast + SSRC - Synchronization Source + +2.2. Indicating Requirement Levels + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC 2119 [RFC2119]. + + The RFC 2119 language is used in this document to highlight those + important requirements and/or resulting solutions that are necessary + to address the issues raised in this document. + +3. Topologies + + This subsection defines several basic topologies that are relevant + for codec control. The first four relate to the RTP system model + utilizing multicast and/or unicast, as envisioned in RFC 3550. The + last two topologies, in contrast, describe the deployed system models + as used in many H.323 [H323] video conferences, where both the media + streams and the RTP Control Protocol (RTCP) control traffic terminate + at the MCU. In these two cases, the media sender does not receive + the (unmodified or Translator-modified) Receiver Reports from all + sources (which it needs to interpret based on Synchronization Source + (SSRC) values) and therefore has no full information about all the + endpoint's situation as reported in RTCP Receiver Reports (RRs). + More topologies can be constructed by combining any of the models; + see Section 3.8. + + + +Westerlund & Wenger Informational [Page 3] + +RFC 5117 RTP Topologies January 2008 + + + The topologies may be referenced in other documents by a shortcut + name, indicated by the prefix "Topo-". + + For each of the RTP-defined topologies, we discuss how RTP, RTCP, and + the carried media are handled. With respect to RTCP, we also + introduce the handling of RTCP feedback messages as defined in + [RFC4585] and [CCM]. Any important differences between the two will + be illuminated in the discussion. + +3.1. Point to Point + + Shortcut name: Topo-Point-to-Point + + The Point to Point (PtP) topology (Figure 1) consists of two + endpoints, communicating using unicast. Both RTP and RTCP traffic + are conveyed endpoint-to-endpoint, using unicast traffic only (even + if, in exotic cases, this unicast traffic happens to be conveyed over + an IP-multicast address). + + +---+ +---+ + | A |<------->| B | + +---+ +---+ + + Figure 1 - Point to Point + + The main property of this topology is that A sends to B, and only B, + while B sends to A, and only A. This avoids all complexities of + handling multiple endpoints and combining the requirements from them. + Note that an endpoint can still use multiple RTP Synchronization + Sources (SSRCs) in an RTP session. + + RTCP feedback messages for the indicated SSRCs are communicated + directly between the endpoints. Therefore, this topology poses + minimal (if any) issues for any feedback messages. + + + + + + + + + + + + + + + + + +Westerlund & Wenger Informational [Page 4] + +RFC 5117 RTP Topologies January 2008 + + +3.2. Point to Multipoint Using Multicast + + Shortcut name: Topo-Multicast + + +-----+ + +---+ / \ +---+ + | A |----/ \---| B | + +---+ / Multi- \ +---+ + + Cast + + +---+ \ Network / +---+ + | C |----\ /---| D | + +---+ \ / +---+ + +-----+ + + Figure 2 - Point to Multipoint Using Multicast + + Point to Multipoint (PtM) is defined here as using a multicast + topology as a transmission model, in which traffic from any + participant reaches all the other participants, except for cases such + as: + + o packet loss, or + + o when a participant does not wish to receive the traffic for a + specific multicast group and therefore has not subscribed to the + IP-multicast group in question. This is for the cases where a + multi-media session is distributed using two or more multicast + groups. + + In the above context, "traffic" encompasses both RTP and RTCP + traffic. The number of participants can vary between one and many, + as RTP and RTCP scale to very large multicast groups (the theoretical + limit of the number of participants in a single RTP session is + approximately two billion). The above can be realized using Any + Source Multicast (ASM). Source-Specific Multicast (SSM) may be also + be used with RTP. However, then only the designated source may reach + all receivers. Please review [RTCP-SSM] for how RTCP can be made to + work in combination with SSM. + + This document is primarily interested in that subset of multicast + sessions wherein the number of participants in the multicast group is + so low that it allows the participants to use early or immediate + feedback, as defined in AVPF [RFC4585]. This document refers to + those groups as "small multicast groups". + + RTCP feedback messages in multicast will, like media, reach everyone + (subject to packet losses and multicast group subscription). + Therefore, the feedback suppression mechanism discussed in [RFC4585] + + + +Westerlund & Wenger Informational [Page 5] + +RFC 5117 RTP Topologies January 2008 + + + is required. Each individual node needs to process every feedback + message it receives to determine if it is affected or if the feedback + message applies only to some other participant. + +3.3. Point to Multipoint Using the RFC 3550 Translator + + Shortcut name: Topo-Translator + + Two main categories of Translators can be distinguished: + + Transport Translators (Topo-Trn-Translator) do not modify the media + stream itself, but are concerned with transport parameters. + Transport parameters, in the sense of this section, comprise the + transport addresses (to bridge different domains) and the media + packetization to allow other transport protocols to be interconnected + to a session (in gateways). Of the transport Translators, this memo + is primarily interested in those that use RTP on both sides, and this + is assumed henceforth. Translators that bridge between different + protocol worlds need to be concerned about the mapping of the + SSRC/CSRC (Contributing Source) concept to the non-RTP protocol. + When designing a Translator to a non-RTP-based media transport, one + crucial factor lies in how to handle different sources and their + identities. This problem space is not discussed henceforth. + + Media Translators (Topo-Media-Translator), in contrast, modify the + media stream itself. This process is commonly known as transcoding. + The modification of the media stream can be as small as removing + parts of the stream, and it can go all the way to a full transcoding + (down to the sample level or equivalent) utilizing a different media + codec. Media Translators are commonly used to connect entities + without a common interoperability point. + + Stand-alone Media Translators are rare. Most commonly, a combination + of Transport and Media Translators are used to translate both the + media stream and the transport aspects of a stream between two + transport domains (or clouds). + + Both Translator types share common attributes that separate them from + Mixers. For each media stream that the Translator receives, it + generates an individual stream in the other domain. A Translator + always keeps the SSRC for a stream across the translation, where a + Mixer can select a media stream, or send them out mixed, always under + its own SSRC, using the CSRC field to indicate the source(s) of the + content. + + + + + + + +Westerlund & Wenger Informational [Page 6] + +RFC 5117 RTP Topologies January 2008 + + + The RTCP translation process can be trivial, for example, when + Transport Translators just need to adjust IP addresses, or they can + be quite complex as in the case of media Translators. See Section + 7.2 of [RFC3550]. + + +-----+ + +---+ / \ +------------+ +---+ + | A |<---/ \ | |<---->| B | + +---+ / Multi- \ | | +---+ + + Cast +->| Translator | + +---+ \ Network / | | +---+ + | C |<---\ / | |<---->| D | + +---+ \ / +------------+ +---+ + +-----+ + + Figure 3 - Point to Multipoint Using a Translator + + Figure 3 depicts an example of a Transport Translator performing at + least IP address translation. It allows the (non-multicast-capable) + participants B and D to take part in a multicast session by having + the Translator forward their unicast traffic to the multicast + addresses in use, and vice versa. It must also forward B's traffic + to D, and vice versa, to provide each of B and D with a complete view + of the session. + + If B were behind a limited network path, the Translator may perform + media transcoding to allow the traffic received from the other + participants to reach B without overloading the path. + + When, in the example depicted in Figure 3, the Translator acts only + as a Transport Translator, then the RTCP traffic can simply be + forwarded, similar to the media traffic. However, when media + translation occurs, the Translator's task becomes substantially more + complex, even with respect to the RTCP traffic. In this case, the + Translator needs to rewrite B's RTCP Receiver Report before + forwarding them to D and the multicast network. The rewriting is + needed as the stream received by B is not the same stream as the + other participants receive. For example, the number of packets + transmitted to B may be lower than what D receives, due to the + different media format. Therefore, if the Receiver Reports were + forwarded without changes, the extended highest sequence number would + indicate that B were substantially behind in reception, while it most + likely it would not be. Therefore, the Translator must translate + that number to a corresponding sequence number for the stream the + Translator received. Similar arguments can be made for most other + fields in the RTCP Receiver Reports. + + + + + +Westerlund & Wenger Informational [Page 7] + +RFC 5117 RTP Topologies January 2008 + + + As specified in Section 7.1 of [RFC3550], the SSRC space is common + for all participants in the session, independent of on which side + they are of the Translator. Therefore, it is the responsibility of + the participants to run SSRC collision detection, and the SSRC is a + field the Translator should not change. + + +---+ +------------+ +---+ + | A |<---->| |<---->| B | + +---+ | | +---+ + | Translator | + +---+ | | +---+ + | C |<---->| |<---->| D | + +---+ +------------+ +---+ + + Figure 4 - RTP Translator (Relay) with Only Unicast Paths + + Another Translator scenario is depicted in Figure 4. Herein, the + Translator connects multiple users of a conference through unicast. + This can be implemented using a very simple transport Translator, + which in this document is called a relay. The relay forwards all + traffic it receives, both RTP and RTCP, to all other participants. + In doing so, a multicast network is emulated without relying on a + multicast-capable network infrastructure. + + A Translator normally does not use an SSRC of its own, and is not + visible as an active participant in the session. One exception can + be conceived when a Translator acts as a quality monitor that sends + RTCP reports and therefore is required to have an SSRC. Another + example is the case when a Translator is prepared to use RTCP + feedback messages. This may, for example, occur when it suffers + packet loss of important video packets and wants to trigger repair by + the media sender, by sending feedback messages. To be able to do + this it needs to have a unique SSRC. + + A media Translator may in some cases act on behalf of the "real" + source and respond to RTCP feedback messages. This may occur, for + example, when a receiver requests a bandwidth reduction, and the + media Translator has not detected any congestion or other reasons for + bandwidth reduction between the media source and itself. In that + case, it is sensible that the media Translator reacts to the codec + control messages itself, for example, by transcoding to a lower media + rate. If it were not reacting, the media quality in the media + sender's domain may suffer, as a result of the media sender adjusting + its media rate (and quality) according to the needs of the slow + past-Translator endpoint, at the expense of the rate and quality of + all other session participants. + + + + + +Westerlund & Wenger Informational [Page 8] + +RFC 5117 RTP Topologies January 2008 + + + In general, a Translator implementation should consider which RTCP + feedback messages or codec-control messages it needs to understand in + relation to the functionality of the Translator itself. This is + completely in line with the requirement to also translate RTCP + messages between the domains. + +3.4. Point to Multipoint Using the RFC 3550 Mixer Model + + Shortcut name: Topo-Mixer + + A Mixer is a middlebox that aggregates multiple RTP streams, which + are part of a session, by mixing the media data and generating a new + RTP stream. One common application for a Mixer is to allow a + participant to receive a session with a reduced amount of resources. + + +-----+ + +---+ / \ +-----------+ +---+ + | A |<---/ \ | |<---->| B | + +---+ / Multi- \ | | +---+ + + Cast +->| Mixer | + +---+ \ Network / | | +---+ + | C |<---\ / | |<---->| D | + +---+ \ / +-----------+ +---+ + +-----+ + + Figure 5 - Point to Multipoint Using the RFC 3550 Mixer Model + + A Mixer can be viewed as a device terminating the media streams + received from other session participants. Using the media data from + the received media streams, a Mixer generates a media stream that is + sent to the session participant. + + The content that the Mixer provides is the mixed aggregate of what + the Mixer receives over the PtP or PtM paths, which are part of the + same conference session. + + The Mixer is the content source, as it mixes the content (often in + the uncompressed domain) and then encodes it for transmission to a + participant. The CSRC Count (CC) and CSRC fields in the RTP header + are used to indicate the contributors of to the newly generated + stream. The SSRCs of the to-be-mixed streams on the Mixer input + appear as the CSRCs at the Mixer output. That output stream uses a + unique SSRC that identifies the Mixer's stream. The CSRC are + forwarded between the two domains to allow for loop detection and + identification of sources that are part of the global session. Note + that Section 7.1 of RFC 3550 requires the SSRC space to be shared + between domains for these reasons. + + + + +Westerlund & Wenger Informational [Page 9] + +RFC 5117 RTP Topologies January 2008 + + + The Mixer is responsible for generating RTCP packets in accordance + with its role. It is a receiver and should therefore send reception + reports for the media streams it receives. In its role as a media + sender, it should also generate Sender Reports for those media + streams sent. As specified in Section 7.3 of RFC 3550, a Mixer must + not forward RTCP unaltered between the two domains. + + The Mixer depicted in Figure 5 is involved in three domains that need + to be separated: the multicast network, participant B, and + participant D. The Mixer produces different mixed streams to B and + D, as the one to B may contain content received from D, and vice + versa. However, the Mixer only needs one SSRC in each domain that is + the receiving entity and transmitter of mixed content. + + In the multicast domain, a Mixer still needs to provide a mixed view + of the other domains. This makes the Mixer simpler to implement and + avoids any issues with advanced RTCP handling or loop detection, + which would be problematic if the Mixer were providing non-symmetric + behavior. Please see Section 3.7 for more discussion on this topic. + + A Mixer is responsible for receiving RTCP feedback messages and + handling them appropriately. The definition of "appropriate" depends + on the message itself and the context. In some cases, the reception + of a codec-control message may result in the generation and + transmission of RTCP feedback messages by the Mixer to the + participants in the other domain. In other cases, a message is + handled by the Mixer itself and therefore not forwarded to any other + domain. + + When replacing the multicast network in Figure 5 (to the left of the + Mixer) with individual unicast paths as depicted in Figure 6, the + Mixer model is very similar to the one discussed in Section 3.6 + below. Please see the discussion in Section 3.6 about the + differences between these two models. + + +---+ +------------+ +---+ + | A |<---->| |<---->| B | + +---+ | | +---+ + | Mixer | + +---+ | | +---+ + | C |<---->| |<---->| D | + +---+ +------------+ +---+ + + Figure 6 - RTP Mixer with Only Unicast Paths + + + + + + + +Westerlund & Wenger Informational [Page 10] + +RFC 5117 RTP Topologies January 2008 + + +3.5. Point to Multipoint Using Video Switching MCUs + + Shortcut name: Topo-Video-switch-MCU + + +---+ +------------+ +---+ + | A |------| Multipoint |------| B | + +---+ | Control | +---+ + | Unit | + +---+ | (MCU) | +---+ + | C |------| |------| D | + +---+ +------------+ +---+ + + Figure 7 - Point to Multipoint Using a Video Switching MCU + + This PtM topology is still deployed today, although the + RTCP-terminating MCUs, as discussed in the next section, are perhaps + more common. This topology, as well as the following one, reflect + today's lack of wide availability of IP multicast technologies, as + well as the simplicity of content switching when compared to content + mixing. The technology is commonly implemented in what is known as + "Video Switching MCUs". + + A video switching MCU forwards to a participant a single media + stream, selected from the available streams. The criteria for + selection are often based on voice activity in the audio-visual + conference, but other conference management mechanisms (like + presentation mode or explicit floor control) are known to exist as + well. + + The video switching MCU may also perform media translation to modify + the content in bit-rate, encoding, or resolution. However, it still + may indicate the original sender of the content through the SSRC. In + this case, the values of the CC and CSRC fields are retained. + + If not terminating RTP, the RTCP Sender Reports are forwarded for the + currently selected sender. All RTCP Receiver Reports are freely + forwarded between the participants. In addition, the MCU may also + originate RTCP control traffic in order to control the session and/or + report on status from its viewpoint. + + The video switching MCU has most of the attributes of a Translator. + However, its stream selection is a mixing behavior. This behavior + has some RTP and RTCP issues associated with it. The suppression of + all but one media stream results in most participants seeing only a + subset of the sent media streams at any given time, often a single + stream per conference. Therefore, RTCP Receiver Reports only report + on these streams. Consequently, the media senders that are not + currently forwarded receive a view of the session that indicates + + + +Westerlund & Wenger Informational [Page 11] + +RFC 5117 RTP Topologies January 2008 + + + their media streams disappear somewhere en route. This makes the use + of RTCP for congestion control, or any type of quality reporting, + very problematic. + + To avoid the aforementioned issues, the MCU needs to implement two + features. First, it needs to act as a Mixer (see Section 3.4) and + forward the selected media stream under its own SSRC and with the + appropriate CSRC values. Second, the MCU needs to modify the RTCP + RRs it forwards between the domains. As a result, it is RECOMMENDED + that one implement a centralized video switching conference using a + Mixer according to RFC 3550, instead of the shortcut implementation + described here. + +3.6. Point to Multipoint Using RTCP-Terminating MCU + + Shortcut name: Topo-RTCP-terminating-MCU + + +---+ +------------+ +---+ + | A |<---->| Multipoint |<---->| B | + +---+ | Control | +---+ + | Unit | + +---+ | (MCU) | +---+ + | C |<---->| |<---->| D | + +---+ +------------+ +---+ + + Figure 8 - Point to Multipoint Using Content Modifying MCUs + + In this PtM scenario, each participant runs an RTP point-to-point + session between itself and the MCU. This is a very commonly deployed + topology in multipoint video conferencing. The content that the MCU + provides to each participant is either: + + a) a selection of the content received from the other participants, + or + + b) the mixed aggregate of what the MCU receives from the other PtP + paths, which are part of the same conference session. + + In case a), the MCU may modify the content in bit-rate, encoding, or + resolution. No explicit RTP mechanism is used to establish the + relationship between the original media sender and the version the + MCU sends. In other words, the outgoing sessions typically use a + different SSRC, and may well use a different payload type (PT), even + if this different PT happens to be mapped to the same media type. + This is a result of the individually negotiated session for each + participant. + + + + + +Westerlund & Wenger Informational [Page 12] + +RFC 5117 RTP Topologies January 2008 + + + In case b), the MCU is the content source as it mixes the content and + then encodes it for transmission to a participant. According to RTP + [RFC3550], the SSRC of the contributors are to be signalled using the + CSRC/CC mechanism. In practice, today, most deployed MCUs do not + implement this feature. Instead, the identification of the + participants whose content is included in the Mixer's output is not + indicated through any explicit RTP mechanism. That is, most deployed + MCUs set the CSRC Count (CC) field in the RTP header to zero, thereby + indicating no available CSRC information, even if they could identify + the content sources as suggested in RTP. + + The main feature that sets this topology apart from what RFC 3550 + describes is the breaking of the common RTP session across the + centralized device, such as the MCU. This results in the loss of + explicit RTP-level indication of all participants. If one were using + the mechanisms available in RTP and RTCP to signal this explicitly, + the topology would follow the approach of an RTP Mixer. The lack of + explicit indication has at least the following potential problems: + + 1) Loop detection cannot be performed on the RTP level. When + carelessly connecting two misconfigured MCUs, a loop could be + generated. + + 2) There is no information about active media senders available in + the RTP packet. As this information is missing, receivers cannot + use it. It also deprives the client of information related to + currently active senders in a machine-usable way, thus preventing + clients from indicating currently active speakers in user + interfaces, etc. + + Note that deployed MCUs (and endpoints) rely on signalling layer + mechanisms for the identification of the contributing sources, for + example, a SIP conferencing package [RFC4575]. This alleviates, to + some extent, the aforementioned issues resulting from ignoring RTP's + CSRC mechanism. + + As a result of the shortcomings of this topology, it is RECOMMENDED + to instead implement the Mixer concept as specified by RFC 3550. + +3.7. Non-Symmetric Mixer/Translators + + Shortcut name: Topo-Asymmetric + + It is theoretically possible to construct an MCU that is a Mixer in + one direction and a Translator in another. The main reason to + consider this would be to allow topologies similar to Figure 5, where + the Mixer does not need to mix in the direction from B or D towards + the multicast domains with A and C. Instead, the media streams from + + + +Westerlund & Wenger Informational [Page 13] + +RFC 5117 RTP Topologies January 2008 + + + B and D are forwarded without changes. Avoiding this mixing would + save media processing resources that perform the mixing in cases + where it isn't needed. However, there would still be a need to mix + B's stream towards D. Only in the direction B -> multicast domain or + D -> multicast domain would it be possible to work as a Translator. + In all other directions, it would function as a Mixer. + + The Mixer/Translator would still need to process and change the RTCP + before forwarding it in the directions of B or D to the multicast + domain. One issue is that A and C do not know about the mixed-media + stream the Mixer sends to either B or D. Thus, any reports related + to these streams must be removed. Also, receiver reports related to + A and C's media stream would be missing. To avoid A and C thinking + that B and D aren't receiving A and C at all, the Mixer needs to + insert its Receiver Reports for the streams from A and C into B and + D's Sender Reports. In the opposite direction, the Receiver Reports + from A and C about B's and D's stream also need to be aggregated into + the Mixer's Receiver Reports sent to B and D. Since B and D only + have the Mixer as source for the stream, all RTCP from A and C must + be suppressed by the Mixer. + + This topology is so problematic and it is so easy to get the RTCP + processing wrong, that it is NOT RECOMMENDED to implement this + topology. + +3.8. Combining Topologies + + Topologies can be combined and linked to each other using Mixers or + Translators. However, care must be taken in handling the SSRC/CSRC + space. A Mixer will not forward RTCP from sources in other domains, + but will instead generate its own RTCP packets for each domain it + mixes into, including the necessary Source Description (SDES) + information for both the CSRCs and the SSRCs. Thus, in a mixed + domain, the only SSRCs seen will be the ones present in the domain, + while there can be CSRCs from all the domains connected together with + a combination of Mixers and Translators. The combined SSRC and CSRC + space is common over any Translator or Mixer. This is important to + facilitate loop detection, something that is likely to be even more + important in combined topologies due to the mixed behavior between + the domains. Any hybrid, like the Topo-Video-switch-MCU or + Topo-Asymmetric, requires considerable thought on how RTCP is dealt + with. + + + + + + + + + +Westerlund & Wenger Informational [Page 14] + +RFC 5117 RTP Topologies January 2008 + + +4. Comparing Topologies + + The topologies discussed in Section 3 have different properties. + This section first lists these properties and then maps the different + topologies to them. Please note that even if a certain property is + supported within a particular topology concept, the necessary + functionality may, in many cases, be optional to implement. + +4.1. Topology Properties + +4.1.1. All to All Media Transmission + + Multicast, at least Any Source Multicast (ASM), provides the + functionality that everyone may send to, or receive from, everyone + else within the session. MCUs, Mixers, and Translators may all + provide that functionality at least on some basic level. However, + there are some differences in which type of reachability they + provide. + + The transport Translator function called "relay", in Section 3.3, is + the one that provides the emulation of ASM that is closest to true + IP-multicast-based, all to all transmission. Media Translators, + Mixers, and the MCU variants do not provide a fully meshed forwarding + on the transport level; instead, they only allow limited forwarding + of content from the other session participants. + + The "all to all media transmission" requires that any media + transmitting entity considers the path to the least capable receiver. + Otherwise, the media transmissions may overload that path. + Therefore, a media sender needs to monitor the path from itself to + any of the participants, to detect the currently least capable + receiver, and adapt its sending rate accordingly. As multiple + participants may send simultaneously, the available resources may + vary. RTCP's Receiver Reports help performing this monitoring, at + least on a medium time scale. + + The transmission of RTCP automatically adapts to any changes in the + number of participants due to the transmission algorithm, defined in + the RTP specification [RFC3550], and the extensions in AVPF [RFC4585] + (when applicable). That way, the resources utilized for RTCP stay + within the bounds configured for the session. + + + + + + + + + + +Westerlund & Wenger Informational [Page 15] + +RFC 5117 RTP Topologies January 2008 + + +4.1.2. Transport or Media Interoperability + + Translators, Mixers, and RTCP-terminating MCU all allow changing the + media encoding or the transport to other properties of the other + domain, thereby providing extended interoperability in cases where + the participants lack a common set of media codecs and/or transport + protocols. + +4.1.3. Per Domain Bit-Rate Adaptation + + Participants are most likely to be connected to each other with a + heterogeneous set of paths. This makes congestion control in a Point + to Multipoint set problematic. For the ASM and "relay" scenario, + each individual sender has to adapt to the receiver with the least + capable path. This is no longer necessary when Media Translators, + Mixers, or MCUs are involved, as each participant only needs to adapt + to the slowest path within its own domain. The Translator, Mixer, or + MCU topologies all require their respective outgoing streams to + adjust the bit-rate, packet-rate, etc., to adapt to the least capable + path in each of the other domains. That way one can avoid lowering + the quality to the least-capable participant in all the domains at + the cost (complexity, delay, equipment) of the Mixer or Translator. + +4.1.4. Aggregation of Media + + In the all to all media property mentioned above and provided by ASM, + all simultaneous media transmissions share the available bit-rate. + For participants with limited reception capabilities, this may result + in a situation where even a minimal acceptable media quality cannot + be accomplished. This is the result of multiple media streams + needing to share the available resources. The solution to this + problem is to provide for a Mixer or MCU to aggregate the multiple + streams into a single one. This aggregation can be performed + according to different methods. Mixing or selection are two common + methods. + +4.1.5. View of All Session Participants + + The RTP protocol includes functionality to identify the session + participants through the use of the SSRC and CSRC fields. In + addition, it is capable of carrying some further identity information + about these participants using the RTCP Source Descriptors (SDES). + To maintain this functionality, it is necessary that RTCP is handled + correctly in domain bridging function. This is specified for + Translators and Mixers. The MCU described in Section 3.5 does not + entirely fulfill this. The one described in Section 3.6 does not + support this at all. + + + + +Westerlund & Wenger Informational [Page 16] + +RFC 5117 RTP Topologies January 2008 + + +4.1.6. Loop Detection + + In complex topologies with multiple interconnected domains, it is + possible to form media loops. RTP and RTCP support detecting such + loops, as long as the SSRC and CSRC identities are correctly set in + forwarded packets. It is likely that loop detection works for the + MCU, described in Section 3.5, at least as long as it forwards the + RTCP between the participants. However, the MCU in Section 3.6 will + definitely break the loop detection mechanism. + +4.2. Comparison of Topologies + + The table below attempts to summarize the properties of the different + topologies. The legend to the topology abbreviations are: + Topo-Point-to-Point (PtP), Topo-Multicast (Multic), + Topo-Trns-Translator (TTrn), Topo-Media-Translator (including + Transport Translator) (MTrn), Topo-Mixer (Mixer), Topo-Asymmetric + (ASY), Topo-Video-switch-MCU (MCUs), and Topo-RTCP-terminating-MCU + (MCUt). In the table below, Y indicates Yes or full support, N + indicates No support, (Y) indicates partial support, and N/A + indicates not applicable. + + Property PtP Multic TTrn MTrn Mixer ASY MCUs MCUt + ------------------------------------------------------------------ + All to All media N Y Y Y (Y) (Y) (Y) (Y) + Interoperability N/A N Y Y Y Y N Y + Per Domain Adaptation N/A N N Y Y Y N Y + Aggregation of media N N N N Y (Y) Y Y + Full Session View Y Y Y Y Y Y (Y) N + Loop Detection Y Y Y Y Y Y (Y) N + + Please note that the Media Translator also includes the transport + Translator functionality. + +5. Security Considerations + + The use of Mixers and Translators has impact on security and the + security functions used. The primary issue is that both Mixers and + Translators modify packets, thus preventing the use of integrity and + source authentication, unless they are trusted devices that take part + in the security context, e.g., the device can send Secure Realtime + Transport Protocol (SRTP) and Secure Realtime Transport Control + Protocol (SRTCP) [RFC3711] packets to session endpoints. If + encryption is employed, the media Translator and Mixer need to be + able to decrypt the media to perform its function. A transport + Translator may be used without access to the encrypted payload in + cases where it translates parts that are not included in the + encryption and integrity protection, for example, IP address and UDP + + + +Westerlund & Wenger Informational [Page 17] + +RFC 5117 RTP Topologies January 2008 + + + port numbers in a media stream using SRTP [RFC3711]. However, in + general, the Translator or Mixer needs to be part of the signalling + context and get the necessary security associations (e.g., SRTP + crypto contexts) established with its RTP session participants. + + Including the Mixer and Translator in the security context allows the + entity, if subverted or misbehaving, to perform a number of very + serious attacks as it has full access. It can perform all the + attacks possible (see RFC 3550 and any applicable profiles) as if the + media session were not protected at all, while giving the impression + to the session participants that they are protected. + + Transport Translators have no interactions with cryptography that + works above the transport layer, such as SRTP, since that sort of + Translator leaves the RTP header and payload unaltered. Media + Translators, on the other hand, have strong interactions with + cryptography, since they alter the RTP payload. A media Translator + in a session that uses cryptographic protection needs to perform + cryptographic processing to both inbound and outbound packets. + + A media Translator may need to use different cryptographic keys for + the inbound and outbound processing. For SRTP, different keys are + required, because an RFC 3550 media Translator leaves the SSRC + unchanged during its packet processing, and SRTP key sharing is only + allowed when distinct SSRCs can be used to protect distinct packet + streams. + + When the media Translator uses different keys to process inbound and + outbound packets, each session participant needs to be provided with + the appropriate key, depending on whether they are listening to the + Translator or the original source. (Note that there is an + architectural difference between RTP media translation, in which + participants can rely on the RTP Payload Type field of a packet to + determine appropriate processing, and cryptographically protected + media translation, in which participants must use information that is + not carried in the packet.) + + When using security mechanisms with Translators and Mixers, it is + possible that the Translator or Mixer could create different security + associations for the different domains they are working in. Doing so + has some implications: + + First, it might weaken security if the Mixer/Translator accepts a + weaker algorithm or key in one domain than in another. Therefore, + care should be taken that appropriately strong security parameters + are negotiated in all domains. In many cases, "appropriate" + + + + + +Westerlund & Wenger Informational [Page 18] + +RFC 5117 RTP Topologies January 2008 + + + translates to "similar" strength. If a key management system does + allow the negotiation of security parameters resulting in a different + strength of the security, then this system SHOULD notify the + participants in the other domains about this. + + Second, the number of crypto contexts (keys and security related + state) needed (for example, in SRTP [RFC3711]) may vary between + Mixers and Translators. A Mixer normally needs to represent only a + single SSRC per domain and therefore needs to create only one + security association (SRTP crypto context) per domain. In contrast, + a Translator needs one security association per participant it + translates towards, in the opposite domain. Considering Figure 3, + the Translator needs two security associations towards the multicast + domain, one for B and one for D. It may be forced to maintain a set + of totally independent security associations between itself and B and + D respectively, so as to avoid two-time pad occurrences. These + contexts must also be capable of handling all the sources present in + the other domains. Hence, using completely independent security + associations (for certain keying mechanisms) may force a Translator + to handle N*DM keys and related state; where N is the total number of + SSRCs used over all domains and DM is the total number of domains. + + There exist a number of different mechanisms to provide keys to the + different participants. One example is the choice between group keys + and unique keys per SSRC. The appropriate keying model is impacted + by the topologies one intends to use. The final security properties + are dependent on both the topologies in use and the keying + mechanisms' properties, and need to be considered by the application. + Exactly which mechanisms are used is outside of the scope of this + document. + +6. Acknowledgements + + The authors would like to thank Bo Burman, Umesh Chandra, Roni Even, + Keith Lantz, Ladan Gharai, Geoff Hunt, and Mark Baugher for their + help in reviewing this document. + +7. References + +7.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. + Jacobson, "RTP: A Transport Protocol for Real-Time + Applications", STD 64, RFC 3550, July 2003. + + + + +Westerlund & Wenger Informational [Page 19] + +RFC 5117 RTP Topologies January 2008 + + + [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. + Norrman, "The Secure Real-time Transport Protocol + (SRTP)", RFC 3711, March 2004. + + [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A + Session Initiation Protocol (SIP) Event Package for + Conference State", RFC 4575, August 2006. + + [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. + Rey, "Extended RTP Profile for Real-time Transport + Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC + 4585, July 2006. + +7.2. Informative References + + [CCM] Wenger, S., Chandra, U., Westerlund, M., Burman, B., + "Codec Control Messages in the RTP Audio-Visual Profile + with Feedback (AVPF)", Work in Progress, July 2007. + + [H323] ITU-T Recommendation H.323, "Packet-based multimedia + communications systems", June 2006. + + [RTCP-SSM] J. Ott, J. Chesterfield, E. Schooler, "RTCP Extensions + for Single-Source Multicast Sessions with Unicast + Feedback," Work in Progress, March 2007. + +Authors' Addresses + + Magnus Westerlund + Ericsson Research + Ericsson AB + SE-164 80 Stockholm, SWEDEN + + Phone: +46 8 7190000 + EMail: magnus.westerlund@ericsson.com + + + Stephan Wenger + Nokia Corporation + P.O. Box 100 + FIN-33721 Tampere + FINLAND + + Phone: +358-50-486-0637 + EMail: stewe@stewe.org + + + + + + +Westerlund & Wenger Informational [Page 20] + +RFC 5117 RTP Topologies January 2008 + + +Full Copyright Statement + + Copyright (C) The IETF Trust (2008). + + This document is subject to the rights, licenses and restrictions + contained in BCP 78, and except as set forth therein, the authors + retain all their rights. + + This document and the information contained herein are provided on an + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND + THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS + OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF + THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + + The IETF takes no position regarding the validity or scope of any + Intellectual Property Rights or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; nor does it represent that it has + made any independent effort to identify any such rights. Information + on the procedures with respect to rights in RFC documents can be + found in BCP 78 and BCP 79. + + Copies of IPR disclosures made to the IETF Secretariat and any + assurances of licenses to be made available, or the result of an + attempt made to obtain a general license or permission for the use of + such proprietary rights by implementers or users of this + specification can be obtained from the IETF on-line IPR repository at + http://www.ietf.org/ipr. + + The IETF invites any interested party to bring to its attention any + copyrights, patents or patent applications, or other proprietary + rights that may cover technology that may be required to implement + this standard. Please address the information to the IETF at + ietf-ipr@ietf.org. + + + + + + + + + + + + +Westerlund & Wenger Informational [Page 21] + -- cgit v1.2.3