summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc7656.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc7656.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc7656.txt')
-rw-r--r--doc/rfc/rfc7656.txt2579
1 files changed, 2579 insertions, 0 deletions
diff --git a/doc/rfc/rfc7656.txt b/doc/rfc/rfc7656.txt
new file mode 100644
index 0000000..59f793b
--- /dev/null
+++ b/doc/rfc/rfc7656.txt
@@ -0,0 +1,2579 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) J. Lennox
+Request for Comments: 7656 Vidyo
+Category: Informational K. Gross
+ISSN: 2070-1721 AVA
+ S. Nandakumar
+ G. Salgueiro
+ Cisco Systems
+ B. Burman, Ed.
+ Ericsson
+ November 2015
+
+
+ A Taxonomy of Semantics and Mechanisms for
+ Real-Time Transport Protocol (RTP) Sources
+
+Abstract
+
+ The terminology about, and associations among, Real-time Transport
+ Protocol (RTP) sources can be complex and somewhat opaque. This
+ document describes a number of existing and proposed properties and
+ relationships among RTP sources and defines common terminology for
+ discussing protocol entities and their relationships.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for informational purposes.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Not all documents
+ approved by the IESG are a candidate for any level of Internet
+ Standard; see Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc7656.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 1]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+Copyright Notice
+
+ Copyright (c) 2015 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 2]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5
+ 2. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 5
+ 2.1. Media Chain . . . . . . . . . . . . . . . . . . . . . . . 5
+ 2.1.1. Physical Stimulus . . . . . . . . . . . . . . . . . . 10
+ 2.1.2. Media Capture . . . . . . . . . . . . . . . . . . . . 10
+ 2.1.3. Raw Stream . . . . . . . . . . . . . . . . . . . . . 10
+ 2.1.4. Media Source . . . . . . . . . . . . . . . . . . . . 11
+ 2.1.5. Source Stream . . . . . . . . . . . . . . . . . . . . 11
+ 2.1.6. Media Encoder . . . . . . . . . . . . . . . . . . . . 12
+ 2.1.7. Encoded Stream . . . . . . . . . . . . . . . . . . . 13
+ 2.1.8. Dependent Stream . . . . . . . . . . . . . . . . . . 13
+ 2.1.9. Media Packetizer . . . . . . . . . . . . . . . . . . 13
+ 2.1.10. RTP Stream . . . . . . . . . . . . . . . . . . . . . 14
+ 2.1.11. RTP-Based Redundancy . . . . . . . . . . . . . . . . 14
+ 2.1.12. Redundancy RTP Stream . . . . . . . . . . . . . . . . 15
+ 2.1.13. RTP-Based Security . . . . . . . . . . . . . . . . . 15
+ 2.1.14. Secured RTP Stream . . . . . . . . . . . . . . . . . 16
+ 2.1.15. Media Transport . . . . . . . . . . . . . . . . . . . 16
+ 2.1.16. Media Transport Sender . . . . . . . . . . . . . . . 17
+ 2.1.17. Sent RTP Stream . . . . . . . . . . . . . . . . . . . 18
+ 2.1.18. Network Transport . . . . . . . . . . . . . . . . . . 18
+ 2.1.19. Transported RTP Stream . . . . . . . . . . . . . . . 18
+ 2.1.20. Media Transport Receiver . . . . . . . . . . . . . . 18
+ 2.1.21. Received Secured RTP Stream . . . . . . . . . . . . . 19
+ 2.1.22. RTP-Based Validation . . . . . . . . . . . . . . . . 19
+ 2.1.23. Received RTP Stream . . . . . . . . . . . . . . . . . 19
+ 2.1.24. Received Redundancy RTP Stream . . . . . . . . . . . 19
+ 2.1.25. RTP-Based Repair . . . . . . . . . . . . . . . . . . 19
+ 2.1.26. Repaired RTP Stream . . . . . . . . . . . . . . . . . 19
+ 2.1.27. Media Depacketizer . . . . . . . . . . . . . . . . . 20
+ 2.1.28. Received Encoded Stream . . . . . . . . . . . . . . . 20
+ 2.1.29. Media Decoder . . . . . . . . . . . . . . . . . . . . 20
+ 2.1.30. Received Source Stream . . . . . . . . . . . . . . . 20
+ 2.1.31. Media Sink . . . . . . . . . . . . . . . . . . . . . 21
+ 2.1.32. Received Raw Stream . . . . . . . . . . . . . . . . . 21
+ 2.1.33. Media Render . . . . . . . . . . . . . . . . . . . . 21
+ 2.2. Communication Entities . . . . . . . . . . . . . . . . . 22
+ 2.2.1. Endpoint . . . . . . . . . . . . . . . . . . . . . . 23
+ 2.2.2. RTP Session . . . . . . . . . . . . . . . . . . . . . 23
+ 2.2.3. Participant . . . . . . . . . . . . . . . . . . . . . 24
+ 2.2.4. Multimedia Session . . . . . . . . . . . . . . . . . 24
+ 2.2.5. Communication Session . . . . . . . . . . . . . . . . 25
+ 3. Concepts of Inter-Relations . . . . . . . . . . . . . . . . . 25
+ 3.1. Synchronization Context . . . . . . . . . . . . . . . . . 26
+ 3.1.1. RTCP CNAME . . . . . . . . . . . . . . . . . . . . . 26
+ 3.1.2. Clock Source Signaling . . . . . . . . . . . . . . . 26
+
+
+
+Lennox, et al. Informational [Page 3]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ 3.1.3. Implicitly via RtcMediaStream . . . . . . . . . . . . 26
+ 3.1.4. Explicitly via SDP Mechanisms . . . . . . . . . . . . 26
+ 3.2. Endpoint . . . . . . . . . . . . . . . . . . . . . . . . 27
+ 3.3. Participant . . . . . . . . . . . . . . . . . . . . . . . 27
+ 3.4. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 27
+ 3.5. Multi-Channel Audio . . . . . . . . . . . . . . . . . . . 28
+ 3.6. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 28
+ 3.7. Layered Multi-Stream . . . . . . . . . . . . . . . . . . 30
+ 3.8. RTP Stream Duplication . . . . . . . . . . . . . . . . . 32
+ 3.9. Redundancy Format . . . . . . . . . . . . . . . . . . . . 33
+ 3.10. RTP Retransmission . . . . . . . . . . . . . . . . . . . 33
+ 3.11. Forward Error Correction . . . . . . . . . . . . . . . . 35
+ 3.12. RTP Stream Separation . . . . . . . . . . . . . . . . . . 36
+ 3.13. Multiple RTP Sessions over one Media Transport . . . . . 37
+ 4. Mapping from Existing Terms . . . . . . . . . . . . . . . . . 37
+ 4.1. Telepresence Terms . . . . . . . . . . . . . . . . . . . 37
+ 4.1.1. Audio Capture . . . . . . . . . . . . . . . . . . . . 37
+ 4.1.2. Capture Device . . . . . . . . . . . . . . . . . . . 37
+ 4.1.3. Capture Encoding . . . . . . . . . . . . . . . . . . 38
+ 4.1.4. Capture Scene . . . . . . . . . . . . . . . . . . . . 38
+ 4.1.5. Endpoint . . . . . . . . . . . . . . . . . . . . . . 38
+ 4.1.6. Individual Encoding . . . . . . . . . . . . . . . . . 38
+ 4.1.7. Media Capture . . . . . . . . . . . . . . . . . . . . 38
+ 4.1.8. Media Consumer . . . . . . . . . . . . . . . . . . . 38
+ 4.1.9. Media Provider . . . . . . . . . . . . . . . . . . . 39
+ 4.1.10. Stream . . . . . . . . . . . . . . . . . . . . . . . 39
+ 4.1.11. Video Capture . . . . . . . . . . . . . . . . . . . . 39
+ 4.2. Media Description . . . . . . . . . . . . . . . . . . . . 39
+ 4.3. Media Stream . . . . . . . . . . . . . . . . . . . . . . 39
+ 4.4. Multimedia Conference . . . . . . . . . . . . . . . . . . 39
+ 4.5. Multimedia Session . . . . . . . . . . . . . . . . . . . 40
+ 4.6. Multipoint Control Unit (MCU) . . . . . . . . . . . . . . 40
+ 4.7. Multi-Session Transmission (MST) . . . . . . . . . . . . 40
+ 4.8. Recording Device . . . . . . . . . . . . . . . . . . . . 41
+ 4.9. RtcMediaStream . . . . . . . . . . . . . . . . . . . . . 41
+ 4.10. RtcMediaStreamTrack . . . . . . . . . . . . . . . . . . . 41
+ 4.11. RTP Receiver . . . . . . . . . . . . . . . . . . . . . . 41
+ 4.12. RTP Sender . . . . . . . . . . . . . . . . . . . . . . . 41
+ 4.13. RTP Session . . . . . . . . . . . . . . . . . . . . . . . 41
+ 4.14. Single-Session Transmission (SST) . . . . . . . . . . . . 41
+ 4.15. SSRC . . . . . . . . . . . . . . . . . . . . . . . . . . 42
+ 5. Security Considerations . . . . . . . . . . . . . . . . . . . 42
+ 6. Informative References . . . . . . . . . . . . . . . . . . . 42
+ Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 45
+ Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 45
+ Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 46
+
+
+
+
+
+Lennox, et al. Informational [Page 4]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+1. Introduction
+
+ The existing taxonomy of sources in the Real-time Transport Protocol
+ (RTP) [RFC3550] has previously been regarded as confusing and
+ inconsistent. Consequently, a deep understanding of how the
+ different terms relate to each other becomes a real challenge.
+ Frequently cited examples of this confusion are (1) how different
+ protocols that make use of RTP use the same terms to signify
+ different things and (2) how the complexities addressed at one layer
+ are often glossed over or ignored at another.
+
+ This document improves clarity by reviewing the semantics of various
+ aspects of sources in RTP. As an organizing mechanism, it approaches
+ this by describing various ways that RTP sources are transformed on
+ their way between sender and receiver, and how they can be grouped
+ and associated together.
+
+ All non-specific references to ControLling mUltiple streams for
+ tElepresence (CLUE) in this document map to [CLUE-FRAME], and all
+ references to Web Real-time Communications (WebRTC) map to
+ [WEBRTC-OVERVIEW].
+
+2. Concepts
+
+ This section defines concepts that serve to identify and name various
+ transformations and streams in a given RTP usage. For each concept,
+ alternate definitions and usages that coexist today are listed along
+ with various characteristics that further describe the concept.
+ These concepts are divided into two categories: one is related to the
+ chain of streams and transformations that Media can be subject to,
+ and the other is for entities involved in the communication.
+
+2.1. Media Chain
+
+ In the context of this document, media is a sequence of synthetic or
+ Physical Stimuli (Section 2.1.1) -- for example, sound waves,
+ photons, key strokes -- represented in digital form. Synthesized
+ media is typically generated directly in the digital domain.
+
+ This section contains the concepts that can be involved in taking
+ media at a sender side and transporting it to a receiver, which may
+ recover a sequence of physical stimuli. This chain of concepts is of
+ two main types: streams and transformations. Streams are time-based
+ sequences of samples of the physical stimulus in various
+ representations, while transformations change the representation of
+ the streams in some way.
+
+
+
+
+
+Lennox, et al. Informational [Page 5]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ The below examples are basic ones, and it is important to keep in
+ mind that this conceptual model enables more complex usages. Some
+ will be further discussed in later sections of this document. In
+ general the following applies to this model:
+
+ o A transformation may have zero or more inputs and one or more
+ outputs.
+
+ o A stream is of some type, such as audio, video, real-time text,
+ etc.
+
+ o A stream has one source transformation and one or more sink
+ transformations (with the exception of physical stimulus
+ (Section 2.1.1) that may lack source or sink transformation).
+
+ o Streams can be forwarded from a transformation output to any
+ number of inputs on other transformations that support that type.
+
+ o If the output of a transformation is sent to multiple
+ transformations, those streams will be identical; it takes a
+ transformation to make them different.
+
+ o There are no formal limitations on how streams are connected to
+ transformations.
+
+ It is also important to remember that this is a conceptual model.
+ Thus, real-world implementations may look different and have a
+ different structure.
+
+ To provide a basic understanding of the relationships in the chain,
+ we first introduce the concepts for the sender side (Figure 1). This
+ covers physical stimuli until media packets are emitted onto the
+ network.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 6]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ Physical Stimulus
+ |
+ V
+ +----------------------+
+ | Media Capture |
+ +----------------------+
+ |
+ Raw Stream
+ V
+ +----------------------+
+ | Media Source |<- Synchronization Timing
+ +----------------------+
+ |
+ Source Stream
+ V
+ +----------------------+
+ | Media Encoder |
+ +----------------------+
+ |
+ Encoded Stream +------------+
+ V | V
+ +----------------------+ | +----------------------+
+ | Media Packetizer | | | RTP-Based Redundancy |
+ +----------------------+ | +----------------------+
+ | | |
+ +-------------+ Redundancy RTP Stream
+ Source RTP Stream |
+ V V
+ +----------------------+ +----------------------+
+ | RTP-Based Security | | RTP-Based Security |
+ +----------------------+ +----------------------+
+ | |
+ Secured RTP Stream Secured Redundancy RTP Stream
+ V V
+ +----------------------+ +----------------------+
+ | Media Transport | | Media Transport |
+ +----------------------+ +----------------------+
+
+ Figure 1: Sender Side Concepts in the Media Chain
+
+ In Figure 1, we have included a branched chain to cover the concepts
+ for using redundancy to improve the reliability of the transport.
+ The Media Transport concept is an aggregate that is decomposed in
+ Section 2.1.15.
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 7]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ In Figure 2, we review a receiver media chain matching the sender
+ side, to look at the inverse transformations and their attempts to
+ recover identical streams as in the sender chain, subject to what may
+ be lossy compression and imperfect media transport. Note that the
+ streams out of a reverse transformation, like the Source Stream out
+ of the Media Decoder, are in many cases not the same as the
+ corresponding ones on the sender side; thus, they are prefixed with a
+ "received" to denote a potentially modified version. The reason for
+ not being the same lies in the transformations that can be of
+ irreversible type. For example, lossy source coding in the Media
+ Encoder prevents the source stream out of the media decoder from
+ being the same as the one fed into the media encoder. Other reasons
+ include packet loss in the media transport transformation that even
+ RTP-based Repair, if used, fails to repair.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 8]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ +----------------------+ +----------------------+
+ | Media Transport | | Media Transport |
+ +----------------------+ +----------------------+
+ Received | Received | Secured
+ Secured RTP Stream Redundancy RTP Stream
+ V V
+ +----------------------+ +----------------------+
+ | RTP-Based Validation | | RTP-Based Validation |
+ +----------------------+ +----------------------+
+ | |
+ Received RTP Stream Received Redundancy RTP Stream
+ | |
+ | +--------------------+
+ V V
+ +----------------------+
+ | RTP-Based Repair |
+ +----------------------+
+ |
+ Repaired RTP Stream
+ V
+ +----------------------+
+ | Media Depacketizer |
+ +----------------------+
+ |
+ Received Encoded Stream
+ V
+ +----------------------+
+ | Media Decoder |
+ +----------------------+
+ |
+ Received Source Stream
+ V
+ +----------------------+
+ | Media Sink |--> Synchronization Information
+ +----------------------+
+ |
+ Received Raw Stream
+ V
+ +----------------------+
+ | Media Render |
+ +----------------------+
+ |
+ V
+ Physical Stimulus
+
+ Figure 2: Receiver Side Concepts of the Media Chain
+
+
+
+
+
+Lennox, et al. Informational [Page 9]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+2.1.1. Physical Stimulus
+
+ The physical stimulus is a physical event in the analog domain that
+ can be sampled and converted to digital form by an appropriate sensor
+ or transducer. This includes sound waves making up audio, photons in
+ a light field, or other excitations or interactions with sensors,
+ like keystrokes on a keyboard.
+
+2.1.2. Media Capture
+
+ Media Capture is the process of transforming the analog physical
+ stimulus (Section 2.1.1) into digital media using an appropriate
+ sensor or transducer. The media capture performs a digital sampling
+ of the physical stimulus, usually periodically, and outputs this in
+ some representation as a Raw Stream (Section 2.1.3). This data is
+ considered "media", because it includes data that is periodically
+ sampled or made up of a set of timed asynchronous events. The media
+ capture is normally instantiated in some type of device, i.e., media
+ capture device. Examples of different types of media capturing
+ devices are digital cameras, microphones connected to A/D converters,
+ or keyboards.
+
+ Characteristics:
+
+ o A media capture is identified either by hardware/manufacturer ID
+ or via a session-scoped device identifier as mandated by the
+ application usage.
+
+ o A media capture can generate an Encoded Stream (Section 2.1.7) if
+ the capture device supports such a configuration.
+
+ o The nature of the media capture may impose constraints on the
+ clock handling in some of the subsequent steps. For example, many
+ audio or video capture devices are not completely free in
+ selecting the sample rate.
+
+2.1.3. Raw Stream
+
+ A raw stream is the time progressing stream of digitally sampled
+ information, usually periodically sampled and provided by a media
+ capture (Section 2.1.2). A raw stream can also contain synthesized
+ media that may not require any explicit media capture, since it is
+ already in an appropriate digital form.
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 10]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+2.1.4. Media Source
+
+ A Media Source is the logical source of a time progressing digital
+ media stream synchronized to a reference clock. This stream is
+ called a source stream (Section 2.1.5). This transformation takes
+ one or more raw streams (Section 2.1.3) and provides a source stream
+ as output. The output is synchronized with a reference clock
+ (Section 3.1), which can be as simple as a system local wall clock or
+ as complex as an NTP synchronized clock.
+
+ The output can be of different types. One type is directly
+ associated with a particular media capture's raw stream. Others are
+ more conceptual sources, like an audio mix of multiple source streams
+ (Figure 3). Mixing multiple streams typically requires that the
+ input streams are possible to relate in time, meaning that they have
+ to be source streams (Section 2.1.5) rather than raw streams. In
+ Figure 3, the generated source stream is a mix of the three input
+ source streams.
+
+ Source Source Source
+ Stream Stream Stream
+ | | |
+ V V V
+ +--------------------------+
+ | Media Source |<-- Reference Clock
+ | Mixer |
+ +--------------------------+
+ |
+ V
+ Source Stream
+
+ Figure 3: Conceptual Media Source in the form of an Audio Mixer
+
+ Another possible example of a conceptual media source is a video
+ surveillance switch, where the input is multiple source streams from
+ different cameras, and the output is one of those source streams
+ based on some selection criteria, such as round robin or some video
+ activity measure.
+
+2.1.5. Source Stream
+
+ A source stream is a stream of digital samples that has been
+ synchronized with a reference clock and comes from a particular media
+ source (Section 2.1.4).
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 11]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+2.1.6. Media Encoder
+
+ A media encoder is a transform that is responsible for encoding the
+ media data from a source stream (Section 2.1.5) into another
+ representation, usually more compact, that is output as an encoded
+ stream (Section 2.1.7).
+
+ The media encoder step commonly includes pre-encoding
+ transformations, such as scaling, resampling, etc. The media encoder
+ can have a significant number of configuration options that affects
+ the properties of the encoded stream. This includes properties such
+ as codec, bitrate, start points for decoding, resolution, bandwidth,
+ or other fidelity affecting properties.
+
+ Scalable media encoders need special attention as they produce
+ multiple outputs that are potentially of different types. As shown
+ in Figure 4, a scalable media encoder takes one input source stream
+ and encodes it into multiple output streams of two different types:
+ at least one encoded stream that is independently decodable and one
+ or more Dependent Streams (Section 2.1.8). Decoding requires at
+ least one encoded stream and zero or more dependent streams. A
+ dependent stream's dependency is one of the grouping relations this
+ document discusses further in Section 3.7.
+
+ Source Stream
+ |
+ V
+ +--------------------------+
+ | Scalable Media Encoder |
+ +--------------------------+
+ | | ... |
+ V V V
+ Encoded Dependent Dependent
+ Stream Stream Stream
+
+ Figure 4: Scalable Media Encoder Input and Outputs
+
+ There are also other variants of encoders, like so-called Multiple
+ Description Coding (MDC). Such media encoders produce multiple
+ independent and thus individually decodable encoded streams.
+ However, (logically) combining multiple of these encoded streams into
+ a single Received Source Stream during decoding leads to an
+ improvement in perceptual reproduced quality when compared to
+ decoding a single encoded stream.
+
+ Creating multiple encoded streams from the same source stream, where
+ the encoded streams are neither in a scalable nor in an MDC
+
+
+
+
+Lennox, et al. Informational [Page 12]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ relationship is commonly utilized in simulcast [SDP-SIMULCAST]
+ environments.
+
+2.1.7. Encoded Stream
+
+ A stream of time synchronized encoded media that can be independently
+ decoded.
+
+ Due to temporal dependencies, an encoded stream may have limitations
+ in where decoding can be started. These entry points, for example,
+ Intra frames from a video encoder, may require identification and
+ their generation may be event based or configured to occur
+ periodically.
+
+2.1.8. Dependent Stream
+
+ A stream of time synchronized encoded media fragments that are
+ dependent on one or more encoded streams (Section 2.1.7) and zero or
+ more dependent streams to be possible to decode.
+
+ Each dependent stream has a set of dependencies. These dependencies
+ must be understood by the parties in a Multimedia Session
+ (Section 2.2.4) that intend to use a dependent stream.
+
+2.1.9. Media Packetizer
+
+ The transformation of taking one or more encoded (Section 2.1.7) or
+ dependent streams (Section 2.1.8) and putting their content into one
+ or more sequences of packets, normally RTP Packets, and output Source
+ RTP Streams (Section 2.1.10). This step includes both generating RTP
+ Payloads as well as RTP packets. The Media Packetizer then selects
+ which synchronization source(s) (SSRC) [RFC3550] and RTP Sessions
+ (Section 2.2.2) to use.
+
+ The media packetizer can combine multiple encoded or dependent
+ streams into one or more RTP Streams:
+
+ o The media packetizer can use multiple inputs when producing a
+ single RTP stream. One such example is Single RTP stream on a
+ Single media Transport (SRST) packetization when using Scalable
+ Video Coding (SVC) (Section 3.7).
+
+ o The media packetizer can also produce multiple RTP streams, for
+ example, when encoded and/or dependent streams are distributed
+ over multiple RTP streams. One example of this is Multiple RTP
+ streams on Multiple media Transports (MRMT) packetization when
+ using SVC (Section 3.7).
+
+
+
+
+Lennox, et al. Informational [Page 13]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+2.1.10. RTP Stream
+
+ An RTP stream is a stream of RTP packets containing media data,
+ source or redundant. The RTP stream is identified by an SSRC
+ belonging to a particular RTP Session. The RTP session is identified
+ as discussed in Section 2.2.2.
+
+ A source RTP stream is an RTP stream directly related to an encoded
+ stream (Section 2.1.7), targeted for transport over RTP without any
+ additional RTP-based Redundancy (Section 2.1.11) applied.
+
+ Characteristics:
+
+ o Each RTP stream is identified by an SSRC [RFC3550] that is carried
+ in every RTP and RTP Control Protocol (RTCP) packet header. The
+ SSRC is unique in a specific RTP session context.
+
+ o At any given point in time, an RTP stream can have one and only
+ one SSRC, but SSRCs for a given RTP stream can change over time.
+ SSRC collision and clock rate change [RFC7160] are examples of
+ valid reasons to change SSRC for an RTP stream. In those cases,
+ the RTP stream itself is not changed in any significant way, only
+ the identifying SSRC number.
+
+ o Each SSRC defines a unique RTP sequence numbering and timing
+ space.
+
+ o Several RTP streams, each with their own SSRC, may represent a
+ single media source.
+
+ o Several RTP streams, each with their own SSRC, can be carried in a
+ single RTP session.
+
+2.1.11. RTP-Based Redundancy
+
+ RTP-based redundancy is defined here as a transformation that
+ generates redundant or repair packets sent out as a Redundancy RTP
+ Stream (Section 2.1.12) to mitigate Network Transport
+ (Section 2.1.18) impairments, like packet loss and delay. Note that
+ this excludes the type of redundancy that most suitable media
+ encoders (Section 2.1.6) may add to the media format of the encoded
+ stream (Section 2.1.7) that makes it cope better with RTP packet
+ losses.
+
+ The RTP-based redundancy exists in many flavors: they may generate
+ independent repair streams that are used in addition to the source
+ stream (like RTP Retransmission (Section 3.10) and some special types
+ of Forward Error Correction (FEC) (Section 3.11), like RTP stream
+
+
+
+Lennox, et al. Informational [Page 14]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ duplication (Section 3.8)); they may generate a new source stream by
+ combining redundancy information with source information (using XOR
+ FEC as a redundancy payload (Section 3.9)); or they may completely
+ replace the source information with only redundancy packets.
+
+2.1.12. Redundancy RTP Stream
+
+ A redundancy RTP stream is an RTP stream (Section 2.1.10) that
+ contains no original source data, only redundant data, which may
+ either be used as standalone or be combined with one or more Received
+ RTP Streams (Section 2.1.23) to produce Repaired RTP Streams
+ (Section 2.1.26).
+
+2.1.13. RTP-Based Security
+
+ The optional RTP-based Security transformation applies security
+ services such as authentication, integrity protection, and
+ confidentiality to an input RTP stream, like what is specified in
+ "The Secure Real-time Transport Protocol (SRTP)" [RFC3711], producing
+ a Secured RTP Stream (Section 2.1.14). Either an RTP stream
+ (Section 2.1.10) or a redundancy RTP stream (Section 2.1.12) can be
+ used as input to this transformation.
+
+ In SRTP and the related Secure RTCP (SRTCP), all of the above-
+ mentioned security services are optional, except for integrity
+ protection of SRTCP, which is mandatory. Also confidentiality
+ (encryption) is effectively optional in SRTP, since it is possible to
+ use a NULL encryption algorithm. As described in [RFC7201], the
+ strength of SRTP data origin authentication depends on the
+ cryptographic transform and key management used. For example, in
+ group communication, where it is sometimes possible to authenticate
+ group membership but not the actual RTP stream sender.
+
+ RTP-based security and RTP-based redundancy can be combined in a few
+ different ways. One way is depicted in Figure 1, where an RTP stream
+ and its corresponding redundancy RTP stream are protected by separate
+ RTP-based security transforms. In other cases, like when a Media
+ Translator is adding FEC in Section 3.2.1.3 of [RTP-TOPOLOGIES], a
+ middlebox can apply RTP-based redundancy to an already secured RTP
+ stream instead of a source RTP stream. One example of that is
+ depicted in Figure 5 below.
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 15]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ Source RTP Stream +------------+
+ V | V
+ +----------------------+ | +----------------------+
+ | RTP-Based Security | | | RTP-Based Redundancy |
+ +----------------------+ | +----------------------+
+ | | |
+ | | Redundancy RTP Stream
+ +-------------+ |
+ | V
+ | +----------------------+
+ Secured RTP Stream | RTP-Based Security |
+ | +----------------------+
+ | |
+ | Secured Redundancy RTP Stream
+ V V
+ +----------------------+ +----------------------+
+ | Media Transport | | Media Transport |
+ +----------------------+ +----------------------+
+
+ Figure 5: Adding Redundancy to a Secured RTP Stream
+
+ In this case, the redundancy RTP stream may already have been secured
+ for confidentiality (encrypted) by the first RTP-based security, and
+ it may therefore not be necessary to apply additional confidentiality
+ protection in the second RTP-based security. To avoid attacks and
+ negative impact on RTP-based Repair (Section 2.1.25) and the
+ resulting repaired RTP stream (Section 2.1.26), it is, however, still
+ necessary to have this second RTP-based security apply both
+ authentication and integrity protection to the redundancy RTP stream.
+
+2.1.14. Secured RTP Stream
+
+ A secured RTP stream is a source or redundancy RTP stream that is
+ protected through RTP-based security (Section 2.1.13) by one or more
+ of the confidentiality, integrity, or authentication security
+ services.
+
+2.1.15. Media Transport
+
+ A media transport defines the transformation that the RTP streams
+ (Section 2.1.10) are subjected to by the end-to-end transport from
+ one RTP Sender (Section 4.12) to one specific RTP Receiver
+ (Section 4.11) (an RTP session (Section 2.2.2) may contain multiple
+ RTP receivers per sender). Each media transport is defined by a
+ transport association that is normally identified by a 5-tuple
+ (source address, source port, destination address, destination port,
+ transport protocol), but a proposal exists for sending multiple
+ transport associations on a single 5-tuple [TRANSPORT-MULTIPLEX].
+
+
+
+Lennox, et al. Informational [Page 16]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ Characteristics:
+
+ o Media transport transmits RTP streams of RTP packets from a source
+ transport address to a destination transport address.
+
+ o Each media transport contains only a single RTP session.
+
+ o A single RTP session can span multiple media transports.
+
+ The media transport concept sometimes needs to be decomposed into
+ more steps to enable discussion of what a sender emits that gets
+ transformed by the network before it is received by the receiver.
+ Thus, we provide also this media transport decomposition (Figure 6).
+
+ RTP Stream
+ |
+ V
+ +--------------------------+
+ | Media Transport Sender |
+ +--------------------------+
+ |
+ Sent RTP Stream
+ V
+ +--------------------------+
+ | Network Transport |
+ +--------------------------+
+ |
+ Transported RTP Stream
+ V
+ +--------------------------+
+ | Media Transport Receiver |
+ +--------------------------+
+ |
+ V
+ Received RTP Stream
+
+ Figure 6: Decomposition of Media Transport
+
+2.1.16. Media Transport Sender
+
+ The first transformation within the media transport (Section 2.1.15)
+ is the Media Transport Sender. The sending Endpoint (Section 2.2.1)
+ takes an RTP stream and emits the packets onto the network using the
+ transport association established for this media transport, thereby
+ creating a Sent RTP Stream (Section 2.1.17). In the process, it
+ transforms the RTP stream in several ways. First, it generates the
+ necessary protocol headers for the transport association, for
+ example, IP and UDP headers, thus forming IP/UDP/RTP packets. In
+
+
+
+Lennox, et al. Informational [Page 17]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ addition, the media transport sender may queue, intentionally pace,
+ or otherwise affect how the packets are emitted onto the network,
+ thereby potentially introducing delay and delay variations [RFC5481]
+ that characterize the sent RTP stream.
+
+2.1.17. Sent RTP Stream
+
+ The sent RTP stream is the RTP stream as entering the first hop of
+ the network path to its destination. The sent RTP stream is
+ identified using network transport addresses, like the 5-tuple
+ (source IP address, source port, destination IP address, destination
+ port, and protocol (UDP)) for IP/UDP.
+
+2.1.18. Network Transport
+
+ Network transport is the transformation that subjects the sent RTP
+ stream (Section 2.1.17) to traveling from the source to the
+ destination through the network. This transformation can result in
+ loss of some packets, delay, and delay variation on a per-packet
+ basis, packet duplication, and packet header or data corruption.
+ This transformation produces a Transported RTP Stream
+ (Section 2.1.19) at the exit of the network path.
+
+2.1.19. Transported RTP Stream
+
+ The transported RTP stream is the RTP stream that is emitted out of
+ the network path at the destination, subjected to the network
+ transport's transformation (Section 2.1.18).
+
+2.1.20. Media Transport Receiver
+
+ The Media Transport Receiver is the receiver endpoint's
+ (Section 2.2.1) transformation of the transported RTP stream
+ (Section 2.1.19) by its reception process, which results in the
+ received RTP stream (Section 2.1.23). This transformation includes
+ transport checksums being verified. Sensible system designs
+ typically either discard packets with mismatching checksums or pass
+ them on while somehow marking them in the resulting received RTP
+ stream so to alert subsequent transformations about the possible
+ corrupt state. In this context, it is worth noting that there is
+ typically some probability for corrupt packets to pass through
+ undetected (with a seemingly correct checksum). Other
+ transformations can compensate for delay variations in receiving a
+ packet on the network interface and providing it to the application
+ (de-jitter buffer).
+
+
+
+
+
+
+Lennox, et al. Informational [Page 18]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+2.1.21. Received Secured RTP Stream
+
+ This is the secured RTP stream (Section 2.1.14) resulting from the
+ media transport (Section 2.1.15) aggregate transformation.
+
+2.1.22. RTP-Based Validation
+
+ RTP-based Validation is the reverse transformation of RTP-based
+ security (Section 2.1.13). If this transformation fails, the result
+ is either not usable and must be discarded or may be usable but
+ cannot be trusted. If the transformation succeeds, the result can be
+ a received RTP stream (Section 2.1.23) or a Received Redundancy RTP
+ Stream (Section 2.1.24), depending on what was input to the
+ corresponding RTP-based security transformation, but it can also be a
+ Received Secured RTP Stream (Section 2.1.21) in case several RTP-
+ based security transformations were applied.
+
+2.1.23. Received RTP Stream
+
+ The received RTP stream is the RTP stream (Section 2.1.10) resulting
+ from the media transport's aggregate transformation (Section 2.1.15),
+ i.e., subjected to packet loss, packet corruption, packet
+ duplication, delay, and delay variation from sender to receiver.
+
+2.1.24. Received Redundancy RTP Stream
+
+ The received redundancy RTP stream is the redundancy RTP stream
+ (Section 2.1.12) resulting from the media transport's aggregate
+ transformation, i.e., subjected to packet loss, packet corruption,
+ packet duplication, delay, and delay variation from sender to
+ receiver.
+
+2.1.25. RTP-Based Repair
+
+ RTP-based repair is a transformation that takes as input zero or more
+ received RTP streams (Section 2.1.23) and one or more received
+ redundancy RTP streams (Section 2.1.24) and produces one or more
+ repaired RTP streams (Section 2.1.26) that are as close to the
+ corresponding sent source RTP streams (Section 2.1.10) as possible,
+ using different RTP-based repair methods, for example, the ones
+ referred to in RTP-based redundancy (Section 2.1.11).
+
+2.1.26. Repaired RTP Stream
+
+ A repaired RTP stream is a received RTP stream (Section 2.1.23) for
+ which received redundancy RTP stream (Section 2.1.24) information has
+ been used to try to recover the source RTP stream (Section 2.1.10) as
+ it was before media transport (Section 2.1.15).
+
+
+
+Lennox, et al. Informational [Page 19]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+2.1.27. Media Depacketizer
+
+ A Media Depacketizer takes one or more RTP streams (Section 2.1.10),
+ depacketizes them, and attempts to reconstitute the encoded streams
+ (Section 2.1.7) or dependent streams (Section 2.1.8) present in those
+ RTP streams.
+
+ In practical implementations, the media depacketizer and the media
+ decoder may be tightly coupled and share information to improve or
+ optimize the overall decoding and error concealment process. It is,
+ however, not expected that there would be any benefit in defining a
+ taxonomy for those detailed (and likely very implementation-
+ dependent) steps.
+
+2.1.28. Received Encoded Stream
+
+ The Received Encoded Stream is the received version of an encoded
+ stream (Section 2.1.7).
+
+2.1.29. Media Decoder
+
+ A media decoder is a transformation that is responsible for decoding
+ encoded streams (Section 2.1.7) and any dependent streams
+ (Section 2.1.8) into a source stream (Section 2.1.5).
+
+ In practical implementations, the media decoder and the media
+ depacketizer may be tightly coupled and share information to improve
+ or optimize the overall decoding process in various ways. It is,
+ however, not expected that there would be any benefit in defining a
+ taxonomy for those detailed (and likely very implementation-
+ dependent) steps.
+
+ A media decoder has to deal with any errors in the encoded streams
+ that resulted from corruption or failure to repair packet losses.
+ Therefore, it commonly is robust to error and losses, and includes
+ concealment methods.
+
+2.1.30. Received Source Stream
+
+ The received source stream is the received version of a source stream
+ (Section 2.1.5).
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 20]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+2.1.31. Media Sink
+
+ The Media Sink receives a source stream (Section 2.1.5) that
+ contains, usually periodically, sampled media data together with
+ associated synchronization information. Depending on application,
+ this source stream then needs to be transformed into a raw stream
+ (Section 2.1.3) that is conveyed to the Media Render (Section 2.1.33)
+ and synchronized with the output from other media sinks. The media
+ sink may also be connected with a media source (Section 2.1.4) and be
+ used as part of a conceptual media source.
+
+ The media sink can further transform the source stream into a
+ representation that is suitable for rendering on the media render as
+ defined by the application or system-wide configuration. This
+ includes sample scaling, level adjustments, etc.
+
+2.1.32. Received Raw Stream
+
+ The Received Raw Stream is the received version of a raw stream
+ (Section 2.1.3).
+
+2.1.33. Media Render
+
+ A media render takes a raw stream (Section 2.1.3) and converts it
+ into physical stimulus (Section 2.1.1) that a human user can
+ perceive. Examples of such devices are screens and D/A converters
+ connected to amplifiers and loudspeakers.
+
+ An endpoint can potentially have multiple media renders for each
+ media type.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 21]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+2.2. Communication Entities
+
+ This section contains concepts for entities involved in the
+ communication.
+
+ +------------------------------------------------------------+
+ | Communication Session |
+ | |
+ | +----------------+ +----------------+ |
+ | | Participant A | +------------+ | Participant B | |
+ | | | | Multimedia | | | |
+ | | +------------+ |<==>| Session |<==>| +------------+ | |
+ | | | Endpoint A | | | | | | Endpoint B | | |
+ | | | | | +------------+ | | | | |
+ | | | +----------+-+----------------------+-+----------+ | | |
+ | | | | RTP | | | | | | | |
+ | | | | Session |-+---Media Transport----+>| | | | |
+ | | | | Audio |<+---Media Transport----+-| | | | |
+ | | | | | | ^ | | | | | |
+ | | | +----------+-+----------|-----------+-+----------+ | | |
+ | | | | | v | | | | |
+ | | | | | +-----------------+ | | | | |
+ | | | | | | Synchronization | | | | | |
+ | | | | | | Context | | | | | |
+ | | | | | +-----------------+ | | | | |
+ | | | | | ^ | | | | |
+ | | | +----------+-+----------|-----------+-+----------+ | | |
+ | | | | RTP | | v | | | | | |
+ | | | | Session |<+---Media Transport----+-| | | | |
+ | | | | Video |-+---Media Transport----+>| | | | |
+ | | | | | | | | | | | |
+ | | | +----------+-+----------------------+-+----------+ | | |
+ | | +------------+ | | +------------+ | |
+ | +----------------+ +----------------+ |
+ +------------------------------------------------------------+
+
+ Figure 7: Example Point-to-Point Communication Session with Two RTP
+ Sessions
+
+ Figure 7 shows a high-level example representation of a very basic
+ point-to-point Communication Session between Participants A and B.
+ It uses two different audio and video RTP sessions between A's and
+ B's endpoints, where each RTP session is a group communications
+ channel that can potentially carry a number of RTP streams. It is
+ using separate media transports for those RTP sessions. The
+ multimedia session shared by the participants can, for example, be
+ established using SIP (i.e., there is a SIP dialog between A and B).
+
+
+
+
+Lennox, et al. Informational [Page 22]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ The terms used in Figure 7 are further elaborated in the subsections
+ below.
+
+2.2.1. Endpoint
+
+ An endpoint is a single addressable entity sending or receiving RTP
+ packets. It may be decomposed into several functional blocks, but as
+ long as it behaves as a single RTP stack entity, it is classified as
+ a single "endpoint".
+
+ Characteristics:
+
+ o Endpoints can be identified in several different ways. While RTCP
+ Canonical Names (CNAMEs) [RFC3550] provide a globally unique and
+ stable identification mechanism for the duration of the
+ communication session (see Section 2.2.5), their validity applies
+ exclusively within a Synchronization Context (Section 3.1). Thus,
+ one endpoint can handle multiple CNAMEs, each of which can be
+ shared among a set of endpoints belonging to the same participant
+ (Section 2.2.3). Therefore, mechanisms outside the scope of RTP,
+ such as application-defined mechanisms, must be used to provide
+ endpoint identification when outside this synchronization context.
+
+ o An endpoint can be associated with at most one participant
+ (Section 2.2.3) at any single point in time.
+
+ o In some contexts, an endpoint would typically correspond to a
+ single "host", for example, a computer using a single network
+ interface and being used by a single human user. In other
+ contexts, a single "host" can serve multiple participants, in
+ which case each participant's endpoint may share properties, for
+ example, the IP address part of a transport address.
+
+2.2.2. RTP Session
+
+ An RTP session is an association among a group of participants
+ communicating with RTP. It is a group communications channel that
+ can potentially carry a number of RTP streams. Within an RTP
+ session, every participant can find metadata and control information
+ (over RTCP) about all the RTP streams in the RTP session. The
+ bandwidth of the RTCP control channel is shared between all
+ participants within an RTP session.
+
+ Characteristics:
+
+ o An RTP session can carry one or more RTP streams.
+
+
+
+
+
+Lennox, et al. Informational [Page 23]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ o An RTP session shares a single SSRC space as defined in [RFC3550].
+ That is, the endpoints participating in an RTP session can see an
+ SSRC identifier transmitted by any of the other endpoints. An
+ endpoint can receive an SSRC either as SSRC or as a contributing
+ source (CSRC) in RTP and RTCP packets, as defined by the
+ endpoints' network interconnection topology.
+
+ o An RTP session uses at least two media transports
+ (Section 2.1.15): one for sending and one for receiving.
+ Commonly, the receiving media transport is the reverse direction
+ of the media transport used for sending. An RTP session may use
+ many media transports and these define the session's network
+ interconnection topology.
+
+ o A single media transport always carries a single RTP session.
+
+ o Multiple RTP sessions can be conceptually related, for example,
+ originating from or targeted for the same participant
+ (Section 2.2.3) or endpoint (Section 2.2.1), or by containing RTP
+ streams that are somehow related (Section 3).
+
+2.2.3. Participant
+
+ A participant is an entity reachable by a single signaling address
+ and is thus related more to the signaling context than to the media
+ context.
+
+ Characteristics:
+
+ o A single signaling-addressable entity, using an application-
+ specific signaling address space, for example, a SIP URI.
+
+ o A participant can participate in several multimedia sessions
+ (Section 2.2.4).
+
+ o A participant can be comprised of several associated endpoints
+ (Section 2.2.1).
+
+2.2.4. Multimedia Session
+
+ A multimedia session is an association among a group of participants
+ (Section 2.2.3) engaged in the communication via one or more RTP
+ sessions (Section 2.2.2). It defines logical relationships among
+ media sources (Section 2.1.4) that appear in multiple RTP sessions.
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 24]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ Characteristics:
+
+ o A multimedia session can be composed of several RTP sessions with
+ potentially multiple RTP streams per RTP session.
+
+ o Each participant in a multimedia session can have a multitude of
+ media captures and media rendering devices.
+
+ o A single multimedia session can contain media from one or more
+ synchronization contexts (Section 3.1). An example of that is a
+ multimedia session containing one set of audio and video for
+ communication purposes belonging to one synchronization context,
+ and another set of audio and video for presentation purposes (like
+ playing a video file) with a separate synchronization context that
+ has no strong timing relationship and need not be strictly
+ synchronized with the audio and video used for communication.
+
+2.2.5. Communication Session
+
+ A communication session is an association among two or more
+ participants (Section 2.2.3) communicating with each other via one or
+ more multimedia sessions (Section 2.2.4).
+
+ Characteristics:
+
+ o Each participant in a communication session is identified via an
+ application-specific signaling address.
+
+ o A communication session is composed of participants that share at
+ least one multimedia session, involving one or more parallel RTP
+ sessions with potentially multiple RTP streams per RTP session.
+
+ For example, in a full mesh communication, the communication session
+ consists of a set of separate multimedia sessions between each pair
+ of participants. Another example is a centralized conference, where
+ the communication session consists of a set of multimedia sessions
+ between each participant and the conference handler.
+
+3. Concepts of Inter-Relations
+
+ This section uses the concepts from previous sections and looks at
+ different types of relationships among them. These relationships
+ occur at different abstraction levels and for different purposes, but
+ the reason for the needed relationship at a certain step in the media
+ handling chain may exist at another step. For example, the use of
+ simulcast (Section 3.6) implies a need to determine relations at the
+
+
+
+
+
+Lennox, et al. Informational [Page 25]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ RTP stream level, but the underlying reason is that multiple media
+ encoders use the same media source, i.e., to be able to identify a
+ common media source.
+
+3.1. Synchronization Context
+
+ A synchronization context defines a requirement for a strong timing
+ relationship between the media sources, typically requiring alignment
+ of clock sources. Such a relationship can be identified in multiple
+ ways as listed below. A single media source can only belong to a
+ single synchronization context, since it is assumed that a single
+ media source can only have a single media clock and requiring
+ alignment to several synchronization contexts (and thus reference
+ clocks) will effectively merge those into a single synchronization
+ context.
+
+3.1.1. RTCP CNAME
+
+ [RFC3550] describes inter-media synchronization between RTP sessions
+ based on RTCP CNAME, RTP, and timestamps of a reference clock
+ formatted using the Network Time Protocol (NTP) [RFC5905]. As
+ indicated in [RFC7273], despite using NTP format timestamps, it is
+ not required that the clock be synchronized to an NTP source.
+
+3.1.2. Clock Source Signaling
+
+ [RFC7273] provides a mechanism to signal the clock source in the
+ Session Description Protocol (SDP) [RFC4566] both for the reference
+ clock as well as the media clock, thus allowing a synchronization
+ context to be defined beyond the one defined by the usage of CNAME
+ source descriptions.
+
+3.1.3. Implicitly via RtcMediaStream
+
+ WebRTC defines RtcMediaStream with one or more RtcMediaStreamTracks.
+ All tracks in a RtcMediaStream are intended to be synchronized when
+ rendered, implying that they must be generated such that
+ synchronization is possible.
+
+3.1.4. Explicitly via SDP Mechanisms
+
+ The SDP Grouping Framework [RFC5888] defines an "m=" line
+ (Section 4.2) grouping mechanism called Lip Synchronization (with LS
+ identification-tag) for establishing the synchronization requirement
+ across "m=" lines when they map to individual sources.
+
+
+
+
+
+
+Lennox, et al. Informational [Page 26]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ Source-Specific Media Attributes in SDP [RFC5576] extends the above
+ mechanism when multiple media sources are described by a single "m="
+ line.
+
+3.2. Endpoint
+
+ Some applications require knowledge of what media sources originate
+ from a particular endpoint (Section 2.2.1). This can include such
+ decisions as packet routing between parts of the topology, knowing
+ the endpoint origin of the RTP streams.
+
+ In RTP, this identification has been overloaded with the
+ synchronization context (Section 3.1) through the usage of the RTCP
+ source description CNAME (Section 3.1.1). This works for some
+ usages, but in others it breaks down. For example, if an endpoint
+ has two sets of media sources that have different synchronization
+ contexts, like the audio and video of the human participant as well
+ as a set of media sources of audio and video for a shared movie,
+ CNAME would not be an appropriate identification for that endpoint.
+ Therefore, an endpoint may have multiple CNAMEs. The CNAMEs or the
+ media sources themselves can be related to the endpoint.
+
+3.3. Participant
+
+ In communication scenarios, information about which media sources
+ originate from which participant (Section 2.2.3) is commonly needed.
+ One reason is, for example, to enable the application to correctly
+ display participant identity information associated with the media
+ sources. This association is handled through signaling to point at a
+ specific multimedia session where the media sources may be explicitly
+ or implicitly tied to a particular endpoint.
+
+ Participant information becomes more problematic when there are media
+ sources that are generated through mixing or other conceptual
+ processing of raw streams or source streams that originate from
+ different participants. These types of media sources can thus have a
+ dynamically varying set of origins and participants. RTP contains
+ the concept of CSRC that carries information about the previous step
+ origin of the included media content on the RTP level.
+
+3.4. RtcMediaStream
+
+ An RtcMediaStream in WebRTC is an explicit grouping of a set of media
+ sources (RtcMediaStreamTracks) that share a common identifier and a
+ single synchronization context (Section 3.1).
+
+
+
+
+
+
+Lennox, et al. Informational [Page 27]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+3.5. Multi-Channel Audio
+
+ There exist a number of RTP payload formats that can carry multi-
+ channel audio, despite the codec being a single-channel (mono)
+ encoder. Multi-channel audio can be viewed as multiple media sources
+ sharing a common synchronization context. These are independently
+ encoded by a media encoder and the different encoded streams are
+ packetized together in a time-synchronized way into a single source
+ RTP stream, using the used codec's RTP payload format. Examples of
+ codecs that support multi-channel audio are PCMA and PCMU [RFC3551],
+ Adaptive Multi Rate (AMR) [RFC4867], and G.719 [RFC5404].
+
+3.6. Simulcast
+
+ A media source represented as multiple independent encoded streams
+ constitutes a simulcast [SDP-SIMULCAST] or Modification Detection
+ Code (MDC) of that media source. Figure 8 shows an example of a
+ media source that is encoded into three separate simulcast streams,
+ that are in turn sent on the same media transport flow. When using
+ simulcast, the RTP streams may be sharing an RTP session and media
+ transport, or be separated on different RTP sessions and media
+ transports, or be any combination of these two. One major reason to
+ use separate media transports is to make use of different quality of
+ service (QoS) for the different source RTP streams. Some
+ considerations on separating related RTP streams are discussed in
+ Section 3.12.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 28]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ +----------------+
+ | Media Source |
+ +----------------+
+ Source Stream |
+ +----------------------+----------------------+
+ | | |
+ V V V
+ +------------------+ +------------------+ +------------------+
+ | Media Encoder | | Media Encoder | | Media Encoder |
+ +------------------+ +------------------+ +------------------+
+ | Encoded | Encoded | Encoded
+ | Stream | Stream | Stream
+ V V V
+ +------------------+ +------------------+ +------------------+
+ | Media Packetizer | | Media Packetizer | | Media Packetizer |
+ +------------------+ +------------------+ +------------------+
+ | Source | Source | Source
+ | RTP | RTP | RTP
+ | Stream | Stream | Stream
+ +-----------------+ | +-----------------+
+ | | |
+ V V V
+ +-------------------+
+ | Media Transport |
+ +-------------------+
+
+ Figure 8: Example of Media Source Simulcast
+
+ The simulcast relation between the RTP streams is the common media
+ source. In addition, to be able to identify the common media source,
+ a receiver of the RTP stream may need to know which configuration or
+ encoding goals lay behind the produced encoded stream and its
+ properties. This enables selection of the stream that is most useful
+ in the application at that moment.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 29]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+3.7. Layered Multi-Stream
+
+ Layered Multi-Stream (LMS) is a mechanism by which different portions
+ of a layered or scalable encoding of a source stream are sent using
+ separate RTP streams (sometimes in separate RTP sessions). LMSs are
+ useful for receiver control of layered media.
+
+ A media source represented as an encoded stream and multiple
+ dependent streams constitutes a media source that has layered
+ dependencies. Figure 9 represents an example of a media source that
+ is encoded into three dependent layers, where two layers are sent on
+ the same media transport using different RTP streams, i.e., SSRCs,
+ and the third layer is sent on a separate media transport.
+
+ +----------------+
+ | Media Source |
+ +----------------+
+ |
+ |
+ V
+ +---------------------------------------------------------+
+ | Media Encoder |
+ +---------------------------------------------------------+
+ | | |
+ Encoded Stream Dependent Stream Dependent Stream
+ | | |
+ V V V
+ +----------------+ +----------------+ +----------------+
+ |Media Packetizer| |Media Packetizer| |Media Packetizer|
+ +----------------+ +----------------+ +----------------+
+ | | |
+ RTP Stream RTP Stream RTP Stream
+ | | |
+ +------+ +------+ |
+ | | |
+ V V V
+ +-----------------+ +-----------------+
+ | Media Transport | | Media Transport |
+ +-----------------+ +-----------------+
+
+ Figure 9: Example of Media Source Layered Dependency
+
+ It is sometimes useful to make a distinction between using a single
+ media transport or multiple separate media transports when (in both
+ cases) using multiple RTP streams to carry encoded streams and
+ dependent streams for a media source. Therefore, the following new
+ terminology is defined here:
+
+
+
+
+Lennox, et al. Informational [Page 30]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ SRST: Single RTP stream on a Single media Transport
+
+ MRST: Multiple RTP streams on a Single media Transport
+
+ MRMT: Multiple RTP streams on Multiple media Transports
+
+ MRST and MRMT relations need to identify the common media encoder
+ origin for the encoded and dependent streams. When using different
+ RTP sessions (MRMT), a single RTP stream per media encoder, and a
+ single media source in each RTP session, common SSRCs and CNAMEs can
+ be used to identify the common media source. When multiple RTP
+ streams are sent from one media encoder in the same RTP session
+ (MRST), then CNAME is the only currently specified RTP identifier
+ that can be used. In cases where multiple media encoders use
+ multiple media sources sharing synchronization context, and thus have
+ a common CNAME, additional heuristics or identification need to be
+ applied to create the MRST or MRMT relationships between the RTP
+ streams.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 31]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+3.8. RTP Stream Duplication
+
+ RTP Stream Duplication [RFC7198], using the same or different media
+ transports, and optionally also delaying the duplicate [RFC7197],
+ offers a simple way to protect media flows from packet loss in some
+ cases (see Figure 10). This is a specific type of redundancy. All
+ but one source RTP stream (Section 2.1.10) are effectively redundancy
+ RTP streams (Section 2.1.12), but since both source and redundant RTP
+ streams are the same, it does not matter which one is which. This
+ can also be seen as a specific type of simulcast (Section 3.6) that
+ transmits the same encoded stream (Section 2.1.7) multiple times.
+
+ +----------------+
+ | Media Source |
+ +----------------+
+ Source Stream |
+ V
+ +----------------+
+ | Media Encoder |
+ +----------------+
+ Encoded Stream |
+ +-----------+-----------+
+ | |
+ V V
+ +------------------+ +------------------+
+ | Media Packetizer | | Media Packetizer |
+ +------------------+ +------------------+
+ Source | RTP Stream Source | RTP Stream
+ | V
+ | +-------------+
+ | | Delay (opt) |
+ | +-------------+
+ | |
+ +-----------+-----------+
+ |
+ V
+ +-------------------+
+ | Media Transport |
+ +-------------------+
+
+ Figure 10: Example of RTP Stream Duplication
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 32]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+3.9. Redundancy Format
+
+ "RTP Payload for Redundant Audio Data" [RFC2198] defines a transport
+ for redundant audio data together with primary data in the same RTP
+ payload. The redundant data can be a time-delayed version of the
+ primary or another time-delayed encoded stream using a different
+ media encoder to encode the same media source as the primary, as
+ depicted in Figure 11.
+
+ +--------------------+
+ | Media Source |
+ +--------------------+
+ |
+ Source Stream
+ |
+ +------------------------+
+ | |
+ V V
+ +--------------------+ +--------------------+
+ | Media Encoder | | Media Encoder |
+ +--------------------+ +--------------------+
+ | |
+ | +------------+
+ Encoded Stream | Time Delay |
+ | +------------+
+ | |
+ | +------------------+
+ V V
+ +--------------------+
+ | Media Packetizer |
+ +--------------------+
+ |
+ V
+ RTP Stream
+
+ Figure 11: Concept for Usage of Audio Redundancy with Different Media
+ Encoders
+
+ The redundancy format is thus providing the necessary meta
+ information to correctly relate different parts of the same encoded
+ stream. The case depicted above (Figure 11) relates the received
+ source stream fragments coming out of different media decoders, to be
+ able to combine them together into a less erroneous source stream.
+
+3.10. RTP Retransmission
+
+ Figure 12 shows an example where a media source's source RTP stream
+ is protected by a retransmission (RTX) flow [RFC4588]. In this
+
+
+
+Lennox, et al. Informational [Page 33]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ example, the source RTP stream and the redundancy RTP stream share
+ the same media transport.
+
+ +--------------------+
+ | Media Source |
+ +--------------------+
+ |
+ V
+ +--------------------+
+ | Media Encoder |
+ +--------------------+
+ | Retransmission
+ Encoded Stream +--------+ +---- Request
+ V | V V
+ +--------------------+ | +--------------------+
+ | Media Packetizer | | | RTP Retransmission |
+ +--------------------+ | +--------------------+
+ | | |
+ +------------+ Redundancy RTP Stream
+ Source RTP Stream |
+ | |
+ +---------+ +---------+
+ | |
+ V V
+ +-----------------+
+ | Media Transport |
+ +-----------------+
+
+ Figure 12: Example of Media Source Retransmission Flows
+
+ The RTP retransmission example (Figure 12) illustrates that this
+ mechanism works purely on the source RTP stream. The RTP
+ retransmission transforms buffers from the sent source RTP stream
+ and, upon request, emits a retransmitted packet with an extra payload
+ header as a redundancy RTP stream. The RTP retransmission mechanism
+ [RFC4588] is specified such that there is a one-to-one relation
+ between the source RTP stream and the redundancy RTP stream.
+ Therefore, a redundancy RTP stream needs to be associated with its
+ source RTP stream. This is done based on CNAME selectors and
+ heuristics to match requested packets for a given source RTP stream
+ with the original sequence number in the payload of any new
+ redundancy RTP stream using the RTX payload format. In cases where
+ the redundancy RTP stream is sent in a different RTP session than the
+ source RTP stream, the RTP session relation is signaled by using the
+ SDP media grouping's [RFC5888] Flow Identification (FID
+ identification-tag) semantics.
+
+
+
+
+
+Lennox, et al. Informational [Page 34]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+3.11. Forward Error Correction
+
+ Figure 13 shows an example where two media sources' source RTP
+ streams are protected by FEC. Source RTP stream A has an RTP-based
+ redundancy transformation in FEC encoder 1. This produces a
+ redundancy RTP stream 1, that is only related to source RTP stream A.
+ The FEC encoder 2, however, takes two source RTP streams (A and B)
+ and produces a redundancy RTP stream 2 that protects them jointly,
+ i.e., redundancy RTP stream 2 relates to two source RTP streams (a
+ FEC group). FEC decoding, when needed due to packet loss or packet
+ corruption at the receiver, requires knowledge about which source RTP
+ streams that the FEC encoding was based on.
+
+ In Figure 13, all RTP streams are sent on the same media transport.
+ This is, however, not the only possible choice. Numerous
+ combinations exist for spreading these RTP streams over different
+ media transports to achieve the communication application's goal.
+
+ +--------------------+ +--------------------+
+ | Media Source A | | Media Source B |
+ +--------------------+ +--------------------+
+ | |
+ V V
+ +--------------------+ +--------------------+
+ | Media Encoder A | | Media Encoder B |
+ +--------------------+ +--------------------+
+ | |
+ Encoded Stream Encoded Stream
+ V V
+ +--------------------+ +--------------------+
+ | Media Packetizer A | | Media Packetizer B |
+ +--------------------+ +--------------------+
+ | |
+ Source RTP Stream A Source RTP Stream B
+ | |
+ +-----+---------+-------------+ +---+---+
+ | V V V |
+ | +---------------+ +---------------+ |
+ | | FEC Encoder 1 | | FEC Encoder 2 | |
+ | +---------------+ +---------------+ |
+ | Redundancy | Redundancy | |
+ | RTP Stream 1 | RTP Stream 2 | |
+ V V V V
+ +----------------------------------------------------------+
+ | Media Transport |
+ +----------------------------------------------------------+
+
+ Figure 13: Example of FEC Redundancy RTP Streams
+
+
+
+Lennox, et al. Informational [Page 35]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ As FEC encoding exists in various forms, the methods for relating FEC
+ redundancy RTP streams with its source information in source RTP
+ streams are many. The XOR-based RTP FEC payload format [RFC5109] is
+ defined in such a way that a redundancy RTP stream has a one-to-one
+ relation with a source RTP stream. In fact, the RFC requires the
+ redundancy RTP stream to use the same SSRC as the source RTP stream.
+ This requires the use of either a separate RTP session or the
+ redundancy RTP payload format [RFC2198]. The underlying relation
+ requirement for this FEC format and a particular redundancy RTP
+ stream is to know the related source RTP stream, including its SSRC.
+
+3.12. RTP Stream Separation
+
+ RTP streams can be separated exclusively based on their SSRCs, at the
+ RTP session level, or at the multimedia session level.
+
+ When the RTP streams that have a relationship are all sent in the
+ same RTP session and are uniquely identified based on their SSRC
+ only, it is termed an "SSRC-only-based separation". Such streams can
+ be related via RTCP CNAME to identify that the streams belong to the
+ same endpoint. SSRC-based approaches [RFC5576], when used, can
+ explicitly relate various such RTP streams.
+
+ On the other hand, when RTP streams that are related are sent in the
+ context of different RTP sessions to achieve separation, it is known
+ as "RTP session-based separation". This is commonly used when the
+ different RTP streams are intended for different media transports.
+
+ Several mechanisms that use RTP session-based separation rely on it
+ as a grouping mechanism expressing the relationship. The solutions
+ have been based on using the same SSRC value in the different RTP
+ sessions to implicitly indicate their relation. That way, no
+ explicit RTP level mechanism has been needed; only signaling level
+ relations have been established using semantics from the media-line
+ grouping framework [RFC5888]. Examples of this are RTP
+ retransmission [RFC4588], SVC Multi-Session Transmission [RFC6190],
+ and XOR-based FEC [RFC5109]. RTCP CNAME explicitly relates RTP
+ streams across different RTP sessions, as explained in the previous
+ section. Such a relationship can be used to perform inter-media
+ synchronization.
+
+ RTP streams that are related and need to be associated can be part of
+ different multimedia sessions, rather than just different RTP
+ sessions within the same multimedia session context. This puts
+ further demand on the scope of the mechanism(s) and its handling of
+ identifiers used for expressing the relationships.
+
+
+
+
+
+Lennox, et al. Informational [Page 36]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+3.13. Multiple RTP Sessions over one Media Transport
+
+ [TRANSPORT-MULTIPLEX] describes a mechanism that allows several RTP
+ sessions to be carried over a single underlying media transport. The
+ main reasons for doing this are related to the impact of using one or
+ more media transports (using a common network path or potentially
+ having different ones). The fewer media transports used, the less
+ need for NAT/firewall traversal resources and smaller number of flow-
+ based QoS.
+
+ However, multiple RTP sessions over one media transport imply that a
+ single media transport 5-tuple is not sufficient to express in which
+ RTP session context a particular RTP stream exists. Complexities in
+ the relationship between media transports and RTP sessions already
+ exist as one RTP session contains multiple media transports, e.g.,
+ even a Peer-to-Peer RTP Session with RTP/RTCP Multiplexing requires
+ two media transports, one in each direction. The relationship
+ between media transports and RTP sessions as well as additional
+ levels of identifiers needs to be considered in both signaling design
+ and when defining terminology.
+
+4. Mapping from Existing Terms
+
+ This section describes a selected set of terms from some relevant
+ RFCs and Internet-Drafts (at the time of writing), using the concepts
+ from previous sections.
+
+4.1. Telepresence Terms
+
+ The terms in this subsection are used in the context of CLUE
+ [CLUE-FRAME]. Note that some terms listed in this subsection use the
+ same names as terms defined elsewhere in this document. Unless
+ explicitly stated (as "RTP Taxonomy") and in this subsection, they
+ are to be read as references to the CLUE-specific term within this
+ subsection.
+
+4.1.1. Audio Capture
+
+ Defined in CLUE as a Media Capture (Section 4.1.7) for audio.
+ Describes an audio media source (Section 2.1.4).
+
+4.1.2. Capture Device
+
+ Defined in CLUE as a device that converts physical input into an
+ electrical signal. Identifies a physical entity performing an RTP
+ Taxonomy media capture (Section 2.1.2) transformation.
+
+
+
+
+
+Lennox, et al. Informational [Page 37]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+4.1.3. Capture Encoding
+
+ Defined in CLUE as a specific Encoding (Section 4.1.6) of a Media
+ Capture (Section 4.1.7). Describes an encoded stream (Section 2.1.7)
+ related to CLUE-specific semantic information.
+
+4.1.4. Capture Scene
+
+ Defined in CLUE as a structure representing a spatial region captured
+ by one or more Capture Devices (Section 4.1.2), each capturing media
+ representing a portion of the region. Describes a set of spatially
+ related media sources (Section 2.1.4).
+
+4.1.5. Endpoint
+
+ Defined in CLUE as a CLUE-capable device that is the logical point of
+ final termination through receiving, decoding, and rendering and/or
+ initiation through capturing, encoding, and sending of media Streams
+ (Section 4.1.10). CLUE further defines it to consist of one or more
+ physical devices with source and sink media streams, and exactly one
+ participant [RFC4353]. Describes exactly one participant
+ (Section 2.2.3) and one or more RTP Taxonomy endpoints
+ (Section 2.2.1).
+
+4.1.6. Individual Encoding
+
+ Defined in CLUE as a set of parameters representing a way to encode a
+ Media Capture (Section 4.1.7) to become a Capture Encoding
+ (Section 4.1.3). Describes the configuration information needed to
+ perform a media encoder (Section 2.1.6) transformation.
+
+4.1.7. Media Capture
+
+ Defined in CLUE as a source of media, such as from one or more
+ Capture Devices (Section 4.1.2) or constructed from other media
+ Streams (Section 4.1.10). Describes either an RTP Taxonomy media
+ capture (Section 2.1.2) or a media source (Section 2.1.4), depending
+ on in which context the term is used.
+
+4.1.8. Media Consumer
+
+ Defined in CLUE as a CLUE-capable device that intends to receive
+ Capture Encodings (Section 4.1.3). Describes the media receiving
+ part of an RTP Taxonomy endpoint (Section 2.2.1).
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 38]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+4.1.9. Media Provider
+
+ Defined in CLUE as a CLUE-capable device that intends to send Capture
+ Encodings (Section 4.1.3). Describes the media sending part of an
+ RTP Taxonomy endpoint (Section 2.2.1).
+
+4.1.10. Stream
+
+ Defined in CLUE as a Capture Encoding (Section 4.1.3) sent from a
+ Media Provider (Section 4.1.9) to a Media Consumer (Section 4.1.8)
+ via RTP. Describes an RTP stream (Section 2.1.10).
+
+4.1.11. Video Capture
+
+ Defined in CLUE as a Media Capture (Section 4.1.7) for video.
+ Describes a video media source (Section 2.1.4).
+
+4.2. Media Description
+
+ A single Session Description Protocol (SDP) [RFC4566] Media
+ Description (or media block; an "m=" line and all subsequent lines
+ until the next "m=" line or the end of the SDP) describes part of the
+ necessary configuration and identification information needed for a
+ media encoder transformation, as well as the necessary configuration
+ and identification information for the media decoder to be able to
+ correctly interpret a received RTP stream.
+
+ A media description typically relates to a single media source. This
+ is, for example, an explicit restriction in WebRTC. However, nothing
+ prevents that the same media description (and same RTP session) is
+ reused for multiple media sources [RTP-MULTI-STREAM]. It can thus
+ describe properties of one or more RTP streams, and can also describe
+ properties valid for an entire RTP session (via [RFC5576] mechanisms,
+ for example).
+
+4.3. Media Stream
+
+ RTP [RFC3550] uses media stream, audio stream, video stream, and a
+ stream of (RTP) packets interchangeably, which are all RTP streams.
+
+4.4. Multimedia Conference
+
+ A Multimedia Conference is a communication session (Section 2.2.5)
+ between two or more participants (Section 2.2.3), along with the
+ software they are using to communicate.
+
+
+
+
+
+
+Lennox, et al. Informational [Page 39]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+4.5. Multimedia Session
+
+ SDP [RFC4566] defines a multimedia session as a set of multimedia
+ senders and receivers and the data streams flowing from senders to
+ receivers, which would correspond to a set of endpoints and the RTP
+ streams that flow between them. In this document, multimedia session
+ (Section 2.2.4) also assumes those endpoints belong to a set of
+ participants that are engaged in communication via a set of related
+ RTP streams.
+
+ RTP [RFC3550] defines a multimedia session as a set of concurrent RTP
+ sessions among a common group of participants. For example, a video
+ conference may contain an audio RTP session and a video RTP session.
+ This would correspond to a group of participants (each using one or
+ more endpoints) sharing a set of concurrent RTP sessions. In this
+ document, multimedia session also defines those RTP sessions to have
+ some relation and be part of a communication among the participants.
+
+4.6. Multipoint Control Unit (MCU)
+
+ This term is commonly used to describe the central node in any type
+ of star topology [RTP-TOPOLOGIES] conference. It describes a device
+ that includes one participant (Section 2.2.3) (usually corresponding
+ to a so-called conference focus) and one or more related endpoints
+ (Section 2.2.1) (sometimes one or more per conference participant).
+
+4.7. Multi-Session Transmission (MST)
+
+ One of two transmission modes defined in H.264-based SVC [RFC6190],
+ the other mode being a Single-Session Transmission (SST)
+ (Section 4.14). In Multi-Session Transmission (MST), the SVC media
+ encoder sends encoded streams and dependent streams distributed
+ across two or more RTP streams in one or more RTP sessions. The term
+ "MST" is ambiguous in RFC 6190, especially since the name indicates
+ the use of multiple "sessions", while MST-type packetization is in
+ fact required whenever two or more RTP streams are used for the
+ encoded and dependent streams, regardless if those are sent in one or
+ more RTP sessions. Corresponds either to MRST or MRMT (Section 3.7)
+ stream relations defined in this document. The SVC RTP payload RFC
+ [RFC6190] is not particularly explicit about how the common media
+ encoder (Section 2.1.6) relation between encoded streams
+ (Section 2.1.7) and dependent streams (Section 2.1.8) is to be
+ implemented.
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 40]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+4.8. Recording Device
+
+ WebRTC specifications use this term to refer to locally available
+ entities performing a media capture (Section 2.1.2) transformation.
+
+4.9. RtcMediaStream
+
+ A WebRTC RtcMediaStream is a set of media sources (Section 2.1.4)
+ sharing the same synchronization context (Section 3.1).
+
+4.10. RtcMediaStreamTrack
+
+ A WebRTC RtcMediaStreamTrack is a media source (Section 2.1.4).
+
+4.11. RTP Receiver
+
+ RTP [RFC3550] uses this term, which can be seen as the RTP protocol
+ part of a media depacketizer (Section 2.1.27).
+
+4.12. RTP Sender
+
+ RTP [RFC3550] uses this term, which can be seen as the RTP protocol
+ part of a media packetizer (Section 2.1.9).
+
+4.13. RTP Session
+
+ Within the context of SDP, a singe "m=" line can map to a single RTP
+ session (Section 2.2.2), or multiple "m=" lines can map to a single
+ RTP session. The latter is enabled via multiplexing schemes such as
+ BUNDLE [SDP-BUNDLE], for example, which allows mapping of multiple
+ "m=" lines to a single RTP session.
+
+4.14. Single-Session Transmission (SST)
+
+ One of two transmission modes defined in H.264-based SVC [RFC6190],
+ the other mode being MST (Section 4.7). In SST, the SVC media
+ encoder sends encoded streams (Section 2.1.7) and dependent streams
+ (Section 2.1.8) combined into a single RTP stream (Section 2.1.10) in
+ a single RTP session (Section 2.2.2), using the SVC RTP payload
+ format. The term "SST" is ambiguous in RFC 6190, in that it
+ sometimes refers to the use of a single RTP stream, like in sections
+ relating to packetization, and sometimes appears to refer to use of a
+ single RTP session, like in the context of discussing SDP. Closely
+ corresponds to SRST (Section 3.7) defined in this document.
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 41]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+4.15. SSRC
+
+ RTP [RFC3550] defines this as "the source of a stream of RTP
+ packets", which indicates that an SSRC is not only a unique
+ identifier for the encoded stream (Section 2.1.7) carried in those
+ packets but is also effectively used as a term to denote a media
+ packetizer (Section 2.1.9). In [RFC3550], it is stated that "a
+ synchronization source may change its data format, e.g., audio
+ encoding, over time". The related encoded stream data format in an
+ RTP stream (Section 2.1.10) is identified by the RTP payload type.
+ Changing the data format for an encoded stream effectively also
+ changes what media encoder (Section 2.1.6) is used for the encoded
+ stream. No ambiguity is introduced to SSRC as an encoded stream
+ identifier by allowing RTP payload type changes, as long as only a
+ single RTP payload type is valid for any given RTP Timestamp. This
+ is aligned with and further described by Section 5.2 of [RFC3550].
+
+5. Security Considerations
+
+ The purpose of this document is to make clarifications and reduce the
+ confusion prevalent in RTP taxonomy because of inconsistent usage by
+ multiple technologies and protocols making use of the RTP protocol.
+ It does not introduce any new security considerations beyond those
+ already well documented in the RTP protocol [RFC3550] and each of the
+ many respective specifications of the various protocols making use of
+ it.
+
+ Having a well-defined common terminology and understanding of the
+ complexities of the RTP architecture will help lead us to better
+ standards, avoiding security problems.
+
+6. Informative References
+
+ [CLUE-FRAME]
+ Duckworth, M., Pepperell, A., and S. Wenger, "Framework
+ for Telepresence Multi-Streams", Work in Progress,
+ draft-ietf-clue-framework-22, April 2015.
+
+ [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
+ Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
+ Parisis, "RTP Payload for Redundant Audio Data", RFC 2198,
+ DOI 10.17487/RFC2198, September 1997,
+ <http://www.rfc-editor.org/info/rfc2198>.
+
+ [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
+ Jacobson, "RTP: A Transport Protocol for Real-Time
+ Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
+ July 2003, <http://www.rfc-editor.org/info/rfc3550>.
+
+
+
+Lennox, et al. Informational [Page 42]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
+ Video Conferences with Minimal Control", STD 65, RFC 3551,
+ DOI 10.17487/RFC3551, July 2003,
+ <http://www.rfc-editor.org/info/rfc3551>.
+
+ [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+ Norrman, "The Secure Real-time Transport Protocol (SRTP)",
+ RFC 3711, DOI 10.17487/RFC3711, March 2004,
+ <http://www.rfc-editor.org/info/rfc3711>.
+
+ [RFC4353] Rosenberg, J., "A Framework for Conferencing with the
+ Session Initiation Protocol (SIP)", RFC 4353,
+ DOI 10.17487/RFC4353, February 2006,
+ <http://www.rfc-editor.org/info/rfc4353>.
+
+ [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
+ Description Protocol", RFC 4566, DOI 10.17487/RFC4566,
+ July 2006, <http://www.rfc-editor.org/info/rfc4566>.
+
+ [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
+ Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
+ DOI 10.17487/RFC4588, July 2006,
+ <http://www.rfc-editor.org/info/rfc4588>.
+
+ [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie,
+ "RTP Payload Format and File Storage Format for the
+ Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband
+ (AMR-WB) Audio Codecs", RFC 4867, DOI 10.17487/RFC4867,
+ April 2007, <http://www.rfc-editor.org/info/rfc4867>.
+
+ [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error
+ Correction", RFC 5109, DOI 10.17487/RFC5109, December
+ 2007, <http://www.rfc-editor.org/info/rfc5109>.
+
+ [RFC5404] Westerlund, M. and I. Johansson, "RTP Payload Format for
+ G.719", RFC 5404, DOI 10.17487/RFC5404, January 2009,
+ <http://www.rfc-editor.org/info/rfc5404>.
+
+ [RFC5481] Morton, A. and B. Claise, "Packet Delay Variation
+ Applicability Statement", RFC 5481, DOI 10.17487/RFC5481,
+ March 2009, <http://www.rfc-editor.org/info/rfc5481>.
+
+ [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific
+ Media Attributes in the Session Description Protocol
+ (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009,
+ <http://www.rfc-editor.org/info/rfc5576>.
+
+
+
+
+
+Lennox, et al. Informational [Page 43]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description
+ Protocol (SDP) Grouping Framework", RFC 5888,
+ DOI 10.17487/RFC5888, June 2010,
+ <http://www.rfc-editor.org/info/rfc5888>.
+
+ [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch,
+ "Network Time Protocol Version 4: Protocol and Algorithms
+ Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010,
+ <http://www.rfc-editor.org/info/rfc5905>.
+
+ [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
+ "RTP Payload Format for Scalable Video Coding", RFC 6190,
+ DOI 10.17487/RFC6190, May 2011,
+ <http://www.rfc-editor.org/info/rfc6190>.
+
+ [RFC7160] Petit-Huguenin, M. and G. Zorn, Ed., "Support for Multiple
+ Clock Rates in an RTP Session", RFC 7160,
+ DOI 10.17487/RFC7160, April 2014,
+ <http://www.rfc-editor.org/info/rfc7160>.
+
+ [RFC7197] Begen, A., Cai, Y., and H. Ou, "Duplication Delay
+ Attribute in the Session Description Protocol", RFC 7197,
+ DOI 10.17487/RFC7197, April 2014,
+ <http://www.rfc-editor.org/info/rfc7197>.
+
+ [RFC7198] Begen, A. and C. Perkins, "Duplicating RTP Streams",
+ RFC 7198, DOI 10.17487/RFC7198, April 2014,
+ <http://www.rfc-editor.org/info/rfc7198>.
+
+ [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP
+ Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
+ <http://www.rfc-editor.org/info/rfc7201>.
+
+ [RFC7273] Williams, A., Gross, K., van Brandenburg, R., and H.
+ Stokking, "RTP Clock Source Signalling", RFC 7273,
+ DOI 10.17487/RFC7273, June 2014,
+ <http://www.rfc-editor.org/info/rfc7273>.
+
+ [RTP-MULTI-STREAM]
+ Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
+ "Sending Multiple Media Streams in a Single RTP Session",
+ Work in Progress, draft-ietf-avtcore-rtp-multi-stream-08,
+ July 2015.
+
+ [RTP-TOPOLOGIES]
+ Westerlund, M. and S. Wenger, "RTP Topologies", Work in
+ Progress, draft-ietf-avtcore-rtp-topologies-update-10,
+ July 2015.
+
+
+
+Lennox, et al. Informational [Page 44]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+ [SDP-BUNDLE]
+ Holmberg, C., Alvestrand, H., and C. Jennings,
+ "Negotiating Media Multiplexing Using the Session
+ Description Protocol (SDP)", Work in Progress,
+ draft-ietf-mmusic-sdp-bundle-negotiation-23, July 2015.
+
+ [SDP-SIMULCAST]
+ Burman, B., Westerlund, M., Nandakumar, S., and M. Zanaty,
+ "Using Simulcast in SDP and RTP Sessions", Work in
+ Progress, draft-ietf-mmusic-sdp-simulcast-01, July 2015.
+
+ [TRANSPORT-MULTIPLEX]
+ Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP
+ Sessions onto a Single Lower-Layer Transport", Work in
+ Progress, draft-westerlund-avtcore-transport-multiplexing-
+ 07, October 2013.
+
+ [WEBRTC-OVERVIEW]
+ Alvestrand, H., "Overview: Real Time Protocols for
+ Browser-based Applications", Work in Progress,
+ draft-ietf-rtcweb-overview-14, June 2015.
+
+Acknowledgements
+
+ This document has many concepts borrowed from several documents such
+ as WebRTC [WEBRTC-OVERVIEW], CLUE [CLUE-FRAME], and Multiplexing
+ Architecture [TRANSPORT-MULTIPLEX]. The authors would like to thank
+ all the authors of each of those documents.
+
+ The authors would also like to acknowledge the insights, guidance,
+ and contributions of Magnus Westerlund, Roni Even, Paul Kyzivat,
+ Colin Perkins, Keith Drage, Harald Alvestrand, Alex Eleftheriadis, Mo
+ Zanaty, Stephan Wenger, and Bernard Aboba.
+
+Contributors
+
+ Magnus Westerlund has contributed the concept model for the media
+ chain using transformations and streams model, including rewriting
+ pre-existing concepts into this model and adding missing concepts.
+ The first proposal for updating the relationships and the topologies
+ based on this concept was also performed by Magnus.
+
+
+
+
+
+
+
+
+
+
+Lennox, et al. Informational [Page 45]
+
+RFC 7656 RTP Taxonomy November 2015
+
+
+Authors' Addresses
+
+ Jonathan Lennox
+ Vidyo, Inc.
+ 433 Hackensack Avenue
+ Seventh Floor
+ Hackensack, NJ 07601
+ United States
+
+ Email: jonathan@vidyo.com
+
+
+ Kevin Gross
+ AVA Networks, LLC
+ Boulder, CO
+ United States
+
+ Email: kevin.gross@avanw.com
+
+
+ Suhas Nandakumar
+ Cisco Systems
+ 170 West Tasman Drive
+ San Jose, CA 95134
+ United States
+
+ Email: snandaku@cisco.com
+
+
+ Gonzalo Salgueiro
+ Cisco Systems
+ 7200-12 Kit Creek Road
+ Research Triangle Park, NC 27709
+ United States
+
+ Email: gsalguei@cisco.com
+
+
+ Bo Burman (editor)
+ Ericsson
+ Kistavagen 25
+ SE-16480 Stockholm
+ Sweden
+
+ Email: bo.burman@ericsson.com
+
+
+
+
+
+
+Lennox, et al. Informational [Page 46]
+