summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc9071.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc9071.txt')
-rw-r--r--doc/rfc/rfc9071.txt1923
1 files changed, 1923 insertions, 0 deletions
diff --git a/doc/rfc/rfc9071.txt b/doc/rfc/rfc9071.txt
new file mode 100644
index 0000000..9166840
--- /dev/null
+++ b/doc/rfc/rfc9071.txt
@@ -0,0 +1,1923 @@
+
+
+
+
+Internet Engineering Task Force (IETF) G. Hellström
+Request for Comments: 9071 GHAccess
+Updates: 4103 July 2021
+Category: Standards Track
+ISSN: 2070-1721
+
+
+ RTP-Mixer Formatting of Multiparty Real-Time Text
+
+Abstract
+
+ This document provides enhancements of real-time text (as specified
+ in RFC 4103) suitable for mixing in a centralized conference model,
+ enabling source identification and rapidly interleaved transmission
+ of text from different sources. The intended use is for real-time
+ text mixers and participant endpoints capable of providing an
+ efficient presentation or other treatment of a multiparty real-time
+ text session. The specified mechanism builds on the standard use of
+ the Contributing Source (CSRC) list in the Real-time Transport
+ Protocol (RTP) packet for source identification. The method makes
+ use of the same "text/t140" and "text/red" formats as for two-party
+ sessions.
+
+ Solutions using multiple RTP streams in the same RTP session are
+ briefly mentioned, as they could have some benefits over the RTP-
+ mixer model. The RTP-mixer model was selected to be used for the
+ fully specified solution in this document because it can be applied
+ to a wide range of existing RTP implementations.
+
+ A capability exchange is specified so that it can be verified that a
+ mixer and a participant can handle the multiparty-coded real-time
+ text stream using the RTP-mixer method. The capability is indicated
+ by the use of a Session Description Protocol (SDP) (RFC 8866) media
+ attribute, "rtt-mixer".
+
+ This document updates RFC 4103 ("RTP Payload for Text Conversation").
+
+ A specification for how a mixer can format text for the case when the
+ endpoint is not multiparty aware is also provided.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ https://www.rfc-editor.org/info/rfc9071.
+
+Copyright Notice
+
+ Copyright (c) 2021 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (https://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction
+ 1.1. Terminology
+ 1.2. Main Method, Fallback Method, and Considered Alternatives
+ 1.3. Intended Application
+ 2. Overview of the Two Specified Solutions and Selection of Method
+ 2.1. The RTP-Mixer-Based Solution for Multiparty-Aware Endpoints
+ 2.2. Mixing for Multiparty-Unaware Endpoints
+ 2.3. Offer/Answer Considerations
+ 2.4. Actions Depending on Capability Negotiation Result
+ 3. Details for the RTP-Mixer-Based Mixing Method for
+ Multiparty-Aware Endpoints
+ 3.1. Use of Fields in the RTP Packets
+ 3.2. Initial Transmission of a BOM Character
+ 3.3. Keep-Alive
+ 3.4. Transmission Interval
+ 3.5. Only One Source per Packet
+ 3.6. Do Not Send Received Text to the Originating Source
+ 3.7. Clean Incoming Text
+ 3.8. Principles of Redundant Transmission
+ 3.9. Text Placement in Packets
+ 3.10. Empty T140blocks
+ 3.11. Creation of the Redundancy
+ 3.12. Timer Offset Fields
+ 3.13. Other RTP Header Fields
+ 3.14. Pause in Transmission
+ 3.15. RTCP Considerations
+ 3.16. Reception of Multiparty Contents
+ 3.17. Performance Considerations
+ 3.18. Security for Session Control and Media
+ 3.19. SDP Offer/Answer Examples
+ 3.20. Packet Sequence Example from Interleaved Transmission
+ 3.21. Maximum Character Rate "cps" Setting
+ 4. Presentation-Level Considerations
+ 4.1. Presentation by Multiparty-Aware Endpoints
+ 4.2. Multiparty Mixing for Multiparty-Unaware Endpoints
+ 5. Relationship to Conference Control
+ 5.1. Use with SIP Centralized Conferencing Framework
+ 5.2. Conference Control
+ 6. Gateway Considerations
+ 6.1. Gateway Considerations with Textphones
+ 6.2. Gateway Considerations with WebRTC
+ 7. Updates to RFC 4103
+ 8. Congestion Considerations
+ 9. IANA Considerations
+ 9.1. Registration of the "rtt-mixer" SDP Media Attribute
+ 10. Security Considerations
+ 11. References
+ 11.1. Normative References
+ 11.2. Informative References
+ Acknowledgements
+ Author's Address
+
+1. Introduction
+
+ "RTP Payload for Text Conversation" [RFC4103] specifies the use of
+ the Real-time Transport Protocol (RTP) [RFC3550] for transmission of
+ real-time text (often called RTT) and the "text/t140" format. It
+ also specifies a redundancy format, "text/red", for increased
+ robustness. The "text/red" format is registered in [RFC4102].
+
+ Real-time text is usually provided together with audio and sometimes
+ with video in conversational sessions.
+
+ A requirement related to multiparty sessions from the presentation-
+ level standard T.140 [T140] for real-time text is as follows:
+
+ | The display of text from the members of the conversation should be
+ | arranged so that the text from each participant is clearly
+ | readable, and its source and the relative timing of entered text
+ | is visualized in the display.
+
+ Another requirement is that the mixing procedure must not introduce
+ delays in the text streams that could be perceived as disruptive to
+ the real-time experience of the receiving users.
+
+ The use of real-time text is increasing, and specifically, use in
+ emergency calls is increasing. Emergency call use requires
+ multiparty mixing, because it is common that one agent needs to
+ transfer the call to another specialized agent but is obliged to stay
+ on the call to at least verify that the transfer was successful.
+ Mixer implementations for RFC 4103 ("RTP Payload for Text
+ Conversation") can use traditional RTP functions (RFC 3550) for
+ mixing and source identification, but the performance of the mixer
+ when giving turns for the different sources to transmit is limited
+ when using the default transmission characteristics with redundancy.
+
+ The redundancy scheme described in [RFC4103] enables efficient
+ transmission of earlier transmitted redundant text in packets
+ together with new text. However, the redundancy header format has no
+ source indicators for the redundant transmissions. The redundant
+ parts in a packet must therefore be from the same source as the new
+ text. The recommended transmission is one new and two redundant
+ generations of text (T140blocks) in each packet, and the recommended
+ transmission interval for two-party use is 300 ms.
+
+ Real-time text mixers for multiparty sessions need to include the
+ source with each transmitted group of text from a conference
+ participant so that the text can be transmitted interleaved with text
+ groups from different sources at the rate at which they are created.
+ This enables the text groups to be presented by endpoints in a
+ suitable grouping with other text from the same source.
+
+ The presentation can then be arranged so that text from different
+ sources can be presented in real time and easily read. At the same
+ time, it is possible for a reading user to perceive approximately
+ when the text was created in real time by the different parties. The
+ transmission and mixing are intended to be done in a general way, so
+ that presentation can be arranged in a layout decided upon by the
+ receiving endpoint.
+
+ Existing implementations of RFC 4103 in endpoints that do not
+ implement the updates specified in this document cannot be expected
+ to properly present real-time text mixed for multiparty-aware
+ endpoints.
+
+ A negotiation mechanism is therefore needed to verify if the parties
+ (1) are able to handle a common method for multiparty transmissions
+ and (2) can agree on using that method.
+
+ A fallback mixing procedure is also needed for cases when the
+ negotiation result indicates that a receiving endpoint is not capable
+ of handling the mixed format. Multiparty-unaware endpoints would
+ possibly otherwise present all received multiparty mixed text as if
+ it came from the same source regardless of any accompanying source
+ indication coded in fields in the packet. Or, they may have other
+ undesirable ways of acting on the multiparty content. The fallback
+ method is called the mixing procedure for multiparty-unaware
+ endpoints. The fallback method is naturally not expected to meet all
+ performance requirements placed on the mixing procedure for
+ multiparty-aware endpoints.
+
+ This document updates [RFC4103] by introducing an attribute for
+ declaring support of the RTP-mixer-based multiparty-mixing case and
+ rules for source indications and interleaving of text from different
+ sources.
+
+1.1. Terminology
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
+ capitals, as shown here.
+
+ The terms "Source Description" (SDES), "Canonical Name" (CNAME),
+ "Name" (NAME), "Synchronization Source" (SSRC), "Contributing Source"
+ (CSRC), "CSRC list", "CSRC count" (CC), "RTP Control Protocol"
+ (RTCP), and "RTP mixer" are defined in [RFC3550].
+
+ "real-time text" (RTT) is text transmitted instantly as it is typed
+ or created. Recipients can immediately read the message while it is
+ being written, without waiting.
+
+ The term "T140block" is defined in [RFC4103] to contain one or more
+ T.140 code elements.
+
+ "TTY" stands for a textphone type used in North America.
+
+ Web Real-Time Communication (WebRTC) is specified by the World Wide
+ Web Consortium (W3C) and the IETF. See [RFC8825].
+
+ "DTLS-SRTP" is a Datagram Transport Layer Security (DTLS) extension
+ for use with the Secure Real-time Transport Protocol / Secure Real-
+ time Transport Control Protocol (SRTP/SRTCP) as specified in
+ [RFC5764].
+
+ The term "multiparty aware" describes an endpoint that (1) receives
+ real-time text from multiple sources through a common conference
+ mixer, (2) is able to present the text in real time, separated by
+ source, and (3) presents the text so that a user can get an
+ impression of the approximate relative timing of text from different
+ parties.
+
+ The term "multiparty unaware" describes an endpoint that cannot
+ itself separate text from different sources when the text is received
+ through a common conference mixer.
+
+1.2. Main Method, Fallback Method, and Considered Alternatives
+
+ A number of alternatives were considered when searching for an
+ efficient and easily implemented multiparty method for real-time
+ text. This section briefly explains a few of them.
+
+ Multiple RTP streams, one per participant:
+ One RTP stream per source would be sent in the same RTP session
+ with the "text/red" format. From some points of view, the use of
+ multiple RTP streams, one for each source, sent in the same RTP
+ session would be efficient and would use exactly the same packet
+ format as [RFC4103] and the same payload type. A couple of
+ relevant scenarios using multiple RTP streams are specified in
+ "RTP Topologies" [RFC7667]. One possibility of special interest
+ is the Selective Forwarding Middlebox (SFM) topology specified in
+ Section 3.7 of [RFC7667], which could enable end-to-end
+ encryption. In contrast to audio and video, real-time text is
+ only transmitted when the users actually transmit information.
+ Thus, an SFM solution would not need to exclude any party from
+ transmission under normal conditions. In order to allow the mixer
+ to convey the packets with the payload preserved and encrypted, an
+ SFM solution would need to act on some specific characteristics of
+ the "text/red" format. The redundancy headers are part of the
+ payload, so the receiver would need to just assume that the
+ payload type number in the redundancy header is for "text/t140".
+ The characters per second ("cps") parameter would need to act per
+ stream. The relationship between the SSRC and the source would
+ need to be conveyed in some specified way, e.g., in the CSRC.
+ Recovery and loss detection would preferably be based on RTP
+ sequence number gap detection. Thus, sequence number gaps in the
+ incoming stream to the mixer would need to be reflected in the
+ stream to the participant, with no new gaps created by the mixer.
+ However, the RTP implementation in both mixers and endpoints needs
+ to support multiple streams in the same RTP session in order to
+ use this mechanism. To provide the best opportunities for
+ deployment, it should be possible to upgrade existing endpoint
+ solutions to be multiparty aware with a reasonable amount of
+ effort. There is currently a lack of support for multi-stream RTP
+ in certain implementations. This fact led to only brief mention
+ of this solution in this document as an option for further study.
+
+ RTP-mixer-based method for multiparty-aware endpoints:
+ The "text/red" format as defined in RFC 4102 and applied in RFC
+ 4103 is sent with the RTP-mixer method indicating the source in
+ the CSRC field. The "text/red" format with a "text/t140" payload
+ in a single RTP stream can be sent when text is available from the
+ call participants instead of at the regular 300 ms intervals.
+ Transmission of packets with text from different sources can then
+ be done smoothly while simultaneous transmission occurs as long as
+ it is not limited by the maximum character rate "cps" value. With
+ ten participants sending text simultaneously, the switching and
+ transmission performance is good. With more simultaneously
+ sending participants and with receivers at default capacity, there
+ will be a noticeable jerkiness and delay in text presentation.
+ The more participants who send text simultaneously, the more
+ jerkiness will occur. Two seconds of jerkiness will be noticeable
+ and slightly unpleasant, but it corresponds in time to what typing
+ humans often cause by hesitating or changing position while
+ typing. A benefit of this method is that no new packet format
+ needs to be introduced and implemented. Since simultaneous typing
+ by more than two parties is expected to be very rare -- as
+ described in Section 1.3 -- this method can be used successfully
+ with good performance. Recovery of text in the case of packet
+ loss is based on analysis of timestamps of received redundancy
+ versus earlier received text. Negotiation is based on a new SDP
+ media attribute, "rtt-mixer". This method was selected to be the
+ main method specified in this document.
+
+ Multiple sources per packet:
+ A new "text" media subtype would be specified with up to 15
+ sources in each packet. The mechanism would make use of the RTP-
+ mixer model specified in RTP [RFC3550]. The sources would be
+ indicated in strict order in the CSRC list of the RTP packets.
+ The CSRC list can have up to 15 members. Therefore, text from up
+ to 15 sources can be included in each packet. Packets are
+ normally sent at 300 ms intervals. The mean delay would be 150
+ ms. A new redundancy packet format would be specified. This
+ method would result in good performance but would require
+ standardization and implementation of new releases in the target
+ technologies; these would take more time than desirable to
+ complete. It was therefore not selected to be included in this
+ document.
+
+ Mixing for multiparty-unaware endpoints:
+ The presentation of text from multiple parties is prepared by the
+ mixer in one single stream. It is desirable to have a method that
+ does not require any modifications in existing user devices
+ implementing RFC 4103 for real-time text without explicit support
+ of multiparty sessions. This is made possible by having the mixer
+ insert a new line and a text-formatted source label before each
+ switch of text source in the stream. Switching the source can
+ only be done in places in the text where it does not disturb the
+ perception of the contents. Text from only one source at a time
+ can be presented in real time. The delay will therefore vary. In
+ calls where parties take turns properly by ending their entries
+ with a new line, the limitations will have limited influence on
+ the user experience. When only two parties send text, these two
+ will see the text in real time with no delay. Although this
+ method also has other limitations, it is included in this document
+ as a fallback method.
+
+ Real-time text transport in WebRTC:
+ [RFC8865] specifies how the WebRTC data channel can be used to
+ transport real-time text. That specification contains a section
+ briefly describing its use in multiparty sessions. The focus of
+ this document is RTP transport. Therefore, even if the WebRTC
+ transport provides good multiparty performance, it is only
+ mentioned in this document in relation to providing gateways with
+ multiparty capabilities between RTP and WebRTC technologies.
+
+1.3. Intended Application
+
+ The method for multiparty real-time text specified in this document
+ is primarily intended for use in transmissions between mixers and
+ endpoints in centralized mixing configurations. It is also
+ applicable between mixers. An often-mentioned application is for
+ emergency service calls with real-time text and voice, where a call
+ taker wants to make an attended handover of a call to another agent
+ and stay on the call to observe the session. Multimedia conference
+ sessions with support for participants to contribute with text is
+ another example. Conferences with central support for speech-to-text
+ conversion represent yet another example.
+
+ In all these applications, normally only one participant at a time
+ will send long text comments. In some cases, one other participant
+ will occasionally contribute with a longer comment simultaneously.
+ That may also happen in some rare cases when text is translated to
+ text in another language in a conference. Apart from these cases,
+ other participants are only expected to contribute with very brief
+ comments while others are sending text.
+
+ Users expect the text they send to be presented in real time in a
+ readable way to the other participants even if they send
+ simultaneously with other users and even when they make brief edit
+ operations of their text by backspacing and correcting their text.
+
+ Text is supposed to be human generated, by some means of text input,
+ such as typing on a keyboard or using speech-to-text technology.
+ Occasional small cut-and-paste operations may appear even if that is
+ not the initial purpose of real-time text.
+
+ The real-time characteristics of real-time text are essential for the
+ participants to be able to contribute to a conversation. If the text
+ is delayed too much between the typing of a character and its
+ presentation, then, in some conference situations, the opportunity to
+ comment will be gone and someone else will grab the turn. A delay of
+ more than one second in such situations is an obstacle to good
+ conversation.
+
+2. Overview of the Two Specified Solutions and Selection of Method
+
+ This section contains a brief introduction of the two methods
+ specified in this document.
+
+2.1. The RTP-Mixer-Based Solution for Multiparty-Aware Endpoints
+
+ This method specifies the negotiated use of the formats described in
+ RFC 4103, for multiparty transmissions in a single RTP stream. The
+ main purpose of this document is to specify a method for true
+ multiparty real-time text mixing for multiparty-aware endpoints that
+ can be widely deployed. The RTP-mixer-based method makes use of the
+ current format for real-time text as provided in [RFC4103]. This
+ method updates RFC 4103 by clarifying one way to use it in the
+ multiparty situation. That is done by completing a negotiation for
+ this kind of multiparty capability and by interleaving packets from
+ different sources. The source is indicated in the CSRC element in
+ the RTP packets. Specific considerations are made regarding the
+ ability to recover text after packet loss.
+
+ The detailed procedures for the RTP-mixer-based multiparty-aware case
+ are specified in Section 3.
+
+ Please refer to [RFC4103] when reading this document.
+
+2.2. Mixing for Multiparty-Unaware Endpoints
+
+ This document also specifies a method to be used in cases when the
+ endpoint participating in a multiparty call does not itself implement
+ any solution or does not implement the same solution as the mixer.
+ This method requires the mixer to insert text dividers and readable
+ labels and only send text from one source at a time until a suitable
+ point appears for changing the source. This solution is a fallback
+ method with functional limitations. It operates at the presentation
+ level.
+
+ A mixer SHOULD by default format and transmit text to a call
+ participant so that the text is suitable for presentation on a
+ multiparty-unaware endpoint that has not negotiated any method for
+ true multiparty real-time text handling but has negotiated a "text/
+ red" or "text/t140" format in a session. This SHOULD be done if
+ nothing else is specified for the application, in order to maintain
+ interoperability. Section 4.2 specifies how this mixing is done.
+
+2.3. Offer/Answer Considerations
+
+ "RTP Payload for Text Conversation" [RFC4103] specifies the use of
+ RTP [RFC3550] and a redundancy format ("text/red", as defined in
+ [RFC4102]) for increased robustness of real-time text transmission.
+ This document updates [RFC4103] by introducing a capability
+ negotiation for handling multiparty real-time text, a way to indicate
+ the source of transmitted text, and rules for efficient timing of the
+ transmissions interleaved from different sources.
+
+ The capability negotiation for the RTP-mixer-based multiparty method
+ is based on the use of the SDP media attribute "rtt-mixer".
+
+ The syntax is as follows:
+
+ a=rtt-mixer
+
+ If in the future any other method for RTP-based multiparty real-time
+ text is specified by additional work, it is assumed that it will be
+ recognized by some specific SDP feature exchange.
+
+2.3.1. Initial Offer
+
+ A party that intends to set up a session and is willing to use the
+ RTP-mixer-based method provided in this specification for sending,
+ receiving, or both sending and receiving real-time text SHALL include
+ the "rtt-mixer" SDP attribute in the corresponding "text" media
+ section in the initial offer.
+
+ The party MAY indicate its capability regarding both the RTP-mixer-
+ based method provided in this specification and other methods.
+
+ When the offerer has sent the offer, which includes the "rtt-mixer"
+ attribute, it MUST be prepared to receive and handle real-time text
+ formatted according to both the method for multiparty-aware parties
+ specified in Section 3 and two-party formatted real-time text.
+
+2.3.2. Answering the Offer
+
+ A party that receives an offer containing the "rtt-mixer" SDP
+ attribute and is willing to use the RTP-mixer-based method provided
+ in this specification for sending, receiving, or both sending and
+ receiving real-time text SHALL include the "rtt-mixer" SDP attribute
+ in the corresponding "text" media section in the answer.
+
+ If the offer did not contain the "rtt-mixer" attribute, the answer
+ MUST NOT contain the "rtt-mixer" attribute.
+
+ Even when the "rtt-mixer" attribute is successfully negotiated, the
+ parties MAY send and receive two-party coded real-time text.
+
+ An answer MUST NOT include acceptance of more than one method for
+ multiparty real-time text in the same RTP session.
+
+ When the answer, which includes acceptance, is transmitted, the
+ answerer MUST be prepared to act on received text in the negotiated
+ session according to the method for multiparty-aware parties
+ specified in Section 3. Reception of text for a two-party session
+ SHALL also be supported.
+
+2.3.3. Offerer Processing the Answer
+
+ When the answer is processed by the offerer, the offerer MUST follow
+ the requirements listed in Section 2.4.
+
+2.3.4. Modifying a Session
+
+ A session MAY be modified at any time by any party offering a
+ modified SDP with or without the "rtt-mixer" SDP attribute expressing
+ a desired change in the support of multiparty real-time text.
+
+ If the modified offer adds the indication of support for multiparty
+ real-time text by including the "rtt-mixer" SDP attribute, the
+ procedures specified in the previous subsections SHALL be applied.
+
+ If the modified offer deletes the indication of support for
+ multiparty real-time text by excluding the "rtt-mixer" SDP attribute,
+ the answer MUST NOT contain the "rtt-mixer" attribute. After
+ processing this SDP exchange, the parties MUST NOT send real-time
+ text formatted for multiparty-aware parties according to this
+ specification.
+
+2.4. Actions Depending on Capability Negotiation Result
+
+ A transmitting party SHALL send text according to the RTP-mixer-based
+ multiparty method only when the negotiation for that method was
+ successful and when it conveys text for another source. In all other
+ cases, the packets SHALL be populated and interpreted as for a two-
+ party session.
+
+ A party that has negotiated the "rtt-mixer" SDP media attribute and
+ acts as an RTP mixer sending multiparty text MUST (1) populate the
+ CSRC list and (2) format the packets according to Section 3.
+
+ A party that has negotiated the "rtt-mixer" SDP media attribute MUST
+ interpret the contents of the CC field, the CSRC list, and the
+ packets according to Section 3 in received RTP packets in the
+ corresponding RTP stream.
+
+ A party that has not successfully completed the negotiation of the
+ "rtt-mixer" SDP media attribute MUST NOT transmit packets interleaved
+ from different sources in the same RTP stream, as specified in
+ Section 3. If the party is a mixer and did declare the "rtt-mixer"
+ SDP media attribute, it SHOULD perform the procedure for multiparty-
+ unaware endpoints. If the party is not a mixer, it SHOULD transmit
+ as in a two-party session according to [RFC4103].
+
+3. Details for the RTP-Mixer-Based Mixing Method for Multiparty-Aware
+ Endpoints
+
+3.1. Use of Fields in the RTP Packets
+
+ The CC field SHALL show the number of members in the CSRC list, which
+ SHALL be one (1) in transmissions from a mixer when conveying text
+ from other sources in a multiparty session, and otherwise 0.
+
+ When text is conveyed by a mixer during a multiparty session, a CSRC
+ list SHALL be included in the packet. The single member in the CSRC
+ list SHALL contain the SSRC of the source of the T140blocks in the
+ packet.
+
+ When redundancy is used, the RECOMMENDED level of redundancy is to
+ use one primary and two redundant generations of T140blocks. In some
+ cases, a primary or redundant T140block is empty but is still
+ represented by a member in the redundancy header.
+
+ In other respects, the contents of the RTP packets will be as
+ specified in [RFC4103].
+
+3.2. Initial Transmission of a BOM Character
+
+ As soon as a participant is known to participate in a session with
+ another entity and is available for text reception, a Unicode byte
+ order mark (BOM) character SHALL be sent to it by the other entity
+ according to the procedures in this section. This is useful in many
+ configurations for opening ports and firewalls and for setting up the
+ connection between the application and the network. If the
+ transmitter is a mixer, then the source of this character SHALL be
+ indicated to be the mixer itself.
+
+ Note that the BOM character SHALL be transmitted with the same
+ redundancy procedures as any other text.
+
+3.3. Keep-Alive
+
+ After that, the transmitter SHALL send keep-alive traffic to the
+ receiver(s) at regular intervals when no other traffic has occurred
+ during that interval, if that is decided upon for the actual
+ connection. It is RECOMMENDED to use the keep-alive solution
+ provided in [RFC6263]. The consent check [RFC7675] is a possible
+ alternative if it is used anyway for other reasons.
+
+3.4. Transmission Interval
+
+ A "text/red" or "text/t140" transmitter in a mixer SHALL send packets
+ distributed over time as long as there is something (new or redundant
+ T140blocks) to transmit. The maximum transmission interval between
+ text transmissions from the same source SHALL then be 330 ms, when no
+ other limitations cause a longer interval to be temporarily used. It
+ is RECOMMENDED to send the next packet to a receiver as soon as new
+ text to that receiver is available, as long as the mean character
+ rate of new text to the receiver calculated over the last 10 one-
+ second intervals does not exceed the "cps" value of the receiver.
+ The intention is to keep the latency low and network load limited
+ while keeping good protection against text loss in bursty packet loss
+ conditions. The main purpose of the 330 ms interval is for the
+ timing of redundant transmissions, when no new text from the same
+ source is available.
+
+ The value of 330 ms is used, because many sources of text will
+ transmit new text at 300 ms intervals during periods of continuous
+ user typing, and then reception in the mixer of such new text will
+ cause a combined transmission of the new text and the unsent
+ redundancy from the previous transmission. Only when the user stops
+ typing will the 330 ms interval be applied to send the redundancy.
+
+ If the characters per second ("cps") value is reached, a longer
+ transmission interval SHALL be applied for text from all sources as
+ specified in [RFC4103] and only as much of the text queued for
+ transmission SHALL be sent at the end of each transmission interval
+ as can be allowed without exceeding the "cps" value. Division of
+ text for partial transmission MUST then be made at T140block borders.
+ When the transmission rate falls below the "cps" value again, the
+ transmission intervals SHALL be reset to 330 ms and transmission of
+ new text SHALL again be made as soon as new text is available.
+
+ | NOTE: Extending the transmission intervals during periods of
+ | high load does not change the number of characters to be
+ | conveyed. It just evens out the load over time and reduces the
+ | number of packets per second. With human-created
+ | conversational text, the sending user will eventually take a
+ | pause, letting transmission catch up.
+
+ See also Section 8.
+
+ For a transmitter not acting as a mixer, the transmission interval
+ principles provided in [RFC4103] apply, and the normal transmission
+ interval SHALL be 300 ms.
+
+3.5. Only One Source per Packet
+
+ New text and redundant copies of earlier text from one source SHALL
+ be transmitted in the same packet if available for transmission at
+ the same time. Text from different sources MUST NOT be transmitted
+ in the same packet.
+
+3.6. Do Not Send Received Text to the Originating Source
+
+ Text received by a mixer from a participant SHOULD NOT be included in
+ transmissions from the mixer to that participant, because for text
+ that is produced locally, the normal behavior of the endpoint is to
+ present such text directly when it is produced.
+
+3.7. Clean Incoming Text
+
+ A mixer SHALL handle reception, recovery from packet loss, deletion
+ of superfluous redundancy, marking of possible text loss, and
+ deletion of BOM characters from each participant before queueing
+ received text for transmission to receiving participants as specified
+ in [RFC4103] for single-party sources and Section 3.16 for multiparty
+ sources (chained mixers).
+
+3.8. Principles of Redundant Transmission
+
+ A transmitting party using redundancy SHALL send redundant
+ repetitions of T140blocks already transmitted in earlier packets.
+
+ The number of redundant generations of T140blocks to include in
+ transmitted packets SHALL be deduced from the SDP negotiation. It
+ SHALL be set to the minimum of the number declared by the two parties
+ negotiating a connection. It is RECOMMENDED to declare and transmit
+ one original and two redundant generations of the T140blocks, because
+ this provides good protection against text loss in the case of packet
+ loss and also provides low overhead.
+
+3.9. Text Placement in Packets
+
+ The mixer SHALL compose and transmit an RTP packet to a receiver when
+ one or more of the following conditions have occurred:
+
+ * The transmission interval is the normal 330 ms (no matter whether
+ the transmission interval has passed or not), and there is newly
+ received unsent text available for transmission to that receiver.
+
+ * The current transmission interval has passed and is longer than
+ the normal 330 ms, and there is newly received unsent text
+ available for transmission to that receiver.
+
+ * The current transmission interval (normally 330 ms) has passed
+ since already-transmitted text was queued for transmission as
+ redundant text.
+
+ The principles provided in [RFC4103] apply for populating the header,
+ the redundancy header, and the data in the packet with specific
+ information, as detailed here and in the following sections.
+
+ At the time of transmission, the mixer SHALL populate the RTP packet
+ with all T140blocks queued for transmission originating from the
+ source selected for transmission as long as this is not in conflict
+ with the allowed number of characters per second ("cps") or the
+ maximum packet size. In this way, the latency of the latest received
+ text is kept low even in moments of simultaneous transmission from
+ many sources.
+
+ Redundant text SHALL also be included, and the assessment of how much
+ new text can be included within the maximum packet size MUST take
+ into account that the redundancy has priority to be transmitted in
+ its entirety. See Section 3.4.
+
+ The SSRC of the source SHALL be placed as the only member in the CSRC
+ list.
+
+ | Note: The CSRC list in an RTP packet only includes the
+ | participant whose text is included in text blocks. It is not
+ | the same as the total list of participants in a conference.
+ | With audio and video media, the CSRC list would often contain
+ | all participants who are not muted, whereas text participants
+ | that don't type are completely silent and thus are not
+ | represented in RTP packet CSRC lists.
+
+3.10. Empty T140blocks
+
+ If no unsent T140blocks were available for a source at the time of
+ populating a packet but already-transmitted T140blocks are available
+ that have not yet been sent the full intended number of redundant
+ transmissions, then the primary area in the packet is composed of an
+ empty T140block and included (without taking up any length) in the
+ packet for transmission. The corresponding SSRC SHALL be placed as
+ usual in its place in the CSRC list.
+
+ The first packet in the session, the first after a source switch, and
+ the first after a pause SHALL be populated with the available
+ T140blocks for the source selected to be sent as the primary, and
+ empty T140blocks for the agreed-upon number of redundancy
+ generations.
+
+3.11. Creation of the Redundancy
+
+ The primary T140block from a source in the latest transmitted packet
+ is saved for populating the first redundant T140block for that source
+ in the next transmission of text from that source. The first
+ redundant T140block for that source from the latest transmission is
+ saved for populating the second redundant T140block in the next
+ transmission of text from that source.
+
+ Usually, this is the level of redundancy used. If a higher level of
+ redundancy is negotiated, then the procedure SHALL be continued until
+ all available redundant levels of T140blocks are placed in the
+ packet. If a receiver has negotiated a lower number of "text/red"
+ generations, then that level SHALL be the maximum used by the
+ transmitter.
+
+ The T140blocks saved for transmission as redundant data are assigned
+ a planned transmission time of 330 ms after the current time but
+ SHOULD be transmitted earlier if new text for the same source gets
+ selected for transmission before that time.
+
+3.12. Timer Offset Fields
+
+ The timestamp offset values SHALL be inserted in the redundancy
+ header, with the time offset from the RTP timestamp in the packet
+ when the corresponding T140block was sent as the primary.
+
+ The timestamp offsets are expressed in the same clock tick units as
+ the RTP timestamp.
+
+ The timestamp offset values for empty T140blocks have no relevance
+ but SHOULD be assigned realistic values.
+
+3.13. Other RTP Header Fields
+
+ The number of members in the CSRC list (0 or 1) SHALL be placed in
+ the CC header field. Only mixers place value 1 in the CC field. A
+ value of 0 indicates that the source is the transmitting device
+ itself and that the source is indicated by the SSRC field. This
+ value is used by endpoints and also by mixers sending self-sourced
+ data.
+
+ The current time SHALL be inserted in the timestamp.
+
+ The SSRC header field SHALL contain the SSRC of the RTP session where
+ the packet will be transmitted.
+
+ The M-bit SHALL be handled as specified in [RFC4103].
+
+3.14. Pause in Transmission
+
+ When there is no new T140block to transmit and no redundant T140block
+ that has not been retransmitted the intended number of times from any
+ source, the transmission process SHALL be stopped until either new
+ T140blocks arrive or a keep-alive method calls for transmission of
+ keep-alive packets.
+
+3.15. RTCP Considerations
+
+ A mixer SHALL send RTCP reports with SDES, CNAME, and NAME
+ information about the sources in the multiparty call. This makes it
+ possible for participants to compose a suitable label for text from
+ each source.
+
+ Privacy considerations SHALL be taken when composing these fields.
+ They contain name and address information that may be considered
+ sensitive if the information is transmitted in its entirety, e.g., to
+ unauthenticated participants.
+
+3.16. Reception of Multiparty Contents
+
+ The "text/red" receiver included in an endpoint with presentation
+ functions will receive RTP packets in the single stream from the
+ mixer and SHALL distribute the T140blocks for presentation in
+ presentation areas for each source. Other receiver roles, such as
+ gateways or chained mixers, are also feasible. Whether the stream
+ will only be forwarded or will be distributed based on the different
+ sources must be taken into consideration.
+
+3.16.1. Acting on the Source of the Packet Contents
+
+ If the CC field value of a received packet is 1, it indicates that
+ the text is conveyed from a source indicated in the single member in
+ the CSRC list, and the receiver MUST act on the source according to
+ its role. If the CC value is 0, the source is indicated in the SSRC
+ field.
+
+3.16.2. Detection and Indication of Possible Text Loss
+
+ The receiver SHALL monitor the RTP sequence numbers of the received
+ packets for gaps and for packets received out of order. If a
+ sequence number gap appears and still exists after some defined short
+ time for jitter and reordering resolution, the packets in the gap
+ SHALL be regarded as lost.
+
+ If it is known that only one source is active in the RTP session,
+ then it is likely that a gap equal to or larger than the agreed-upon
+ number of redundancy generations (including the primary) causes text
+ loss. In that case, the receiver SHALL create a T140block with a
+ marker for possible text loss [T140ad1], associate it with the
+ source, and insert it in the reception buffer for that source.
+
+ If it is known that more than one source is active in the RTP
+ session, then it is not possible in general to evaluate if text was
+ lost when packets were lost. With two active sources and the
+ recommended number of redundancy generations (one original and two
+ redundant), it can take a gap of five consecutive lost packets before
+ any text may be lost, but text loss can also appear if three non-
+ consecutive packets are lost when they contained consecutive data
+ from the same source. A simple method for deciding when there is a
+ risk of resulting text loss is to evaluate if three or more packets
+ were lost within one second. If this simple method is used, then a
+ T140block SHOULD be created with a marker for possible text loss
+ [T140ad1] and associated with the SSRC of the RTP session as a
+ general input from the mixer.
+
+ Implementations MAY apply more refined methods for more reliable
+ detection of whether text was lost or not. Any refined method SHOULD
+ prefer marking possible loss rather than not marking when it is
+ uncertain if there was loss.
+
+3.16.3. Extracting Text and Handling Recovery
+
+ When applying the following procedures, the effects of possible
+ timestamp wraparound and the RTP session possibly changing the SSRC
+ MUST be considered.
+
+ When a packet is received in an RTP session using the packetization
+ for multiparty-aware endpoints, its T140blocks SHALL be extracted as
+ described below.
+
+ The source SHALL be extracted from the CSRC list if available, and
+ otherwise from the SSRC.
+
+ If the received packet is the first packet received from the source,
+ then all T140blocks in the packet SHALL be retrieved and assigned to
+ a receive buffer for that source, beginning with the oldest available
+ redundant generation, continuing with the younger redundant
+ generations in age order, and finally ending with the primary.
+
+ | Note: The normal case is that in the first packet, only the
+ | primary data has contents. The redundant data has contents in
+ | the first received packet from a source only after initial
+ | packet loss.
+
+ If the packet is not the first packet from a source, then if
+ redundant data is available, the process SHALL start with the oldest
+ generation. The timestamp of that redundant data SHALL be created by
+ subtracting its timestamp offset from the RTP timestamp. If the
+ resulting timestamp is later than the latest retrieved data from the
+ same source, then the redundant data SHALL be retrieved and appended
+ to the receive buffer. The process SHALL be continued in the same
+ way for all younger generations of redundant data. After that, the
+ timestamp of the packet SHALL be compared with the timestamp of the
+ latest retrieved data from the same source and if it is later, then
+ the primary data SHALL be retrieved from the packet and appended to
+ the receive buffer for the source.
+
+3.16.4. Delete BOM
+
+ The Unicode BOM character is used as a start indication and is
+ sometimes used as a filler or keep-alive by transmission
+ implementations. Any BOM characters SHALL be deleted after
+ extraction from received packets.
+
+3.17. Performance Considerations
+
+ This solution has good performance with low text delays, as long as
+ the mean number of characters per second sent during any 10-second
+ interval from a number of simultaneously sending participants to a
+ receiving participant does not reach the "cps" value. At higher
+ numbers of sent characters per second, a jerkiness is visible in the
+ presentation of text. The solution is therefore suitable for
+ emergency service use, relay service use, and small or well-managed
+ larger multimedia conferences. In large unmanaged conferences with a
+ high number of participants only, on very rare occasions, situations
+ might arise where many participants happen to send text
+ simultaneously. In such circumstances, the result may be
+ unpleasantly jerky presentation of text from each sending
+ participant. It should be noted that it is only the number of users
+ sending text within the same moment that causes jerkiness, not the
+ total number of users with real-time text capability.
+
+3.18. Security for Session Control and Media
+
+ Security mechanisms to provide confidentiality, integrity protection,
+ and peer authentication SHOULD be applied when possible regarding the
+ capabilities of the participating devices by using the Session
+ Initiation Protocol (SIP) over TLS by default according to
+ Section 3.1.3 of [RFC5630] on the session control level and by
+ default using DTLS-SRTP [RFC5764] at the media level. In
+ applications where legacy endpoints without security are allowed, a
+ negotiation SHOULD be performed to decide if encryption at the media
+ level will be applied. If no other security solution is mandated for
+ the application, then the Opportunistic Secure Real-time Transport
+ Protocol (OSRTP) [RFC8643] is a suitable method to be applied to
+ negotiate SRTP media security with DTLS. For simplicity, most SDP
+ examples below are expressed without the security additions. The
+ principles (but not all details) for applying DTLS-SRTP security
+ [RFC5764] are shown in a couple of the following examples.
+
+ Further general security considerations are covered in Section 10.
+
+ End-to-end encryption would require further work and could be based
+ on WebRTC as specified in Section 1.2 or on double encryption as
+ specified in [RFC8723].
+
+3.19. SDP Offer/Answer Examples
+
+ This section shows some examples of SDP for session negotiation of
+ the real-time text media in SIP sessions. Audio is usually provided
+ in the same session, and sometimes also video. The examples only
+ show the part of importance for the real-time text media. The
+ examples relate to the single RTP stream mixing for multiparty-aware
+ endpoints and for multiparty-unaware endpoints.
+
+ | Note: Multiparty real-time text MAY also be provided through
+ | other methods, e.g., by a Selective Forwarding Middlebox (SFM).
+ | In that case, the SDP of the offer will include something
+ | specific for that method, e.g., an SDP attribute or another
+ | media format. An answer selecting the use of that method would
+ | accept it via a corresponding acknowledgement included in the
+ | SDP. The offer may also contain the "rtt-mixer" SDP media
+ | attribute for the main real-time text media when the offerer
+ | has this capability for both multiparty methods, while an
+ | answer, choosing to use SFM, will not include the "rtt-mixer"
+ | SDP media attribute.
+
+ Offer example for the "text/red" format, multiparty support, and
+ capability for 90 characters per second:
+
+ m=text 11000 RTP/AVP 100 98
+ a=rtpmap:98 t140/1000
+ a=fmtp:98 cps=90
+ a=rtpmap:100 red/1000
+ a=fmtp:100 98/98/98
+ a=rtt-mixer
+
+ Answer example from a multiparty-aware device:
+
+ m=text 14000 RTP/AVP 100 98
+ a=rtpmap:98 t140/1000
+ a=fmtp:98 cps=90
+ a=rtpmap:100 red/1000
+ a=fmtp:100 98/98/98
+ a=rtt-mixer
+
+ Offer example for the "text/red" format, including multiparty and
+ security:
+
+ a=fingerprint: (fingerprint1)
+ m=text 11000 RTP/AVP 100 98
+ a=rtpmap:98 t140/1000
+ a=rtpmap:100 red/1000
+ a=fmtp:100 98/98/98
+ a=rtt-mixer
+
+ The "fingerprint" is sufficient to offer DTLS-SRTP, with the media
+ line still indicating RTP/AVP.
+
+ | Note: For brevity, the entire value of the SDP "fingerprint"
+ | attribute is not shown in this and the following example.
+
+ Answer example from a multiparty-aware device with security:
+
+ a=fingerprint: (fingerprint2)
+ m=text 16000 RTP/AVP 100 98
+ a=rtpmap:98 t140/1000
+ a=rtpmap:100 red/1000
+ a=fmtp:100 98/98/98
+ a=rtt-mixer
+
+ With the "fingerprint", the device acknowledges the use of DTLS-SRTP.
+
+ Answer example from a multiparty-unaware device that also does not
+ support security:
+
+ m=text 12000 RTP/AVP 100 98
+ a=rtpmap:98 t140/1000
+ a=rtpmap:100 red/1000
+ a=fmtp:100 98/98/98
+
+3.20. Packet Sequence Example from Interleaved Transmission
+
+ This example shows a symbolic flow of packets from a mixer, including
+ loss and recovery. The sequence includes interleaved transmission of
+ text from two real-time text sources: A and B. P indicates primary
+ data. R1 is the first redundant generation of data, and R2 is the
+ second redundant generation of data. A1, B1, A2, etc. are text
+ chunks (T140blocks) received from the respective sources and sent on
+ to the receiver by the mixer. X indicates a dropped packet between
+ the mixer and a receiver. The session is assumed to use the original
+ and two redundant generations of real-time text.
+
+ |-----------------------|
+ |Seq no 101, Time=20400 |
+ |CC=1 |
+ |CSRC list A |
+ |R2: A1, Offset=600 |
+ |R1: A2, Offset=300 |
+ |P: A3 |
+ |-----------------------|
+
+ Assuming that earlier packets (with text A1 and A2) were received in
+ sequence, text A3 is received from packet 101 and assigned to
+ reception buffer A. The mixer is now assumed to have received
+ initial text from source B 100 ms after packet 101 and will send that
+ text. Transmission of A2 and A3 as redundancy is planned for 330 ms
+ after packet 101 if no new text from A is ready to be sent before
+ that.
+
+ |-----------------------|
+ |Seq no 102, Time=20500 |
+ |CC=1 |
+ |CSRC list B |
+ |R2 Empty, Offset=600 |
+ |R1: Empty, Offset=300 |
+ |P: B1 |
+ |-----------------------|
+
+ Packet 102 is received.
+
+ B1 is retrieved from this packet. Redundant transmission of B1 is
+ planned 330 ms after packet 102.
+
+ X------------------------|
+ X Seq no 103, Timer=20730|
+ X CC=1 |
+ X CSRC list A |
+ X R2: A2, Offset=630 |
+ X R1: A3, Offset=330 |
+ X P: Empty |
+ X------------------------|
+
+ Packet 103 is assumed to be lost due to network problems.
+
+ It contains redundancy for A. Sending A3 as second-level
+ redundancy is planned for 330 ms after packet 103.
+
+ X------------------------|
+ X Seq no 104, Timer=20800|
+ X CC=1 |
+ X CSRC list B |
+ X R2: Empty, Offset=600 |
+ X R1: B1, Offset=300 |
+ X P: B2 |
+ X------------------------|
+
+ Packet 104 contains text from B, including new B2 and redundant
+ B1. It is assumed dropped due to network problems.
+
+ The mixer has A3 redundancy to send, but no new text appears from
+ A, and therefore the redundancy is sent 330 ms after the previous
+ packet with text from A.
+
+ |------------------------|
+ | Seq no 105, Timer=21060|
+ | CC=1 |
+ | CSRC list A |
+ | R2: A3, Offset=660 |
+ | R1: Empty, Offset=330 |
+ | P: Empty |
+ |------------------------|
+
+ Packet 105 is received.
+
+ A gap for lost packets 103 and 104 is detected. Assume that no
+ other loss was detected during the last second. It can then be
+ concluded that nothing was totally lost.
+
+ R2 is checked. Its original time was 21060-660=20400. A packet
+ with text from A was received with that timestamp, so nothing
+ needs to be recovered.
+
+ B1 and B2 still need to be transmitted as redundancy. This is
+ planned 330 ms after packet 104. That would be at 21130.
+
+ |-----------------------|
+ |Seq no 106, Timer=21130|
+ |CC=1 |
+ |CSRC list B |
+ | R2: B1, Offset=630 |
+ | R1: B2, Offset=330 |
+ | P: Empty |
+ |-----------------------|
+
+ Packet 106 is received.
+
+ The second-level redundancy in packet 106 is B1 and has a
+ timestamp offset of 630 ms. The timestamp of packet 106 minus 630
+ is 20500, which is the timestamp of packet 102 that was received.
+ So, B1 does not need to be retrieved. The first-level redundancy
+ in packet 106 has an offset of 330. The timestamp of packet 106
+ minus 330 is 20800. That is later than the latest received packet
+ with source B. Therefore, B2 is retrieved and assigned to the
+ input buffer for source B. No primary is available in packet 106.
+
+ After this sequence, A3, B1, and B2 have been received. In this
+ case, no text was lost.
+
+3.21. Maximum Character Rate "cps" Setting
+
+ The default maximum rate of reception of "text/t140" real-time text,
+ as specified in [RFC4103], is 30 characters per second. The actual
+ rate is calculated without regard to any redundant text transmission
+ and is, in the multiparty case, evaluated for all sources
+ contributing to transmission to a receiver. The value MAY be
+ modified in the "cps" parameter of the "fmtp" attribute for the
+ "text/t140" format of the "text" media section.
+
+ A mixer combining real-time text from a number of sources may
+ occasionally have a higher combined flow of text coming from the
+ sources. Endpoints SHOULD therefore include a suitable higher value
+ for the "cps" parameter, corresponding to its real reception
+ capability. The default "cps" value 30 can be assumed to be
+ sufficient for small meetings and well-managed larger conferences
+ with users only making manual text entry. A "cps" value of 90 can be
+ assumed to be sufficient even for large unmanaged conferences and for
+ cases when speech-to-text technologies are used for text entry. This
+ is also a reachable performance for receivers in modern technologies,
+ and 90 is therefore the RECOMMENDED "cps" value. See [RFC4103] for
+ the format and use of the "cps" parameter. The same rules apply for
+ the multiparty case.
+
+4. Presentation-Level Considerations
+
+ "Protocol for multimedia application text conversation" [T140]
+ provides the presentation-level requirements for RTP transport as
+ described in [RFC4103]. Functions for erasure and other formatting
+ functions are specified in [T140], which has the following general
+ statement for the presentation:
+
+ | The display of text from the members of the conversation should be
+ | arranged so that the text from each participant is clearly
+ | readable, and its source and the relative timing of entered text
+ | is visualized in the display. Mechanisms for looking back in the
+ | contents from the current session should be provided. The text
+ | should be displayed as soon as it is received.
+
+ Strict application of [T140] is essential for the interoperability of
+ real-time text implementations and to fulfill the intention that the
+ session participants have the same information conveyed in the text
+ contents of the conversation without necessarily having the exact
+ same layout of the conversation.
+
+ [T140] specifies a set of presentation control codes (Section 4.2.4)
+ to include in the stream. Some of them are optional.
+ Implementations MUST ignore optional control codes that they do not
+ support.
+
+ There is no strict "message" concept in real-time text. The Unicode
+ Line Separator character SHALL be used as a separator allowing a part
+ of received text to be grouped in a presentation. The character
+ combination "CRLF" may be used by other implementations as a
+ replacement for the Line Separator. The "CRLF" combination SHALL be
+ erased by just one erasing action, the same as the Line Separator.
+ Presentation functions are allowed to group text for presentation in
+ smaller groups than the Line Separators imply and present such groups
+ with a source indication together with text groups from other sources
+ (see the following presentation examples). Erasure has no specific
+ limit by any delimiter in the text stream.
+
+4.1. Presentation by Multiparty-Aware Endpoints
+
+ A multiparty-aware receiving party presenting real-time text MUST
+ separate text from different sources and present them in separate
+ presentation fields. The receiving party MAY separate the
+ presentation of parts of text from a source in readable groups based
+ on criteria other than a Line Separator and merge these groups in the
+ presentation area when it benefits the user to most easily find and
+ read text from the different participants. The criteria MAY, for
+ example, be a received comma, a full stop, some other type of phrase
+ delimiter, or a long pause.
+
+ When text is received from multiple original sources, the
+ presentation SHALL provide a view where text is added in multiple
+ presentation fields.
+
+ If the presentation presents text from different sources in one
+ common area, the presenting endpoint SHOULD insert text from the
+ local user, where the text ends at suitable points and is merged
+ properly with received text to indicate the relative timing for when
+ the text groups were completed. In this presentation mode, the
+ receiving endpoint SHALL present the source of the different groups
+ of text. This presentation style is called the "chat" style here and
+ provides the possibility of following text arriving from multiple
+ parties and the approximate relative time that text is received as
+ related to text from the local user.
+
+ A view of a three-party real-time text call in chat style is shown in
+ this example.
+
+ _________________________________________________
+ | |^|
+ |[Alice] Hi, Alice here. |-|
+ | | |
+ |[Bob] Bob as well. | |
+ | | |
+ |[Eve] Hi, this is Eve, calling from Paris. | |
+ | I thought you should be here. | |
+ | | |
+ |[Alice] I am coming on Thursday, my | |
+ | performance is not until Friday morning.| |
+ | | |
+ |[Bob] And I on Wednesday evening. | |
+ | | |
+ |[Alice] Can we meet on Thursday evening? | |
+ | | |
+ |[Eve] Yes, definitely. How about 7pm. | |
+ | at the entrance of the restaurant | |
+ | Le Lion Blanc? | |
+ |[Eve] we can have dinner and then take a walk |-|
+ |______________________________________________|v|
+ | <Eve-typing> But I need to be back to |^|
+ | the hotel by 11 because I need |-|
+ | | |
+ | <Bob-typing> I wou |-|
+ |______________________________________________|v|
+ | of course, I underst |
+ |________________________________________________|
+
+ Figure 1: Example of a Three-Party Real-Time Text Call Presented
+ in Chat Style Seen at Participant Alice's Endpoint
+
+ Presentation styles other than the chat style MAY be arranged.
+
+ Figure 2 shows how a coordinated column view MAY be presented.
+
+ _____________________________________________________________________
+ | Bob | Eve | Alice |
+ |____________________|______________________|_______________________|
+ | | |I will arrive by TGV. |
+ |My flight is to Orly| |Convenient to the main |
+ | |Hi all, can we plan |station. |
+ | |for the seminar? | |
+ |Eve, will you do | | |
+ |your presentation on| | |
+ |Friday? |Yes, Friday at 10. | |
+ |Fine, wo | |We need to meet befo |
+ |___________________________________________________________________|
+
+ Figure 2: An Example of a Coordinated Column View of a
+ Three-Party Session with Entries Ordered Vertically in
+ Approximate Time Order
+
+4.2. Multiparty Mixing for Multiparty-Unaware Endpoints
+
+ When the mixer has indicated multiparty real-time text capability in
+ an SDP negotiation but the multiparty capability negotiation fails
+ with an endpoint, the agreed-upon "text/red" or "text/t140" format
+ SHALL be used and the mixer SHOULD compose a best-effort presentation
+ of multiparty real-time text in one stream intended to be presented
+ by an endpoint with no multiparty awareness, when that is desired in
+ the actual implementation. The following specifies a procedure that
+ MAY be applied in that situation.
+
+ This presentation format has functional limitations and SHOULD be
+ used only to enable participation in multiparty calls by legacy
+ deployed endpoints implementing only RFC 4103 without any multiparty
+ extensions specified in this document.
+
+ The principles and procedures below do not specify any new protocol
+ elements. They are instead composed of information provided in
+ [T140] and an ambition to provide a best-effort presentation on an
+ endpoint that has functions originally intended only for two-party
+ calls.
+
+ The mixer performing the mixing for multiparty-unaware endpoints
+ SHALL compose a simulated, limited multiparty real-time text view
+ suitable for presentation in one presentation area. The mixer SHALL
+ group text in suitable groups and prepare them for presentation by
+ inserting a Line Separator between them if the transmitted text did
+ not already end with a new line (Line Separator or CRLF). A
+ presentable label SHALL be composed and sent for the source initially
+ in the session and after each source switch. With this procedure,
+ the time for switching from transmission of text from one source to
+ transmission of text from another source depends on the actions of
+ the users. In order to expedite source switching, a user can, for
+ example, end its turn with a new line.
+
+4.2.1. Actions by the Mixer at Reception from the Call Participants
+
+ When text is received by the mixer from the different participants,
+ the mixer SHALL recover text from redundancy if any packets are lost.
+ The marker for lost text [T140ad1] SHALL be inserted in the stream if
+ unrecoverable loss appears. Any Unicode BOM characters, possibly
+ used for keep-alives, SHALL be deleted. The time of creation of text
+ (retrieved from the RTP timestamp) SHALL be stored together with the
+ received text from each source in queues for transmission to the
+ recipients in order to be able to evaluate text loss.
+
+4.2.2. Actions by the Mixer for Transmission to the Recipients
+
+ The following procedure SHALL be applied for each multiparty-unaware
+ recipient of multiparty text from the mixer.
+
+ The text for transmission SHALL be formatted by the mixer for each
+ receiving user for presentation in one single presentation area.
+ Text received from a participant SHOULD NOT be included in
+ transmissions to that participant, because it is usually presented
+ locally at transmission time. When there is text available for
+ transmission from the mixer to a receiving party from more than one
+ participant, the mixer SHALL switch between transmission of text from
+ the different sources at suitable points in the transmitted stream.
+
+ When switching the source, the mixer SHALL insert a Line Separator if
+ the already-transmitted text did not end with a new line (Line
+ Separator or CRLF). A label SHALL be composed of information in the
+ CNAME and NAME fields in RTCP reports from the participant to have
+ its text transmitted, or from other session information for that
+ user. The label SHALL be delimited by suitable characters (e.g.,
+ "[ ]") and transmitted. The CSRC SHALL indicate the selected source.
+ Then, text from that selected participant SHALL be transmitted until
+ a new suitable point for switching the source is reached.
+
+ Information available to the mixer for composing the label may
+ contain sensitive personal information that SHOULD NOT be revealed in
+ sessions not securely authenticated and confidentiality protected.
+ Privacy considerations regarding how much personal information is
+ included in the label SHOULD therefore be taken when composing the
+ label.
+
+ Seeking a suitable point for switching the source SHALL be done when
+ there is older text waiting for transmission from any party than the
+ age of the last transmitted text. Suitable points for switching are:
+
+ * A completed phrase ending with a comma.
+
+ * A completed sentence.
+
+ * A new line (Line Separator or CRLF).
+
+ * A long pause (e.g., > 10 seconds) in received text from the
+ currently transmitted source.
+
+ * If text from one participant has been transmitted with text from
+ other sources waiting for transmission for a long time (e.g., > 1
+ minute) and none of the other suitable points for switching has
+ occurred, a source switch MAY be forced by the mixer at the next
+ word delimiter, and also even if a word delimiter does not occur
+ within some period of time (e.g., 15 seconds) after the scan for a
+ word delimiter started.
+
+ When switching the source, the source that has the oldest text in
+ queue SHALL be selected to be transmitted. A character display count
+ SHALL be maintained for the currently transmitted source, starting at
+ zero after the label is transmitted for the currently transmitted
+ source.
+
+ The status SHALL be maintained for the latest control code for Select
+ Graphic Rendition (SGR) from each source. If there is an SGR code
+ stored as the status for the current source before the source switch
+ is done, a reset of SGR SHALL be sent by the sequence SGR 0 [U+009B
+ U+0000 U+006D] after the new line and before the new label during a
+ source switch. See Section 4.2.4 for an explanation. This
+ transmission does not influence the display count.
+
+ If there is an SGR code stored for the new source after the source
+ switch, that SGR code SHALL be transmitted to the recipient before
+ the label. This transmission does not influence the display count.
+
+4.2.3. Actions on Transmission of Text
+
+ Text from a source sent to the recipient SHALL increase the display
+ count by one per transmitted character.
+
+4.2.4. Actions on Transmission of Control Codes
+
+ The following control codes, as specified by T.140 [T140], require
+ specific actions. They SHALL cause specific considerations in the
+ mixer. Note that the codes presented here are expressed in UTF-16,
+ while transmission is made in the UTF-8 encoding of these codes.
+
+ BEL (U+0007): Bell. Alert in session. Provides for alerting during
+ an active session. The display count SHALL NOT be altered.
+
+ NEW LINE (U+2028): Line Separator. Check and perform a source
+ switch if appropriate. Increase the display count by 1.
+
+ CR LF (U+000D U+000A): A supported, but not preferred, way of
+ requesting a new line. Check and perform a source switch if
+ appropriate. Increase the display count by 1.
+
+ INT (ESC U+0061): Interrupt (used to initiate the mode negotiation
+ procedure). The display count SHALL NOT be altered.
+
+ SGR (U+009B Ps U+006D): Select Graphic Rendition. Ps represents the
+ rendition parameters specified in [ISO6429]. (For freely
+ available equivalent information, please see [ECMA-48].) The
+ display count SHALL NOT be altered. The SGR code SHOULD be stored
+ for the current source.
+
+ SOS (U+0098): Start of String. Used as a general protocol element
+ introducer, followed by a maximum 256-byte string and the ST. The
+ display count SHALL NOT be altered.
+
+ ST (U+009C): String Terminator. End of SOS string. The display
+ count SHALL NOT be altered.
+
+ ESC (U+001B): Escape. Used in control strings. The display count
+ SHALL NOT be altered for the complete escape code.
+
+ Byte order mark (BOM) (U+FEFF): "Zero width no-break space". Used
+ for synchronization and keep-alive. It SHALL be deleted from
+ incoming streams. It SHALL also be sent first after session
+ establishment to the recipient. The display count SHALL NOT be
+ altered.
+
+ Missing text mark (U+FFFD): "Replacement character". Represented as
+ a question mark in a rhombus, or, if that is not feasible,
+ replaced by an apostrophe ('). It marks the place in the stream
+ of possible text loss. This mark SHALL be inserted by the
+ reception procedure in the case of unrecoverable loss of packets.
+ The display count SHALL be increased by one when sent as for any
+ other character.
+
+ SGR: If a control code for SGR other than a reset of the graphic
+ rendition (SGR 0) is sent to a recipient, that control code SHALL
+ also be stored as the status for the source in the storage for SGR
+ status. If a reset graphic rendition (SGR 0) originating from a
+ source is sent, then the SGR status storage for that source SHALL
+ be cleared. The display count SHALL NOT be increased.
+
+ BS (U+0008): "Back Space". Intended to erase the last entered
+ character by a source. Erasure by backspace cannot always be
+ performed as the erasing party intended. If an erasing action
+ erases all text up to the end of the leading label after a source
+ switch, then the mixer MUST NOT transmit more backspaces.
+ Instead, it is RECOMMENDED that a letter "X" be inserted in the
+ text stream for each backspace as an indication of the intent to
+ erase more. A new line is usually coded by a Line Separator, but
+ the character combination "CRLF" MAY be used instead. Erasure of
+ a new line is, in both cases, done by just one erasing action
+ (backspace). If the display count has a positive value, it SHALL
+ be decreased by one when the BS is sent. If the display count is
+ at zero, it SHALL NOT be altered.
+
+4.2.5. Packet Transmission
+
+ A mixer transmitting to a multiparty-unaware endpoint SHALL send
+ primary data only from one source per packet. The SSRC SHALL be the
+ SSRC of the mixer. The CSRC list MAY contain one member and be the
+ SSRC of the source of the primary data.
+
+4.2.6. Functional Limitations
+
+ When a multiparty-unaware endpoint presents a conversation in one
+ display area in a chat style, it inserts source indications for
+ remote text and local user text as they are merged in completed text
+ groups. When an endpoint using this layout receives and presents
+ text mixed for multiparty-unaware endpoints, there will be two levels
+ of source indicators for the received text: one generated by the
+ mixer and inserted in a label after each source switch, and another
+ generated by the receiving endpoint and inserted after each switch
+ between the local source and the remote source in the presentation
+ area. This will waste display space and look inconsistent to the
+ reader.
+
+ New text can be presented from only one source at a time. Switching
+ the source to be presented takes place at suitable places in the
+ text, such as the end of a phrase, the end of a sentence, or a Line
+ Separator, or upon detecting inactivity. Therefore, the time to
+ switch to present waiting text from other sources may grow long, and
+ it will vary and depend on the actions of the currently presented
+ source.
+
+ Erasure can only be done up to the latest source switch. If a user
+ tries to erase more text, the erasing actions will be presented as a
+ letter "X" after the label.
+
+ Text loss because of network errors may hit the label between entries
+ from different parties, causing the risk of a misunderstanding
+ regarding which source provided a piece of text.
+
+ Because of these facts, it is strongly RECOMMENDED that multiparty
+ awareness be implemented in real-time text endpoints. The use of the
+ mixing method for multiparty-unaware endpoints should be left for use
+ with endpoints that are impossible to upgrade to become multiparty
+ aware.
+
+4.2.7. Example Views of Presentation on Multiparty-Unaware Endpoints
+
+ The following pictures are examples of the view on a participant's
+ display for the multiparty-unaware case.
+
+ Figure 3 shows how a coordinated column view MAY be presented on
+ Alice's device in a view with two columns. The mixer inserts labels
+ to show how the sources alternate in the column with received text.
+ The mixer alternates between the sources at suitable points in the
+ text exchange so that text entries from each party can be
+ conveniently read.
+
+ ___________________________________________________
+ | Conference | Alice |
+ |_________________________|_________________________|
+ | |I will arrive by TGV. |
+ |[Bob]: My flight is to |Convenient to the main |
+ |Orly. |station. |
+ |[Eve]: Hi all, can we | |
+ |plan for the seminar. | |
+ | | |
+ |[Bob]: Eve, will you do | |
+ |your presentation on | |
+ |Friday? | |
+ |[Eve]: Yes, Friday at 10.| |
+ |[Bob]: Fine, wo |We need to meet befo |
+ |_________________________|_________________________|
+
+ Figure 3: Alice, Who Has a Conference-Unaware Client, Is
+ Receiving the Multiparty Real-Time Text in a Single Stream
+
+ In Figure 4, there is a tradition in receiving applications to
+ include a label showing the source of the text, here shown with
+ parentheses "()". The mixer also inserts source labels for the
+ multiparty call participants, here shown with brackets "[]".
+
+ _________________________________________________
+ | |^|
+ |(Alice) Hi, Alice here. |-|
+ | | |
+ |(mix)[Bob] Bob as well. | |
+ | | |
+ |[Eve] Hi, this is Eve, calling from Paris | |
+ | I thought you should be here. | |
+ | | |
+ |(Alice) I am coming on Thursday, my | |
+ | performance is not until Friday morning.| |
+ | | |
+ |(mix)[Bob] And I on Wednesday evening. | |
+ | | |
+ |[Eve] we can have dinner and then walk | |
+ | | |
+ |[Eve] But I need to be back to | |
+ | the hotel by 11 because I need | |
+ | |-|
+ |______________________________________________|v|
+ | of course, I underst |
+ |________________________________________________|
+
+ Figure 4: An Example of a View of the Multiparty-Unaware
+ Presentation in Chat Style, Where Alice Is the Local User
+
+5. Relationship to Conference Control
+
+5.1. Use with SIP Centralized Conferencing Framework
+
+ The Session Initiation Protocol (SIP) conferencing framework, mainly
+ specified in [RFC4353], [RFC4579], and [RFC4575], is suitable for
+ coordinating sessions, including multiparty real-time text. The
+ real-time text stream between the mixer and a participant is one and
+ the same during the conference. Participants get announced by
+ notifications when participants are joining or leaving, and further
+ user information may be provided. The SSRC of the text to expect
+ from joined users MAY be included in a notification. The
+ notifications MAY be used for both security purposes and translation
+ to a label for presentation to other users.
+
+5.2. Conference Control
+
+ In managed conferences, control of the real-time text media SHOULD be
+ provided in the same way as for other media, e.g., for muting and
+ unmuting by the direction attributes in SDP [RFC8866].
+
+ Note that floor control functions may be of value for real-time text
+ users as well as for users of other media in a conference.
+
+6. Gateway Considerations
+
+ Multiparty real-time text sessions may involve gateways of different
+ kinds. Gateways involved in setting up sessions SHALL correctly
+ reflect the multiparty capability or unawareness of the combination
+ of the gateway and the remote endpoint beyond the gateway.
+
+6.1. Gateway Considerations with Textphones
+
+ One case that may occur is a gateway to the Public Switched Telephone
+ Network (PSTN) for communication with textphones (e.g., TTYs).
+ Textphones are limited devices with no multiparty awareness, and it
+ SHOULD therefore be appropriate for the gateway to not indicate
+ multiparty awareness for that case. Another solution is that the
+ gateway indicates multiparty capability towards the mixer and
+ includes the multiparty mixer function for multiparty-unaware
+ endpoints itself. This solution makes it possible to adapt to the
+ functional limitations of the textphone.
+
+ More information on gateways to textphones is found in [RFC5194].
+
+6.2. Gateway Considerations with WebRTC
+
+ Gateway operation between RTP-mixer-based multiparty real-time text
+ and WebRTC-based real-time text may also be required. Real-time text
+ transport in WebRTC is specified in [RFC8865].
+
+ A multiparty bridge may have functionality for communicating via
+ real-time text in both (1) RTP streams with real-time text and (2)
+ WebRTC T.140 data channels. Other configurations may consist of a
+ multiparty bridge with either technology for real-time text transport
+ and a separate gateway for conversion of the text communication
+ streams between RTP and T.140 data channels.
+
+ In WebRTC, it is assumed that for a multiparty session, one T.140
+ data channel is established for each source from a gateway or bridge
+ to each participant. Each participant also has a data channel with a
+ two-way connection with the gateway or bridge.
+
+ A T.140 data channel used for two-way communication is for text from
+ the WebRTC user and from the bridge or gateway itself to the WebRTC
+ user. The label parameter of this T.140 data channel is used as the
+ NAME field in RTCP to participants on the RTP side. The other T.140
+ data channels are only for text from other participants to the WebRTC
+ user.
+
+ When a new participant has entered the session with RTP transport of
+ real-time text, a new T.140 data channel SHOULD be established to
+ WebRTC users with the label parameter composed of information from
+ the NAME field in RTCP on the RTP side.
+
+ When a new participant has entered the multiparty session with real-
+ time text transport in a WebRTC T.140 data channel, the new
+ participant SHOULD be announced by a notification to RTP users. The
+ label parameter from the WebRTC side or other suitable information
+ from the session or stream establishment procedure SHOULD be used to
+ compose the NAME RTCP field on the RTP side.
+
+ When a participant on the RTP side is disconnected from the
+ multiparty session, the corresponding T.140 data channel(s) SHOULD be
+ closed.
+
+ When a WebRTC user of T.140 data channels disconnects from the mixer,
+ the corresponding RTP streams or sources in an RTP-mixed stream
+ SHOULD be closed.
+
+ T.140 data channels MAY be opened and closed by negotiation or
+ renegotiation of the session, or by any other valid means, as
+ specified in Section 1 of [RFC8865].
+
+7. Updates to RFC 4103
+
+ This document updates [RFC4103] by introducing an SDP media
+ attribute, "rtt-mixer", for negotiation of multiparty-mixing
+ capability with the format described in [RFC4103] and by specifying
+ the rules for packets when multiparty capability is negotiated and in
+ use.
+
+8. Congestion Considerations
+
+ The congestion considerations and recommended actions provided in
+ [RFC4103] are also valid in multiparty situations.
+
+ The time values SHALL then be applied per source of text sent to a
+ receiver.
+
+ In the very unlikely event that many participants in a conference
+ send text simultaneously for a long period of time, a delay may build
+ up for the presentation of text at the receivers if the limitation in
+ characters per second ("cps") to be transmitted to the participants
+ is exceeded. A delay of more than 15 seconds can cause confusion in
+ the session. It is therefore RECOMMENDED that an RTP mixer discard
+ such text causing excessive delays and insert a general indication of
+ possible text loss [T140ad1] in the session. If the main text
+ contributor is indicated in any way, the mixer MAY avoid deleting
+ text from that participant. It should, however, be noted that human
+ creation of text normally contains pauses, when the transmission can
+ catch up, so that transmission-overload situations are expected to be
+ very rare.
+
+9. IANA Considerations
+
+9.1. Registration of the "rtt-mixer" SDP Media Attribute
+
+ IANA has registered the new SDP attribute "rtt-mixer".
+
+ Contact name: IESG
+
+ Contact email: iesg@ietf.org
+
+ Attribute name: rtt-mixer
+
+ Attribute semantics: See RFC 9071, Section 2.3
+
+ Attribute value: none
+
+ Usage level: media
+
+ Purpose: To indicate mixer and endpoint support of multiparty mixing
+ for real-time text transmission, using a common RTP stream for
+ transmission of text from a number of sources mixed with one
+ source at a time and where the source is indicated in a single
+ CSRC-list member.
+
+ Charset Dependent: no
+
+ O/A procedures: See RFC 9071, Section 2.3
+
+ Mux Category: NORMAL
+
+ Reference: RFC 9071
+
+10. Security Considerations
+
+ The RTP-mixer model requires the mixer to be allowed to decrypt,
+ pack, and encrypt secured text from conference participants.
+ Therefore, the mixer needs to be trusted to maintain confidentiality
+ and integrity of the real-time text data. This situation is similar
+ to the situation for handling audio and video media in centralized
+ mixers.
+
+ The requirement to transfer information about the user in RTCP
+ reports in SDES, CNAME, and NAME fields, and in conference
+ notifications, may have privacy concerns, as already stated in RFC
+ 3550 [RFC3550], and may be restricted for privacy reasons. When used
+ for the creation of readable labels in the presentation, the
+ receiving user will then get a more symbolic label for the source.
+
+ The services available through the real-time text mixer may be of
+ special interest to deaf and hard-of-hearing individuals. Some users
+ may want to refrain from revealing such characteristics broadly in
+ conferences. Conference systems where the mixer is included MAY need
+ to be designed with the confidentiality of such characteristics in
+ mind.
+
+ Participants with malicious intentions may appear and, for example,
+ disrupt the multiparty session by emitting a continuous flow of text.
+ They may also send text that appears to originate from other
+ participants. Countermeasures should include requiring secure
+ signaling, media, and authentication, and providing higher-layer
+ conference functions, e.g., for blocking, muting, and expelling
+ participants.
+
+ Participants with malicious intentions may also try to disrupt the
+ presentation by sending incomplete or malformed control codes.
+ Handling of text from the different sources by the receivers MUST
+ therefore be well separated so that the effects of such actions only
+ affect text from the source causing the action.
+
+ Care should be taken to avoid the possibility of attacks by
+ unauthenticated call participants, and even eavesdropping and
+ manipulation of content by non-participants, if the use of the mixer
+ is permitted for users both with and without security procedures.
+
+ As already stated in Section 3.18, security in media SHOULD be
+ applied by using DTLS-SRTP [RFC5764] at the media level.
+
+ Further security considerations specific to this application are
+ specified in Section 3.18.
+
+11. References
+
+11.1. Normative References
+
+ [ECMA-48] Ecma International, "ECMA-48: Control functions for coded
+ character sets", 5th edition, June 1991,
+ <https://www.ecma-international.org/publications-and-
+ standards/standards/ecma-48/>.
+
+ [ISO6429] ISO/IEC, "Information technology - Control functions for
+ coded character sets", ISO/IEC ISO/IEC 6429:1992, December
+ 1992, <https://www.iso.org/obp/ui/#iso:std:iso-
+ iec:6429:ed-3:v1:en>.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <https://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
+ Jacobson, "RTP: A Transport Protocol for Real-Time
+ Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
+ July 2003, <https://www.rfc-editor.org/info/rfc3550>.
+
+ [RFC4102] Jones, P., "Registration of the text/red MIME Sub-Type",
+ RFC 4102, DOI 10.17487/RFC4102, June 2005,
+ <https://www.rfc-editor.org/info/rfc4102>.
+
+ [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text
+ Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005,
+ <https://www.rfc-editor.org/info/rfc4103>.
+
+ [RFC5630] Audet, F., "The Use of the SIPS URI Scheme in the Session
+ Initiation Protocol (SIP)", RFC 5630,
+ DOI 10.17487/RFC5630, October 2009,
+ <https://www.rfc-editor.org/info/rfc5630>.
+
+ [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer
+ Security (DTLS) Extension to Establish Keys for the Secure
+ Real-time Transport Protocol (SRTP)", RFC 5764,
+ DOI 10.17487/RFC5764, May 2010,
+ <https://www.rfc-editor.org/info/rfc5764>.
+
+ [RFC6263] Marjou, X. and A. Sollaud, "Application Mechanism for
+ Keeping Alive the NAT Mappings Associated with RTP / RTP
+ Control Protocol (RTCP) Flows", RFC 6263,
+ DOI 10.17487/RFC6263, June 2011,
+ <https://www.rfc-editor.org/info/rfc6263>.
+
+ [RFC7675] Perumal, M., Wing, D., Ravindranath, R., Reddy, T., and M.
+ Thomson, "Session Traversal Utilities for NAT (STUN) Usage
+ for Consent Freshness", RFC 7675, DOI 10.17487/RFC7675,
+ October 2015, <https://www.rfc-editor.org/info/rfc7675>.
+
+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+ 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+ May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
+ [RFC8865] Holmberg, C. and G. Hellström, "T.140 Real-Time Text
+ Conversation over WebRTC Data Channels", RFC 8865,
+ DOI 10.17487/RFC8865, January 2021,
+ <https://www.rfc-editor.org/info/rfc8865>.
+
+ [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP:
+ Session Description Protocol", RFC 8866,
+ DOI 10.17487/RFC8866, January 2021,
+ <https://www.rfc-editor.org/info/rfc8866>.
+
+ [T140] ITU-T, "Protocol for multimedia application text
+ conversation", ITU-T Recommendation T.140, February 1998,
+ <https://www.itu.int/rec/T-REC-T.140-199802-I/en>.
+
+ [T140ad1] ITU-T, "Recommendation T.140 Addendum", February 2000,
+ <https://www.itu.int/rec/T-REC-T.140-200002-I!Add1/en>.
+
+11.2. Informative References
+
+ [RFC4353] Rosenberg, J., "A Framework for Conferencing with the
+ Session Initiation Protocol (SIP)", RFC 4353,
+ DOI 10.17487/RFC4353, February 2006,
+ <https://www.rfc-editor.org/info/rfc4353>.
+
+ [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A
+ Session Initiation Protocol (SIP) Event Package for
+ Conference State", RFC 4575, DOI 10.17487/RFC4575, August
+ 2006, <https://www.rfc-editor.org/info/rfc4575>.
+
+ [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol
+ (SIP) Call Control - Conferencing for User Agents",
+ BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006,
+ <https://www.rfc-editor.org/info/rfc4579>.
+
+ [RFC5194] van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real-
+ Time Text over IP Using the Session Initiation Protocol
+ (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008,
+ <https://www.rfc-editor.org/info/rfc5194>.
+
+ [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
+ DOI 10.17487/RFC7667, November 2015,
+ <https://www.rfc-editor.org/info/rfc7667>.
+
+ [RFC8643] Johnston, A., Aboba, B., Hutton, A., Jesske, R., and T.
+ Stach, "An Opportunistic Approach for Secure Real-time
+ Transport Protocol (OSRTP)", RFC 8643,
+ DOI 10.17487/RFC8643, August 2019,
+ <https://www.rfc-editor.org/info/rfc8643>.
+
+ [RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach,
+ "Double Encryption Procedures for the Secure Real-Time
+ Transport Protocol (SRTP)", RFC 8723,
+ DOI 10.17487/RFC8723, April 2020,
+ <https://www.rfc-editor.org/info/rfc8723>.
+
+ [RFC8825] Alvestrand, H., "Overview: Real-Time Protocols for
+ Browser-Based Applications", RFC 8825,
+ DOI 10.17487/RFC8825, January 2021,
+ <https://www.rfc-editor.org/info/rfc8825>.
+
+Acknowledgements
+
+ The author wants to thank the following persons for support, reviews,
+ and valuable comments: Bernard Aboba, Amanda Baber, Roman Danyliw,
+ Spencer Dawkins, Martin Duke, Lars Eggert, James Hamlin, Benjamin
+ Kaduk, Murray Kucherawy, Paul Kyzivat, Jonathan Lennox, Lorenzo
+ Miniero, Dan Mongrain, Francesca Palombini, Colin Perkins, Brian
+ Rosen, Rich Salz, Jürgen Schönwälder, Robert Wilton, Dale Worley,
+ Yong Xin, and Peter Yee.
+
+Author's Address
+
+ Gunnar Hellström
+ Gunnar Hellström Accessible Communication
+ SE-13670 Vendelsö
+ Sweden
+
+ Email: gunnar.hellstrom@ghaccess.se