diff options
Diffstat (limited to 'doc/rfc/rfc9071.txt')
-rw-r--r-- | doc/rfc/rfc9071.txt | 1923 |
1 files changed, 1923 insertions, 0 deletions
diff --git a/doc/rfc/rfc9071.txt b/doc/rfc/rfc9071.txt new file mode 100644 index 0000000..9166840 --- /dev/null +++ b/doc/rfc/rfc9071.txt @@ -0,0 +1,1923 @@ + + + + +Internet Engineering Task Force (IETF) G. Hellström +Request for Comments: 9071 GHAccess +Updates: 4103 July 2021 +Category: Standards Track +ISSN: 2070-1721 + + + RTP-Mixer Formatting of Multiparty Real-Time Text + +Abstract + + This document provides enhancements of real-time text (as specified + in RFC 4103) suitable for mixing in a centralized conference model, + enabling source identification and rapidly interleaved transmission + of text from different sources. The intended use is for real-time + text mixers and participant endpoints capable of providing an + efficient presentation or other treatment of a multiparty real-time + text session. The specified mechanism builds on the standard use of + the Contributing Source (CSRC) list in the Real-time Transport + Protocol (RTP) packet for source identification. The method makes + use of the same "text/t140" and "text/red" formats as for two-party + sessions. + + Solutions using multiple RTP streams in the same RTP session are + briefly mentioned, as they could have some benefits over the RTP- + mixer model. The RTP-mixer model was selected to be used for the + fully specified solution in this document because it can be applied + to a wide range of existing RTP implementations. + + A capability exchange is specified so that it can be verified that a + mixer and a participant can handle the multiparty-coded real-time + text stream using the RTP-mixer method. The capability is indicated + by the use of a Session Description Protocol (SDP) (RFC 8866) media + attribute, "rtt-mixer". + + This document updates RFC 4103 ("RTP Payload for Text Conversation"). + + A specification for how a mixer can format text for the case when the + endpoint is not multiparty aware is also provided. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc9071. + +Copyright Notice + + Copyright (c) 2021 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction + 1.1. Terminology + 1.2. Main Method, Fallback Method, and Considered Alternatives + 1.3. Intended Application + 2. Overview of the Two Specified Solutions and Selection of Method + 2.1. The RTP-Mixer-Based Solution for Multiparty-Aware Endpoints + 2.2. Mixing for Multiparty-Unaware Endpoints + 2.3. Offer/Answer Considerations + 2.4. Actions Depending on Capability Negotiation Result + 3. Details for the RTP-Mixer-Based Mixing Method for + Multiparty-Aware Endpoints + 3.1. Use of Fields in the RTP Packets + 3.2. Initial Transmission of a BOM Character + 3.3. Keep-Alive + 3.4. Transmission Interval + 3.5. Only One Source per Packet + 3.6. Do Not Send Received Text to the Originating Source + 3.7. Clean Incoming Text + 3.8. Principles of Redundant Transmission + 3.9. Text Placement in Packets + 3.10. Empty T140blocks + 3.11. Creation of the Redundancy + 3.12. Timer Offset Fields + 3.13. Other RTP Header Fields + 3.14. Pause in Transmission + 3.15. RTCP Considerations + 3.16. Reception of Multiparty Contents + 3.17. Performance Considerations + 3.18. Security for Session Control and Media + 3.19. SDP Offer/Answer Examples + 3.20. Packet Sequence Example from Interleaved Transmission + 3.21. Maximum Character Rate "cps" Setting + 4. Presentation-Level Considerations + 4.1. Presentation by Multiparty-Aware Endpoints + 4.2. Multiparty Mixing for Multiparty-Unaware Endpoints + 5. Relationship to Conference Control + 5.1. Use with SIP Centralized Conferencing Framework + 5.2. Conference Control + 6. Gateway Considerations + 6.1. Gateway Considerations with Textphones + 6.2. Gateway Considerations with WebRTC + 7. Updates to RFC 4103 + 8. Congestion Considerations + 9. IANA Considerations + 9.1. Registration of the "rtt-mixer" SDP Media Attribute + 10. Security Considerations + 11. References + 11.1. Normative References + 11.2. Informative References + Acknowledgements + Author's Address + +1. Introduction + + "RTP Payload for Text Conversation" [RFC4103] specifies the use of + the Real-time Transport Protocol (RTP) [RFC3550] for transmission of + real-time text (often called RTT) and the "text/t140" format. It + also specifies a redundancy format, "text/red", for increased + robustness. The "text/red" format is registered in [RFC4102]. + + Real-time text is usually provided together with audio and sometimes + with video in conversational sessions. + + A requirement related to multiparty sessions from the presentation- + level standard T.140 [T140] for real-time text is as follows: + + | The display of text from the members of the conversation should be + | arranged so that the text from each participant is clearly + | readable, and its source and the relative timing of entered text + | is visualized in the display. + + Another requirement is that the mixing procedure must not introduce + delays in the text streams that could be perceived as disruptive to + the real-time experience of the receiving users. + + The use of real-time text is increasing, and specifically, use in + emergency calls is increasing. Emergency call use requires + multiparty mixing, because it is common that one agent needs to + transfer the call to another specialized agent but is obliged to stay + on the call to at least verify that the transfer was successful. + Mixer implementations for RFC 4103 ("RTP Payload for Text + Conversation") can use traditional RTP functions (RFC 3550) for + mixing and source identification, but the performance of the mixer + when giving turns for the different sources to transmit is limited + when using the default transmission characteristics with redundancy. + + The redundancy scheme described in [RFC4103] enables efficient + transmission of earlier transmitted redundant text in packets + together with new text. However, the redundancy header format has no + source indicators for the redundant transmissions. The redundant + parts in a packet must therefore be from the same source as the new + text. The recommended transmission is one new and two redundant + generations of text (T140blocks) in each packet, and the recommended + transmission interval for two-party use is 300 ms. + + Real-time text mixers for multiparty sessions need to include the + source with each transmitted group of text from a conference + participant so that the text can be transmitted interleaved with text + groups from different sources at the rate at which they are created. + This enables the text groups to be presented by endpoints in a + suitable grouping with other text from the same source. + + The presentation can then be arranged so that text from different + sources can be presented in real time and easily read. At the same + time, it is possible for a reading user to perceive approximately + when the text was created in real time by the different parties. The + transmission and mixing are intended to be done in a general way, so + that presentation can be arranged in a layout decided upon by the + receiving endpoint. + + Existing implementations of RFC 4103 in endpoints that do not + implement the updates specified in this document cannot be expected + to properly present real-time text mixed for multiparty-aware + endpoints. + + A negotiation mechanism is therefore needed to verify if the parties + (1) are able to handle a common method for multiparty transmissions + and (2) can agree on using that method. + + A fallback mixing procedure is also needed for cases when the + negotiation result indicates that a receiving endpoint is not capable + of handling the mixed format. Multiparty-unaware endpoints would + possibly otherwise present all received multiparty mixed text as if + it came from the same source regardless of any accompanying source + indication coded in fields in the packet. Or, they may have other + undesirable ways of acting on the multiparty content. The fallback + method is called the mixing procedure for multiparty-unaware + endpoints. The fallback method is naturally not expected to meet all + performance requirements placed on the mixing procedure for + multiparty-aware endpoints. + + This document updates [RFC4103] by introducing an attribute for + declaring support of the RTP-mixer-based multiparty-mixing case and + rules for source indications and interleaving of text from different + sources. + +1.1. Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + + The terms "Source Description" (SDES), "Canonical Name" (CNAME), + "Name" (NAME), "Synchronization Source" (SSRC), "Contributing Source" + (CSRC), "CSRC list", "CSRC count" (CC), "RTP Control Protocol" + (RTCP), and "RTP mixer" are defined in [RFC3550]. + + "real-time text" (RTT) is text transmitted instantly as it is typed + or created. Recipients can immediately read the message while it is + being written, without waiting. + + The term "T140block" is defined in [RFC4103] to contain one or more + T.140 code elements. + + "TTY" stands for a textphone type used in North America. + + Web Real-Time Communication (WebRTC) is specified by the World Wide + Web Consortium (W3C) and the IETF. See [RFC8825]. + + "DTLS-SRTP" is a Datagram Transport Layer Security (DTLS) extension + for use with the Secure Real-time Transport Protocol / Secure Real- + time Transport Control Protocol (SRTP/SRTCP) as specified in + [RFC5764]. + + The term "multiparty aware" describes an endpoint that (1) receives + real-time text from multiple sources through a common conference + mixer, (2) is able to present the text in real time, separated by + source, and (3) presents the text so that a user can get an + impression of the approximate relative timing of text from different + parties. + + The term "multiparty unaware" describes an endpoint that cannot + itself separate text from different sources when the text is received + through a common conference mixer. + +1.2. Main Method, Fallback Method, and Considered Alternatives + + A number of alternatives were considered when searching for an + efficient and easily implemented multiparty method for real-time + text. This section briefly explains a few of them. + + Multiple RTP streams, one per participant: + One RTP stream per source would be sent in the same RTP session + with the "text/red" format. From some points of view, the use of + multiple RTP streams, one for each source, sent in the same RTP + session would be efficient and would use exactly the same packet + format as [RFC4103] and the same payload type. A couple of + relevant scenarios using multiple RTP streams are specified in + "RTP Topologies" [RFC7667]. One possibility of special interest + is the Selective Forwarding Middlebox (SFM) topology specified in + Section 3.7 of [RFC7667], which could enable end-to-end + encryption. In contrast to audio and video, real-time text is + only transmitted when the users actually transmit information. + Thus, an SFM solution would not need to exclude any party from + transmission under normal conditions. In order to allow the mixer + to convey the packets with the payload preserved and encrypted, an + SFM solution would need to act on some specific characteristics of + the "text/red" format. The redundancy headers are part of the + payload, so the receiver would need to just assume that the + payload type number in the redundancy header is for "text/t140". + The characters per second ("cps") parameter would need to act per + stream. The relationship between the SSRC and the source would + need to be conveyed in some specified way, e.g., in the CSRC. + Recovery and loss detection would preferably be based on RTP + sequence number gap detection. Thus, sequence number gaps in the + incoming stream to the mixer would need to be reflected in the + stream to the participant, with no new gaps created by the mixer. + However, the RTP implementation in both mixers and endpoints needs + to support multiple streams in the same RTP session in order to + use this mechanism. To provide the best opportunities for + deployment, it should be possible to upgrade existing endpoint + solutions to be multiparty aware with a reasonable amount of + effort. There is currently a lack of support for multi-stream RTP + in certain implementations. This fact led to only brief mention + of this solution in this document as an option for further study. + + RTP-mixer-based method for multiparty-aware endpoints: + The "text/red" format as defined in RFC 4102 and applied in RFC + 4103 is sent with the RTP-mixer method indicating the source in + the CSRC field. The "text/red" format with a "text/t140" payload + in a single RTP stream can be sent when text is available from the + call participants instead of at the regular 300 ms intervals. + Transmission of packets with text from different sources can then + be done smoothly while simultaneous transmission occurs as long as + it is not limited by the maximum character rate "cps" value. With + ten participants sending text simultaneously, the switching and + transmission performance is good. With more simultaneously + sending participants and with receivers at default capacity, there + will be a noticeable jerkiness and delay in text presentation. + The more participants who send text simultaneously, the more + jerkiness will occur. Two seconds of jerkiness will be noticeable + and slightly unpleasant, but it corresponds in time to what typing + humans often cause by hesitating or changing position while + typing. A benefit of this method is that no new packet format + needs to be introduced and implemented. Since simultaneous typing + by more than two parties is expected to be very rare -- as + described in Section 1.3 -- this method can be used successfully + with good performance. Recovery of text in the case of packet + loss is based on analysis of timestamps of received redundancy + versus earlier received text. Negotiation is based on a new SDP + media attribute, "rtt-mixer". This method was selected to be the + main method specified in this document. + + Multiple sources per packet: + A new "text" media subtype would be specified with up to 15 + sources in each packet. The mechanism would make use of the RTP- + mixer model specified in RTP [RFC3550]. The sources would be + indicated in strict order in the CSRC list of the RTP packets. + The CSRC list can have up to 15 members. Therefore, text from up + to 15 sources can be included in each packet. Packets are + normally sent at 300 ms intervals. The mean delay would be 150 + ms. A new redundancy packet format would be specified. This + method would result in good performance but would require + standardization and implementation of new releases in the target + technologies; these would take more time than desirable to + complete. It was therefore not selected to be included in this + document. + + Mixing for multiparty-unaware endpoints: + The presentation of text from multiple parties is prepared by the + mixer in one single stream. It is desirable to have a method that + does not require any modifications in existing user devices + implementing RFC 4103 for real-time text without explicit support + of multiparty sessions. This is made possible by having the mixer + insert a new line and a text-formatted source label before each + switch of text source in the stream. Switching the source can + only be done in places in the text where it does not disturb the + perception of the contents. Text from only one source at a time + can be presented in real time. The delay will therefore vary. In + calls where parties take turns properly by ending their entries + with a new line, the limitations will have limited influence on + the user experience. When only two parties send text, these two + will see the text in real time with no delay. Although this + method also has other limitations, it is included in this document + as a fallback method. + + Real-time text transport in WebRTC: + [RFC8865] specifies how the WebRTC data channel can be used to + transport real-time text. That specification contains a section + briefly describing its use in multiparty sessions. The focus of + this document is RTP transport. Therefore, even if the WebRTC + transport provides good multiparty performance, it is only + mentioned in this document in relation to providing gateways with + multiparty capabilities between RTP and WebRTC technologies. + +1.3. Intended Application + + The method for multiparty real-time text specified in this document + is primarily intended for use in transmissions between mixers and + endpoints in centralized mixing configurations. It is also + applicable between mixers. An often-mentioned application is for + emergency service calls with real-time text and voice, where a call + taker wants to make an attended handover of a call to another agent + and stay on the call to observe the session. Multimedia conference + sessions with support for participants to contribute with text is + another example. Conferences with central support for speech-to-text + conversion represent yet another example. + + In all these applications, normally only one participant at a time + will send long text comments. In some cases, one other participant + will occasionally contribute with a longer comment simultaneously. + That may also happen in some rare cases when text is translated to + text in another language in a conference. Apart from these cases, + other participants are only expected to contribute with very brief + comments while others are sending text. + + Users expect the text they send to be presented in real time in a + readable way to the other participants even if they send + simultaneously with other users and even when they make brief edit + operations of their text by backspacing and correcting their text. + + Text is supposed to be human generated, by some means of text input, + such as typing on a keyboard or using speech-to-text technology. + Occasional small cut-and-paste operations may appear even if that is + not the initial purpose of real-time text. + + The real-time characteristics of real-time text are essential for the + participants to be able to contribute to a conversation. If the text + is delayed too much between the typing of a character and its + presentation, then, in some conference situations, the opportunity to + comment will be gone and someone else will grab the turn. A delay of + more than one second in such situations is an obstacle to good + conversation. + +2. Overview of the Two Specified Solutions and Selection of Method + + This section contains a brief introduction of the two methods + specified in this document. + +2.1. The RTP-Mixer-Based Solution for Multiparty-Aware Endpoints + + This method specifies the negotiated use of the formats described in + RFC 4103, for multiparty transmissions in a single RTP stream. The + main purpose of this document is to specify a method for true + multiparty real-time text mixing for multiparty-aware endpoints that + can be widely deployed. The RTP-mixer-based method makes use of the + current format for real-time text as provided in [RFC4103]. This + method updates RFC 4103 by clarifying one way to use it in the + multiparty situation. That is done by completing a negotiation for + this kind of multiparty capability and by interleaving packets from + different sources. The source is indicated in the CSRC element in + the RTP packets. Specific considerations are made regarding the + ability to recover text after packet loss. + + The detailed procedures for the RTP-mixer-based multiparty-aware case + are specified in Section 3. + + Please refer to [RFC4103] when reading this document. + +2.2. Mixing for Multiparty-Unaware Endpoints + + This document also specifies a method to be used in cases when the + endpoint participating in a multiparty call does not itself implement + any solution or does not implement the same solution as the mixer. + This method requires the mixer to insert text dividers and readable + labels and only send text from one source at a time until a suitable + point appears for changing the source. This solution is a fallback + method with functional limitations. It operates at the presentation + level. + + A mixer SHOULD by default format and transmit text to a call + participant so that the text is suitable for presentation on a + multiparty-unaware endpoint that has not negotiated any method for + true multiparty real-time text handling but has negotiated a "text/ + red" or "text/t140" format in a session. This SHOULD be done if + nothing else is specified for the application, in order to maintain + interoperability. Section 4.2 specifies how this mixing is done. + +2.3. Offer/Answer Considerations + + "RTP Payload for Text Conversation" [RFC4103] specifies the use of + RTP [RFC3550] and a redundancy format ("text/red", as defined in + [RFC4102]) for increased robustness of real-time text transmission. + This document updates [RFC4103] by introducing a capability + negotiation for handling multiparty real-time text, a way to indicate + the source of transmitted text, and rules for efficient timing of the + transmissions interleaved from different sources. + + The capability negotiation for the RTP-mixer-based multiparty method + is based on the use of the SDP media attribute "rtt-mixer". + + The syntax is as follows: + + a=rtt-mixer + + If in the future any other method for RTP-based multiparty real-time + text is specified by additional work, it is assumed that it will be + recognized by some specific SDP feature exchange. + +2.3.1. Initial Offer + + A party that intends to set up a session and is willing to use the + RTP-mixer-based method provided in this specification for sending, + receiving, or both sending and receiving real-time text SHALL include + the "rtt-mixer" SDP attribute in the corresponding "text" media + section in the initial offer. + + The party MAY indicate its capability regarding both the RTP-mixer- + based method provided in this specification and other methods. + + When the offerer has sent the offer, which includes the "rtt-mixer" + attribute, it MUST be prepared to receive and handle real-time text + formatted according to both the method for multiparty-aware parties + specified in Section 3 and two-party formatted real-time text. + +2.3.2. Answering the Offer + + A party that receives an offer containing the "rtt-mixer" SDP + attribute and is willing to use the RTP-mixer-based method provided + in this specification for sending, receiving, or both sending and + receiving real-time text SHALL include the "rtt-mixer" SDP attribute + in the corresponding "text" media section in the answer. + + If the offer did not contain the "rtt-mixer" attribute, the answer + MUST NOT contain the "rtt-mixer" attribute. + + Even when the "rtt-mixer" attribute is successfully negotiated, the + parties MAY send and receive two-party coded real-time text. + + An answer MUST NOT include acceptance of more than one method for + multiparty real-time text in the same RTP session. + + When the answer, which includes acceptance, is transmitted, the + answerer MUST be prepared to act on received text in the negotiated + session according to the method for multiparty-aware parties + specified in Section 3. Reception of text for a two-party session + SHALL also be supported. + +2.3.3. Offerer Processing the Answer + + When the answer is processed by the offerer, the offerer MUST follow + the requirements listed in Section 2.4. + +2.3.4. Modifying a Session + + A session MAY be modified at any time by any party offering a + modified SDP with or without the "rtt-mixer" SDP attribute expressing + a desired change in the support of multiparty real-time text. + + If the modified offer adds the indication of support for multiparty + real-time text by including the "rtt-mixer" SDP attribute, the + procedures specified in the previous subsections SHALL be applied. + + If the modified offer deletes the indication of support for + multiparty real-time text by excluding the "rtt-mixer" SDP attribute, + the answer MUST NOT contain the "rtt-mixer" attribute. After + processing this SDP exchange, the parties MUST NOT send real-time + text formatted for multiparty-aware parties according to this + specification. + +2.4. Actions Depending on Capability Negotiation Result + + A transmitting party SHALL send text according to the RTP-mixer-based + multiparty method only when the negotiation for that method was + successful and when it conveys text for another source. In all other + cases, the packets SHALL be populated and interpreted as for a two- + party session. + + A party that has negotiated the "rtt-mixer" SDP media attribute and + acts as an RTP mixer sending multiparty text MUST (1) populate the + CSRC list and (2) format the packets according to Section 3. + + A party that has negotiated the "rtt-mixer" SDP media attribute MUST + interpret the contents of the CC field, the CSRC list, and the + packets according to Section 3 in received RTP packets in the + corresponding RTP stream. + + A party that has not successfully completed the negotiation of the + "rtt-mixer" SDP media attribute MUST NOT transmit packets interleaved + from different sources in the same RTP stream, as specified in + Section 3. If the party is a mixer and did declare the "rtt-mixer" + SDP media attribute, it SHOULD perform the procedure for multiparty- + unaware endpoints. If the party is not a mixer, it SHOULD transmit + as in a two-party session according to [RFC4103]. + +3. Details for the RTP-Mixer-Based Mixing Method for Multiparty-Aware + Endpoints + +3.1. Use of Fields in the RTP Packets + + The CC field SHALL show the number of members in the CSRC list, which + SHALL be one (1) in transmissions from a mixer when conveying text + from other sources in a multiparty session, and otherwise 0. + + When text is conveyed by a mixer during a multiparty session, a CSRC + list SHALL be included in the packet. The single member in the CSRC + list SHALL contain the SSRC of the source of the T140blocks in the + packet. + + When redundancy is used, the RECOMMENDED level of redundancy is to + use one primary and two redundant generations of T140blocks. In some + cases, a primary or redundant T140block is empty but is still + represented by a member in the redundancy header. + + In other respects, the contents of the RTP packets will be as + specified in [RFC4103]. + +3.2. Initial Transmission of a BOM Character + + As soon as a participant is known to participate in a session with + another entity and is available for text reception, a Unicode byte + order mark (BOM) character SHALL be sent to it by the other entity + according to the procedures in this section. This is useful in many + configurations for opening ports and firewalls and for setting up the + connection between the application and the network. If the + transmitter is a mixer, then the source of this character SHALL be + indicated to be the mixer itself. + + Note that the BOM character SHALL be transmitted with the same + redundancy procedures as any other text. + +3.3. Keep-Alive + + After that, the transmitter SHALL send keep-alive traffic to the + receiver(s) at regular intervals when no other traffic has occurred + during that interval, if that is decided upon for the actual + connection. It is RECOMMENDED to use the keep-alive solution + provided in [RFC6263]. The consent check [RFC7675] is a possible + alternative if it is used anyway for other reasons. + +3.4. Transmission Interval + + A "text/red" or "text/t140" transmitter in a mixer SHALL send packets + distributed over time as long as there is something (new or redundant + T140blocks) to transmit. The maximum transmission interval between + text transmissions from the same source SHALL then be 330 ms, when no + other limitations cause a longer interval to be temporarily used. It + is RECOMMENDED to send the next packet to a receiver as soon as new + text to that receiver is available, as long as the mean character + rate of new text to the receiver calculated over the last 10 one- + second intervals does not exceed the "cps" value of the receiver. + The intention is to keep the latency low and network load limited + while keeping good protection against text loss in bursty packet loss + conditions. The main purpose of the 330 ms interval is for the + timing of redundant transmissions, when no new text from the same + source is available. + + The value of 330 ms is used, because many sources of text will + transmit new text at 300 ms intervals during periods of continuous + user typing, and then reception in the mixer of such new text will + cause a combined transmission of the new text and the unsent + redundancy from the previous transmission. Only when the user stops + typing will the 330 ms interval be applied to send the redundancy. + + If the characters per second ("cps") value is reached, a longer + transmission interval SHALL be applied for text from all sources as + specified in [RFC4103] and only as much of the text queued for + transmission SHALL be sent at the end of each transmission interval + as can be allowed without exceeding the "cps" value. Division of + text for partial transmission MUST then be made at T140block borders. + When the transmission rate falls below the "cps" value again, the + transmission intervals SHALL be reset to 330 ms and transmission of + new text SHALL again be made as soon as new text is available. + + | NOTE: Extending the transmission intervals during periods of + | high load does not change the number of characters to be + | conveyed. It just evens out the load over time and reduces the + | number of packets per second. With human-created + | conversational text, the sending user will eventually take a + | pause, letting transmission catch up. + + See also Section 8. + + For a transmitter not acting as a mixer, the transmission interval + principles provided in [RFC4103] apply, and the normal transmission + interval SHALL be 300 ms. + +3.5. Only One Source per Packet + + New text and redundant copies of earlier text from one source SHALL + be transmitted in the same packet if available for transmission at + the same time. Text from different sources MUST NOT be transmitted + in the same packet. + +3.6. Do Not Send Received Text to the Originating Source + + Text received by a mixer from a participant SHOULD NOT be included in + transmissions from the mixer to that participant, because for text + that is produced locally, the normal behavior of the endpoint is to + present such text directly when it is produced. + +3.7. Clean Incoming Text + + A mixer SHALL handle reception, recovery from packet loss, deletion + of superfluous redundancy, marking of possible text loss, and + deletion of BOM characters from each participant before queueing + received text for transmission to receiving participants as specified + in [RFC4103] for single-party sources and Section 3.16 for multiparty + sources (chained mixers). + +3.8. Principles of Redundant Transmission + + A transmitting party using redundancy SHALL send redundant + repetitions of T140blocks already transmitted in earlier packets. + + The number of redundant generations of T140blocks to include in + transmitted packets SHALL be deduced from the SDP negotiation. It + SHALL be set to the minimum of the number declared by the two parties + negotiating a connection. It is RECOMMENDED to declare and transmit + one original and two redundant generations of the T140blocks, because + this provides good protection against text loss in the case of packet + loss and also provides low overhead. + +3.9. Text Placement in Packets + + The mixer SHALL compose and transmit an RTP packet to a receiver when + one or more of the following conditions have occurred: + + * The transmission interval is the normal 330 ms (no matter whether + the transmission interval has passed or not), and there is newly + received unsent text available for transmission to that receiver. + + * The current transmission interval has passed and is longer than + the normal 330 ms, and there is newly received unsent text + available for transmission to that receiver. + + * The current transmission interval (normally 330 ms) has passed + since already-transmitted text was queued for transmission as + redundant text. + + The principles provided in [RFC4103] apply for populating the header, + the redundancy header, and the data in the packet with specific + information, as detailed here and in the following sections. + + At the time of transmission, the mixer SHALL populate the RTP packet + with all T140blocks queued for transmission originating from the + source selected for transmission as long as this is not in conflict + with the allowed number of characters per second ("cps") or the + maximum packet size. In this way, the latency of the latest received + text is kept low even in moments of simultaneous transmission from + many sources. + + Redundant text SHALL also be included, and the assessment of how much + new text can be included within the maximum packet size MUST take + into account that the redundancy has priority to be transmitted in + its entirety. See Section 3.4. + + The SSRC of the source SHALL be placed as the only member in the CSRC + list. + + | Note: The CSRC list in an RTP packet only includes the + | participant whose text is included in text blocks. It is not + | the same as the total list of participants in a conference. + | With audio and video media, the CSRC list would often contain + | all participants who are not muted, whereas text participants + | that don't type are completely silent and thus are not + | represented in RTP packet CSRC lists. + +3.10. Empty T140blocks + + If no unsent T140blocks were available for a source at the time of + populating a packet but already-transmitted T140blocks are available + that have not yet been sent the full intended number of redundant + transmissions, then the primary area in the packet is composed of an + empty T140block and included (without taking up any length) in the + packet for transmission. The corresponding SSRC SHALL be placed as + usual in its place in the CSRC list. + + The first packet in the session, the first after a source switch, and + the first after a pause SHALL be populated with the available + T140blocks for the source selected to be sent as the primary, and + empty T140blocks for the agreed-upon number of redundancy + generations. + +3.11. Creation of the Redundancy + + The primary T140block from a source in the latest transmitted packet + is saved for populating the first redundant T140block for that source + in the next transmission of text from that source. The first + redundant T140block for that source from the latest transmission is + saved for populating the second redundant T140block in the next + transmission of text from that source. + + Usually, this is the level of redundancy used. If a higher level of + redundancy is negotiated, then the procedure SHALL be continued until + all available redundant levels of T140blocks are placed in the + packet. If a receiver has negotiated a lower number of "text/red" + generations, then that level SHALL be the maximum used by the + transmitter. + + The T140blocks saved for transmission as redundant data are assigned + a planned transmission time of 330 ms after the current time but + SHOULD be transmitted earlier if new text for the same source gets + selected for transmission before that time. + +3.12. Timer Offset Fields + + The timestamp offset values SHALL be inserted in the redundancy + header, with the time offset from the RTP timestamp in the packet + when the corresponding T140block was sent as the primary. + + The timestamp offsets are expressed in the same clock tick units as + the RTP timestamp. + + The timestamp offset values for empty T140blocks have no relevance + but SHOULD be assigned realistic values. + +3.13. Other RTP Header Fields + + The number of members in the CSRC list (0 or 1) SHALL be placed in + the CC header field. Only mixers place value 1 in the CC field. A + value of 0 indicates that the source is the transmitting device + itself and that the source is indicated by the SSRC field. This + value is used by endpoints and also by mixers sending self-sourced + data. + + The current time SHALL be inserted in the timestamp. + + The SSRC header field SHALL contain the SSRC of the RTP session where + the packet will be transmitted. + + The M-bit SHALL be handled as specified in [RFC4103]. + +3.14. Pause in Transmission + + When there is no new T140block to transmit and no redundant T140block + that has not been retransmitted the intended number of times from any + source, the transmission process SHALL be stopped until either new + T140blocks arrive or a keep-alive method calls for transmission of + keep-alive packets. + +3.15. RTCP Considerations + + A mixer SHALL send RTCP reports with SDES, CNAME, and NAME + information about the sources in the multiparty call. This makes it + possible for participants to compose a suitable label for text from + each source. + + Privacy considerations SHALL be taken when composing these fields. + They contain name and address information that may be considered + sensitive if the information is transmitted in its entirety, e.g., to + unauthenticated participants. + +3.16. Reception of Multiparty Contents + + The "text/red" receiver included in an endpoint with presentation + functions will receive RTP packets in the single stream from the + mixer and SHALL distribute the T140blocks for presentation in + presentation areas for each source. Other receiver roles, such as + gateways or chained mixers, are also feasible. Whether the stream + will only be forwarded or will be distributed based on the different + sources must be taken into consideration. + +3.16.1. Acting on the Source of the Packet Contents + + If the CC field value of a received packet is 1, it indicates that + the text is conveyed from a source indicated in the single member in + the CSRC list, and the receiver MUST act on the source according to + its role. If the CC value is 0, the source is indicated in the SSRC + field. + +3.16.2. Detection and Indication of Possible Text Loss + + The receiver SHALL monitor the RTP sequence numbers of the received + packets for gaps and for packets received out of order. If a + sequence number gap appears and still exists after some defined short + time for jitter and reordering resolution, the packets in the gap + SHALL be regarded as lost. + + If it is known that only one source is active in the RTP session, + then it is likely that a gap equal to or larger than the agreed-upon + number of redundancy generations (including the primary) causes text + loss. In that case, the receiver SHALL create a T140block with a + marker for possible text loss [T140ad1], associate it with the + source, and insert it in the reception buffer for that source. + + If it is known that more than one source is active in the RTP + session, then it is not possible in general to evaluate if text was + lost when packets were lost. With two active sources and the + recommended number of redundancy generations (one original and two + redundant), it can take a gap of five consecutive lost packets before + any text may be lost, but text loss can also appear if three non- + consecutive packets are lost when they contained consecutive data + from the same source. A simple method for deciding when there is a + risk of resulting text loss is to evaluate if three or more packets + were lost within one second. If this simple method is used, then a + T140block SHOULD be created with a marker for possible text loss + [T140ad1] and associated with the SSRC of the RTP session as a + general input from the mixer. + + Implementations MAY apply more refined methods for more reliable + detection of whether text was lost or not. Any refined method SHOULD + prefer marking possible loss rather than not marking when it is + uncertain if there was loss. + +3.16.3. Extracting Text and Handling Recovery + + When applying the following procedures, the effects of possible + timestamp wraparound and the RTP session possibly changing the SSRC + MUST be considered. + + When a packet is received in an RTP session using the packetization + for multiparty-aware endpoints, its T140blocks SHALL be extracted as + described below. + + The source SHALL be extracted from the CSRC list if available, and + otherwise from the SSRC. + + If the received packet is the first packet received from the source, + then all T140blocks in the packet SHALL be retrieved and assigned to + a receive buffer for that source, beginning with the oldest available + redundant generation, continuing with the younger redundant + generations in age order, and finally ending with the primary. + + | Note: The normal case is that in the first packet, only the + | primary data has contents. The redundant data has contents in + | the first received packet from a source only after initial + | packet loss. + + If the packet is not the first packet from a source, then if + redundant data is available, the process SHALL start with the oldest + generation. The timestamp of that redundant data SHALL be created by + subtracting its timestamp offset from the RTP timestamp. If the + resulting timestamp is later than the latest retrieved data from the + same source, then the redundant data SHALL be retrieved and appended + to the receive buffer. The process SHALL be continued in the same + way for all younger generations of redundant data. After that, the + timestamp of the packet SHALL be compared with the timestamp of the + latest retrieved data from the same source and if it is later, then + the primary data SHALL be retrieved from the packet and appended to + the receive buffer for the source. + +3.16.4. Delete BOM + + The Unicode BOM character is used as a start indication and is + sometimes used as a filler or keep-alive by transmission + implementations. Any BOM characters SHALL be deleted after + extraction from received packets. + +3.17. Performance Considerations + + This solution has good performance with low text delays, as long as + the mean number of characters per second sent during any 10-second + interval from a number of simultaneously sending participants to a + receiving participant does not reach the "cps" value. At higher + numbers of sent characters per second, a jerkiness is visible in the + presentation of text. The solution is therefore suitable for + emergency service use, relay service use, and small or well-managed + larger multimedia conferences. In large unmanaged conferences with a + high number of participants only, on very rare occasions, situations + might arise where many participants happen to send text + simultaneously. In such circumstances, the result may be + unpleasantly jerky presentation of text from each sending + participant. It should be noted that it is only the number of users + sending text within the same moment that causes jerkiness, not the + total number of users with real-time text capability. + +3.18. Security for Session Control and Media + + Security mechanisms to provide confidentiality, integrity protection, + and peer authentication SHOULD be applied when possible regarding the + capabilities of the participating devices by using the Session + Initiation Protocol (SIP) over TLS by default according to + Section 3.1.3 of [RFC5630] on the session control level and by + default using DTLS-SRTP [RFC5764] at the media level. In + applications where legacy endpoints without security are allowed, a + negotiation SHOULD be performed to decide if encryption at the media + level will be applied. If no other security solution is mandated for + the application, then the Opportunistic Secure Real-time Transport + Protocol (OSRTP) [RFC8643] is a suitable method to be applied to + negotiate SRTP media security with DTLS. For simplicity, most SDP + examples below are expressed without the security additions. The + principles (but not all details) for applying DTLS-SRTP security + [RFC5764] are shown in a couple of the following examples. + + Further general security considerations are covered in Section 10. + + End-to-end encryption would require further work and could be based + on WebRTC as specified in Section 1.2 or on double encryption as + specified in [RFC8723]. + +3.19. SDP Offer/Answer Examples + + This section shows some examples of SDP for session negotiation of + the real-time text media in SIP sessions. Audio is usually provided + in the same session, and sometimes also video. The examples only + show the part of importance for the real-time text media. The + examples relate to the single RTP stream mixing for multiparty-aware + endpoints and for multiparty-unaware endpoints. + + | Note: Multiparty real-time text MAY also be provided through + | other methods, e.g., by a Selective Forwarding Middlebox (SFM). + | In that case, the SDP of the offer will include something + | specific for that method, e.g., an SDP attribute or another + | media format. An answer selecting the use of that method would + | accept it via a corresponding acknowledgement included in the + | SDP. The offer may also contain the "rtt-mixer" SDP media + | attribute for the main real-time text media when the offerer + | has this capability for both multiparty methods, while an + | answer, choosing to use SFM, will not include the "rtt-mixer" + | SDP media attribute. + + Offer example for the "text/red" format, multiparty support, and + capability for 90 characters per second: + + m=text 11000 RTP/AVP 100 98 + a=rtpmap:98 t140/1000 + a=fmtp:98 cps=90 + a=rtpmap:100 red/1000 + a=fmtp:100 98/98/98 + a=rtt-mixer + + Answer example from a multiparty-aware device: + + m=text 14000 RTP/AVP 100 98 + a=rtpmap:98 t140/1000 + a=fmtp:98 cps=90 + a=rtpmap:100 red/1000 + a=fmtp:100 98/98/98 + a=rtt-mixer + + Offer example for the "text/red" format, including multiparty and + security: + + a=fingerprint: (fingerprint1) + m=text 11000 RTP/AVP 100 98 + a=rtpmap:98 t140/1000 + a=rtpmap:100 red/1000 + a=fmtp:100 98/98/98 + a=rtt-mixer + + The "fingerprint" is sufficient to offer DTLS-SRTP, with the media + line still indicating RTP/AVP. + + | Note: For brevity, the entire value of the SDP "fingerprint" + | attribute is not shown in this and the following example. + + Answer example from a multiparty-aware device with security: + + a=fingerprint: (fingerprint2) + m=text 16000 RTP/AVP 100 98 + a=rtpmap:98 t140/1000 + a=rtpmap:100 red/1000 + a=fmtp:100 98/98/98 + a=rtt-mixer + + With the "fingerprint", the device acknowledges the use of DTLS-SRTP. + + Answer example from a multiparty-unaware device that also does not + support security: + + m=text 12000 RTP/AVP 100 98 + a=rtpmap:98 t140/1000 + a=rtpmap:100 red/1000 + a=fmtp:100 98/98/98 + +3.20. Packet Sequence Example from Interleaved Transmission + + This example shows a symbolic flow of packets from a mixer, including + loss and recovery. The sequence includes interleaved transmission of + text from two real-time text sources: A and B. P indicates primary + data. R1 is the first redundant generation of data, and R2 is the + second redundant generation of data. A1, B1, A2, etc. are text + chunks (T140blocks) received from the respective sources and sent on + to the receiver by the mixer. X indicates a dropped packet between + the mixer and a receiver. The session is assumed to use the original + and two redundant generations of real-time text. + + |-----------------------| + |Seq no 101, Time=20400 | + |CC=1 | + |CSRC list A | + |R2: A1, Offset=600 | + |R1: A2, Offset=300 | + |P: A3 | + |-----------------------| + + Assuming that earlier packets (with text A1 and A2) were received in + sequence, text A3 is received from packet 101 and assigned to + reception buffer A. The mixer is now assumed to have received + initial text from source B 100 ms after packet 101 and will send that + text. Transmission of A2 and A3 as redundancy is planned for 330 ms + after packet 101 if no new text from A is ready to be sent before + that. + + |-----------------------| + |Seq no 102, Time=20500 | + |CC=1 | + |CSRC list B | + |R2 Empty, Offset=600 | + |R1: Empty, Offset=300 | + |P: B1 | + |-----------------------| + + Packet 102 is received. + + B1 is retrieved from this packet. Redundant transmission of B1 is + planned 330 ms after packet 102. + + X------------------------| + X Seq no 103, Timer=20730| + X CC=1 | + X CSRC list A | + X R2: A2, Offset=630 | + X R1: A3, Offset=330 | + X P: Empty | + X------------------------| + + Packet 103 is assumed to be lost due to network problems. + + It contains redundancy for A. Sending A3 as second-level + redundancy is planned for 330 ms after packet 103. + + X------------------------| + X Seq no 104, Timer=20800| + X CC=1 | + X CSRC list B | + X R2: Empty, Offset=600 | + X R1: B1, Offset=300 | + X P: B2 | + X------------------------| + + Packet 104 contains text from B, including new B2 and redundant + B1. It is assumed dropped due to network problems. + + The mixer has A3 redundancy to send, but no new text appears from + A, and therefore the redundancy is sent 330 ms after the previous + packet with text from A. + + |------------------------| + | Seq no 105, Timer=21060| + | CC=1 | + | CSRC list A | + | R2: A3, Offset=660 | + | R1: Empty, Offset=330 | + | P: Empty | + |------------------------| + + Packet 105 is received. + + A gap for lost packets 103 and 104 is detected. Assume that no + other loss was detected during the last second. It can then be + concluded that nothing was totally lost. + + R2 is checked. Its original time was 21060-660=20400. A packet + with text from A was received with that timestamp, so nothing + needs to be recovered. + + B1 and B2 still need to be transmitted as redundancy. This is + planned 330 ms after packet 104. That would be at 21130. + + |-----------------------| + |Seq no 106, Timer=21130| + |CC=1 | + |CSRC list B | + | R2: B1, Offset=630 | + | R1: B2, Offset=330 | + | P: Empty | + |-----------------------| + + Packet 106 is received. + + The second-level redundancy in packet 106 is B1 and has a + timestamp offset of 630 ms. The timestamp of packet 106 minus 630 + is 20500, which is the timestamp of packet 102 that was received. + So, B1 does not need to be retrieved. The first-level redundancy + in packet 106 has an offset of 330. The timestamp of packet 106 + minus 330 is 20800. That is later than the latest received packet + with source B. Therefore, B2 is retrieved and assigned to the + input buffer for source B. No primary is available in packet 106. + + After this sequence, A3, B1, and B2 have been received. In this + case, no text was lost. + +3.21. Maximum Character Rate "cps" Setting + + The default maximum rate of reception of "text/t140" real-time text, + as specified in [RFC4103], is 30 characters per second. The actual + rate is calculated without regard to any redundant text transmission + and is, in the multiparty case, evaluated for all sources + contributing to transmission to a receiver. The value MAY be + modified in the "cps" parameter of the "fmtp" attribute for the + "text/t140" format of the "text" media section. + + A mixer combining real-time text from a number of sources may + occasionally have a higher combined flow of text coming from the + sources. Endpoints SHOULD therefore include a suitable higher value + for the "cps" parameter, corresponding to its real reception + capability. The default "cps" value 30 can be assumed to be + sufficient for small meetings and well-managed larger conferences + with users only making manual text entry. A "cps" value of 90 can be + assumed to be sufficient even for large unmanaged conferences and for + cases when speech-to-text technologies are used for text entry. This + is also a reachable performance for receivers in modern technologies, + and 90 is therefore the RECOMMENDED "cps" value. See [RFC4103] for + the format and use of the "cps" parameter. The same rules apply for + the multiparty case. + +4. Presentation-Level Considerations + + "Protocol for multimedia application text conversation" [T140] + provides the presentation-level requirements for RTP transport as + described in [RFC4103]. Functions for erasure and other formatting + functions are specified in [T140], which has the following general + statement for the presentation: + + | The display of text from the members of the conversation should be + | arranged so that the text from each participant is clearly + | readable, and its source and the relative timing of entered text + | is visualized in the display. Mechanisms for looking back in the + | contents from the current session should be provided. The text + | should be displayed as soon as it is received. + + Strict application of [T140] is essential for the interoperability of + real-time text implementations and to fulfill the intention that the + session participants have the same information conveyed in the text + contents of the conversation without necessarily having the exact + same layout of the conversation. + + [T140] specifies a set of presentation control codes (Section 4.2.4) + to include in the stream. Some of them are optional. + Implementations MUST ignore optional control codes that they do not + support. + + There is no strict "message" concept in real-time text. The Unicode + Line Separator character SHALL be used as a separator allowing a part + of received text to be grouped in a presentation. The character + combination "CRLF" may be used by other implementations as a + replacement for the Line Separator. The "CRLF" combination SHALL be + erased by just one erasing action, the same as the Line Separator. + Presentation functions are allowed to group text for presentation in + smaller groups than the Line Separators imply and present such groups + with a source indication together with text groups from other sources + (see the following presentation examples). Erasure has no specific + limit by any delimiter in the text stream. + +4.1. Presentation by Multiparty-Aware Endpoints + + A multiparty-aware receiving party presenting real-time text MUST + separate text from different sources and present them in separate + presentation fields. The receiving party MAY separate the + presentation of parts of text from a source in readable groups based + on criteria other than a Line Separator and merge these groups in the + presentation area when it benefits the user to most easily find and + read text from the different participants. The criteria MAY, for + example, be a received comma, a full stop, some other type of phrase + delimiter, or a long pause. + + When text is received from multiple original sources, the + presentation SHALL provide a view where text is added in multiple + presentation fields. + + If the presentation presents text from different sources in one + common area, the presenting endpoint SHOULD insert text from the + local user, where the text ends at suitable points and is merged + properly with received text to indicate the relative timing for when + the text groups were completed. In this presentation mode, the + receiving endpoint SHALL present the source of the different groups + of text. This presentation style is called the "chat" style here and + provides the possibility of following text arriving from multiple + parties and the approximate relative time that text is received as + related to text from the local user. + + A view of a three-party real-time text call in chat style is shown in + this example. + + _________________________________________________ + | |^| + |[Alice] Hi, Alice here. |-| + | | | + |[Bob] Bob as well. | | + | | | + |[Eve] Hi, this is Eve, calling from Paris. | | + | I thought you should be here. | | + | | | + |[Alice] I am coming on Thursday, my | | + | performance is not until Friday morning.| | + | | | + |[Bob] And I on Wednesday evening. | | + | | | + |[Alice] Can we meet on Thursday evening? | | + | | | + |[Eve] Yes, definitely. How about 7pm. | | + | at the entrance of the restaurant | | + | Le Lion Blanc? | | + |[Eve] we can have dinner and then take a walk |-| + |______________________________________________|v| + | <Eve-typing> But I need to be back to |^| + | the hotel by 11 because I need |-| + | | | + | <Bob-typing> I wou |-| + |______________________________________________|v| + | of course, I underst | + |________________________________________________| + + Figure 1: Example of a Three-Party Real-Time Text Call Presented + in Chat Style Seen at Participant Alice's Endpoint + + Presentation styles other than the chat style MAY be arranged. + + Figure 2 shows how a coordinated column view MAY be presented. + + _____________________________________________________________________ + | Bob | Eve | Alice | + |____________________|______________________|_______________________| + | | |I will arrive by TGV. | + |My flight is to Orly| |Convenient to the main | + | |Hi all, can we plan |station. | + | |for the seminar? | | + |Eve, will you do | | | + |your presentation on| | | + |Friday? |Yes, Friday at 10. | | + |Fine, wo | |We need to meet befo | + |___________________________________________________________________| + + Figure 2: An Example of a Coordinated Column View of a + Three-Party Session with Entries Ordered Vertically in + Approximate Time Order + +4.2. Multiparty Mixing for Multiparty-Unaware Endpoints + + When the mixer has indicated multiparty real-time text capability in + an SDP negotiation but the multiparty capability negotiation fails + with an endpoint, the agreed-upon "text/red" or "text/t140" format + SHALL be used and the mixer SHOULD compose a best-effort presentation + of multiparty real-time text in one stream intended to be presented + by an endpoint with no multiparty awareness, when that is desired in + the actual implementation. The following specifies a procedure that + MAY be applied in that situation. + + This presentation format has functional limitations and SHOULD be + used only to enable participation in multiparty calls by legacy + deployed endpoints implementing only RFC 4103 without any multiparty + extensions specified in this document. + + The principles and procedures below do not specify any new protocol + elements. They are instead composed of information provided in + [T140] and an ambition to provide a best-effort presentation on an + endpoint that has functions originally intended only for two-party + calls. + + The mixer performing the mixing for multiparty-unaware endpoints + SHALL compose a simulated, limited multiparty real-time text view + suitable for presentation in one presentation area. The mixer SHALL + group text in suitable groups and prepare them for presentation by + inserting a Line Separator between them if the transmitted text did + not already end with a new line (Line Separator or CRLF). A + presentable label SHALL be composed and sent for the source initially + in the session and after each source switch. With this procedure, + the time for switching from transmission of text from one source to + transmission of text from another source depends on the actions of + the users. In order to expedite source switching, a user can, for + example, end its turn with a new line. + +4.2.1. Actions by the Mixer at Reception from the Call Participants + + When text is received by the mixer from the different participants, + the mixer SHALL recover text from redundancy if any packets are lost. + The marker for lost text [T140ad1] SHALL be inserted in the stream if + unrecoverable loss appears. Any Unicode BOM characters, possibly + used for keep-alives, SHALL be deleted. The time of creation of text + (retrieved from the RTP timestamp) SHALL be stored together with the + received text from each source in queues for transmission to the + recipients in order to be able to evaluate text loss. + +4.2.2. Actions by the Mixer for Transmission to the Recipients + + The following procedure SHALL be applied for each multiparty-unaware + recipient of multiparty text from the mixer. + + The text for transmission SHALL be formatted by the mixer for each + receiving user for presentation in one single presentation area. + Text received from a participant SHOULD NOT be included in + transmissions to that participant, because it is usually presented + locally at transmission time. When there is text available for + transmission from the mixer to a receiving party from more than one + participant, the mixer SHALL switch between transmission of text from + the different sources at suitable points in the transmitted stream. + + When switching the source, the mixer SHALL insert a Line Separator if + the already-transmitted text did not end with a new line (Line + Separator or CRLF). A label SHALL be composed of information in the + CNAME and NAME fields in RTCP reports from the participant to have + its text transmitted, or from other session information for that + user. The label SHALL be delimited by suitable characters (e.g., + "[ ]") and transmitted. The CSRC SHALL indicate the selected source. + Then, text from that selected participant SHALL be transmitted until + a new suitable point for switching the source is reached. + + Information available to the mixer for composing the label may + contain sensitive personal information that SHOULD NOT be revealed in + sessions not securely authenticated and confidentiality protected. + Privacy considerations regarding how much personal information is + included in the label SHOULD therefore be taken when composing the + label. + + Seeking a suitable point for switching the source SHALL be done when + there is older text waiting for transmission from any party than the + age of the last transmitted text. Suitable points for switching are: + + * A completed phrase ending with a comma. + + * A completed sentence. + + * A new line (Line Separator or CRLF). + + * A long pause (e.g., > 10 seconds) in received text from the + currently transmitted source. + + * If text from one participant has been transmitted with text from + other sources waiting for transmission for a long time (e.g., > 1 + minute) and none of the other suitable points for switching has + occurred, a source switch MAY be forced by the mixer at the next + word delimiter, and also even if a word delimiter does not occur + within some period of time (e.g., 15 seconds) after the scan for a + word delimiter started. + + When switching the source, the source that has the oldest text in + queue SHALL be selected to be transmitted. A character display count + SHALL be maintained for the currently transmitted source, starting at + zero after the label is transmitted for the currently transmitted + source. + + The status SHALL be maintained for the latest control code for Select + Graphic Rendition (SGR) from each source. If there is an SGR code + stored as the status for the current source before the source switch + is done, a reset of SGR SHALL be sent by the sequence SGR 0 [U+009B + U+0000 U+006D] after the new line and before the new label during a + source switch. See Section 4.2.4 for an explanation. This + transmission does not influence the display count. + + If there is an SGR code stored for the new source after the source + switch, that SGR code SHALL be transmitted to the recipient before + the label. This transmission does not influence the display count. + +4.2.3. Actions on Transmission of Text + + Text from a source sent to the recipient SHALL increase the display + count by one per transmitted character. + +4.2.4. Actions on Transmission of Control Codes + + The following control codes, as specified by T.140 [T140], require + specific actions. They SHALL cause specific considerations in the + mixer. Note that the codes presented here are expressed in UTF-16, + while transmission is made in the UTF-8 encoding of these codes. + + BEL (U+0007): Bell. Alert in session. Provides for alerting during + an active session. The display count SHALL NOT be altered. + + NEW LINE (U+2028): Line Separator. Check and perform a source + switch if appropriate. Increase the display count by 1. + + CR LF (U+000D U+000A): A supported, but not preferred, way of + requesting a new line. Check and perform a source switch if + appropriate. Increase the display count by 1. + + INT (ESC U+0061): Interrupt (used to initiate the mode negotiation + procedure). The display count SHALL NOT be altered. + + SGR (U+009B Ps U+006D): Select Graphic Rendition. Ps represents the + rendition parameters specified in [ISO6429]. (For freely + available equivalent information, please see [ECMA-48].) The + display count SHALL NOT be altered. The SGR code SHOULD be stored + for the current source. + + SOS (U+0098): Start of String. Used as a general protocol element + introducer, followed by a maximum 256-byte string and the ST. The + display count SHALL NOT be altered. + + ST (U+009C): String Terminator. End of SOS string. The display + count SHALL NOT be altered. + + ESC (U+001B): Escape. Used in control strings. The display count + SHALL NOT be altered for the complete escape code. + + Byte order mark (BOM) (U+FEFF): "Zero width no-break space". Used + for synchronization and keep-alive. It SHALL be deleted from + incoming streams. It SHALL also be sent first after session + establishment to the recipient. The display count SHALL NOT be + altered. + + Missing text mark (U+FFFD): "Replacement character". Represented as + a question mark in a rhombus, or, if that is not feasible, + replaced by an apostrophe ('). It marks the place in the stream + of possible text loss. This mark SHALL be inserted by the + reception procedure in the case of unrecoverable loss of packets. + The display count SHALL be increased by one when sent as for any + other character. + + SGR: If a control code for SGR other than a reset of the graphic + rendition (SGR 0) is sent to a recipient, that control code SHALL + also be stored as the status for the source in the storage for SGR + status. If a reset graphic rendition (SGR 0) originating from a + source is sent, then the SGR status storage for that source SHALL + be cleared. The display count SHALL NOT be increased. + + BS (U+0008): "Back Space". Intended to erase the last entered + character by a source. Erasure by backspace cannot always be + performed as the erasing party intended. If an erasing action + erases all text up to the end of the leading label after a source + switch, then the mixer MUST NOT transmit more backspaces. + Instead, it is RECOMMENDED that a letter "X" be inserted in the + text stream for each backspace as an indication of the intent to + erase more. A new line is usually coded by a Line Separator, but + the character combination "CRLF" MAY be used instead. Erasure of + a new line is, in both cases, done by just one erasing action + (backspace). If the display count has a positive value, it SHALL + be decreased by one when the BS is sent. If the display count is + at zero, it SHALL NOT be altered. + +4.2.5. Packet Transmission + + A mixer transmitting to a multiparty-unaware endpoint SHALL send + primary data only from one source per packet. The SSRC SHALL be the + SSRC of the mixer. The CSRC list MAY contain one member and be the + SSRC of the source of the primary data. + +4.2.6. Functional Limitations + + When a multiparty-unaware endpoint presents a conversation in one + display area in a chat style, it inserts source indications for + remote text and local user text as they are merged in completed text + groups. When an endpoint using this layout receives and presents + text mixed for multiparty-unaware endpoints, there will be two levels + of source indicators for the received text: one generated by the + mixer and inserted in a label after each source switch, and another + generated by the receiving endpoint and inserted after each switch + between the local source and the remote source in the presentation + area. This will waste display space and look inconsistent to the + reader. + + New text can be presented from only one source at a time. Switching + the source to be presented takes place at suitable places in the + text, such as the end of a phrase, the end of a sentence, or a Line + Separator, or upon detecting inactivity. Therefore, the time to + switch to present waiting text from other sources may grow long, and + it will vary and depend on the actions of the currently presented + source. + + Erasure can only be done up to the latest source switch. If a user + tries to erase more text, the erasing actions will be presented as a + letter "X" after the label. + + Text loss because of network errors may hit the label between entries + from different parties, causing the risk of a misunderstanding + regarding which source provided a piece of text. + + Because of these facts, it is strongly RECOMMENDED that multiparty + awareness be implemented in real-time text endpoints. The use of the + mixing method for multiparty-unaware endpoints should be left for use + with endpoints that are impossible to upgrade to become multiparty + aware. + +4.2.7. Example Views of Presentation on Multiparty-Unaware Endpoints + + The following pictures are examples of the view on a participant's + display for the multiparty-unaware case. + + Figure 3 shows how a coordinated column view MAY be presented on + Alice's device in a view with two columns. The mixer inserts labels + to show how the sources alternate in the column with received text. + The mixer alternates between the sources at suitable points in the + text exchange so that text entries from each party can be + conveniently read. + + ___________________________________________________ + | Conference | Alice | + |_________________________|_________________________| + | |I will arrive by TGV. | + |[Bob]: My flight is to |Convenient to the main | + |Orly. |station. | + |[Eve]: Hi all, can we | | + |plan for the seminar. | | + | | | + |[Bob]: Eve, will you do | | + |your presentation on | | + |Friday? | | + |[Eve]: Yes, Friday at 10.| | + |[Bob]: Fine, wo |We need to meet befo | + |_________________________|_________________________| + + Figure 3: Alice, Who Has a Conference-Unaware Client, Is + Receiving the Multiparty Real-Time Text in a Single Stream + + In Figure 4, there is a tradition in receiving applications to + include a label showing the source of the text, here shown with + parentheses "()". The mixer also inserts source labels for the + multiparty call participants, here shown with brackets "[]". + + _________________________________________________ + | |^| + |(Alice) Hi, Alice here. |-| + | | | + |(mix)[Bob] Bob as well. | | + | | | + |[Eve] Hi, this is Eve, calling from Paris | | + | I thought you should be here. | | + | | | + |(Alice) I am coming on Thursday, my | | + | performance is not until Friday morning.| | + | | | + |(mix)[Bob] And I on Wednesday evening. | | + | | | + |[Eve] we can have dinner and then walk | | + | | | + |[Eve] But I need to be back to | | + | the hotel by 11 because I need | | + | |-| + |______________________________________________|v| + | of course, I underst | + |________________________________________________| + + Figure 4: An Example of a View of the Multiparty-Unaware + Presentation in Chat Style, Where Alice Is the Local User + +5. Relationship to Conference Control + +5.1. Use with SIP Centralized Conferencing Framework + + The Session Initiation Protocol (SIP) conferencing framework, mainly + specified in [RFC4353], [RFC4579], and [RFC4575], is suitable for + coordinating sessions, including multiparty real-time text. The + real-time text stream between the mixer and a participant is one and + the same during the conference. Participants get announced by + notifications when participants are joining or leaving, and further + user information may be provided. The SSRC of the text to expect + from joined users MAY be included in a notification. The + notifications MAY be used for both security purposes and translation + to a label for presentation to other users. + +5.2. Conference Control + + In managed conferences, control of the real-time text media SHOULD be + provided in the same way as for other media, e.g., for muting and + unmuting by the direction attributes in SDP [RFC8866]. + + Note that floor control functions may be of value for real-time text + users as well as for users of other media in a conference. + +6. Gateway Considerations + + Multiparty real-time text sessions may involve gateways of different + kinds. Gateways involved in setting up sessions SHALL correctly + reflect the multiparty capability or unawareness of the combination + of the gateway and the remote endpoint beyond the gateway. + +6.1. Gateway Considerations with Textphones + + One case that may occur is a gateway to the Public Switched Telephone + Network (PSTN) for communication with textphones (e.g., TTYs). + Textphones are limited devices with no multiparty awareness, and it + SHOULD therefore be appropriate for the gateway to not indicate + multiparty awareness for that case. Another solution is that the + gateway indicates multiparty capability towards the mixer and + includes the multiparty mixer function for multiparty-unaware + endpoints itself. This solution makes it possible to adapt to the + functional limitations of the textphone. + + More information on gateways to textphones is found in [RFC5194]. + +6.2. Gateway Considerations with WebRTC + + Gateway operation between RTP-mixer-based multiparty real-time text + and WebRTC-based real-time text may also be required. Real-time text + transport in WebRTC is specified in [RFC8865]. + + A multiparty bridge may have functionality for communicating via + real-time text in both (1) RTP streams with real-time text and (2) + WebRTC T.140 data channels. Other configurations may consist of a + multiparty bridge with either technology for real-time text transport + and a separate gateway for conversion of the text communication + streams between RTP and T.140 data channels. + + In WebRTC, it is assumed that for a multiparty session, one T.140 + data channel is established for each source from a gateway or bridge + to each participant. Each participant also has a data channel with a + two-way connection with the gateway or bridge. + + A T.140 data channel used for two-way communication is for text from + the WebRTC user and from the bridge or gateway itself to the WebRTC + user. The label parameter of this T.140 data channel is used as the + NAME field in RTCP to participants on the RTP side. The other T.140 + data channels are only for text from other participants to the WebRTC + user. + + When a new participant has entered the session with RTP transport of + real-time text, a new T.140 data channel SHOULD be established to + WebRTC users with the label parameter composed of information from + the NAME field in RTCP on the RTP side. + + When a new participant has entered the multiparty session with real- + time text transport in a WebRTC T.140 data channel, the new + participant SHOULD be announced by a notification to RTP users. The + label parameter from the WebRTC side or other suitable information + from the session or stream establishment procedure SHOULD be used to + compose the NAME RTCP field on the RTP side. + + When a participant on the RTP side is disconnected from the + multiparty session, the corresponding T.140 data channel(s) SHOULD be + closed. + + When a WebRTC user of T.140 data channels disconnects from the mixer, + the corresponding RTP streams or sources in an RTP-mixed stream + SHOULD be closed. + + T.140 data channels MAY be opened and closed by negotiation or + renegotiation of the session, or by any other valid means, as + specified in Section 1 of [RFC8865]. + +7. Updates to RFC 4103 + + This document updates [RFC4103] by introducing an SDP media + attribute, "rtt-mixer", for negotiation of multiparty-mixing + capability with the format described in [RFC4103] and by specifying + the rules for packets when multiparty capability is negotiated and in + use. + +8. Congestion Considerations + + The congestion considerations and recommended actions provided in + [RFC4103] are also valid in multiparty situations. + + The time values SHALL then be applied per source of text sent to a + receiver. + + In the very unlikely event that many participants in a conference + send text simultaneously for a long period of time, a delay may build + up for the presentation of text at the receivers if the limitation in + characters per second ("cps") to be transmitted to the participants + is exceeded. A delay of more than 15 seconds can cause confusion in + the session. It is therefore RECOMMENDED that an RTP mixer discard + such text causing excessive delays and insert a general indication of + possible text loss [T140ad1] in the session. If the main text + contributor is indicated in any way, the mixer MAY avoid deleting + text from that participant. It should, however, be noted that human + creation of text normally contains pauses, when the transmission can + catch up, so that transmission-overload situations are expected to be + very rare. + +9. IANA Considerations + +9.1. Registration of the "rtt-mixer" SDP Media Attribute + + IANA has registered the new SDP attribute "rtt-mixer". + + Contact name: IESG + + Contact email: iesg@ietf.org + + Attribute name: rtt-mixer + + Attribute semantics: See RFC 9071, Section 2.3 + + Attribute value: none + + Usage level: media + + Purpose: To indicate mixer and endpoint support of multiparty mixing + for real-time text transmission, using a common RTP stream for + transmission of text from a number of sources mixed with one + source at a time and where the source is indicated in a single + CSRC-list member. + + Charset Dependent: no + + O/A procedures: See RFC 9071, Section 2.3 + + Mux Category: NORMAL + + Reference: RFC 9071 + +10. Security Considerations + + The RTP-mixer model requires the mixer to be allowed to decrypt, + pack, and encrypt secured text from conference participants. + Therefore, the mixer needs to be trusted to maintain confidentiality + and integrity of the real-time text data. This situation is similar + to the situation for handling audio and video media in centralized + mixers. + + The requirement to transfer information about the user in RTCP + reports in SDES, CNAME, and NAME fields, and in conference + notifications, may have privacy concerns, as already stated in RFC + 3550 [RFC3550], and may be restricted for privacy reasons. When used + for the creation of readable labels in the presentation, the + receiving user will then get a more symbolic label for the source. + + The services available through the real-time text mixer may be of + special interest to deaf and hard-of-hearing individuals. Some users + may want to refrain from revealing such characteristics broadly in + conferences. Conference systems where the mixer is included MAY need + to be designed with the confidentiality of such characteristics in + mind. + + Participants with malicious intentions may appear and, for example, + disrupt the multiparty session by emitting a continuous flow of text. + They may also send text that appears to originate from other + participants. Countermeasures should include requiring secure + signaling, media, and authentication, and providing higher-layer + conference functions, e.g., for blocking, muting, and expelling + participants. + + Participants with malicious intentions may also try to disrupt the + presentation by sending incomplete or malformed control codes. + Handling of text from the different sources by the receivers MUST + therefore be well separated so that the effects of such actions only + affect text from the source causing the action. + + Care should be taken to avoid the possibility of attacks by + unauthenticated call participants, and even eavesdropping and + manipulation of content by non-participants, if the use of the mixer + is permitted for users both with and without security procedures. + + As already stated in Section 3.18, security in media SHOULD be + applied by using DTLS-SRTP [RFC5764] at the media level. + + Further security considerations specific to this application are + specified in Section 3.18. + +11. References + +11.1. Normative References + + [ECMA-48] Ecma International, "ECMA-48: Control functions for coded + character sets", 5th edition, June 1991, + <https://www.ecma-international.org/publications-and- + standards/standards/ecma-48/>. + + [ISO6429] ISO/IEC, "Information technology - Control functions for + coded character sets", ISO/IEC ISO/IEC 6429:1992, December + 1992, <https://www.iso.org/obp/ui/#iso:std:iso- + iec:6429:ed-3:v1:en>. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. + Jacobson, "RTP: A Transport Protocol for Real-Time + Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, + July 2003, <https://www.rfc-editor.org/info/rfc3550>. + + [RFC4102] Jones, P., "Registration of the text/red MIME Sub-Type", + RFC 4102, DOI 10.17487/RFC4102, June 2005, + <https://www.rfc-editor.org/info/rfc4102>. + + [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text + Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, + <https://www.rfc-editor.org/info/rfc4103>. + + [RFC5630] Audet, F., "The Use of the SIPS URI Scheme in the Session + Initiation Protocol (SIP)", RFC 5630, + DOI 10.17487/RFC5630, October 2009, + <https://www.rfc-editor.org/info/rfc5630>. + + [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer + Security (DTLS) Extension to Establish Keys for the Secure + Real-time Transport Protocol (SRTP)", RFC 5764, + DOI 10.17487/RFC5764, May 2010, + <https://www.rfc-editor.org/info/rfc5764>. + + [RFC6263] Marjou, X. and A. Sollaud, "Application Mechanism for + Keeping Alive the NAT Mappings Associated with RTP / RTP + Control Protocol (RTCP) Flows", RFC 6263, + DOI 10.17487/RFC6263, June 2011, + <https://www.rfc-editor.org/info/rfc6263>. + + [RFC7675] Perumal, M., Wing, D., Ravindranath, R., Reddy, T., and M. + Thomson, "Session Traversal Utilities for NAT (STUN) Usage + for Consent Freshness", RFC 7675, DOI 10.17487/RFC7675, + October 2015, <https://www.rfc-editor.org/info/rfc7675>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + + [RFC8865] Holmberg, C. and G. Hellström, "T.140 Real-Time Text + Conversation over WebRTC Data Channels", RFC 8865, + DOI 10.17487/RFC8865, January 2021, + <https://www.rfc-editor.org/info/rfc8865>. + + [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: + Session Description Protocol", RFC 8866, + DOI 10.17487/RFC8866, January 2021, + <https://www.rfc-editor.org/info/rfc8866>. + + [T140] ITU-T, "Protocol for multimedia application text + conversation", ITU-T Recommendation T.140, February 1998, + <https://www.itu.int/rec/T-REC-T.140-199802-I/en>. + + [T140ad1] ITU-T, "Recommendation T.140 Addendum", February 2000, + <https://www.itu.int/rec/T-REC-T.140-200002-I!Add1/en>. + +11.2. Informative References + + [RFC4353] Rosenberg, J., "A Framework for Conferencing with the + Session Initiation Protocol (SIP)", RFC 4353, + DOI 10.17487/RFC4353, February 2006, + <https://www.rfc-editor.org/info/rfc4353>. + + [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, Ed., "A + Session Initiation Protocol (SIP) Event Package for + Conference State", RFC 4575, DOI 10.17487/RFC4575, August + 2006, <https://www.rfc-editor.org/info/rfc4575>. + + [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol + (SIP) Call Control - Conferencing for User Agents", + BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006, + <https://www.rfc-editor.org/info/rfc4579>. + + [RFC5194] van Wijk, A., Ed. and G. Gybels, Ed., "Framework for Real- + Time Text over IP Using the Session Initiation Protocol + (SIP)", RFC 5194, DOI 10.17487/RFC5194, June 2008, + <https://www.rfc-editor.org/info/rfc5194>. + + [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, + DOI 10.17487/RFC7667, November 2015, + <https://www.rfc-editor.org/info/rfc7667>. + + [RFC8643] Johnston, A., Aboba, B., Hutton, A., Jesske, R., and T. + Stach, "An Opportunistic Approach for Secure Real-time + Transport Protocol (OSRTP)", RFC 8643, + DOI 10.17487/RFC8643, August 2019, + <https://www.rfc-editor.org/info/rfc8643>. + + [RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach, + "Double Encryption Procedures for the Secure Real-Time + Transport Protocol (SRTP)", RFC 8723, + DOI 10.17487/RFC8723, April 2020, + <https://www.rfc-editor.org/info/rfc8723>. + + [RFC8825] Alvestrand, H., "Overview: Real-Time Protocols for + Browser-Based Applications", RFC 8825, + DOI 10.17487/RFC8825, January 2021, + <https://www.rfc-editor.org/info/rfc8825>. + +Acknowledgements + + The author wants to thank the following persons for support, reviews, + and valuable comments: Bernard Aboba, Amanda Baber, Roman Danyliw, + Spencer Dawkins, Martin Duke, Lars Eggert, James Hamlin, Benjamin + Kaduk, Murray Kucherawy, Paul Kyzivat, Jonathan Lennox, Lorenzo + Miniero, Dan Mongrain, Francesca Palombini, Colin Perkins, Brian + Rosen, Rich Salz, Jürgen Schönwälder, Robert Wilton, Dale Worley, + Yong Xin, and Peter Yee. + +Author's Address + + Gunnar Hellström + Gunnar Hellström Accessible Communication + SE-13670 Vendelsö + Sweden + + Email: gunnar.hellstrom@ghaccess.se |